anchor
Blog /  
Machine Learning with Clojure: Benefits and Perspectives

Machine Learning with Clojure: Benefits and Perspectives

May 14, 2024
->
5 min read

When you start a machine learning (ML) project and select the technology stack, you expect it to provide data immutability, great parallel programming features, and excellent data manipulation and encoding capabilities. That’s exactly what Clojure brings to the table. Although it’s not the most popular choice for ML projects, its functional paradigm and the entire Java ecosystem behind its back can make it a game-changer. 

Why use Clojure for machine learning? 

As someone who has never truly worked with machine learning, I initially found stepping into this field pretty confusing from Clojure's perspective. There are so many different frameworks, collections of Java libraries, and Clojure wrappers for them that you don't know where to start, and there are not many resources out there to help you.

So, why would we want to use Clojure for data science stuff in the first place? Let’s dive deeper.

Functional paradigm

Leveraging Clojure's functional paradigm for machine learning pipelines facilitates modularization. With this approach, each step (like data preprocessing, feature extraction, or model training) is expressed within a pure function, which makes the code more readable, maintainable, and testable. This paradigm ensures that functions produce the same output for a given input and avoids side effects, making code more predictable.

Moreover, functional programming concepts can be applied to manipulate data efficiently, enabling streamlined processing of large datasets. This makes Clojure highly effective with huge datasets compared to Python or R.

JVM backbone 

One of the most notable reasons for choosing Clojure as a primary tool for ML is probably the speed and portability provided by Java Virtual Machine. 

The JVM's portability across different operating systems and hardware architectures is a significant advantage for Clojure. ML models developed in Clojure can run on any platform that supports the JVM without modification, ensuring consistency and ease of deployment across various environments. I’m sure you will value this point if your project requires deploying ML models across different systems or cloud platforms.

Another benefit of the JVM is its Just-In-Time compilation and runtime optimizations, which contribute to the performance of Clojure applications. This means that, while Clojure might not always be as fast as lower-level languages like C or C++, it can still achieve competitive performance for many ML tasks, especially when leveraging optimized Java libraries.

It’s worth mentioning some performance considerations for machine learning workflows in Clojure. Clojure's ability to leverage multicore processors and distributed computing frameworks like Apache Spark enhances scalability and performance for parallel and distributed ML tasks. If you still tend to doubt Clojure's performance, it can be optimized further by leveraging persistent data structures, lazy evaluation, and concurrency primitives like futures and promises.

Wide range of ML libraries and tools 

It seems that the best entry point to take your project off the ground without too much complication is scicloj.ml. It's a library collection that includes various ML models, pipelines, and data transformation features gathered in one place. Scicloj.ml is a very popular and documented framework with a relatively big and active community.

There’s another promising new framework, noj, that is actively developed. It aims to replace scicloj.ml and be the entry point into the Clojure data science world. However, at the time of writing, noj is a work-in-progress library in the early stage of development, but it seems that the community has really high hopes for it. 

If you want more control over your stack, here are even more libraries you might be interested in:

  • The two most popular Java libraries for machine learning are Tribuo and Smile. Both provide APIs for classification, regression, clustering, anomaly detection, data visualization, and so on. scicloj.ml is using Smile, yet Smile 2.x is being phased out of the main Clojure ML Libraries, and Smile 3.x is avoided due to GPL licensing. Tribuo seems to be more feature-rich and comprehensive, being the preferred library for ML workflows in the community. 
  • When it comes to Clojure wrappers for these libraries, there are Fastmath and scicloj.ml.smile for Smile library and scicloj.ml.tribuo for Tribuo. Even though scicloj.ml.tribuo is a pretty young library, it's likely to become the main source for ML algorithms. There is also tech.ml.dataset that focuses on efficient storage and processing for individual datasets.
  • For composing ML pipelines there is a metamorph library. It allows to express any data transformation and machine learning pipeline as a simple sequence of pure functions. metamorph.ml builds upon metamorph and allows developers to encapsulate the variable components of the pipelines and efficiently tune them as separate entities.
  • Also, Clojure has fantastic tools for literate programming like Clerk or Clay, ensuring more maintainable codebases and better collaboration among team members.

Navigating the landscape of machine learning frameworks and libraries in the Clojure ecosystem can be both exciting and challenging. Yet there are tools for ML development that would suit any project and any team, ensuring stable and reliable work of your application. 

Integration with popular tools 

Clojure's Java interop capabilities enable seamless integration with popular machine learning libraries. Developers can leverage existing Java libraries or create custom Java wrappers to interface with TensorFlow and PyTorch from Clojure code.

By integrating with these libraries, Clojure developers can take advantage of their extensive capabilities for deep learning tasks such as image recognition, natural language processing, and sequence modeling.

When you start exploring Clojure's support for deep learning frameworks, you realize there are quite a number of tools for handling your tasks.

For example, there are libraries providing Python bindings for Clojure:

  • Libpython-clj is the go-to library for Python interop and allows you to call Python code directly from Clojure pretty easily;
  • Sklearn-clj is another tool that provides access to estimators and models from the Python library scikit-learn by using the above-mentioned libpython-clj;
  • Neanderthal is a high-performance numerical computing library for Clojure that can be used for interfacing with deep learning frameworks, essential for deep learning model training and inference;
  • DL4CLJ is a Clojure wrapper for deep learning libraries allowing developers to leverage the power of TensorFlow's extensive ecosystem.

Integrating Clojure with popular machine learning libraries like TensorFlow and PyTorch allows you to complete any ML task your project might require and makes it even more universal and suitable for supporting and scaling projects that use a different technology stack from the beginning. 

Supportive community

Clojure's data-science community deserves special attention. It is pretty active and always ready to help, as is the case for the Clojure community in general. The members include the creators of many tools you might want to use, so your development team can find support and answers to any questions, ensuring there are no blockers on their way. 

The majority of discussions happen in the Clojurian's Zulip chat, specifically in the #data-science stream, where you can get in touch in case of encountered problems. 

Future of machine learning with Clojure

All current achievements of Clojure in this field are not final, as the language community keeps expanding its capabilities. As the field of machine learning continues to evolve, Clojure is likely to play a significant role in enabling developers to build scalable, maintainable, and efficient machine learning solutions.

The future of Clojure's machine learning ecosystem may bring enhancements to existing libraries, development of new machine learning algorithms, improvements in tooling for model deployment and monitoring, and even more wrappers and integrations with key technologies.

Real-world examples of Machine Learning in Clojure 

Even though Clojure is not the most popular choice for ML development yet, the examples of its usage already can be found in various industries and domains, including e-commerce, healthcare, and manufacturing. Still, the most common choice of Clojure development is for financial services. They include personalized recommendation systems for e-commerce platforms, fraud detection algorithms for financial services, disease diagnosis models for healthcare applications, and predictive maintenance systems for industrial equipment.

Still have some doubts? Let’s explore a couple of examples: 

  • Funding Circle - a peer-to-peer lending marketplace, has been known to use Clojure in its backend systems;
  • Nubank - a Brazilian fintech company, has embraced Clojure for various data-related tasks;
  • LinguaSys - a natural language processing company, used Clojure to develop their software solutions, including natural language processing algorithms;
  • Kira - a startup machine learning software for smart and transparent work with contracts and documents, relies on Clojure for the ML part.

Considering Clojure's benefits, like expressive syntax, functional programming features, and interoperability with existing libraries, it’s no surprise that this technology is steadily winning over its audience. Developers can use Clojure to build robust and scalable machine-learning solutions to address real-world challenges and drive business value.

Summing up 

Clojure's naturally great data processing features, immutable data structures, and ease of interactive development and debugging make it a competitive choice for machine learning projects. The functional approach enables modularization and efficient data manipulation, while the JVM base optimizes portability and performance. 

With a developed ecosystem of ML libraries and tools as well as seamless integration with popular frameworks, Clojure offers comprehensive capabilities for ML tasks. Moreover, its supportive community provides assistance and collaboration opportunities.

Although Clojure’s ecosystem may not be as mature as Python or R, I believe Clojure is a solid choice for machine learning and data science projects. It is growing and thriving, with many promising developments on the way. You should definitely give it a try.

{{about-druk-blue="/material/static-element"}}

Shall we discuss
your idea?
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
What happens after you fill this form?
We review your inquiry and respond within 24 hours
We hold a discovery call to discuss your needs
We map the delivery flow and manage the paperwork
You receive a tailored budget and timeline estimation