Exploring the technologies of today behind machine learning programming
Over the past decade machine learning (ML) has gained significant popularity in both the technology industry as well as academia, and is still gaining more prominence today.
As a subfield of artificial intelligence (AI), the goal of machine learning is to design and train a program to perform a task as efficiently as possible.
Machine learning programming is a big deal for the prospects of artificial intelligence- it is bringing new capabilities to software we use every day, especially in smartphones.
There are now many applications of machine learning in the consumer space that we frequently use that did not exist 10 years ago – that is quite hard to believe considering our reliance on these applications today.
Machine Learning vs Deep Learning
This article will focus on the most prominent machine learning languages and frameworks of today, and it is important to understand the difference between the two.
If you were tasked with engineering an ML application today, you will most likely opt for a machine learning framework such as PyTorch or Tensorflow (or others we will cover further down) – standardised APIs with tooling for a range of machine learning algorithms and support in general.
The codebases of these frameworks are implemented in multiple programming languages, and are made available for multiple programming languages via language bindings.
ML frameworks have become industry standard tools. However, a range of programming languages have machine language centric tools and libraries also.
Each with their own strengths, some were used extensively before the aforementioned frameworks came into existence.
Programming Languages for Machine Learning
Let’s now discuss which programming languages you should be aware of today for your machine learning and greater artificial intelligence endeavours.
If you are looking for the best programming languages to learn this year with the latest trends and use cases, check out our published article: The Best Programming Languages to Learn in 2021.
This piece focuses on the entire field rather than just machine learning programming languages.
Python
Python takes the top spot as the most popular machine learning language. The language is now the 2nd most popular programming language period, overtaking Java in 2020.
The rise of machine learning has undoubtedly boosted Python’s standing as the number 2 spot, only behind JavaScript.
If you search for machine learning on Github, over 60% of returned code repositories are written in Python.
An even larger amount of results are Jupyter Notebooks, of which Python is the most widely used programming language of – it is rather safe to assume Python is widely adopted by the data scientist and machine learning engineers.
What has supported Python in its data science dominance is a range of optimised libraries such as NumPy (the number 1 scientific computing package focused on linear algebra), Scikit Learn (algorithms and tools for machine learning tasks), Matplot Lib (plotting and data visualisation), and Pandas (data analysis and manipulation).
On top of these general purpose tools, major frameworks in the form of Tensorflow, PyTorch and Keras have their primary implementations written for Python. Both Tensorflow and PyTorch do great jobs at managing data structures, feeding that data into highly configurable models, training and testing.
Tensorflow has the edge in production deployment, whereas PyTorch is easier to use for experimentation and research purposes – but both are cutting edge machine learning libraries the reader should be aware of.
These frameworks are specifically developed for deep neural networks designed for a range of tasks including price prediction, image classification, NLP (Natural Language Processing), transfer learning tasks, and many more.
The large developer community surrounding Python regularly updates this ecosystem of tools, both from individual developers to major corporations (Google maintains TensorFlow whereas Facebook’s machine learning engineers maintain PyTorch).
Python still has strong momentum – it is very reasonable to expect the programming language to maintain its top spot for machine learning tasks for the foreseeable future.
Get Familiar with Critical Machine Learning Python Tools
- Jupyter Notebook and Google Colab: Google Colab hosts Jupyter notebooks and provides GPU support.
- Tensorflow tutorials and PyTorch tutorials
JavaScript
JavaScript opens up machine learning to the web developer and the web browser.
It is not the most optimised programming language for machine learning tasks, but this is counterbalanced with a number of useful characteristics such as:
- Giving web developers the ability to deliver machine learning models to anyone in a web browser. Models can both be trained and used in this way.
- Data visualization and analysis tools are also very effective in JavaScript, whereby HTML5 and CSS3 can be leveraged to display real-time, interactive, detailed graphs in the browser.
- The Node.js runtime can be leveraged server-side to run backend services for a machine learning model, offering a complete machine learning solution as a web service.
The standout library is Tensorflow JS, allowing the machine learning engineer to develop ML models directly in the browser or with Node.js. There are a range of tutorials and open source models freely accessible online.
What is impressive is that this TensorFlow API closely resembles that of the primary Python implementation, giving JavaScript developers the framework without compromises.
For analytics and data visualization, The Visor and Surfaces API provides the tools to output data into graphs, otherwise known as UI helpers.
Other packages in the JavaScript ecosystem complement machine learning. The Node.js file system support makes fetching data from files seamless. Promises and asynchronous designs can parallelise heavy data calculations. Popular servers such as Express can accommodate communication and network security to other web services.
Little system level experience is needed when adopting JavaScript for machine learning, a strong reason why JavaScript is as popular as it is given its shortcomings in performance.
More on JavaScript Machine Learning Libraries
- TensorFlow JS home page
- ML5JS: A library built on top of Tensorflow JS offering more beginner friendly APIs.
R
The R programming language was specifically designed for statistical computing and graphics, so you might expect R to be a very good option amongst machine learning languages – and you’d be right.
R is widely used among statisticians and data miners for developing statistical software and data analysis tools. Because of this, artificial intelligence and machine learning applications are well suited with R.
R is an open source, multi-platform project that comes with packages out of the box for a range of ML tasks. It was the undisputed programming language of choice for researchers and data science practitioners until Python rose to prominence in the field.
R’s CRAN archive contains many useful tools for machine learning engineers. CARET for example integrates training and prediction of a model. randomForest handles classification and regression tasks via the random forest algorithm. e1071 implements Support Vector Machines (SVG), shortest path computation, and more. The list goes on. And yes, the neturalnet package implements neural networks and trains them with back propagation.
R is used in both industry and academia, and therefore may be the programming language of your future institution or employer if you undertake a career in machine learning.
However, R syntax requires experience in highly expressive languages, which leads to a challenge to read and understand ML algorithms coded in the language.
Useful R Resources
- RDocumentation: Browse the archive of R packages
- An Introduction to R (PDF)
- Hands-on machine learning with R
C++
C++ is an interesting language for machine learning as it plays multiple roles in the field:
- It provides core functionality to frameworks designed for other programming languages
- It hosts its own ML libraries for native C++ development for ML problems
Being so robust, these roles make C++ a valuable language for machine learning engineers and data scientists.
Turning our attention to point 1, we can see that Tensorflow’s codebase actually consists of 61.5% of C++ at the time of writing. This makes sense; C++ is a very general purpose programming language and has access to low level system resources, memory management and so on.
These are valuable traits where highly optimised calculations decrease required resources. This is partly why C++ is popular in the machine learning space.
There are notable libraries in C++ for machine learning, the most popular of which currently being MLPack, a fast and flexible machine learning library (would we expect anything less for a C++ implementation?). It hosts a range of machine learning utilities, not dissimilar to how R’s CRAN provides R machine learning tools out of the box.
If you are already accustomed to C++ MLPack is a good place to start experimenting. MLPack’s documentation is strong and the repository is regularly updated.
Importantly, MLPack also includes language bindings for R, Python, Julia and Go, allowing machine learning developers from these languages take advantage of the speed, robustness and efficiency of MLPack.
All these languages are therefore viable languages for machine learning, although Go has still not had a significant uptake from machine learning specialists.
That is very interesting for the future prospects of MLPack as it gives larger ML communities (such as those on Python and R) the ability to test drive, give feedback and ultimately expand MLPack.
Delve deeper into MLPack
Other Machine Learning Programming Languages
We have now covered the main players in the machine learning space of 2021, but there are other languages well suited for ML that are worth mentioning.
If you are accustomed to these languages, they also make for a good pathway into machine learning development.
Julia
Julia is another one of those languages that are well suited for machine learning algorithms (with machine learning, data science and visualization being primary use cases advertised on their homepage).
Julia code compiles to native binary code for multiple platforms with a focus on performance – again, these are very strong traits for ML.
MLJ.il is the flagship machine learning library for Julia, with many surrounding tools and utilities around it.
Scala (and Java)
Scala is gaining momentum among machine learning engineers and data analysts, mostly because of its tight integration with Apache Spark – a valuable tool for data scientists.
Apache Spark is a powerful data analytics engine for large scale data processing that is actually written in Scala. Spark actually comes with its own machine learning library, namely MLlib, that comes with a host of tools for common machine learning algorithms.
Note that this package support is a common theme for all the languages we have discussed. Rarely do programmers need to implement well-known statistical techniques on their own. A programming language should provide APIs through battle-tested libraries if they are to compete in the ML space, and Scala is no exception.
This is impressive support, but what makes Spark even more impressive is that MLlib works with Java, Scala, Python, and R too. This opens the doors for the huge number of Java developers (the now number 3 programming language by usage, behind Python) to scalable machine learning applications. This also enables Apache Spark to be used alongside the other popular frameworks we’ve discussed.
As briefly touched on above, MLlib works with Java, and as Scala is interoperable with Java, Java’s – and therefore Sun Microsystems’ – prospects in big data and ML dramatically increase by providing Java engineers and entry point into data management with Spark’s parallel processing, streaming and SQL support, and more, courtesy of Scala.
With both languages running on Java Virtual Machine (JVM), Java code can live alongside Scala code and vice-versa. This tightly couples Java with Scala.
Other Machine Learning Frameworks
While we’re talking of Apache, they also have their own deep learning framework called MXNet, that is competing against the two big players Tensorflow and PyTorch.
It is in large part written in C++ with support for a range of languages including Python (primary implementation), R, Julia, Scala, Go, and Javascript.
MXNet has great performance in a range of ML tasks, albeit a much steeper learning curve than the aforementioned frameworks.
Keras is another popular machine learning framework for Python with a focus on simplicity, making it popular for newcomers to the field.
You may also come across a library by Microsoft called the Microsoft Cognitive Toolkit, or CNTK, on your ML endeavours, but the library is now deprecated and therefore not recommended to use.
Summary
That brings us to the end of this overview of the machine learning landscape, hopefully identifying the best language for machine learning specifically for you.
The programming languages mentioned in this piece are all relevant in the discipline of data science, the encompassing field from which machine learning originated.
There are a wealth of machine learning projects with their corresponding repositories on GitHub to browse through and get familiar with in this rapidly growing economy.
The reader is encouraged to adopt the more widely adopted libraries and frameworks as they will be more prevalent in industry, but any of the languages mentioned here are effective pathways into learning machine learning.
Further Research into Developer Trends
If you would like to delve deeper into the current state of the overall developer landscape, the current trends, and more, we have included the following prominent developer surveys for further research: