We know that computers are better than people at crunching series of numbers, but what about tasks that are more complex? How do you teach a computer what a cat looks like? Or how to drive a car? Or how to play a complex strategy game? Or make predictions about the stock market? These are some of the most difficult tasks in artificial intelligence, far outstripping the capabilities of normal machine learning techniques. In these cases, computer scientists turn to neural networks.
What sets neural networks apart from other machine learning algorithms is that they make use of an architecture inspired by the neurons in the human brain. These networks turn out to be well-suited to modeling high-level abstractions across a wide array of disciplines and industries.
In this article, we’re going to try to cut through the buzzwords and look at what neural networks are, how they’re different from other machine learning algorithms, and how they’re being applied today.
MACHINE LEARNING, NEURAL NETWORKS, AND DEEP LEARNING.
To start, let’s define our terms. Machine learning, neural networks, and deep learning are all buzzwords right now, and they often get bandied about as though they’re the same. In reality they’re all related, but the distinctions are important.
- Machine learning is the branch of computer science that has to do with building algorithms that are guided by data. Rather than relying on human programmers to provide explicit instructions, machine learning algorithms use training sets of real-world data to infer models that are more accurate and sophisticated than humans could devise on their own.
- Within the field of machine learning, neural networks are a subset of algorithms built around a model of artificial neurons spread across three or more layers (we’ll get into the details shortly). There are plenty of other machine learning techniques that don’t rely on neural networks.
- Within neural networks, deep learning is generally used to describe particularly complex networks with many more layers than normal. The advantage of these added layers is that the networks are able to develop much greater levels of abstraction, which is necessary for certain complex tasks, like image recognition and automatic translation.
BUILDING A NEURAL NETWORK
The concepts underpinning neural networks have been around for decades, but it’s only in the last several years that the computing power has caught up. Distributed systems (like Hadoop’s MapReduce paradigm) mean that you no longer need a supercomputer to handle the massive calculations neural networks require–you can just spread the job out across clusters of commodity (read: cheap) hardware.
Neural networks are well-suited to identifying non-linear patterns, as in patterns where there isn’t a direct, one-to-one relationship between the input and the output. Instead, the networks identify patterns between combinations of inputs and a given output. Let’s say you’re building a system to distinguish between different types of animals–dogs, lizards, and dolphins–based on the presence (or absence) of various features. So, in our animal example, the presence of four legs or warm blood doesn’t do a good job of predicting whether an animal in question is a dog or not, since the former could also describe a lizard and the latter would also describe a dolphin. However, the combination of four legs and warm blood is a pretty good indicator (in our example at least) that we have a dog. Multiply the number of features and labels by a few thousand or million and you’ll have a good idea of how these networks work.
Many media reports describe artificial neural networks as working like the human brain, but this is a bit of an oversimplification. For one, the difference in scale is tremendous: While neural networks have increased in size, they still typically contain between a few thousand and a few million neurons, compared to the 85 billion or so neurons found in a normal human brain.
The other main difference lies in how these neurons are connected. In the brain, neurons can be connected to many other neurons nearby. In a typical neural network, however, information only flows one way. These neurons are spread across three layers:
- The input layer consists of the neurons that do nothing more than receive the data and pass it on. The number of neurons in the input layer should be equal to the number of features in your data set.
- The output layer consists of a number of nodes depending on the type of model you’re building. In a classification system, there will be one node for each type of label you might be applying, while in a regression system there will just be a single node that puts out a value.
- In between these two layers is where things get interesting. Here, we have what’s called the hidden layer, which also consists of a number of neurons (how many will depend on the number of neurons in your input and output layers, but don’t worry about that). The nodes in the hidden layer apply transformations to the inputs before passing them on. As the network is trained, those nodes that are found to be more predictive of the outcome are weighted more heavily.
Training the Model
One way to think of a neural network is to imagine a black box with dozens (or hundreds or millions) of knobs on the side. (This is actually how Yann LeCun, one of the pioneers of neural networks, likes to describe it.) For example, let’s say we’re trying to train a neural network to predict whether something is a picture of a cat or not. (This example is also taken from real life.) Training the model involves fiddling with those knobs until our output layer is able to correctly identify pictures of cats. Of course, when you have so many knobs, you can’t tweak them all manually–that’s where sophisticated machine learning algorithms come in. These algorithms automatically adjust our knobs (that is, they adjust the weighting functions of the various neurons in our hidden layers) until the model fits the data.
An important thing to note is that, while in this example we’re adjusting our knobs until we have a great cat detector, we could also tweak them a little until we have a great dog detector, or we could adjust them even more until we’ve got a submarine detector. The point here is that these structures are very generalized, meaning the same basic structure can be trained to answer any number of questions. That’s part of what makes artificial neural networks so powerful. The key is in what weighting functions and machine learning algorithms we employ.
Tools and Skills
Neural networks are at the cutting edge of machine learning and artificial intelligence. Implementing them requires expertise in statistical analysis, distributed systems, big data processing, and related fields.
Fortunately, there are a number of different libraries available that make designing and implementing neural networks relatively easy. Here are some of the most popular options:
- scikit-learn builds on the foundational Python libraries NumPy and SciPy by adding a set of algorithms for common machine learning and data mining tasks, including support for both supervised and unsupervised neural networks. As a library, scikit-learn has a lot going for it. Its tools are well-documented and its contributors include many machine learning experts. What’s more, it’s a very curated library, meaning developers won’t have to choose between different versions of the same algorithm. Its power and ease of use make it popular with a lot of data-heavy startups, including Evernote, OKCupid, Spotify, and Birchbox.
- Theano is a Python machine learning library that uses NumPy-like syntax to optimize and evaluate mathematical expressions. What sets Theano apart is that it takes advantage of the computer’s GPU in order to make data-intensive calculations up to 100x faster than the CPU alone. Theano’s speed makes it especially valuable for deep learning and other computationally complex tasks.
- TensorFlow is another high-profile entrant into machine learning, developed by Google as an open-source successor to DistBelief, their previous framework for training neural networks. TensorFlow uses a system of multi-layered nodes that allow you to quickly set up, train, and deploy artificial neural networks with large datasets. It’s what allows Google to identify objects in photos or understand spoken words in its voice-recognition app.
- Deeplearning4j is a Java-based library for implementing neural networks that’s been widely used for recommender systems, anomaly detection, and image recognition. It also comes with APIs that allow it to be used with more data-oriented languages like Scala, Python, and Clojure.
Upwork is a freelancing marketplace where businesses of all sizes can find talented professionals across multiple disciplines and categories. If you are a business and are looking to get projects done, consider signing up!