There is a revolution happening right now in computing. Computers are becoming capable of many tasks that were previously considered only achievable by humans. As an example, back around 2011, if you asked an expert if a computer could tell the difference between a picture of a cat and a dog, they would probably tell you that it’s a hard problem. They are both furry creatures of varying colors that can have pictures taken from so many angles and in so many ways. How could a computer possibly figure this out? Today, it’s safe to say that this problem has been solved. And a whole lot of other challenging problems have been solved along with it.
The driving force behind these advancements is a field called machine learning. Machine learning is when a computer learns by example instead of by strict rules that have been programmed. Specifically, there are algorithms called neural networks, deep neural networks, or deep learning that have been making huge advancements in the field. Neural networks borrow some ideas from biology in an effort to mimic the way a human brain works. Deep neural networks and deep learning build on the basic neural network algorithms in a way that lets them learn higher-level concepts.
Let’s look at one example called the ImageNet challenge. ImageNet is a collection of images that are all tagged with a word describing what is in the image. Every year there is a challenge where teams compete to have their computer programs recognize these images. In 2011, the error rate of the best program was about 26%. The way they score this is that out of many images, the computer has to guess what those images are from 1000 categories – various things like different dog breeds, plants and buildings. The computer has 5 guesses per image and if it can’t guess correctly, it is considered to have failed that image. In 2012, a deep learning approach was used for the first time to win the challenge. Since then, the error rate has been almost cut in half every year. At the time of this writing, the error rate is 3.08%. This looks even more impressive when you look at the human score for this challenge. One person tried to do this challenge himself, so there would be a reference for human performance. He got 5.1% error. So it’s safe to say that computers are pretty good at image recognition now, and this is something they have been historically bad at.
Before getting to other examples and applications of neural nets, I’d like to explain a little about how the image recognition works.
Machine learning differs from other ways of programming a computer because it learns from examples. Usually, when you program a computer, you give it exact rules to follow. As an example, if you want to make software to recognize an image of a tree, you could write a program that says, “If the image is green on top and brown on the bottom, then it’s a tree.” That fails pretty quickly though with different kinds of trees and different lighting, and of course, in the fall when leaves turn red. You can solve that by writing more rules about what makes a tree a tree, but you quickly realize you’re fighting a losing battle. The machine learning approach is to show a computer program images of thousands or millions of trees, and have it learn the patterns in the images automatically based on patterns it finds.
More recently, many of the techniques that have been gaining traction are called “deep learning.” Deep learning is machine learning, but instead of looking for simple patterns, it is able to look for patterns-of-patterns, or patterns-of-patterns-of-patterns, and so on. By doing that, the deep learning system can start to understand higher-level concepts. In the case of image recognition, it will start by recognizing simple patterns like edges. From there, it will look for patterns-of-patterns – things that you can make from the simple edges, like corners or circles. From there, it can start recognizing higher-level concepts, like if it sees a car, maybe it can start to put together the headlights or wheels from those edges and circles. And then finally, it can put all the patterns that make up car pieces into a whole car. Each one of these stages of finding patterns is called a “layer” of the neural network and it’s the fact that these systems use many layers that is the reason this is called “deep.” The “learning” part of “deep learning” is because all of these patterns that the computer looks for are learned from examples, not from manually designed rules.
Image recognition is just one example of how new machine learning techniques are changing what computers can do. Machine learning is generally good at problems where computers need to understand and/or predict real-world data that is not exact. Other examples are things like speech recognition and understanding natural language. There are limitless applications of this technology and every industry on the planet will be affected by it if it hasn’t been already. In some applications, it will be easy to tell that machine learning is being used. Voice recognition on your phone primarily uses machine learning. For other applications, it will be less obvious that machine learning is involved. Better image recognition can help in the process of understanding medical images. There are companies working on medical diagnosis using the same image recognition technologies that are winning the ImageNet challenge.
My work uses deep learning to recognize images of text and translate them between different languages in real-time on a phone It shows that we can now take these neural networks and run them on phones, which are much less powerful than your typical desktop or cloud computers. Even though machine learning is the underlying technology, the user of the software doesn’t necessarily know that. To them, it is an app that they can use to break down language barriers.
Convolutional Neural Network (CNN): This type of neural network is very good at image recognition. The ImageNet challenge winners tend to use variations of this algorithm.
Long short term memory (LSTM): This type of neural network is good at understanding or predicting sequences of data. Things like speech or natural language will often be handled by LSTMs.
Deep learning / deep neural networks: Usually when people talk about “deep” learning, they are talking about CNNs or LSTMs. These networks recognize patterns and then feed those patterns into another stage of the neural network that then recognizes patterns of patterns. This process can be repeated many times to learn higher-level concepts.
In the past (80s / 90s), neural nets were hyped up and then didn’t live up to the hype. This time things are different. Even if progress in the field were to stop this instant, the progress we have seen so far would still be quite significant and game changing for many industries. But it’s not stopping. Every month exciting new research is released that pushes the boundaries and has people rethinking what was considered cutting edge last month.
The initial spark for this recent progress came primarily from computers getting faster and from access to more data. Now we are also seeing so much research effort directed at this problem, that there are many significant algorithmic improvements happening. So let’s look at each of these 3 things – performance, data, and algorithms:
1. Computer performance improvements overall have slowed a bit in recent years, but there are companies making hardware specifically designed for neural networks. So performance will continue to improve for neural networks, allowing for more capable machine learning systems and allowing complex applications to run on lower power processors like those found in phones.
2. More and more things are going online around the world and with that comes more data. Data quantity, quality, and diversity will continue to improve. This data can then be used to train machine learning systems.
3. More attention is being paid to the field of machine learning, and with that, more research and investment is happening in companies in the space. There will continue to be algorithmic progress.
Progress in machine learning is not slowing down. There are applications of this technology that will deeply affect every industry. This will be a revolution as big as, or bigger than, personal computers, the internet, or mobile phones. Machine learning is the next underlying technology.