I’ve been studying and writing about deep learning (DL) for a few years now, and it still amazes the misinformation surrounding this relatively complex learning algorithm. This post is not about how deep learning is or is not over-hyped, as that is a well documented claim. Rather, it’s a jumping off point for a (hopefully) fresh, concise understanding of deep learning and its implications on artificial general intelligence (AGI). I’m going to be bold and try to make some claims on the role that this field of study will or will not play in the genesis of AGI. With all of the news on AI breakthroughs and non-industry commentators drawing rash conclusions about how deep learning will change the world, don’t we, as the deep learning community, owe it to the world to at least have our own camp in order?
[Note added 05/05/16: Please keep in mind that this is a blog post, not an academic paper. My goal was to express my thoughts and inspire some discussion about how we should contextualize deep learning, not to lay out a deeply technical argument. Obviously, a discussion of that magnitude could not be achieved in a few hundred words, and this post is aimed at the machine learning layman none-the-less. One body of text cannot be all things to all people.]
Even the most academic among us mistakenly merge two very different schools of thought in our discussions on deep learning:
- The benefits of neural networks over other learning algorithms.
- The benefits of a “deep” neural network architecture over a “shallow” architecture.
Much of the debating going on is surprisingly still concerned with the first point instead of the second. Let’s be clear — the inspiration for, benefits of, and detriments against neural networks are all well documented in the literature. Why are we still talking about this like the discussion is new? Nothing is more frustrating when discussing deep learning that someone explaining their views on why deep neural networks are “modeled after how the human brain works” and thus are the key to unlocking artificial general intelligence. This is an obvious Straw man, since this discussion is essentially the same as was produced when vanilla neural networks were introduced.
The idea I’d like for you to take away here is that we are not asking the right question for the answer which we desire. If we want to know how one can contextualize deep neural networks in the ever-increasing artificially intelligent world, we must answer the following question: what does increasing computing power and adding layers to a neural network actually allow us to do better than a normal neural network? Answering these could yield a fruitful discussion on deep learning.
If the first question is worn-out, let’s take on the second question: I believe that deep neural networks are more useful than traditional neural networks for three reasons:
- The automatic encoding of features which previously had to be hand engineered.
- The exploitation of structurally/spatially associated features.
- Configurability through the use of stackable layers.
At the risk of sounding bold, that’s largely it. These are the only three benefits that I can recall in my time working with deep learning.
Assuming my above statements were true, what would we expect to see in the deep learning landscape? We might expect that deep neural networks would be most useful in learning problems where the data has some spatial qualities that can be exploited, such as image data, audio data, natural language processing, etc. Although we might say there are many areas that could benefit from that spatial exploitation, we would certainly not find that this algorithm was a magical cure for any data that you throw at it. We might find that deep learning helps self-driving cars perceive their environment through visual and radar-based sensory input, but not a network that can decide whether to protect it’s own driver or the pedestrian walking the street. Those that read the paper on AlphaGo will note that deep learning was simply a tool used by traditional AI algorithms.
Since I am feeling especially bold today, I will make another prediction: deep learning alone will not produce artificial general intelligence. There is simply not enough there to create such a complex system. I do think it’s not unreasonable to expect it will be used as a sensory processing system leveraged by more traditional artificial intelligence systems.
Stop studying deep learning thinking it will lay all other algorithms to waste. Stop throwing deep learning at every dataset you see. Start experimenting with these technologies outside of the “hello world” examples in the packages you use — you will quickly learn what they are actually useful for. Most of all, let’s stop viewing deep learning as a proof that we have almost achieved AGI and start viewing it for what it truly is: a tool that is useful in assisting a computer’s ability to perceive.