Come on people — let’s get our act together on deep learning. I’ve been studying and writing about DL for close to two years now, and it still amazes the misinformation surrounding this relatively complex learning algorithm.

This post is not about how deep learning is or is not over-hyped, as that is a well documented debate. Rather, it’s a jumping off point for a (hopefully) fresh, concise understanding of deep learning and its implications. This discussion/rant is somewhat off the cuff, but the whole point was to encourage those of us in the machine learning community to think clearly about deep learning. Let’s be bold and try to make some claims based on actual science about whether or not this technology will or will not produce artificial intelligence. After all, aren’t we supposed to be the leaders in this field and the few that understand its intricacies and implications? With all of the news on artificial intelligence breakthroughs and non-industry commentators making rash conclusions about how deep learning will change the world, don’t we owe it to the world to at least have our shit together? It feels like most of us are just sitting around waiting for others to figure that out for us.

[Note added 05/05/16: Keep in mind that this is a blog post, not an academic paper. My goal was to express my thoughts and inspire some discussion about how we should contextualize deep learning, not to lay out a deeply technical argument. Obviously, a discussion of that magnitude could not be achieved in a few hundred words, and this post is aimed at the machine learning layman none-the-less. I leave that technical discussion as an exercise to the readers (feel free to email me). One article cannot be all things to all people.]

The Problem

Even the most academic among us mistakenly merge two very different schools of thought in our discussions on deep learning:

  1. The benefits of neural networks over other learning algorithms.
  2. The benefits of a “deep” neural network architecture over a “shallow” architecture.

Much of the debating going on is surprisingly still concerned with the first point instead of the second. Let’s be clear — the inspiration for, benefits of, and detriments against neural networks are all well documented in the literature. Why are we still talking about this like the discussion is new? Nothing is more frustrating when discussing deep learning that someone explaining their views on why deep neural networks are “modeled after how the human brain works” (much less true than the name suggests) and thus are “the key to unlocking true artificial intelligence”. This is an obvious Straw man, since this discussion is essentially the same as was produced when plain old neural networks were introduced.

The idea I’d like for you to take away here is that we are not asking the right question for the answer which we desire. If we want to know how one can contextualize deep neural networks in the ever-increasing artificially intelligent world, we must answer the following question: what does increasing computing power and adding layers to a neural network actually allow us to do better than a normal neural network? Answering these could yield a truly fruitful discussion on deep learning.

The Answer (?)

Here is my personal answer to the second question: deep neural networks are more useful than traditional neural networks for two reasons:

  1. The automatic encoding of features which previously had to be hand engineered.
  2. The exploitation of structurally/spatially associated features.

At the risk of sounding bold, that’s it — if you believe there is another benefit which is not somehow encompassed by these two traits, please let me know. These are the only two that I have come across in all my time working with deep learning.

[Note added 05/05/16: In the responses, the most common possible third benefit is the configurability of layers within a model. Although this is quite true, it’s not new, nor is it philosophically unique to deep learning. Fundamentally, this is just a more frictionless version of pipelining, which we have been doing for a while. I have not (yet) heard a good argument against this proof by decomposition.]

If this was true, what would we expect to see in the academic landscape? We might expect that deep neural networks would be useful in situations where the data has some spatial qualities that can be exploited, such as image data, audio data, natural language processing, etc. Although we might say there are many areas that could benefit from that spatial exploitation, we would certainly not find that this algorithm was a magical cure for any data that you throw at it. The words “deep learning” will not magically cure cancer (unless you find some way to spatially exploit data associated with cancer, as has been done with the human genome), there is no reason to believe it will start thinking and suddenly become sentient. We might see self-driving cars that assist in simply keeping the car between the lines, but not one which can decide whether to protect it’s own driver or the pedestrian walking the street. Hell, even those that actually read the papers on AlphaGo will realize that deep learning was simply a tool used by traditional AI algorithms. Lastly, we might find that, once again, that the Golden mean is generally spot on, and that deep learning is not the answer to all machine learning problems, but also not completely baseless.

Since I am feeling especially bold, I will make another prediction: deep learning will not produce the universal algorithm. There is simply not enough there to create such a complex system. However, deep learning is an extremely useful tool. Where will it be most useful in AI? I predict it will be as a sensory learning system (vision, audio, etc) that exploits some spatially characteristics in data that otherwise go unaccounted for, which, like in AlphaGo, must be used by a truly artificially intelligent system as an input.

Stop studying deep learning thinking it will lay all other algorithms to waste no matter the scenario. Stop throwing deep learning at every dataset you see. Start experimenting with these technologies outside of the “hello world” examples in the packages you use — you will quickly learn what they are actually useful for. Most of all, let’s stop viewing deep learning as the “almost there!!!” universal algorithm and start viewing it for what it truly is: a tool that is useful in assisting a computer’s ability to perceive.