Since the early days of computing, there has always been this idea that artificial intelligence would one day change the world. We’ve seen this future depicted in countless pop culture references and by futurist thinkers for decades, yet the technology itself remained elusive. Incremental progress was mostly relegated to fringe academic circles and expendable corporate research departments.
That all changed five years ago. With the advent of modern deep learning, we’ve seen a real glimpse of this technology in action: Computers are beginning to see, hear, and talk. For the first time, AI feels tangible and within reach.
AI development today is centered around Deep Learning algorithms like convolutional networks, recurrent networks, generative adversarial networks, reinforcement learning, capsule nets, and others. The one thing all of these have in common is they take an enormous amount of computing power. To make real progress towards generalizing this kind of intelligence, we need to overhaul the computational systems that fuel this technology.
The 2009 discovery of the GPU as a compute device is often viewed as a critical juncture that helped usher in the Cambrian explosion around deep learning. Since then, the investment in parallel compute architectures has exploded. The excitement around Google’s TPU (Tensor Processing Unit) is a case in point, but the TPU is really just the beginning. New dedicated AI chip startups raised $1.5 billion in 2017 alone, a CB Insights spokesperson told my team. This is astonishing.
We’re already seeing new startups enter the scene, challenging incumbents like Intel, AMD, Nvidia, Microsoft, Qualcomm, Google, and IBM. Emerging companies like Graphcore, Nervana, Cerebras, Groq, Vathys, Cambricon, SambaNova Systems, and Wave Computing are some of the rising stars paving the way for a future powered by deep learning. Though these startups are certainly well funded, these are early days and we have yet to see who the winners will be and what will come of the old guard.
Nvidia brought GPUs into the mainstream as alternatives for AI and deep learning. The company’s calculated transition from a leader in consumer gaming to an AI chip company has been nothing short of brilliant. Moves like its $3 billion investment in Volta and deep learning software libraries like CUDA/cuDNN catapulted it from a leading position to total market dominance. Last year, its stock went through the roof, CEO Jensen Huang was named Businessperson of the Year by Fortune, and it gained a reputation as the “new Intel.”
But while Nvidia may look completely different on the outside, it’s still just churning out the same graphics cards it has been making for decades. But the future of GPUs as a technology for AI is uncertain. Critics argue that GPUs are packed with 20 years of cruft that is unfit for deep learning. GPUs are generic devices that can support a range of applications, including everything from physics simulations to cinematic rendering. And let’s not forget that the first use of GPUs in deep learning back in 2009 was essentially a hack.
The rise of ASICs
Companies attacking the chip market are making the case that AI will perform lightyears faster on specialized silicon. The most likely candidate is ASICs (application-specific integrated circuit), which can be highly optimized to perform a specific task.
If you think about chips as a progression from generic to specialized, the spectrum includes CPUs on the one side, then GPUs and FPGAs in the middle, and then ASICs at the other extreme.
CPUs are very efficient at performing highly-complex operations — essentially the opposite of the specific type of math that underpins deep learning training and inference. The new entrants are betting on ASICs because they can be designed at the chip level to handle a high volume of simple tasks. The board can be dedicated to a set of narrow functions — in this case, sparse matrix multiplication, with a high degree of parallelism. Even FPGAs, which are designed to be programmable and thus slightly more generalized, are hindered by their implicit versatility.
The performance speedup of dedicated AI chips is evident. So what does this mean for the broader technology landscape?
The future is decommoditized
GPUs are already not commoditized relative to CPUs, and what we’re seeing with the huge surge of investment in AI chips is that GPUs will ultimately be replaced by something even more specialized. There is a bit of irony here considering Nvidia came into existence with the premise that Intel’s x86 CPU technology was too generalized to meet the growing demand for graphics intensive applications. This time, neither Intel nor Nvidia are going to sit on the sidelines and let startups devour this new market. The opportunity is too great.
The likely scenario is that we’ll see Nvidia and Intel continue to invest heavily in Volta and Nervana (as well as their successors). AMD has been struggling due to interoperability issues (see software section below) but will most likely come up with something usable soon. Microsoft and Google are making moves with Brainwave and the TPU, and a host of other projects; and then there are all the startups. The list seems to grow weekly, and you’d be hard-pressed to find a venture capital fund that hasn’t made a sizable bet on at least one of the players.
Another wrinkle in the chip space is edge computing, where inference is computed directly on devices as opposed to in-cloud environments or company data centers. Models can be deployed directly on the edge to satisfy low-latency requirements (mobile) or make predictions on low-powered, intermittently-connected devices (embedded, IoT). There have been several announcements recently about edge-based AI accelerators, such as Google’s Edge TPU.
Open questions about the future
Perhaps the most significant challenge facing any newcomer in the chip space is surprisingly not hardware — it’s software. Nvidia has a stranglehold on the market with CUDA/cuDNN, which are software libraries that form a necessary abstraction layer that sits on top of the chip, enabling frameworks like TensorFlow and PyTorch to run without writing complex, low-level instructions. Without these high-level libraries, chips in general are difficult to target from a code perspective.
The problem is, CUDA and cuDNN are not open source. They are proprietary packages that can only run on Nvidia hardware. Before developers can leverage an ASIC, the provider needs to first find a new way to make their chip easily accessible to frameworks. Without this, there won’t be significant (if any) adoption by developers — developers will just stick with Nvidia because it works. There needs to be an open source equivalent to CUDA/cuDNN or frameworks that will need to be ported to target specific ASICs, like what Google did with the TPU and TensorFlow. This is a huge barrier without an obvious solution.
What does this all mean?
At least in the short term, we’ll see a plethora of chips, some competing directly against each other and others that focus on particular aspects of training and inference. What this means for the industry is that developers will have options, lots of them. Unlike the CPU market which is heavily commoditized, the industry looks like it’s headed towards a more diverse, heterogeneous, and application-specific future.
While we don’t know what the specific outcome will be, one thing is certain: The future of AI lies in purpose built ASICs — not commodity hardware.
Daniel Kobran is cofounder of GPU cloud platform Paperspace.