The battle to create the best artificial intelligence chips is underway. Intel is approaching this challenge from its position as a maker of central processing units (CPUs) or the Xeon microprocessors that dominate the datacenter market. Rival Nvidia is attacking from its position as a maker of graphics processing units (GPUs), and both companies are working on solutions that will handle ground-up AI processing.
Nvidia’s GPUs have already grabbed a good chunk of the market for deep learning neural network solutions, such as image recognition — one of the biggest breakthroughs for AI in the past five years. But Intel has tried to position itself through acquisitions of companies such as Nervana, Mobileye, and Movidius. And when Intel bought Nervana for $350 million in 2016, it also picked up Nervana CEO Naveen Rao.
Rao has a background as both a computer architect and a neuroscientist, and he is now vice president and general manager of the Artificial Intelligence Products Group at Intel. He spoke this week at an event where Intel announced that its Xeon CPUs have generated $1 billion in revenue in 2017 for use in AI applications. Rao believes that the overall market for AI chips will reach $8 billion to $10 billion in revenue by 2022.
To get there, Intel will probably have to architect AI chips from the ground up — and beat Nvidia and other startups to the punch. I talked to Rao about the competition at Intel’s Data Centric Innovation Summit this week in Santa Clara, California.
Here’s an edited transcript of our interview.
VentureBeat: There are some interesting numbers. The billion-dollar number for Xeon—it’s interesting in contrast to $130 billion total. It’s a start.
Naveen Rao: In the startup world that’s huge. You’re a $20 billion company all of a sudden. It’s the start of the market. AI is really just beginning. It’s the top of the second inning right now. We have a long way to go.
VentureBeat: It seems your strategy is covering the notion that AI chips need to be architected as AI chips, as opposed to CPUs or GPUs.
Rao: To a certain degree, yes.
VentureBeat: To what degree is that true, that you have to something more from the ground up for AI?
Rao: I’ll give you a great example from our competitor. They did almost exactly this. They took their GPU and they tacked on a tensor core, which is completely an AI thing. You could think of it like, their ecosystem is the GPU, so they stuck this other thing on the die. As opposed to what we do—our ecosystem is the whole computer. It’s the CPU. We build those capabilities into this kind of accelerator. That was good for their strategy, but it wasn’t leveraging the GPU. It was building something almost from the ground up for this problem.
I’d argue that there are already proof points to this end. But that being said, we’re evolving the CPU. There’s an arc of evolution on that, because it supports many different applications. It supports the scale we’ve come to love in the data center on CPU. We want to be careful with how we do it. We have to make sure we can maintain leadership on all the workloads that are important today, and then add new capabilities that are important for the workloads of tomorrow. The way we understand those workloads of tomorrow is building accelerators — which is really the tip of the spear — figuring out what capabilities make sense to move back into the host.
VentureBeat: Would Nvidia just come back and say, “You guys are just tacking on something to do better image recognition”? Do you still feel like we’re in this stage of adding on extras to existing things? What comes after that?
Rao: What comes next is understanding how to move the data effectively for AI workloads near the compute. That’s what the crest line has always been about. How do we effectively do that and accomplish maximum performance? You see it in a GPU today. The utilization is extremely low, because they haven’t done that holistic approach.
Again, it’s a great strategy for them, because they’re leveraging their platform to the hilt. That’s the right thing to do. Likewise, we’ve seen that—inference is on a very fast path to expansion of number of cycles in the data center. Given the position of Xeon in the data center, it’s natural that we add those capabilities. The workload mix shifts over time. This didn’t exist five years ago. Now it’s a billion dollars, as we said. In the context of the larger data center market, which is about $200 billion, that’s tiny. But it’s on a very rapid trajectory for expansion.
VentureBeat: With Nervana, are you getting more into the ground-up AI designs?
Rao: That’s right.
VentureBeat: When you’re doing this, what is totally different, compared to adding things to a CPU?
Rao: The way you manage your data, you don’t generally have automatically managed caches. That’s one aspect. The commitment to a particular data type—you don’t need to support 100 different workloads. You support the ones that matter for AI. You can be more precise about your data type. That has impacts on how many wires you connect everything with. You can optimize the interconnect on the chip, even, based around that, which gives you a performance bump.
The capabilities of distributing workloads—you don’t need to be general. You’re not going to do many different kinds of parallel distributed computing. You’re going to do a particular set of them. You can build the constructs for those sets. It allows you to be more targeted in your technology and get something to work. If you try to boil the ocean from the start you’ll never do it.
VentureBeat: Are there going to be different solutions for inference and training, more specialization?
Rao: I think we’re going to see—you can call them specialized solutions, but really they’re dials tuned in different direction. What we’re already seeing merging is that TCO, performance per watt, is extremely important for inference. You do massive scale. It’s tied in to your application. Tying into the stack and very good TCO are important.
For training, TCO is less important. People are okay with things not being fully occupied all the time, because when the engineer calls and wants a training job done, they want it done as fast as possible. Max performance is greater. The way you can look at that from a technology perspective, one down—okay, where does the power go in my training solution? Memory interfaces are a big part of it, and compute.
On my inference solution I don’t need the memory. It’s not as memory-intensive. I can turn that down. I use different memory technologies. If you’re building a scale-out inference technology, you’re not going to use HBM. It’s too power-hungry. It has better performance, but you don’t need that, so let’s use something lower-power, with better performance per watt. Your caches can be different. The residency of what’s actually used on the chip is different.
Again, you tweak all these parameters. They work like a family of products, but they’re actually different knobs twisted different ways to optimize for a particular task.
VentureBeat: It seemed interesting that for self-driving cars, you have multiple kinds of solutions for different things, whether it’s in the vehicle or in the cloud communicating back to the vehicle. I talked with Vinod Dham about his new startup, AlphaICs. He’s saying that strong AI and agent-based AI may be more necessary to solve the self-driving problem. How seriously is that becoming the direction everyone has to go?
Rao: Let me give you more of a contextual answer for that. If we look at a brain, and we look at parts of it, there’s the ability to process the visual environment, to segment it. Those are plants. That’s a sidewalk. That’s a light. That’s one aspect. That’s what we’re doing today. What we call AI is really processing complex data and simplifying it into things that are potentially actionable. When I build a self-driving car I segment where roads are, where pedestrians are, and feed that into some more declarative coding. “If child in front of me, apply brakes.”
The next level of AI is continually learning and being attached to the environment. As a neuroscientist, this was something I studied. If I want to accomplish a movement, take my hand and move it to this point in space, I have to figure out what motor command needs to be sent, and actually predict what the sensory consequences will be. What will it look like to my eyes and feel like to the sensors in my arm when it achieves that position? If there’s a mismatch, something wrong happened. How do I fix that mismatch? I learn that from my mistakes.
It’s a continually evolving loop of action, environment, consequence, and learning. That’s where we want to get to. I do believe that to get to full autonomy for a robot in the world – which is what an autonomous car is – we probably do need to solve some of those problems. We’re not quite there yet.