Showing posts with label hardware virtualization. Show all posts

Google’s Making Its Own Chips Now. Time for Intel to Freak Out

The Internet’s most powerful company sent a few shock waves through the tech world yesterday when it revealed that a new custom-designed chip helps run what is surely the future of its vast online empire: artificial intelligence.

Google’s Making Its Own Chips Now. Time for Intel to Freak Out

In building its own chip, Google has taken yet another step along a path that has already remade the tech industry in enormous ways. Over the past decade, the company has designed all sorts of new hardware for the massive data centers that underpin its myriad online services, including computer servers, networking gear, and more. As it created services of unprecedented scope and size, it needed a more efficient breed of hardware to run these services. Over the years, so many other Internet giants have followed suit, forcing a seismic shift in the worldwide hardware market.

With its new chip, Google’s aim is the same: unprecedented efficiency. To take AI to new heights, it needs a chip that can do more in less time while consuming less power. But the effect of this chip extends well beyond the Google empire. It threatens the future of commercial chip makers like Intel and nVidia—particularly when you consider Google’s vision for the future. According to Urs Hölzle, the man most responsible for the global data center network that underpins the Google empire, this new custom chip is just the first of many.

No, Google will not sell its chips to other companies. It won’t directly compete with Intel or nVidia. But with its massive data centers, Google is by far the largest potential customer for both of those companies. At the same time, as more and more businesses adopt the cloud computing services offered by Google, they’ll be buying fewer and fewer servers (and thus chips) of their own, eating even further into the chip market.

Google’s Making Its Own Chips

Indeed, Google revealed its new chip as a way of promoting the cloud services that let businesses and coders tap into its AI engines and build them into their own applications. As Google tries to sell other companies on the power of its AI, it’s claiming—in rather loud ways—that it boasts the best hardware for running this AI, hardware that no other company has.

Google’s Need for Speed
Google’s new chip is called the Tensor Processing Unit, or TPU. That’s because it helps run TensorFlow, the software engine that drives the Google’s deep neural networks, networks of hardware and software that can learn particular tasks by analyzing vast amounts of data. Other tech giants typically run their deep neural nets with graphics processing units, or GPUs—chips that were originally designed to render images for games and other graphics-heavy applications. These are well-suited to running the types of calculations that drive deep neural networks. But Google says it has built a chip that’s even more efficient.

According to Google, it tailored the TPU specifically to machine learning so that it needs fewer transistors to run each operation. That means it can squeeze more operations into the chip with each passing second.


For now, Google is using both TPUs and GPUs to run its neural nets. Hölzle declined to go into specifics on how exactly Google was using its TPUs, except to say that they handle “part of the computation” needed to drive voice recognition on Android phones. But he said that Google would be releasing a paper describing the benefits of its chip and that Google will continue to design new chips that handle machine learning in other ways. Eventually, it seems, this will push GPUs out of the equation. “They’re already going away a little,” Hölzle says. “The GPU is too general for machine learning. It wasn’t actually built for that.”

That’s not something nVidia wants to hear. As the world’s primary seller of GPUs, nVidia is now pushing to expand its own business into the AI realm. As Hölzle points out, the latest nVidia GPU offers a mode specifically for machine learning. But clearly, Google wants the change to happen faster. Much faster.

The Smartest Chip
In the meantime, other companies, most notably Microsoft, are exploring another breed of chip. The field-programmable gate array, or FPGA, is a chip you can re-program to perform specific tasks. Microsoft has tested FPGAs with machine learning, and Intel, seeing where this market was going, recently acquired a company that sells FPGAs.

Some analysts think that’s the smarter way to go. An FPGA provides far more flexibility, says Patrick Moorhead, the president and principal analyst at Moor Insights and Strategy, a firm that closely follows the chip business. Moorhead wonders if the new Google TPU is “overkill,” pointing out that such a chip takes at least six months to build—a long time in the incredibly competitive marketplace in which the biggest Internet companies compete.

But Google doesn’t want that flexibility. More than anything, it wants speed. Asked why Google built its chip from scratch rather than using an FPGA, Hölzle said: “It’s just much faster.”

Core Business
Hölzle also points out that Google’s chip doesn’t replace CPUs, the central processing units at the heart of every computer server. The search giant still needs these chips to run the tens of thousands of machines in its data centers, and CPUs are Intel’s main business. Still, if Google is willing to build its own chips just for AI, you have to wonder if it would go so far as to design its own CPUs as well.

Hölzle plays down the possibility. “You want to solve problems that are not solved,” he says. In other words, CPUs are a mature technology that pretty much works as it should. But he also said that Google wants healthy competition in the chip market. In other words, it wants to buy from many sellers—not just, say, Intel. After all, more competition means lower prices for Google. As Hölzle explains, expanding its options is why Google is working with the OpenPower Foundation, which seeks to offer chip designs that anyone can use and modify.

That’s a powerful idea, and a potentially powerful threat to the world’s biggest chip makers. According to Shane Rau, an analyst with research firm IDC, Google buys about 5 percent of all server CPUs sold on Earth. Over a recent year-long period, he says, Google bought about 1.2 million chips. And most of those likely came from Intel. (In 2012, Intel exec Diane Bryant told WIRED that Google bought more server chips from Intel than all but five other companies—and those were all companies that sell servers.)

Whatever its plans for the CPU, Google will continue to explore chips specifically suited to machine learning. It will be several years before we really know what works and what doesn’t. After all, neural networks are constantly evolving as well. “We’re learning all the time,” he says. “It’s not clear to me what the final answer is.” And as it learns, you can bet that the world’s chip makers will be watching.
 
Though it didn’t attract a ton of attention at the time, back in 2013 ARM announced the ARMv8-R architecture.
ARM Announces the Cortex-R52 CPU

An update for ARM’s architecture for real-time CPUs, ARMv8-R was developed to further the real-time platform by adding support for newer features such as virtualization and memory protection. At the time the company didn’t announce any specific CPU designs for the architecture, but rather just announced the architecture on its own.
ARM’s architecture for real-time CPUs

Now just under 3 years later, ARM is announcing their first ARMv8-R CPU design this evening with the Cortex-R52. An upgrade of sorts to ARM’s existing Cortex-R5, the R52 is the company’s first implementation of ARMv8-R. R52 makes specific use of many of the new features enabled by the architecture, while improving performance at the same time. ARM is pitching the new CPU core at markets that need a safety-critical CPU – a market that the Cortex-R series has been in for a while – where the deterministic nature of the CPU’s execution model is critical to ensuring quick and accurate execution.
first ARMv8-R CPU design

While the focus on today’s CPU design announcement is on functionality and utility over microarchitecture, ARM has revealed a bit about how the Cortex-R52 is organized under the hood. The microarchitecture is a direct evolution of the previous Cortex-R5. This means we’re looking at a dual-issue in-order execution pipeline, with a pipeline length of 8 stages. Broadly speaking, this description is very similar to that of the better-known Cortex-A7/A53 cores, which implies that this is a real-time optimized version of the basic elements in that design.

As the Cortex-R series is focused on determinism and real-time responsiveness over total performance, ARM doesn’t heavily promote these cores on the basis of performance. But at least within the Cortex-R family, they are talking about a performance increase of upwards of 35% in common CPU benchmarks. More important for this market than throughput however is responsiveness: for the R52, ARM has done some specific work to improve interrupt entry and context switching performance, doubling the former and achieving a staggering 14-fold increase on the latter.

ARM avoiding a virtual memory system
The big deal here of course is the deterministic nature of the CPU. The entire microarchitecture is optimized to avoid variable time, non-deterministic operations, which is why it’s an in-order processor to begin with. This design extends to how memory is managed as well, with ARM avoiding a virtual memory system and its associated TLB translation-misses in favor of a model they call the Protected System Memory Architecture (PSMA), which is used in conjunction with an MPU to handle memory operations without the translation.

On the safety side of matters, the R52 has a few different error-resiliency features to ensure accuracy. Multi-core lock step returns for this design, allowing two R52 cores to execute the same task in parallel for redundancy. And on the memory side of matters, ECC is offered across both the memory busses and the memory itself, in order to avoid random bitflips
ARMv8-R, Cortex-R52 implements support for hardware virtualization
Meanwhile in terms of new functionality for hardware developers, as part of ARMv8-R, Cortex-R52 implements support for hardware virtualization. Like virtually everything else in R52, this is deterministic as well, with the hypervisor working with the MPU to offer each guest OS its own section of the physical memory space. According to ARM this is a particularly important advancement, as previous means of separating tasks on real-time CPUs were non-deterministic, which is an obvious problem for the target market.
significance of virtualization in a real-time processor
The significance of virtualization in a real-time processor is that it allows for multiple tasks to be executed on the R52 without interfering with each other. In large, complex devices (e.g. cars), this allows for fewer processors within the device, as these tasks can be consolidated onto a smaller number of processors. At the same time, the rigid separation between the tasks means that it’s possible to run both safety-critical and non-critical (but still real-time) tasks on an R52 together, knowing that the latter will not interrupt the safety-critical tasks. For cars and other devices where there is stringent safety certification, this is especially useful as it means that other tasks can be added (via their own guest OS) without invalidating the certifications of the safety-critical tasks.
Cortex-R52, ARM is pushing the big three traditional markets

This is also why ARM’s earlier context switching and interrupt entry improvements are so important. With a hypervisor now in play and multiple tasks executing on a single processor, the vastly improved ability to switch between tasks is critical for allowing multi-tasking without a major performance hit from context switching overhead.

Finally, for the potential market for the Cortex-R52, ARM is pushing the big three traditional markets for real-time and safety-critical processors; automotive, industrial, and medical. All three of these make significant use of real-time functionality, and there’s also a great deal of overlap on safety as well.
Advanced Driver Assistance Systems (ADAS) market

ARM is particular interested in the Advanced Driver Assistance Systems (ADAS) market, where the Cortex-R is part of a full system of ARM IP. A full ADAS setup from start to end would utilize all three processor types – M, R, and A – with the Cortex-R handling the real-time decision making and executing on those decisions, while Cortex-A would be used to handle sensor perception/interpretation, and Cortex-M would be in many of the individual sensors.
ARMv8-R and will be in need of a CPU design
Wrapping things up, as with most other ARM IP announcements, the announcement of the Cortex-R52 is setting the stage for future products. ARM isn’t talking about specific customers at this time, but they already have a number of companies who have licensed ARMv8-R and will be in need of a CPU design to go with it. To that end, we should be seeing Cortex-R52 start appearing under the hood of various devices in the coming years.

Welcome to ARM Technology
Powered by Blogger.

Latest News

Newsletter

Subscribe Our Newsletter

Enter your email address below to subscribe to our newsletter.

- Copyright © ARM Tech -Robotic Notes- Powered by Blogger - Designed by HPC Appliances -