Showing posts with label hardware virtualization. Show all posts
Google’s Making Its Own Chips Now. Time for Intel to Freak Out
Monday, 17 October 2016
Posted by ARM Servers
Google’s Making Its Own Chips Now. Time for Intel to Freak Out
The
Internet’s most powerful company sent a few shock waves through the tech world
yesterday when it revealed that a new custom-designed chip helps run what is
surely the future of its vast online empire: artificial intelligence.
In
building its own chip, Google has taken yet another step along a path that has
already remade the tech industry in enormous ways. Over the past decade, the
company has designed all sorts of new hardware for the massive data centers
that underpin its myriad online services, including computer servers,
networking gear, and more. As it created services of unprecedented scope and
size, it needed a more efficient breed of hardware to run these services. Over
the years, so many other Internet giants have followed suit, forcing a seismic
shift in the worldwide hardware market.
With
its new chip, Google’s aim is the same: unprecedented efficiency. To take AI to
new heights, it needs a chip that can do more in less time while consuming less
power. But the effect of this chip extends well beyond the Google empire. It
threatens the future of commercial chip makers like Intel and
nVidia—particularly when you consider Google’s vision for the future. According
to Urs Hölzle, the man most responsible for the global data center network that
underpins the Google empire, this new custom chip is just the first of many.
No,
Google will not sell its chips to other companies. It won’t directly compete
with Intel or nVidia. But with its massive data centers, Google is by far the
largest potential customer for both of those companies. At the same time, as
more and more businesses adopt the cloud computing services offered by Google,
they’ll be buying fewer and fewer servers (and thus chips) of their own, eating
even further into the chip market.
Indeed,
Google revealed its new chip as a way of promoting the cloud services that let
businesses and coders tap into its AI engines and build them into their own
applications. As Google tries to sell other companies on the power of its AI,
it’s claiming—in rather loud ways—that it boasts the best hardware for running
this AI, hardware that no other company has.
Google’s
Need for Speed
Google’s
new chip is called the Tensor Processing Unit, or TPU. That’s because it helps
run TensorFlow, the software engine that drives the Google’s deep neural
networks, networks of hardware and software that can learn particular tasks by
analyzing vast amounts of data. Other tech giants typically run their deep
neural nets with graphics processing units, or GPUs—chips that were originally
designed to render images for games and other graphics-heavy applications.
These are well-suited to running the types of calculations that drive deep
neural networks. But Google says it has built a chip that’s even more
efficient.
According
to Google, it tailored the TPU specifically to machine learning so that it
needs fewer transistors to run each operation. That means it can squeeze more
operations into the chip with each passing second.
For
now, Google is using both TPUs and GPUs to run its neural nets. Hölzle declined
to go into specifics on how exactly Google was using its TPUs, except to say
that they handle “part of the computation” needed to drive voice recognition on
Android phones. But he said that Google would be releasing a paper describing
the benefits of its chip and that Google will continue to design new chips that
handle machine learning in other ways. Eventually, it seems, this will push
GPUs out of the equation. “They’re already going away a little,” Hölzle says.
“The GPU is too general for machine learning. It wasn’t actually built for
that.”
That’s
not something nVidia wants to hear. As the world’s primary seller of GPUs,
nVidia is now pushing to expand its own business into the AI realm. As Hölzle
points out, the latest nVidia GPU offers a mode specifically for machine
learning. But clearly, Google wants the change to happen faster. Much faster.
The
Smartest Chip
In
the meantime, other companies, most notably Microsoft, are exploring another
breed of chip. The field-programmable gate array, or FPGA, is a chip you can
re-program to perform specific tasks. Microsoft has tested FPGAs with machine
learning, and Intel, seeing where this market was going, recently acquired a
company that sells FPGAs.
Some
analysts think that’s the smarter way to go. An FPGA provides far more
flexibility, says Patrick Moorhead, the president and principal analyst at Moor
Insights and Strategy, a firm that closely follows the chip business. Moorhead
wonders if the new Google TPU is “overkill,” pointing out that such a chip
takes at least six months to build—a long time in the incredibly competitive
marketplace in which the biggest Internet companies compete.
But
Google doesn’t want that flexibility. More than anything, it wants speed. Asked
why Google built its chip from scratch rather than using an FPGA, Hölzle said:
“It’s just much faster.”
Core
Business
Hölzle
also points out that Google’s chip doesn’t replace CPUs, the central processing
units at the heart of every computer server. The search giant still needs these
chips to run the tens of thousands of machines in its data centers, and CPUs
are Intel’s main business. Still, if Google is willing to build its own chips
just for AI, you have to wonder if it would go so far as to design its own CPUs
as well.
Hölzle
plays down the possibility. “You want to solve problems that are not solved,”
he says. In other words, CPUs are a mature technology that pretty much works as
it should. But he also said that Google wants healthy competition in the chip
market. In other words, it wants to buy from many sellers—not just, say, Intel.
After all, more competition means lower prices for Google. As Hölzle explains,
expanding its options is why Google is working with the OpenPower Foundation,
which seeks to offer chip designs that anyone can use and modify.
That’s
a powerful idea, and a potentially powerful threat to the world’s biggest chip
makers. According to Shane Rau, an analyst with research firm IDC, Google buys
about 5 percent of all server CPUs sold on Earth. Over a recent year-long
period, he says, Google bought about 1.2 million chips. And most of those
likely came from Intel. (In 2012, Intel exec Diane Bryant told WIRED that
Google bought more server chips from Intel than all but five other
companies—and those were all companies that sell servers.)
Whatever
its plans for the CPU, Google will continue to explore chips specifically
suited to machine learning. It will be several years before we really know what
works and what doesn’t. After all, neural networks are constantly evolving as
well. “We’re learning all the time,” he says. “It’s not clear to me what the
final answer is.” And as it learns, you can bet that the world’s chip makers
will be watching.
ARM Announces the Cortex-R52 CPU: Deterministic & Safe, For ADAS & More
Tuesday, 20 September 2016
Posted by ARM Servers
Though it didn’t attract a ton of attention at the time, back in 2013 ARM announced the ARMv8-R architecture.

An update for ARM’s architecture for real-time CPUs, ARMv8-R was developed to further the real-time platform by adding support for newer features such as virtualization and memory protection. At the time the company didn’t announce any specific CPU designs for the architecture, but rather just announced the architecture on its own.
Now just under 3 years later, ARM is announcing their first ARMv8-R CPU design this evening with the Cortex-R52. An upgrade of sorts to ARM’s existing Cortex-R5, the R52 is the company’s first implementation of ARMv8-R. R52 makes specific use of many of the new features enabled by the architecture, while improving performance at the same time. ARM is pitching the new CPU core at markets that need a safety-critical CPU – a market that the Cortex-R series has been in for a while – where the deterministic nature of the CPU’s execution model is critical to ensuring quick and accurate execution.
While the focus on today’s CPU design announcement is on functionality and utility over microarchitecture, ARM has revealed a bit about how the Cortex-R52 is organized under the hood. The microarchitecture is a direct evolution of the previous Cortex-R5. This means we’re looking at a dual-issue in-order execution pipeline, with a pipeline length of 8 stages. Broadly speaking, this description is very similar to that of the better-known Cortex-A7/A53 cores, which implies that this is a real-time optimized version of the basic elements in that design.
As the Cortex-R series is focused on determinism and real-time responsiveness over total performance, ARM doesn’t heavily promote these cores on the basis of performance. But at least within the Cortex-R family, they are talking about a performance increase of upwards of 35% in common CPU benchmarks. More important for this market than throughput however is responsiveness: for the R52, ARM has done some specific work to improve interrupt entry and context switching performance, doubling the former and achieving a staggering 14-fold increase on the latter.
The big deal here of course
is the deterministic nature of the CPU. The entire microarchitecture is
optimized to avoid variable time, non-deterministic operations, which is why
it’s an in-order processor to begin with. This design extends to how memory is
managed as well, with ARM avoiding a virtual memory system and its associated
TLB translation-misses in favor of a model they call the Protected System
Memory Architecture (PSMA), which is used in conjunction with an MPU to handle
memory operations without the translation.
On the safety side of
matters, the R52 has a few different error-resiliency features to ensure
accuracy. Multi-core lock step returns for this design, allowing two R52 cores
to execute the same task in parallel for redundancy. And on the memory side of
matters, ECC is offered across both the memory busses and the memory itself, in
order to avoid random bitflips
Meanwhile in terms of new
functionality for hardware developers, as part of ARMv8-R, Cortex-R52
implements support for hardware virtualization. Like virtually everything else
in R52, this is deterministic as well, with the hypervisor working with the MPU
to offer each guest OS its own section of the physical memory space. According
to ARM this is a particularly important advancement, as previous means of
separating tasks on real-time CPUs were non-deterministic, which is an obvious
problem for the target market.
The
significance of virtualization in a real-time processor is that it allows for
multiple tasks to be executed on the R52 without interfering with each other.
In large, complex devices (e.g. cars), this allows for fewer processors within
the device, as these tasks can be consolidated onto a smaller number of
processors. At the same time, the rigid separation between the tasks means that
it’s possible to run both safety-critical and non-critical (but still
real-time) tasks on an R52 together, knowing that the latter will not interrupt
the safety-critical tasks. For cars and other devices where there is stringent
safety certification, this is especially useful as it means that other tasks can
be added (via their own guest OS) without invalidating the certifications of
the safety-critical tasks.
This
is also why ARM’s earlier context switching and interrupt entry improvements
are so important. With a hypervisor now in play and multiple tasks executing on
a single processor, the vastly improved ability to switch between tasks is
critical for allowing multi-tasking without a major performance hit from
context switching overhead.
ARM is particular
interested in the Advanced Driver Assistance Systems (ADAS) market, where the
Cortex-R is part of a full system of ARM IP. A full ADAS setup from start to
end would utilize all three processor types – M, R, and A – with the Cortex-R
handling the real-time decision making and executing on those decisions, while
Cortex-A would be used to handle sensor perception/interpretation, and Cortex-M
would be in many of the individual sensors.
Wrapping things up, as with
most other ARM IP announcements, the announcement of the Cortex-R52 is setting
the stage for future products. ARM isn’t talking about specific customers at
this time, but they already have a number of companies who have licensed
ARMv8-R and will be in need of a CPU design to go with it. To that end, we
should be seeing Cortex-R52 start appearing under the hood of various devices
in the coming years.









