Showing posts with label hpc servers. Show all posts
IBM and NVIDIA Move to Corner the Enterprise Market for AI
Monday, 12 September 2016
Posted by ARM Servers
A
number of coming technologies will undoubtedly change the world as we
know it. Two came to light last week while I was trying, and failing, to
enjoy an infrequent vacation. One is a power storage technology that
has high capacity and doesn’t catch fire or explode like lithium ion
batteries. The other, and far more important, is artificial intelligence
(AI), which has the potential to change our lives for the better or
worse, but dramatically either way.
An
alliance between two of the most powerful companies in this race, IBM and
NVIDIA, was announced last week around a small, intelligent, rack-mounted
server called the Power System S822LC. This was part of a three-server launch
last week and I think the implications are really interesting. NVIDIA is
naturally very excited about this.
Let’s
explore why this partnership between two powerhouses could be really
interesting.
Think
The
word that IBM has connected with itself for much of my life is “Think,” and
when it announced Watson, it put itself on a path to make that connection a
reality. But Watson, as powerful as it is, is an intellectual baby when it
comes to where the industry wants to go. Intelligent machines -- computers that
can learn, adapt, and then make decisions based on data -- represent the future
of computing and, some argue, the future of the human race.
This
makes for an impressive potential world impact and the firm, or firms, that get
this right first will likely own the next age of computing. IBM, with Watson,
got the initial lead, but Watson is expensive to buy and expensive to train.
That’s
why this isn’t a one-company effort. It can’t be; it will require a team.
NVIDIA
Now,
while IBM was working on large-scale AI, NVIDIA has been working on packaged
intelligence as a technology. Its Drive PX, CX, and DGX-1 platforms are
designed to make cars intelligent. However, DGX-1 goes well beyond this in that
it forms the basis for the learning that other platforms can use in production.
In short, you train the DGX-1 and it trains, at scale, everything else it
feeds. This is close, in concept, to being able to manufacture things
(initially cars) that come off the line with all of the knowledge they need to
operate. If we were talking people, this would be like having a kid that starts
out at birth knowing everything you know.
Now
we just need to put the parts together.
IBM
+ NVIDIA
If
we combine the two companies, we get the potential for not only a system that
is far less expensive to buy but one that is far less expensive to train. The
result may potentially be a system that is far smarter than Watson, far more
capable than the DGX-1, and able to move both companies to the next tier.
OpenPOWER
The
market is currently largely x86, and Intel dominates. Only one non-x86 platform
has the potential to address this AI opportunity near term, and that is
OpenPOWER, largely because it is backed by IBM and, unlike ARM, it is in
production for servers of this class. It is also a technology shared by a
variety of vendors, making it more attractive to customers like Google, which
is aggressive with AI and particularly favors open systems.
When
you combine IBM, NVIDIA and OpenPOWER, you get something unique and potentially
very powerful in this race to intelligent computing.
Wrapping
Up: Power of the Partnership
In
the end, the eventual success of this effort will likely be directly
attributable to how well IBM and NVIDIA partner over time. A similar
partnership between IBM, Intel and Microsoft created the PC market. If IBM and
NVIDIA can do better (that earlier partnership fell apart), then the potential
for both firms to own this next technology wave is unmatched. If not, then
we’ll just have another story about big firms failing to meet their potential.
For
now, IBM and NVIDIA have the inside track, but it’s early in the race. While
this new line of servers is a great start, as both companies know, it matters
far less who leads a race at the beginning than who leads a race at the end.
Rob
Enderle is President and Principal Analyst of the Enderle Group, a
forward-looking emerging technology advisory firm. With over 30 years’ experience in emerging
technologies, he has provided regional and global companies with guidance in
how to better target customer needs; create new business opportunities;
anticipate technology changes; select vendors and products; and present their
products in the best possible light. Rob covers the technology industry
broadly. Before founding the Enderle Group, Rob was the Senior Research Fellow
for Forrester Research and the Giga Information Group, and held senior
positions at IBM and ROLM. Follow Rob on Twitter @enderle, on Facebook and on
Google+.
Will IBM’s Power9 Server Chips Pose Competition to Intel’s Server Chips?
Wednesday, 7 September 2016
Posted by ARM Servers
IBM’s
development of its Power9 architecture has been in the news for some time, and
now the company will make it available to other hardware companies by licensing
its designs. Power9 chips are scheduled to come in the market in 2H17. Let’s
look at some features of the new chips.
Intel’s
x86 versus IBM’s Power9
At
IDF (Intel Development Forum) 2016, IBM unveiled its Power9 server processors,
built on 14nm (nanometer) FinFET (fin-shaped field effect transistor) process
technology, just like Intel’s current server processors.
IBM
will also integrate Xilinx’s (XLNX) FPGA (field-programmable gate array)
technology in its servers, just like Intel is integrating Altera’s FPGA.
Features
of Power9
IBM
will launch Power9 in two basic designs: a 24 SMT4 processor and a 12 SMT8
processor.
The
24 SMT4 processor will be optimized for the Linux ecosystem and will target web
service companies such as Google (GOOG), which need to run across several
thousand machines. It will feature four threads.
The
12 SMT8 processor will be optimized for the PowerVM ecosystem and will target
larger systems designed for running big data or AI (artificial intelligence)
applications. It will feature eight threads.
Both
designs will come in two models: the scale-out model will come with two CPU
(central processing unit) sockets on the motherboard, and the scale-up model
will come with multiple CPU sockets. The Power9 processor will have multiple
connectors to attach FPGAs, GPUs (graphics processing units), and ASICs
(application-specific integrated circuits).
IBM
and Intel eye artificial intelligence
With
all this, IBM aims to make Power9 apt for AI, cognitive computing, analytics,
visual computing, and hyperscale web serving. Intel is also looking to tap AI
and has recently acquired an AI startup called Nervana Systems for this reason.
It has also recently developed Xeon Phi processors for deep learning
applications.
IBM
has changed its strategy in order to pose tough competition to Intel. We’ll
look at this strategy in the next part of the series.
ARM Unveils Scalable Vector Extension for HPC at Hot Chips
Wednesday, 24 August 2016
Posted by ARM Servers
ARM
and Fujitsu today announced a scalable vector extension (SVE) to the
ARMv8-A architecture intended to enhance ARM capabilities in HPC
workloads. Fujitsu is the lead silicon partner in the effort (so far)
and will use ARM with SVE technology in its post K computer, Japan’s
next flagship supercomputer planned for the 2020 timeframe. This is an
important incremental step for ARM, which seeks to push more
aggressively into mainstream and HPC server markets.
Fujitsu
first announced plans to adopt ARM for the post K machine – a switch from SPARC
processor technology used in the K computer – at ISC2016 and said at the time
that it would reveal more at Hot Chips about the ARM development effort needed.
Bull Atos is also developing an ARM-based supercomputer.
The
SVE is focused on addressing “next generation high performance computing
challenges and by that we mean workloads typically found in scientific
computing environment where they are very parallelizable,” said Ian Smythe,
director of marketing programs, ARM Compute Products Group, in a pre-briefing.
SVE is scalable from 128-bits to 2048-bits in 128-bit increments and, among
other things, should enhance ARM’s ability to exploit fine grain parallelism.
Nigel
Stephens, lead ISA architect and ARM Fellow, provided more technical detail in
his blog (Technology Update: The Scalable Vector Extension (SVE) for the
ARMv8-A Architecture, link below) coinciding with his Hot Chips presentation.
It’s worth reading for a fast but substantial summary.
“Rather
than specifying a specific vector length, SVE allows CPU designers to choose
the most appropriate vector length for their application and market, from 128
bits up to 2048 bits per vector register,” wrote Stephens. “SVE also supports a
vector-length agnostic (VLA) programming model that can adapt to the available
vector length. Adoption of the VLA paradigm allows you to compile or hand-code
your program for SVE once, and then run it at different implementation performance
points, while avoiding the need to recompile or rewrite it when longer vectors
appear in the future. This reduces deployment costs over the lifetime of the
architecture; a program just works and executes wider and faster.
“Scientific
workloads, mentioned earlier, have traditionally been carefully written to
exploit as much data-level parallelism as possible with careful use of OpenMP
pragmas and other source code annotations. It’s therefore relatively
straightforward for a compiler to vectorize such code and make good use of a
wider vector unit. Supercomputers are also built with the wide, high- bandwidth
memory systems necessary to feed a longer vector unit,” wrote Stephens.
He
notes that scientific workloads have traditionally been written to exploit as
much data-level parallelism as possible with careful use of OpenMP pragmas and
other source code annotations. “It’s relatively straightforward for a compiler
to vectorize such code and make good use of a wider vector unit. Supercomputers
are also built with the wide, high- bandwidth memory systems necessary to feed
a longer vector unit.”
While
HPC is a natural fit for SVE’s longer vectors, said Stephens, it also offers an
opportunity to improve vectorizing compilers that will be of general benefit
over the longer term as other systems scale to support increased data level
parallelism.
Amplifying
on the point, he wrote, “It is worth noting at this point that Amdahl’s Law
tells us that the theoretical limit of a task’s speedup is governed by the amount
of unparallelizable code. If you succeed in vectorizing 10 percent of your
execution and make that code run four times faster (e.g. a 256-bit vector
allows 4x64b parallel operations), then you reduce 1000 cycles down to 925
cycles and provide a limited speedup for the power and area cost of the extra
gates. Even if you could vectorize 50 percent of your execution infinitely
(unlikely!) you’ve still only doubled the overall performance. You need to be
able to vectorize much more of your program to realize the potential gains from
longer vectors.”
The
ARMv7 Advanced SIMD (aka the ARM NEON) is now about 12 years old and was
originally intended to accelerate media processing tasks on the main processor.
With the move to AArch64, NEON gained full IEEE double-precision float, 64-bit
integer operations, and grew the register file to thirty-two 128-bit vector
registers. These changes, says Stephens, made NEON a better compiler target for
general-purpose compute. SVE is a complementary extension that does not replace
NEON, and was developed specifically for vectorization of HPC scientific
workloads, he says.
Snapshot of new SVE features compared to NEON:
- Scalable vector length (VL)
- VL agnostic (VLA) programming
- Gather-load & Scatter-store
- Per-lane predication
- Predicate-driven loop control and management
- Vector partitioning and SW managed speculation
- Extended integer and floating- point horizontal reductions
- Scalarized intra-vector sub-loops
Smythe
emphasized, “If you compile the code for SVE it will run on any
implementation of SVE regardless of the width, whether 128 or 1024 or
2048, and the hardware implementation, that code will run on ARM
architecture as a binary. That’s important and gives us scalability and
compatibility into the future for the compilers and the code that HPC
guys are writing.”
ARM has been steadily working to expand its ecosystem (shown here) with hopes of capturing a chunk of the broader x86 market. It has notable wins in many market segments, although the market traction has been tougher to gauge, and it is only in the past couple of years that server chips started to become available. Many design wins have been niche oriented; one example is an HPE ARM-based storage server (StoreVirtual 3200) announced earlier this month. ARM, of course, is a juggernaut in mobile computing.
Prior to the Hot Chips conference, with its distinctly technical focus, ARM was pre-briefing some of the HPC community about SVE and using the opportunity to reinforce its mission of growth, its success in ecosystem building, and to bask in some of the glory of the post K computer win. Given the recent acquisition of ARM by SoftBank, it will be interesting to watch how the marketing and technical activities change, if at all.
Lakshmi Mandyam, senior marketing director, ARM Server Programs, said, “We’ve been focusing on enabling some base market segments to establish some beachheads and enable our partners to get adoption in those key areas. Also we have also been using key end users to drive our approach in terms of ecosystem enablement because clearly we are catching up with x86 in terms of software enablement.”
“The move to open source and consuming applications and workloads through [as-a-service models] is really driving a lot of disruption of the industry. It also presents an opportunity because a lot of those platforms are based on open source and Linux and or intermediate middleware and so the dependency on the legacy (x86) software and architectures is gone. That presents an opportunity to ARM.”
It’s also important, she said, to recognize that many modern workloads, even in HPC, are moving towards the scale out model as opposed to a purely scale up. Many of those applications are driven by IO and memory performance. “This where the ARM partnership can shine because we are able to deliver heterogeneous computing quite easily and we’re able to deliver optimized algorithm processing quite easily. If you look at a lot of these applications, it’s not about spec and benchmark performance; it’s about what can you deliver in my application.”
“When you think about Fujitsu, as they talked about the post K computer, a lot of the folks are looking for this really tuned performance, to take a codesign approach where they are looking at the entire problem, and to deliver an application and service for a given problem. This is where their ability to tune platforms down to the silicon level pays big dividends,” she said.
Here’s a link to Nigel Stephens’ blog on the ARM SVE anouncment (Technology Update: The Scalable Vector Extension (SVE) for the ARMv8-A Architecture): https://community.arm.com/groups/processors/blog/2016/08/22/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture

ARM has been steadily working to expand its ecosystem (shown here) with hopes of capturing a chunk of the broader x86 market. It has notable wins in many market segments, although the market traction has been tougher to gauge, and it is only in the past couple of years that server chips started to become available. Many design wins have been niche oriented; one example is an HPE ARM-based storage server (StoreVirtual 3200) announced earlier this month. ARM, of course, is a juggernaut in mobile computing.
Prior to the Hot Chips conference, with its distinctly technical focus, ARM was pre-briefing some of the HPC community about SVE and using the opportunity to reinforce its mission of growth, its success in ecosystem building, and to bask in some of the glory of the post K computer win. Given the recent acquisition of ARM by SoftBank, it will be interesting to watch how the marketing and technical activities change, if at all.
Lakshmi Mandyam, senior marketing director, ARM Server Programs, said, “We’ve been focusing on enabling some base market segments to establish some beachheads and enable our partners to get adoption in those key areas. Also we have also been using key end users to drive our approach in terms of ecosystem enablement because clearly we are catching up with x86 in terms of software enablement.”
“The move to open source and consuming applications and workloads through [as-a-service models] is really driving a lot of disruption of the industry. It also presents an opportunity because a lot of those platforms are based on open source and Linux and or intermediate middleware and so the dependency on the legacy (x86) software and architectures is gone. That presents an opportunity to ARM.”
It’s also important, she said, to recognize that many modern workloads, even in HPC, are moving towards the scale out model as opposed to a purely scale up. Many of those applications are driven by IO and memory performance. “This where the ARM partnership can shine because we are able to deliver heterogeneous computing quite easily and we’re able to deliver optimized algorithm processing quite easily. If you look at a lot of these applications, it’s not about spec and benchmark performance; it’s about what can you deliver in my application.”
“When you think about Fujitsu, as they talked about the post K computer, a lot of the folks are looking for this really tuned performance, to take a codesign approach where they are looking at the entire problem, and to deliver an application and service for a given problem. This is where their ability to tune platforms down to the silicon level pays big dividends,” she said.
Here’s a link to Nigel Stephens’ blog on the ARM SVE anouncment (Technology Update: The Scalable Vector Extension (SVE) for the ARMv8-A Architecture): https://community.arm.com/groups/processors/blog/2016/08/22/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture