Showing posts with label hpc servers. Show all posts

A number of coming technologies will undoubtedly change the world as we know it. Two came to light last week while I was trying, and failing, to enjoy an infrequent vacation. One is a power storage technology that has high capacity and doesn’t catch fire or explode like lithium ion batteries. The other, and far more important, is artificial intelligence (AI), which has the potential to change our lives for the better or worse, but dramatically either way.

IBM and NVIDIA Move to Corner the Enterprise Market for AI

An alliance between two of the most powerful companies in this race, IBM and NVIDIA, was announced last week around a small, intelligent, rack-mounted server called the Power System S822LC. This was part of a three-server launch last week and I think the implications are really interesting. NVIDIA is naturally very excited about this.

Let’s explore why this partnership between two powerhouses could be really interesting.

Think

The word that IBM has connected with itself for much of my life is “Think,” and when it announced Watson, it put itself on a path to make that connection a reality. But Watson, as powerful as it is, is an intellectual baby when it comes to where the industry wants to go. Intelligent machines -- computers that can learn, adapt, and then make decisions based on data -- represent the future of computing and, some argue, the future of the human race.



This makes for an impressive potential world impact and the firm, or firms, that get this right first will likely own the next age of computing. IBM, with Watson, got the initial lead, but Watson is expensive to buy and expensive to train.

That’s why this isn’t a one-company effort. It can’t be; it will require a team.

NVIDIA 
Now, while IBM was working on large-scale AI, NVIDIA has been working on packaged intelligence as a technology. Its Drive PX, CX, and DGX-1 platforms are designed to make cars intelligent. However, DGX-1 goes well beyond this in that it forms the basis for the learning that other platforms can use in production. In short, you train the DGX-1 and it trains, at scale, everything else it feeds. This is close, in concept, to being able to manufacture things (initially cars) that come off the line with all of the knowledge they need to operate. If we were talking people, this would be like having a kid that starts out at birth knowing everything you know.

Now we just need to put the parts together.

IBM + NVIDIA
If we combine the two companies, we get the potential for not only a system that is far less expensive to buy but one that is far less expensive to train. The result may potentially be a system that is far smarter than Watson, far more capable than the DGX-1, and able to move both companies to the next tier.

OpenPOWER
The market is currently largely x86, and Intel dominates. Only one non-x86 platform has the potential to address this AI opportunity near term, and that is OpenPOWER, largely because it is backed by IBM and, unlike ARM, it is in production for servers of this class. It is also a technology shared by a variety of vendors, making it more attractive to customers like Google, which is aggressive with AI and particularly favors open systems.

When you combine IBM, NVIDIA and OpenPOWER, you get something unique and potentially very powerful in this race to intelligent computing.

Wrapping Up: Power of the Partnership
In the end, the eventual success of this effort will likely be directly attributable to how well IBM and NVIDIA partner over time. A similar partnership between IBM, Intel and Microsoft created the PC market. If IBM and NVIDIA can do better (that earlier partnership fell apart), then the potential for both firms to own this next technology wave is unmatched. If not, then we’ll just have another story about big firms failing to meet their potential.

For now, IBM and NVIDIA have the inside track, but it’s early in the race. While this new line of servers is a great start, as both companies know, it matters far less who leads a race at the beginning than who leads a race at the end.

Rob Enderle is President and Principal Analyst of the Enderle Group, a forward-looking emerging technology advisory firm.  With over 30 years’ experience in emerging technologies, he has provided regional and global companies with guidance in how to better target customer needs; create new business opportunities; anticipate technology changes; select vendors and products; and present their products in the best possible light. Rob covers the technology industry broadly. Before founding the Enderle Group, Rob was the Senior Research Fellow for Forrester Research and the Giga Information Group, and held senior positions at IBM and ROLM. Follow Rob on Twitter @enderle, on Facebook and on Google+.
IBM’s development of its Power9 architecture has been in the news for some time, and now the company will make it available to other hardware companies by licensing its designs. Power9 chips are scheduled to come in the market in 2H17. Let’s look at some features of the new chips.

IBM’s ARM Power9 Server Chips

Intel’s x86 versus IBM’s Power9
At IDF (Intel Development Forum) 2016, IBM unveiled its Power9 server processors, built on 14nm (nanometer) FinFET (fin-shaped field effect transistor) process technology, just like Intel’s current server processors.

IBM will also integrate Xilinx’s (XLNX) FPGA (field-programmable gate array) technology in its servers, just like Intel is integrating Altera’s FPGA.

Features of Power9
IBM will launch Power9 in two basic designs: a 24 SMT4 processor and a 12 SMT8 processor.

The 24 SMT4 processor will be optimized for the Linux ecosystem and will target web service companies such as Google (GOOG), which need to run across several thousand machines. It will feature four threads.

The 12 SMT8 processor will be optimized for the PowerVM ecosystem and will target larger systems designed for running big data or AI (artificial intelligence) applications. It will feature eight threads.

Both designs will come in two models: the scale-out model will come with two CPU (central processing unit) sockets on the motherboard, and the scale-up model will come with multiple CPU sockets. The Power9 processor will have multiple connectors to attach FPGAs, GPUs (graphics processing units), and ASICs (application-specific integrated circuits).

IBM and Intel eye artificial intelligence
With all this, IBM aims to make Power9 apt for AI, cognitive computing, analytics, visual computing, and hyperscale web serving. Intel is also looking to tap AI and has recently acquired an AI startup called Nervana Systems for this reason. It has also recently developed Xeon Phi processors for deep learning applications.

IBM has changed its strategy in order to pose tough competition to Intel. We’ll look at this strategy in the next part of the series.
ARM and Fujitsu today announced a scalable vector extension (SVE) to the ARMv8-A architecture intended to enhance ARM capabilities in HPC workloads. Fujitsu is the lead silicon partner in the effort (so far) and will use ARM with SVE technology in its post K computer, Japan’s next flagship supercomputer planned for the 2020 timeframe. This is an important incremental step for ARM, which seeks to push more aggressively into mainstream and HPC server markets.
ARM with SVE technology

Fujitsu first announced plans to adopt ARM for the post K machine – a switch from SPARC processor technology used in the K computer – at ISC2016 and said at the time that it would reveal more at Hot Chips about the ARM development effort needed. Bull Atos is also developing an ARM-based supercomputer.

The SVE is focused on addressing “next generation high performance computing challenges and by that we mean workloads typically found in scientific computing environment where they are very parallelizable,” said Ian Smythe, director of marketing programs, ARM Compute Products Group, in a pre-briefing. SVE is scalable from 128-bits to 2048-bits in 128-bit increments and, among other things, should enhance ARM’s ability to exploit fine grain parallelism.
ARM’s ability benefits and HPC server markets
Nigel Stephens, lead ISA architect and ARM Fellow, provided more technical detail in his blog (Technology Update: The Scalable Vector Extension (SVE) for the ARMv8-A Architecture, link below) coinciding with his Hot Chips presentation. It’s worth reading for a fast but substantial summary.

“Rather than specifying a specific vector length, SVE allows CPU designers to choose the most appropriate vector length for their application and market, from 128 bits up to 2048 bits per vector register,” wrote Stephens. “SVE also supports a vector-length agnostic (VLA) programming model that can adapt to the available vector length. Adoption of the VLA paradigm allows you to compile or hand-code your program for SVE once, and then run it at different implementation performance points, while avoiding the need to recompile or rewrite it when longer vectors appear in the future. This reduces deployment costs over the lifetime of the architecture; a program just works and executes wider and faster.

“Scientific workloads, mentioned earlier, have traditionally been carefully written to exploit as much data-level parallelism as possible with careful use of OpenMP pragmas and other source code annotations. It’s therefore relatively straightforward for a compiler to vectorize such code and make good use of a wider vector unit. Supercomputers are also built with the wide, high- bandwidth memory systems necessary to feed a longer vector unit,” wrote Stephens.

He notes that scientific workloads have traditionally been written to exploit as much data-level parallelism as possible with careful use of OpenMP pragmas and other source code annotations. “It’s relatively straightforward for a compiler to vectorize such code and make good use of a wider vector unit. Supercomputers are also built with the wide, high- bandwidth memory systems necessary to feed a longer vector unit.”
ARM-server-workloads
While HPC is a natural fit for SVE’s longer vectors, said Stephens, it also offers an opportunity to improve vectorizing compilers that will be of general benefit over the longer term as other systems scale to support increased data level parallelism.

Amplifying on the point, he wrote, “It is worth noting at this point that Amdahl’s Law tells us that the theoretical limit of a task’s speedup is governed by the amount of unparallelizable code. If you succeed in vectorizing 10 percent of your execution and make that code run four times faster (e.g. a 256-bit vector allows 4x64b parallel operations), then you reduce 1000 cycles down to 925 cycles and provide a limited speedup for the power and area cost of the extra gates. Even if you could vectorize 50 percent of your execution infinitely (unlikely!) you’ve still only doubled the overall performance. You need to be able to vectorize much more of your program to realize the potential gains from longer vectors.”

The ARMv7 Advanced SIMD (aka the ARM NEON) is now about 12 years old and was originally intended to accelerate media processing tasks on the main processor. With the move to AArch64, NEON gained full IEEE double-precision float, 64-bit integer operations, and grew the register file to thirty-two 128-bit vector registers. These changes, says Stephens, made NEON a better compiler target for general-purpose compute. SVE is a complementary extension that does not replace NEON, and was developed specifically for vectorization of HPC scientific workloads, he says.
Snapshot of new SVE features compared to NEON:
  • Scalable vector length (VL)
  • VL agnostic (VLA) programming
  • Gather-load & Scatter-store
  • Per-lane predication
  • Predicate-driven loop control and management
  • Vector partitioning and SW managed speculation
  • Extended integer and floating- point horizontal reductions
  • Scalarized intra-vector sub-loops
Smythe emphasized, “If you compile the code for SVE it will run on any implementation of SVE regardless of the width, whether 128 or 1024 or 2048, and the hardware implementation, that code will run on ARM architecture as a binary. That’s important and gives us scalability and compatibility into the future for the compilers and the code that HPC guys are writing.”
 ARM ecosystem 
ARM has been steadily working to expand its ecosystem (shown here) with hopes of capturing a chunk of the broader x86 market. It has notable wins in many market segments, although the market traction has been tougher to gauge, and it is only in the past couple of years that server chips started to become available. Many design wins have been niche oriented; one example is an HPE ARM-based storage server (StoreVirtual 3200) announced earlier this month. ARM, of course, is a juggernaut in mobile computing.

Prior to the Hot Chips conference, with its distinctly technical focus, ARM was pre-briefing some of the HPC community about SVE and using the opportunity to reinforce its mission of growth, its success in ecosystem building, and to bask in some of the glory of the post K computer win. Given the recent acquisition of ARM by SoftBank, it will be interesting to watch how the marketing and technical activities change, if at all.

Lakshmi Mandyam, senior marketing director, ARM Server Programs, said, “We’ve been focusing on enabling some base market segments to establish some beachheads and enable our partners to get adoption in those key areas. Also we have also been using key end users to drive our approach in terms of ecosystem enablement because clearly we are catching up with x86 in terms of software enablement.”

“The move to open source and consuming applications and workloads through [as-a-service models] is really driving a lot of disruption of the industry. It also presents an opportunity because a lot of those platforms are based on open source and Linux and or intermediate middleware and so the dependency on the legacy (x86) software and architectures is gone. That presents an opportunity to ARM.”

It’s also important, she said, to recognize that many modern workloads, even in HPC, are moving towards the scale out model as opposed to a purely scale up. Many of those applications are driven by IO and memory performance. “This where the ARM partnership can shine because we are able to deliver heterogeneous computing quite easily and we’re able to deliver optimized algorithm processing quite easily. If you look at a lot of these applications, it’s not about spec and benchmark performance; it’s about what can you deliver in my application.”

“When you think about Fujitsu, as they talked about the post K computer, a lot of the folks are looking for this really tuned performance, to take a codesign approach where they are looking at the entire problem, and to deliver an application and service for a given problem. This is where their ability to tune platforms down to the silicon level pays big dividends,” she said.

Here’s a link to Nigel Stephens’ blog on the ARM SVE anouncment (Technology Update: The Scalable Vector Extension (SVE) for the ARMv8-A Architecture): https://community.arm.com/groups/processors/blog/2016/08/22/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture


Welcome to ARM Technology
Powered by Blogger.

Latest News

Newsletter

Subscribe Our Newsletter

Enter your email address below to subscribe to our newsletter.

- Copyright © ARM Tech -Robotic Notes- Powered by Blogger - Designed by HPC Appliances -