Showing posts with label 64-bit ARM. Show all posts

Microsoft has announced a partnership with Qualcomm to bring Windows 10 - real Windows 10, not the aborted cut-down version formerly known as Windows RT - to the company's ARM processors.

Microsoft's previous attempts at playing with non-x86/AMD64 platforms have not exactly set the world aflame. The company has long offered an embedded Windows release which supports ARM and other non-x86/AMD64 architectures, and recently made that available to a wider audience under the moniker Windows 10 IoT Core. Although Windows 10 IoT Core does indeed run on ARM-based devices, in particular the popular Raspberry Pi single-board computer, it's not Windows as most users would know it; instead it's a cut-down operating system designed to run a single application at a time, and built with the intention of winning over embedded developers from Linux and other non-Windows kernels to the Windows ecosystem.

Qualcomm, Microsoft announce Windows 10 on ARM
The closest Microsoft has ever come to a true release of a consumer-centric Windows version on ARM was Windows RT, launched alongside Windows 8 on Microsoft's Surface family of tablets. While one or two hardware partners licensed Windows RT, it was soon abandoned by both third parties and Microsoft itself: Microsoft confirmed in 2015 that Windows RT would not be updated to a Windows 10-based version, and sank the final nail into its coffin a few months later by leaving Windows RT out of its so-called 'Universal' Windows Platform.

Now, though, Microsoft is having another crack of the whip, and it's convinced Qualcomm to come along for the ride. Devices built around Qualcomm's latest Snapdragon processors will, the companies have jointly announced, be able to run Windows 10 - and this time it's truly the same release of Windows you'd find on an x86/AMD64 device. Not only will it run Windows 10, mind you, but also Windows 10's considerable ecosystem of applications - including those compiled exclusively for Win32 under the x86 architecture and the Universal Windows Platform.

'To deliver on our customers' growing needs to create on the go, we announced today that Windows 10 is coming to ARM through our partnership with Qualcomm,' explained Microsoft's Terry Myerson in a blog post late last night. 'For the first time ever, our customers will be able to experience the Windows they know with all the apps, peripherals, and enterprise capabilities they require, on a truly mobile, power efficient, always-connected cellular PC.'

Technical details of how the system will work have not yet been released, but the secret lies in emulation: a translation engine will take the x86/AMD64 instructions from the operating system and the software it's hosting and translate them into ARM instructions for the host processor. It's a tried-and-tested approach which gave machines like the Acorn Archimedes and Commodore Amiga basic x86 support in the 1980s and 1990s, though one which typically comes with a considerable performance hit - something for which Qualcomm's latest chips, it is to be hoped, can compensate.

A video demonstrating Windows 10 and Adobe Photoshop running on an ARM-based device is reproduced below, with Qualcomm and Microsoft promising to launch the first units some time next year.

Packet’s not-so-secret weapon: energy-sipping bare-metal servers using ARM processors. A little-known startup is making a big bet that it can parlay new ARM chips, and backing from a Japanese investment giant, to make its presence felt among the cloud computing giants.

ARM-powered cloud
The company, Packet, on Tuesday is launching new rentable “bare metal” computing services based on the ARM v8 chip architecture from its data centers in New Jersey, Northern California, Amsterdam, and Tokyo. Customers can set up and launch these resources within minutes, Packet said

The move is unusual because ARM chips are not commonly found in the servers that power corporate data centers or public cloud computer services, such as those sold by Amazon  AMZN -1.47%  Web Services. They do, however, dominate the smartphone market—scratch an Apple  AAPL -1.87%  iPhone (God forbid) and you’ll see an ARM chip. And many techies see ARM’s energy-efficient design as an interesting option for servers going forward.

Bare metal servers, unlike typical cloud-based servers, are not virtualized. That means they can run certain jobs, like databases, faster than virtualized cloud servers. IBM  IBM -2.15% , Rackspace  RAX 0.00%  and some other cloud companies already offer bare metal options for rent.

New York-based Packet, which disclosed $9.4 million in funding from Softbank in September, aims to satisfy what it sees as a growing market for bare-metal computing on demand. Softbank is a great ally for Packet, since it is buying ARM Holdings for $32 billion. ARM Holdings is the U.K. company that controls and licenses ARM processor designs to manufacturers.

Packet CEO Zachary Smith acknowledges that this is a David and Goliath tale in many ways. Intel chips dominate cloud computing services and equipment, as they do inside corporate data centers. And Amazon Web Services and Microsoft  MSFT -0.71%  Azure are the behemoths in the public cloud market; both organizations sell (or rent) massive amounts of computing power to customers from their Intel-dominated data centers.

Smith has no problem stipulating that Intel owns “99 point whatever percent” of the data center chip architecture, with a smattering of IBM-backed Power chips and Oracle  ORCL -1.42%  SPARC chips here and there. Likewise, he admits that Intel  INTC 0.23%  x86 chips work with everything, that Intel fields a huge partner ecosystem of software, hardware and add-on providers, and that it also owns the biggest-and-best fabrication facilities.

But, he also insists that big changes over the past year are shifting the balance of power. “There are a billion smartphones out there with ARM chips,” Smith noted. As a result, there many manufacturers and plenty of ARM licensees working with the technology. What that means is ARM now has an ecosystem all its own, which is something Softbank and Packet hope to capitalize on.

Taking on established cloud giants like Amazon Web Services is a long shot but there are some critical nuances to consider.

First, the market for rentable computer resources is growing fast enough now to float many boats, including newcomers, provided they have funding and innovative services that corporate developers and their IT strategy overlords want.

Second, even cloud giants admit that new chip technologies will be critical as cloud computing matures. Energy-efficient ARM chips that already power an estimated 95% of smartphones are bound to get a look, especially if their use can reduce data center power requirements. Microsoft and Google also talk up x86 alternative chips for some uses. And Amazon last year bought Annapurna Labs, an ARM chip licensee. Clearly, there is interest here.

Smith contended that the widespread use of ARM chips in other scenarios is also making it easier for cloud service providers (and others) to get early previews of the technology and to develop offerings using it.
Cloud scale deployment on ThunderX® with OpenStack, Secure IoT platform on OCTEON TX™ and MontaVista's CGX 2.0


SAN JOSE, CA – Oct. 26, 2016 – Cavium, Inc. (NASDAQ: CAVM), a leading provider of semiconductor products that enable intelligent processing for enterprise, data center, cloud, wired and wireless networking, will exhibit ThunderX and OCTEON TX ARM-based solutions for next-generation Infrastructure deployments in booth# 603 at ARM TechCon 2016 from October 26-27 in Santa Clara, California.
 Cavium's ThunderX ARMv8 based workload optimized processor integrates key capabilities that are critical for the most demanding Public and Private Cloud workloads. The OpenStack cloud infrastructure enables end users to fully utilize ThunderX's features for critical cloud workloads. These workloads include CEPH for cloud storage, Apache Hadoop for Big Data Analytics, MySQL and Cassandra for distributed databases, and NGNIX for secure web servers. ThunderX is also optimized for networking specific workloads such as Network Functions Virtualization (NFV) and Load-Balancing for Telco applications.

Cavium's OCTEON TX is a complete line of 64-bit ARM-based SoCs for control plane and data plane applications in networking, security, and storage. The OCTEON TX expands the addressability of Cavium's embedded products into control plane application areas within enterprise, service provider, data center networking and storage that need support of extensive software ecosystem and virtualization features. This product line is also optimized to run multiple concurrent data and control planes simultaneously for security and router appliances, NFV and SDN infrastructure, service provider CPE, wireless transport, NAS, storage controllers, IOT gateways, printer and industrial applications.

Cavium will present its ThunderX and OCTEON TX products, along with its partner MontaVista's CGX 2.0, in booth #603:
  • Live demonstration of Openstack solution (Autopilot, JuJu Charms, MAAS) on ThunderX SoC.
  • Live demonstration of Secure IoT Gateway solution based on OCTEON TX CN81xx 4-core SoC and MontaVista CGX 2.0.
  • Showcase its latest OCTEON TX CN83xx, 24-core SoC for Service Centric Networks.
####
About Cavium
Cavium, Inc. (NASDAQ: CAVM), offers a broad portfolio of integrated, software compatible processors ranging in performance from 1Gbps to 100Gbp that enable secure, intelligent functionality in Enterprise, Data Center, Broadband/Consumer, Mobile and Service Provider Equipment, highly programmable switches which scale to 3.2Tbps and Ethernet and Fibre Channel adapters up to 100Gbps. Cavium processors are supported by ecosystem partners that provide operating systems, tools and application support, hardware reference designs and other products. Cavium is headquartered in San Jose, CA with design centers in California, Massachusetts, India, China and Taiwan. For more information, please visit : http://www.cavium.com.

Media Contact
Angel Atondo
Sr. Marketing Communications Manager
Telephone: +1 408-943-7417
Email: 
angel.atondo@cavium.com
CHIPMAKER Intel's Altera unit has unveiled the Stratix 10, a quad-core FPGA that features a 64-bit ARM Cortex-A53 with five times the density and twice the performance of Altera's previous generation Stratix V.
The Stratix 10 offers 70 per cent lower power consumption for the same performance and will be produced on Intel's latest 14nm process technology. 
The device was unveiled by Dan McNamara, corporate vice president and general manager of the Programmable Solutions Group (PSG) at Intel.
"Stratix 10 combines the benefits of Intel's 14nm tri-gate process technology with a revolutionary new architecture called HyperFlex to uniquely meet the performance demands of high-end compute and data-intensive applications ranging from data centres, network infrastructure, cloud computing and radar and imaging systems," he said.
The device is intended for data centre applications and networking infrastructure, and comes after Intel signed adeal in August with ARM to produce chips based on ARM's intellectual property in Intel's most advanced chip production facilities.
The arrangement came after Intel struck a deal in2013 to make 64-bit ARM chips for Altera when it was designing the Stratix 10.
"FPGAs are used in the data centre to accelerate the performance of large-scale data systems. When used as a high-performance, multi-function accelerator in the data centre, Stratix 10 FPGAs are capable of performing the acceleration and high-performance networking capabilities," explained McNamara.
The device is among the first new products that Intel will produce on its own fabs that incorporate ARM microprocessor technology since offloading the Xscale business to Marvell in 2006.
Intel had acquired the Xscale business, then called StrongARM, after buying Digital Equipment's semiconductor operations in the late 1990s.
Meanwhile, Intel completed the acquisition ofAltera in December 2015, when CEO BrianKrzanich said: "We will apply Moore's Law to grow today's FPGA business, and we'll invent new products that make amazing experiences of the future possible - experiences like autonomous driving and machine learning."
This is not the first time that a chip design company has blended memory with switching fabric. The Xilinx Zynq-7000 is an all-programmable SoC comprising two 32-bit ARM Cortex-A9 cores, an FPGA and a number of controller cores to handle Ethernet, USB and other controllers.

Nobody tell Linux, okay?
Intel's followed up on its acquisition of Altera by baking a microprocessor into a field-programmable gate array (FPGA).
The Stratix 10 family is part of the company's push beyond its stagnating PC-and-servers homeland into emerging markets like high-performance computing and software-defined networking.

Intel says the quad-core 64-bit ARM Cortex-A53 processor helps position the device for “high-end compute and data-intensive applications ranging from data centres, network infrastructure, cloud computing, and radar and imaging systems.”


Compared to the Stratix V, Altera's current generation before the Chipzilla slurp, Intel says the Stratix 10 has five times the density and twice the performance; 70 per cent lower power consumption at equivalent performance; 10 Tflops (single precision); and 1 TBps memory bandwidth.

The devices will be pitched at acceleration and high-performance networking kit.
The Stratix 10 “Hyperflex architecture” uses bypassable registers – yes, they're called “Hyper-Registers”, which are associated with individual routing segments in the chip, and are available at the inputs of “all functional blocks” like adaptive logic modules (ALMs), embedded memory blocks, and digital signal processing (DSP) blocks.

Designs can bypass individual Hyper-Registers, so design tools can automatically choose the best register location. Intel says this means “performance tuning does not require additional ALM resources … and does not require additional changes or added complexity to the design's place-and-route.”

The company reckons the design also cuts down on on-chip routing congestion.
There's more on the architecture in this white paper.

Oh, and it's got an on-chip ARM core. Did we mention that? ®

ARM and Fujitsu today announced a scalable vector extension (SVE) to the ARMv8-A architecture intended to enhance ARM capabilities in HPC workloads. Fujitsu is the lead silicon partner in the effort (so far) and will use ARM with SVE technology in its post K computer, Japan’s next flagship supercomputer planned for the 2020 timeframe. This is an important incremental step for ARM, which seeks to push more aggressively into mainstream and HPC server markets.
ARM with SVE technology

Fujitsu first announced plans to adopt ARM for the post K machine – a switch from SPARC processor technology used in the K computer – at ISC2016 and said at the time that it would reveal more at Hot Chips about the ARM development effort needed. Bull Atos is also developing an ARM-based supercomputer.

The SVE is focused on addressing “next generation high performance computing challenges and by that we mean workloads typically found in scientific computing environment where they are very parallelizable,” said Ian Smythe, director of marketing programs, ARM Compute Products Group, in a pre-briefing. SVE is scalable from 128-bits to 2048-bits in 128-bit increments and, among other things, should enhance ARM’s ability to exploit fine grain parallelism.
ARM’s ability benefits and HPC server markets
Nigel Stephens, lead ISA architect and ARM Fellow, provided more technical detail in his blog (Technology Update: The Scalable Vector Extension (SVE) for the ARMv8-A Architecture, link below) coinciding with his Hot Chips presentation. It’s worth reading for a fast but substantial summary.

“Rather than specifying a specific vector length, SVE allows CPU designers to choose the most appropriate vector length for their application and market, from 128 bits up to 2048 bits per vector register,” wrote Stephens. “SVE also supports a vector-length agnostic (VLA) programming model that can adapt to the available vector length. Adoption of the VLA paradigm allows you to compile or hand-code your program for SVE once, and then run it at different implementation performance points, while avoiding the need to recompile or rewrite it when longer vectors appear in the future. This reduces deployment costs over the lifetime of the architecture; a program just works and executes wider and faster.

“Scientific workloads, mentioned earlier, have traditionally been carefully written to exploit as much data-level parallelism as possible with careful use of OpenMP pragmas and other source code annotations. It’s therefore relatively straightforward for a compiler to vectorize such code and make good use of a wider vector unit. Supercomputers are also built with the wide, high- bandwidth memory systems necessary to feed a longer vector unit,” wrote Stephens.

He notes that scientific workloads have traditionally been written to exploit as much data-level parallelism as possible with careful use of OpenMP pragmas and other source code annotations. “It’s relatively straightforward for a compiler to vectorize such code and make good use of a wider vector unit. Supercomputers are also built with the wide, high- bandwidth memory systems necessary to feed a longer vector unit.”
ARM-server-workloads
While HPC is a natural fit for SVE’s longer vectors, said Stephens, it also offers an opportunity to improve vectorizing compilers that will be of general benefit over the longer term as other systems scale to support increased data level parallelism.

Amplifying on the point, he wrote, “It is worth noting at this point that Amdahl’s Law tells us that the theoretical limit of a task’s speedup is governed by the amount of unparallelizable code. If you succeed in vectorizing 10 percent of your execution and make that code run four times faster (e.g. a 256-bit vector allows 4x64b parallel operations), then you reduce 1000 cycles down to 925 cycles and provide a limited speedup for the power and area cost of the extra gates. Even if you could vectorize 50 percent of your execution infinitely (unlikely!) you’ve still only doubled the overall performance. You need to be able to vectorize much more of your program to realize the potential gains from longer vectors.”

The ARMv7 Advanced SIMD (aka the ARM NEON) is now about 12 years old and was originally intended to accelerate media processing tasks on the main processor. With the move to AArch64, NEON gained full IEEE double-precision float, 64-bit integer operations, and grew the register file to thirty-two 128-bit vector registers. These changes, says Stephens, made NEON a better compiler target for general-purpose compute. SVE is a complementary extension that does not replace NEON, and was developed specifically for vectorization of HPC scientific workloads, he says.
Snapshot of new SVE features compared to NEON:
  • Scalable vector length (VL)
  • VL agnostic (VLA) programming
  • Gather-load & Scatter-store
  • Per-lane predication
  • Predicate-driven loop control and management
  • Vector partitioning and SW managed speculation
  • Extended integer and floating- point horizontal reductions
  • Scalarized intra-vector sub-loops
Smythe emphasized, “If you compile the code for SVE it will run on any implementation of SVE regardless of the width, whether 128 or 1024 or 2048, and the hardware implementation, that code will run on ARM architecture as a binary. That’s important and gives us scalability and compatibility into the future for the compilers and the code that HPC guys are writing.”
 ARM ecosystem 
ARM has been steadily working to expand its ecosystem (shown here) with hopes of capturing a chunk of the broader x86 market. It has notable wins in many market segments, although the market traction has been tougher to gauge, and it is only in the past couple of years that server chips started to become available. Many design wins have been niche oriented; one example is an HPE ARM-based storage server (StoreVirtual 3200) announced earlier this month. ARM, of course, is a juggernaut in mobile computing.

Prior to the Hot Chips conference, with its distinctly technical focus, ARM was pre-briefing some of the HPC community about SVE and using the opportunity to reinforce its mission of growth, its success in ecosystem building, and to bask in some of the glory of the post K computer win. Given the recent acquisition of ARM by SoftBank, it will be interesting to watch how the marketing and technical activities change, if at all.

Lakshmi Mandyam, senior marketing director, ARM Server Programs, said, “We’ve been focusing on enabling some base market segments to establish some beachheads and enable our partners to get adoption in those key areas. Also we have also been using key end users to drive our approach in terms of ecosystem enablement because clearly we are catching up with x86 in terms of software enablement.”

“The move to open source and consuming applications and workloads through [as-a-service models] is really driving a lot of disruption of the industry. It also presents an opportunity because a lot of those platforms are based on open source and Linux and or intermediate middleware and so the dependency on the legacy (x86) software and architectures is gone. That presents an opportunity to ARM.”

It’s also important, she said, to recognize that many modern workloads, even in HPC, are moving towards the scale out model as opposed to a purely scale up. Many of those applications are driven by IO and memory performance. “This where the ARM partnership can shine because we are able to deliver heterogeneous computing quite easily and we’re able to deliver optimized algorithm processing quite easily. If you look at a lot of these applications, it’s not about spec and benchmark performance; it’s about what can you deliver in my application.”

“When you think about Fujitsu, as they talked about the post K computer, a lot of the folks are looking for this really tuned performance, to take a codesign approach where they are looking at the entire problem, and to deliver an application and service for a given problem. This is where their ability to tune platforms down to the silicon level pays big dividends,” she said.

Here’s a link to Nigel Stephens’ blog on the ARM SVE anouncment (Technology Update: The Scalable Vector Extension (SVE) for the ARMv8-A Architecture): https://community.arm.com/groups/processors/blog/2016/08/22/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture


ARM's new supercomputer chip design with vector extensions will be in Japan's Post-K computer, which will be deployed in 2020


ARM's new weapon


ARM conquered the mobile market starting with Apple’s iPhone, and now wants to be in the world’s fastest computers.

A new ARM chip design being announced on Monday is targeted at supercomputers, a lucrative market in which the company has no presence. ARM’s new chip design, which has mobile origins, has extensions and tweaks to boost computing power.

The announcement comes a few weeks after Japanese company Softbank said it would buy ARM for a mammoth $32 billion. With the cash, ARM is expected to sharpen its focus on servers and the internet of things.

ARM’s new chip design will help the company on two fronts. ARM is sending a warning to Intel, IBM and other chip makers that it too can develop fast supercomputing chips. The company will also join a race among countries and chip makers to build the world’s fastest computers.

The chip design is being detailed at the Hot Chips conference in Cupertino, California, on Monday.
Countries like the U.S., Japan and China  want to be the first to reach the exascale computing threshold, in which a supercomputer delivers 1 exaflop of performance (a million trillion calculations per second). Intel, IBM and Nvidia have also been pushing the limits of chip performance to reach that goal.

Following Softbank’s agreement to buy ARM, it should come as no surprise that the first supercomputer based on the new chip design will be installed in Japan. The Post-K supercomputer will be developed by Fujitsu, which dropped a bombshell in June when it dropped its trusty SPARC architecture in favor of ARM for high-performance computers. Fujitsu aided ARM in the development of the new chip.

Post-K will be 50 to 100 times speedier than its predecessor, the K Computer, which is currently the fifth fastest computer in the world. The K Computer delivers 10.5 petaflops of peak performance with the Fujitsu-designed SPARC64 VIIIfx processor.

The new ARM processor design will be based on the 64-bit ARM-v8A architecture and have vector processing extensions called Scalable Vector Extension. Vector processors drove early supercomputers, which then shifted over to less expensive IBM RISC chips in the early 1990s, and on to general-purpose x86 processors, which are in most high-performance servers today.
In 2013, researchers said less expensive smartphone chips, like the ones from ARM, would ultimately replace x86 processors in supercomputers. But history has turned, and the growing reliance on vector processing is seeing a resurgence with ARM’s new chip design and Intel’s Xeon Phi supercomputing chip.

The power-efficient chip design from ARM could crank up performance while reducing power consumption. Supercomputing speed is growing at a phenomenal rate, but the power consumption isn’t coming down as quickly.

ARM’s chip design will also be part of an influx of alternative chip architectures outside x86 and IBM’s Power entering supercomputing. The world’s fastest supercomputer called the Sunway TaihuLight has a homegrown ShenWei processor developed by China. It offers peak performance of 125.4 petaflops.

ARM has struggled in servers for half a decade now, and the new chip design could give it a better chance of competing against Intel, which dominates data centers. Large server clusters are being built for machine learning, which could use the low-precision calculations provided by a large congregation of ARM chips with vector extensions.

ARM servers are already available, but aren’t being widely adopted. Dell and Lenovo are testing ARM servers, and said they would ship products when demand grows, which hasn’t happened yet.
ARM server chip makers are also struggling and hanging on with the hope the market will take off someday. AMD, which once placed its server future on ARM chips, has reverted back to x86 chips as it re-enters servers. Qualcomm is testing its ARM server chip with cloud developers, and won’t release a chip until the market is viable. AppliedMicro scored a big win with Hewlett Packard Enterprise, which is using the ARM server chips in storage systems. Other ARM server chip makers include Broadcom and Cavium. (Know More)

Welcome to ARM Technology
Powered by Blogger.

Latest News

Newsletter

Subscribe Our Newsletter

Enter your email address below to subscribe to our newsletter.

- Copyright © ARM Tech -Robotic Notes- Powered by Blogger - Designed by HPC Appliances -