Xilinx matches FPGAs with AI on dedicated platforms eeNews Europe

Date 3rd, Oct 2018
Source eeNews Europe - General News Websites

DESCRIPTION

At the Xilinx Developer Forum, FPGA vendor Xilinx has announced the first iteration of the adaptive compute acceleration platform (ACAP) it had announced in March this year.Versal is the name CEO Victor Peng gave away, talking about the first ACAP, one that combines Scalar Processing Engines, Adaptable Hardware Engines, and Intelligent Engines, all with leading-edge memory and interfacing technologies to deliver powerful heterogeneous acceleration for any application. A recurring theme of course is that Versal ACAP’s hardware and software can be programmed and optimized, not only by hardware developers, but also by software developers, data scientists and alike, thanks to a host of tools, software, libraries, IP, middleware, and frameworks provided by Xilinx. Interestingly, this announcement fittingly comes as if completing Omnitek's AI engines announcement at the same Developer Forum, without ever mentioning the company although Xilinx has a 50% stake in it and could well have been inspired from it to create a dedicated fabric for such AI acceleration IP. The Versal portfolio is built on TSMC’s 7-nanometer FinFET process technology and as promised earlier (see Xilinx promises revolutionary architecture at 7nm), it combine software programmability with domain-specific hardware acceleration and reconfigurability. The portfolio includes six series of devices uniquely architected to deliver scalability and AI inference capabilities for a host of applications across different markets. Namely, those include the Versal Prime series, the Premium series and the HBM series designed for the most demanding applications, and the AI Core series, the AI Edge series, and AI RF series, all three featuring an AI Engine, a new hardware block designed to address the emerging need for low-latency AI inference for a wide variety of applications. The AI Engine is tightly coupled with the Versal Adaptable Hardware Engines to enable whole application acceleration, meaning that both the hardware and software can be tuned to ensure maximum performance and efficiency.  The portfolio debuts with the Versal Prime series, delivering broad applicability across multiple markets, and the Versal AI Core series, which according to Xilinx estimates, delivers a 8X AI inference performance boost versus industry-leading GPUs. The Versal AI Core series is optimized for cloud, networking, and autonomous technology. It comprises five devices, offering 128 to 400 AI Engines. The series includes dual-core Arm Cortex-A72 application processors, dual-core Arm Cortex-R5 real-time processors, 256KB of on-chip memory with ECC, more than 1,900 DSP engines optimized for high-precision floating point with low latency. It also incorporates more than 1.9 million system logic cells combined with more than 130Mb of UltraRAM, up to 34Mb of block RAM, and 28Mb of distributed RAM and 32Mb of new Accelerator RAM blocks, which can be directly accessed from any engine and is unique to the Versal AI series, all to support custom memory hierarchies. The series also includes PCIe Gen4 8-lane and 16-lane, and CCIX host interfaces, power-optimized 32G SerDes, up to 4 integrated DDR4 memory controllers, up to 4 multi-rate Ethernet MACs, 650 high-performance I/Os for MIPI D-PHY, NAND, storage-class memory interfacing and LVDS, plus 78 multiplexed I/Os to connect external components and more than 40 HD I/Os for 3.3V interfacing. All of this is interconnected by a state-of-the-art network-on-chip (NoC) with up to 28 master/slave ports, delivering multi-terabit per-second bandwidth at low latency combined with power efficiency and native software programmability. The Versal Prime series is designed for broad applicability across multiple markets and is optimized for connectivity and in-line acceleration of a diverse set of workloads. This mid-range series is made up of nine devices, each including dual-core Arm Cortex-A72 application processors, dual-core Arm Cortex-R5 real-time processors, 256KB of on-chip memory with ECC, more than 4,000 DSP engines optimized for high-precision floating point with low latency. It also incorporates more than 2 million system logic cells combined with more than 200Mb of UltraRAM, greater than 90Mb of block RAM, and 30Mb of distributed RAM to support custom memory hierarchies. The series also includes PCIe Gen4 8-lane and 16-lane, and CCIX host interfaces, power-optimized 32 gigabits-per-second SerDes and mainstream 58 gigabits-per-second PAM4 SerDes, up to 6 integrated DDR4 memory controllers, up to 4 multi-rate Ethernet MACs, 700 high-performance I/Os for MIPI D-PHY, NAND, and storage-class memory interfaces and LVDS, plus 78 multiplexed I/Os to connect external components, and greater than 40 HD I/O for 3.3V interfacing. All of this is interconnected by a state-of-the-art network-on-chip (NoC) with up to 28 master/slave ports, delivering multi-terabits per-second bandwidth at low latency combined with power efficiency and native software programmability. Xilinx - www.xilinx.com Related articles: Convolutional neural network on FPGA beats all efficiency benchmarks Xilinx promises revolutionary architecture at 7nm Xilinx acquires Chinese machine learning startup