AI Startup Groq Designs Chip to Support Performance Demands of Leading AI and ML Applications

2 weeks ago by Luke James

Groq, a high-profile AI startup, has recently claimed that it has developed a single chip architecture capable of delivering 1PetraOps/second.

Groq, the AI hardware startup founded in 2016 with its roots in the engineering team behind Google’s original Tensor Processing Unit (TPU) has recently claimed that it has developed a single chip architecture that is capable of delivering 1 PetraOps per second of computing power.

 

Architecture Designed for Cutting-Edge Computing

Groq’s CEO, Jonathan Ross, said in an announcement, “Top GPU companies have been telling customers that they’d hoped to be able to deliver one PetaOp/s performance within the next few years; Groq is announcing it today…”.

Many multiples faster than any existing else currently available, Groq’s new architecture, the Tensor Streaming Processor (TSP), has been a few years in the making. Just two years ago, the company recruited eight of the 10 engineers that had worked on Google’s own TPU.

 

Groq PCle board demonstrating TSP architecture.

The PCle board with TSP architecture that is currently being tested by Groq's customers. Image Credit: Groq. 

 

Equivalent to one quadrillion operations per second (or 1 PetraOps/s) and capable of delivering 250 trillion FLOPS, Groq’s TSP architecture has been designed to meet the performance requirements of heavy workloads such as AI, machine learning, and computer vision. It can support both new and old machine learning models and can be deployed on both x86 and non-x86 systems.

Software-first, Groq’s TSP architecture is allegedly able to achieve both compute flexibility and parallelism without the syncing overhead of traditional GPU and CPU architectures.

 

Architecture Helping to Streamline Deployment

Groq’s TSP is a very different architecture to existing processors and those being developed by existing start-ups. 

Designed to be a powerful single-threaded processor with an instruction set that takes advantage of tensor manipulation and movement, it enables machine learning models to be executed more efficiently. This is achieved not only through TSP’s hardware but also its software-first model which compiles tensor flow models into independent instruction streams that are coordinated ahead of time. 

In comparison to traditional CPU, GPU, and FPGA architectures, Groq’s TSP architecture also streamlines qualification and deployment. This allows customers to quickly and easily implement high-performance and scalable systems. 

At the moment, Groq is sampling its architecture with customers using a PCIe accelerator card. It is expected that Groq will make more information available in the coming months and that its TSP architecture will expand into other platforms. 

Comments