Anyscale Teams With NVIDIA to Supercharge Large Language Model Performance

Integration of Ray and Anyscale with NVIDIA AI software accelerates computing speeds, development and deployment of generative AI LLMs and applications.

Latest Engineering Computing News

Latest Engineering Computing Resources

Cut Retrieval-Augmented Generation (RAG) Hallucinations by 50%

Most teams hit the same wall with enterprise AI: LLMs that hallucinate, pipelines that don’t scale, and infrastructure that’s harder to design than the models themselves.
What Is Intelligent BOM Management? A Guide to Smarter Product Development

Learn how intelligent Bill of Materials (BOM) management helps teams collaborate, reduce errors, and bring innovative products to market faster with cloud-based PLM tools.
More Resources

By DE Editors

September 19, 2023

Anyscale, the AI infrastructure company built by the creators of Ray, an open-source unified framework for scalable computing, announces a collaboration with NVIDIA to boost the performance and efficiency of large language model (LLM) development on Ray and the Anyscale Platform for production AI.

The companies are integrating NVIDIA AI software into Anyscale’s scalable computing platforms, including Ray open source, the Anyscale Platform and Anyscale Endpoints, announced separately today.

The open-source integrations will bring NVIDIA software, including NVIDIA TensorRT-LLM, NVIDIA Triton Inference Server and NVIDIA NeMo to Ray to supercharge end-to-end AI development and deployment. Making AI software available via open source democratizes access and increases the audience of developers that can use this integration.

For production AI, the companies will certify the NVIDIA AI Enterprise software suite for the Anyscale Platform, bringing enterprise-grade security, stability and support to companies deploying AI. An additional integration with Anyscale Endpoints will bring support for the NVIDIA software to an expanded pool of AI application developers via easy-to-use application programming interfaces.

“Realizing the incredible potential of generative AI requires computing platforms that help developers iterate quickly and save costs when building and tuning LLMs,” says Robert Nishihara, CEO and co-founder of Anyscale. “Our collaboration with NVIDIA will bring even more performance and efficiency to Anyscale’s portfolio so that developers everywhere can create LLMs and generative AI applications with unprecedented speed and efficiency.”

“LLMs are at the heart of today’s generative AI transformation, and the developers creating and customizing these models require full-stack computing with efficient orchestration throughout the AI life cycle,” says Manuvir Das, vice president of Enterprise Computing at NVIDIA. “The combination of NVIDIA AI and Anyscale unites incredible performance with ease of use and the ability to scale rapidly with success.”

NVIDIA AI Acceleration Speeds Anyscale Development

NVIDIA’s open-source and production software helps boost accelerated computing performance and efficiency for generative AI development. The integration delivers benefits for customers and users:

NVIDIA TensorRT-LLM automatically scales inference to run models in parallel over multiple GPUs, which can provide higher performance when running on NVIDIA H100 Tensor Core GPUs, compared to prior-generation GPUs. These capabilities will bring further acceleration and efficiency to Ray, which results in cost savings for at-scale LLM development, NVIDIA reports.

NVIDIA Triton Inference Server standardizes AI model deployment and execution across every workload. It supports inference across cloud, data center, edge, and embedded devices on GPUs, CPUs, and other processors, maximizing performance and reducing end-to-end latency by running multiple models concurrently to maximize GPU use and throughput for LLMs. These capabilities will add more efficiency for developers deploying AI in production on Ray and the Anyscale Platform.

NVIDIA NeMo is an end-to-end, cloud-native framework for building, customizing and deploying generative AI models anywhere. It includes training and inferencing frameworks, guardrailing toolkits, data curation tools and pretrained models, offering enterprises an easy, cost-effective, and fast way to adopt generative AI.

The integration of NeMo with Ray and the Anyscale Platform will enable developers to fine-tune and customize models with enterprise data.

Anyscale Endpoints is a service that enables developers to integrate fast, cost-efficient and scalable LLMs into their applications using LLM application programming interfaces. Endpoints can be tailored to specific use cases and fine-tuned with additional content and context to serve users’ specific needs.

Availability

NVIDIA AI integrations with Anyscale are under development and expected to be available in Q4.

Sources: Press materials received from the company and additional information gleaned from the company’s website.

More about NVIDIA

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and…

Cut Retrieval-Augmented Generation (RAG) Hallucinations by 50%

Most teams hit the same wall with enterprise AI: LLMs that hallucinate, pipelines that don’t scale, and infrastructure that’s harder to design than the models themselves.

Latest in NVIDIA

About DE Editors

DE's editors contribute news and new product announcements to Digital Engineering. Press releases may be sent to them via [email protected].

Follow DE
on Facebook
on Linkedin

Anyscale Teams With NVIDIA to Supercharge Large Language Model Performance

Integration of Ray and Anyscale with NVIDIA AI software accelerates computing speeds, development and deployment of generative AI LLMs and applications.

Latest Engineering Computing News

Latest Engineering Computing Resources

NVIDIA AI Acceleration Speeds Anyscale Development

Availability

More about NVIDIA

Latest in NVIDIA

About DE Editors

Related Topics

From our Sponsors

Digital Engineering 24/7

Design

Simulate

Additive

Digital Thread

Computing

Resources

Our Partners

Design

Top Story

Latest in Design

Simulation

Top Story

Latest in Simulation

Additive Manufacturing

Top Story

Latest in Additive Manufacturing

Digital Thread

Top Story

Latest in Digital Thread

Engineering Computing

Top Story

Latest in Engineering Computing

Subscribe

Latest Magazine

Latest Special Issue

Previous Special Issue

Anyscale Teams With NVIDIA to Supercharge Large Language Model Performance

Integration of Ray and Anyscale with NVIDIA AI software accelerates computing speeds, development and deployment of generative AI LLMs and applications.

Latest Engineering Computing News

Latest Engineering Computing Resources

NVIDIA AI Acceleration Speeds Anyscale Development

Availability

More about NVIDIA

Latest in NVIDIA

About DE Editors

Related Topics

From our Sponsors

Digital Engineering 24/7

Design

Simulate

Additive

Digital Thread

Computing

Resources

Our Partners