NVIDIA Unveils Llama 3.1 AI Models for Enterprise Applications

Joerg Hiller Jul 24, 2024 11:16 UTC 03:16

3 Min Read

The newly unveiled Llama 3.1 collection of 8B, 70B, and 405B large language models (LLMs) by NVIDIA is closing the gap between proprietary and open-source models. This development is attracting more developers and enterprises to integrate these models into their AI applications, according to the NVIDIA Technical Blog.

Capabilities of Llama 3.1

These models excel at various tasks including content generation, coding, and deep reasoning. They can be used to power enterprise applications for use cases like chatbots, natural language processing, and language translation. The Llama 3.1 405B model, thanks to its extensive training data, is particularly suited for generating synthetic data to tune other LLMs, which is beneficial in industries such as healthcare, finance, and retail where real-world data is often restricted due to compliance requirements.

Additionally, Llama 3.1 405B can be tuned with domain-specific data to serve enterprise use cases, enabling better accuracy and customization for organizational requirements, including domain knowledge, company vocabulary, and cultural nuances.

Build Custom Generative AI Models with NVIDIA AI Foundry

NVIDIA AI Foundry is a platform and service designed for building custom generative AI models with enterprise data and domain-specific knowledge. Similar to how TSMC manufactures chips designed by other companies, NVIDIA AI Foundry allows organizations to develop their own AI models. This includes NVIDIA-created AI models like Nemotron and Edify, popular open foundation models, NVIDIA NeMo software for customizing models, and dedicated capacity on NVIDIA DGX Cloud.

The foundry outputs performance-optimized custom models packaged as NVIDIA NIM inference microservices for easy deployment on any accelerated cloud, data center, or workstation.

Generate Proprietary Synthetic Domain Data with Llama 3.1

Enterprises often face challenges with the lack of domain data or data accessibility due to compliance and security requirements. The Llama 3.1 405B model is ideal for synthetic data generation due to its enhanced ability to recognize complex patterns, generate high-quality data, generalize well, scale efficiently, reduce bias, and preserve privacy.

Nemotron-4 340B Reward model judges the data generated by the Llama 3.1 405B model, scoring it across various categories and filtering out lower-scored data to provide high-quality datasets that align with human preferences. This model has achieved best-in-class performance with an overall score of 92.0 on the RewardBench leaderboard.

Curate, Customize, and Evaluate Models with NVIDIA NeMo

NVIDIA NeMo is an end-to-end platform for developing custom generative AI models. It includes tools for training, customization, retrieval-augmented generation (RAG), guardrailing and toolkits, data curation tools, and model pretraining. NeMo supports several parameter-efficient fine-tuning techniques, such as p-tuning, low-rank adaption (LoRA), and its quantization version (QLoRA).

NeMo also supports supervised fine-tuning (SFT) and alignment techniques such as reinforcement learning from human feedback (RLHF), direct preference optimization (DPO), and NeMo SteerLM. These techniques enable steering the model responses and aligning them with human preferences, making the LLMs ready to integrate into customer-facing applications.

High-Performance Inference with NVIDIA NIM

The custom models from the AI Foundry can be packaged as an NVIDIA NIM inference microservice, part of NVIDIA AI Enterprise, for secure, reliable deployment of high-performance inferencing across the cloud, data center, and workstations. Supporting a wide range of AI models, including open foundation models, it ensures seamless, scalable AI inferencing using industry-standard APIs.

Use NIM for local deployment with a single command or autoscale on Kubernetes on NVIDIA accelerated infrastructure, anywhere. Get started with a simple guide to NIM deployment. Additionally, NIMs also support deployments of models customized with LoRA.

Start Building Your Custom Models

Depending on where you are in your AI journey, there are different ways to get started. To build a custom Llama NIM for your enterprise, learn more at NVIDIA AI Foundry. Experience the new Llama 3.1 NIMs and other popular foundation models at ai.nvidia.com. You can access the model endpoints directly or download the NIMs and run them locally.

News ▸

NVIDIA Unveils Llama 3.1 AI Models for Enterprise Applications

Capabilities of Llama 3.1

Build Custom Generative AI Models with NVIDIA AI Foundry

Generate Proprietary Synthetic Domain Data with Llama 3.1

Curate, Customize, and Evaluate Models with NVIDIA NeMo

High-Performance Inference with NVIDIA NIM

Start Building Your Custom Models

Read More

NVIDIA Introduces NeMo Retriever for Enhanced RAG Pipelines

NVIDIA Enhances Meta's Llama 3.1 with Advanced GPU Optimization

Meta Partners with Together AI to Launch High-Performance Llama 3.1 Models

NVIDIA Unveils AI Foundry for Custom Enterprise Generative AI Models

NVIDIA Unveils NeMo Retriever Microservices to Enhance AI Accuracy and Throughput