RAPIDS Introduces GPU Polars Streaming and Unified GNN API Enhancements
Tony Kim Jul 04, 2025 21:25
NVIDIA's RAPIDS suite version 25.06 unveils new features including GPU Polars streaming, a unified GNN API, and zero-code ML speedups, enhancing Python data science capabilities.

NVIDIA has announced the latest version 25.06 of its RAPIDS suite, a collection of CUDA-X libraries for Python data science. This release introduces several groundbreaking features designed to enhance computational efficiency and data processing capabilities, according to NVIDIA.
Polars GPU Engine Enhancements
The new release brings significant updates to the Polars GPU engine, initially launched in September 2024. One of the key features is the experimental streaming executor, which allows execution on datasets larger than the available VRAM through data partitioning and parallel processing. This development is crucial for accelerating analytics operations on extremely large datasets, scaling from hundreds of gigabytes to terabytes. Additionally, the update introduces a shuffle mechanism to facilitate data redistribution between devices and support multi-GPU execution.
Another enhancement includes support for rolling aggregations and expanded column manipulation capabilities, which are particularly beneficial for time series data analysis. The GPU engine now also supports a wider range of expressions for datetime column manipulation, such as .strftime()
and .cast_time_unit()
.
Unified API for Graph Neural Networks (GNNs)
The integration of WholeGraph into NVIDIA’s cuGraph-PyG has led to the creation of a Unified API, which accelerates feature fetching for GNNs. This API allows users to seamlessly transition from a single GPU to multi-GPU or multi-node workflows without modifying their scripts. The familiar torchrun
command from PyTorch is used to manage processes, facilitating ease of use for PyTorch users.
Zero-Code Change ML Enhancements
The RAPIDS 25.06 release expands its zero-code-change acceleration for machine learning by including support vector machines (SVMs) in the cuML library. This allows existing scikit-learn workflows using SVMs to benefit from GPU acceleration without any code modifications. The update improves compatibility with scikit-learn, enhancing parameter validation and error handling.
Additional Platform and Compatibility Updates
The release also includes upgrades to the RAPIDS Memory Manager (RMM), which now supports the hardware-based decompression engine on NVIDIA Blackwell GPUs. This feature promises performance improvements in IO-intensive workflows. Furthermore, the platform now supports Python 3.13, marking the last release to support CUDA 11.
Overall, the RAPIDS 25.06 release delivers significant advancements for data scientists and developers, focusing on enhanced performance and ease of use for GPU-accelerated data processing tasks.
Image source: Shutterstock