NVIDIA's cuML Enhances Tree-Based Model Inference with Forest Inference Library

Darius Baruo   Jun 05, 2025 15:57  UTC 07:57

0 Min Read

NVIDIA has announced significant updates to its Forest Inference Library (FIL) as part of the cuML 25.04 release, aimed at supercharging the performance of tree-based model inference. This enhancement focuses on achieving faster and more efficient inference for gradient-boosted trees and random forests, particularly when trained in frameworks like XGBoost, LightGBM, and scikit-learn, according to NVIDIA.

New Features and Optimizations

One of the key updates includes a redesigned C++ implementation that supports batched inference on both GPU and CPU. The updated FIL boasts an optimize() function for tuning inference models and introduces advanced inference APIs such as predict_per_tree and apply. Notably, the new version promises up to a fourfold increase in GPU throughput compared to the previous FIL version.

The auto-optimization feature is a standout, simplifying the process of fine-tuning performance with a built-in method that adjusts hyperparameters for optimal performance based on batch size. This is particularly beneficial for users aiming to leverage FIL's capabilities without the need for extensive manual configuration.

Performance Benchmarks

In performance tests, cuML 25.04 demonstrated significant speed improvements over its predecessor. Across a variety of model parameters and batch sizes, the new FIL version outperformed the previous one in 75% of scenarios, achieving a median speedup of 1.16x. The enhancements were particularly evident in scenarios requiring batch size 1 performance and maximum throughput.

Compared to scikit-learn's native execution, FIL's performance was notably superior, with speedups ranging from 13.9x to 882x, depending on the model and batch size. These improvements highlight FIL's potential to replace more resource-intensive CPU setups with a single GPU, offering both speed and cost efficiency.

Broad Applicability and Future Developments

The versatility of FIL in cuML 25.04 is underscored by its ability to operate on systems without NVIDIA GPUs, enabling local testing and deployment flexibility. The library supports both GPU and CPU environments, making it suitable for a wide range of applications, from high-volume batch jobs to hybrid deployment scenarios.

Looking ahead, NVIDIA plans to integrate these capabilities into future releases of the Triton Inference Server, further expanding the reach and utility of the FIL. Users can explore these enhancements by downloading the cuML 25.04 release, with upcoming blog posts expected to delve deeper into the technical details and provide additional benchmarks.

For more information on the Forest Inference Library and its capabilities, interested parties can refer to the cuML FIL documentation.



Read More