Open-Source "GreenBoost" Driver Aims To Augment NVIDIA GPUs vRAM With System RAM & NVMe To Handle Larger LLMs

NVIDIA GPUs Receive a Memory Boost: Open-Source GreenBoost Extends VRAM with System RAM and NVMe

NVIDIA GPU users may soon be able to run larger AI models than their graphics card’s dedicated video memory (VRAM) allows, thanks to a new open-source project called GreenBoost. Developed by Ferran Duarri, GreenBoost is a Linux kernel module designed to augment GPU VRAM with system RAM and even NVMe storage, offering a potential solution to the growing memory demands of large language models (LLMs).

How GreenBoost Works: A Multi-Tiered Approach

GreenBoost doesn’t replace NVIDIA’s official drivers; instead, it works alongside them. It functions as a CUDA caching layer, transparently expanding memory access for AI workloads. The system utilizes a multi-tiered approach, leveraging system RAM and NVMe storage to handle data that exceeds the GPU’s VRAM capacity.

The core of GreenBoost is a kernel module (`greenboost.ko`) that allocates pinned DDR4 pages and makes them accessible to the GPU as CUDA external memory. Data movement between the GPU and system resources is handled via the PCIe 4.0 x16 link, achieving speeds of around 32 GB/s. A CUDA shim (`libgreenboost_cuda.so`) intercepts memory allocation calls, redirecting large allocations to the kernel module. This process is designed to be seamless, requiring no modifications to existing CUDA software.

According to the developer’s announcement in the NVIDIA Forums, the system includes a watchdog thread to monitor RAM and NVMe pressure, alerting users before potential issues arise. A sysfs interface (`/sys/class/greenboost/greenboost/pool_info`) provides real-time usage monitoring.

Addressing the LLM Memory Challenge

The motivation behind GreenBoost stems from the increasing size of AI models. Duarri specifically aimed to run a 31.8GB model (glm-4.7-flash:q8_0) on a GeForce RTX 5070 with 12GB of VRAM. Traditional methods, like offloading layers to the GPU, resulted in performance drops due to a lack of CUDA coherence in system memory. Reducing model quantization, while an option, can compromise quality.

GreenBoost offers a potential middle ground, allowing users to run larger models without sacrificing performance or quality. The project is particularly relevant as LLMs continue to grow in size, and complexity.

Open Source and GPLv2 Licensed

GreenBoost is released under the GPLv2 license, encouraging community contributions and further development. The experimental code is available on GitLab. The project’s open-source nature allows for transparency and collaborative improvement, potentially accelerating its adoption and refinement.

Potential Impact on Gaming and AI

While initially focused on LLMs, the technology behind GreenBoost could have broader implications. The ability to effectively utilize system RAM and NVMe storage as GPU memory could benefit other memory-intensive applications, including gaming. Early tests suggest significant performance improvements with reduced VRAM usage, hinting at future possibilities for NVIDIA and DirectX AI-powered enhancements for gamers.

FAQ

Q: Will GreenBoost replace my NVIDIA drivers?
A: No, GreenBoost is designed to be complementary to NVIDIA’s official Linux kernel drivers, working alongside them as a dedicated kernel module.

Q: What hardware is required to use GreenBoost?
A: GreenBoost requires an NVIDIA discrete GPU, system RAM, and ideally, NVMe storage for optimal performance.

Q: Is GreenBoost stable for everyday use?
A: As an experimental project, GreenBoost is still under development. Stability and performance may vary depending on the hardware and workload.

Q: What is CUDA coherence?
A: CUDA coherence refers to the efficient access of data by the GPU. System memory typically lacks this coherence, leading to performance drops when used directly for GPU workloads.

Q: Where can I find more information and contribute to the project?
A: You can find the project on GitLab.

Did you know? GreenBoost intercepts CUDA memory allocation calls to redirect large requests to system RAM and NVMe storage, making it appear as if the GPU has more VRAM than it physically does.

Pro Tip: Monitor the `/sys/class/greenboost/greenboost/pool_info` interface to track GreenBoost’s memory usage and ensure optimal performance.

Interested in learning more about the latest advancements in GPU technology and AI? Explore our other articles or subscribe to our newsletter for regular updates.

Open-Source “GreenBoost” Driver Aims To Augment NVIDIA GPUs vRAM With System RAM & NVMe To Handle Larger LLMs

NVIDIA GPUs Receive a Memory Boost: Open-Source GreenBoost Extends VRAM with System RAM and NVMe

How GreenBoost Works: A Multi-Tiered Approach

Addressing the LLM Memory Challenge

Open Source and GPLv2 Licensed

Potential Impact on Gaming and AI

FAQ

Share this:

Related

Trump Demands Aid for Strait of Hormuz as Iran Threatens US Assets & Attacks Escalate

Most Insomnia Meds Don’t Worsen Sleep Apnea

You may also like

Leave a Comment Cancel Reply