Apple and NVIDIA Collaborate on ‘ReDrafter’ Technique to Boost Large Language Model Speed

Apple has teamed up with NVIDIA in an intriguing partnership aimed at advancing the performance of Large Language Models (LLMs). In a surprising move, the two tech giants have developed and open-sourced a new technique called Recurrent Drafter (ReDrafter), which promises to significantly speed up text generation while reducing latency and power usage.

ReDrafter: A Breakthrough in LLM Inference

ReDrafter leverages two core techniques: beam search and tree attention, both designed to optimize text generation in LLMs. After rigorous internal testing, Apple collaborated with NVIDIA to integrate ReDrafter into the TensorRT-LLM inference acceleration framework, making it compatible with NVIDIA GPUs.

According to Apple's blog post:

"ReDrafter achieves state-of-the-art performance, delivering up to a 2.7x speed-up in token generation per second for greedy decoding in production models. This results in reduced latency, lower GPU usage, and significant power savings, making it an ideal choice for scaling LLM deployments."

Features of ReDrafter

  • Speed: Faster token generation with up to 2.7x improvement in performance.

  • Efficiency: Reduced power consumption and GPU requirements.

  • Versatility: Integrated into NVIDIA's TensorRT-LLM framework, benefiting developers leveraging NVIDIA GPUs for LLM applications.

Implications of the Collaboration

This unexpected collaboration underscores a shared goal between Apple and NVIDIA: pushing the boundaries of AI performance. While Apple traditionally relies on its custom silicon for AI tasks, this joint effort highlights the potential of short-term partnerships to accelerate innovation. However, due to historical friction between the two companies, a long-term collaboration remains unlikely.

For now, ReDrafter is a promising advancement for both researchers and developers working with large-scale AI models, particularly those looking to optimize performance on NVIDIA's GPU platforms.

Angel Morales

Founder and lead writer at Duck-IT Tech News, and dedicated to delivering the latest news, reviews, and insights in the world of technology, gaming, and AI. With experience in the tech and business sectors, combining a deep passion for technology with a talent for clear and engaging writing

Previous
Previous

Qualcomm Claims Custom Oryon Cores Contain Less Than 1% ARM Technology Amid Legal Battle

Next
Next

You Can Now Call ChatGPT or Use It on WhatsApp for Unmatched Accessibility