Arxiv Paper - Long Context RAG Performance of Large Language Models

In this episode, we discuss Long Context RAG Performance of Large Language Models by Quinn Leng, Jacob Portes, Sam Havens, Matei Zaharia, Michael Carbin. The paper examines the effects of long context lengths on Retrieval Augmented Generation (RAG) in large language models, especially with models supporting contexts over 64k tokens like Anthropic Claude and GPT-4-turbo. Experiments across 20 LLMs and varying context lengths revealed that only the advanced models maintain accuracy beyond this threshold. Additionally, the study highlights limitations and failure modes in RAG with extended context lengths, suggesting areas for future research.

Arxiv Paper – Long Context RAG Performance of Large Language Models