Deep Papers
Deep Papers
Arize AI
LibreEval: The Largest Open Source Benchmark for RAG Hallucination Detection
27 minutes Posted Apr 18, 2025 at 9:00 pm.
0:00
27:19
Download MP3
Show notes
For this week's paper read, we actually dive into our own research. We wanted to create a replicable, evolving dataset that can keep pace with model training so that you always know you're testing with data your model has never seen before. We also saw the prohibitively high cost of running LLM evals at scale, and have used our data to fine-tune a series of SLMs that perform just as well as their base LLM counterparts, but at 1/10 the cost. So, over the past few weeks, the Arize team ...