Artwork

Kandungan disediakan oleh Keith Bourne. Semua kandungan podcast termasuk episod, grafik dan perihalan podcast dimuat naik dan disediakan terus oleh Keith Bourne atau rakan kongsi platform podcast mereka. Jika anda percaya seseorang menggunakan karya berhak cipta anda tanpa kebenaran anda, anda boleh mengikuti proses yang digariskan di sini https://ms.player.fm/legal.
Player FM - Aplikasi Podcast
Pergi ke luar talian dengan aplikasi Player FM !

RAG Evaluation with ragas: Reference-Free Metrics & Monitoring

26:47
 
Kongsi
 

Manage episode 524235035 series 3705596
Kandungan disediakan oleh Keith Bourne. Semua kandungan podcast termasuk episod, grafik dan perihalan podcast dimuat naik dan disediakan terus oleh Keith Bourne atau rakan kongsi platform podcast mereka. Jika anda percaya seseorang menggunakan karya berhak cipta anda tanpa kebenaran anda, anda boleh mengikuti proses yang digariskan di sini https://ms.player.fm/legal.

Unlock the secrets to evaluating Retrieval-Augmented Generation (RAG) pipelines effectively and efficiently with ragas, the open-source framework that’s transforming AI quality assurance. In this episode, we explore how to implement reference-free evaluation, integrate continuous monitoring into your AI workflows, and optimize for production scale — all through the lens of Keith Bourne’s comprehensive Chapter 9.

In this episode:

- Overview of ragas and its reference-free metrics that achieve 95% human agreement on faithfulness scoring

- Implementation patterns and code walkthroughs for integrating ragas with LangChain, LlamaIndex, and CI/CD pipelines

- Production monitoring architecture: sampling, async evaluation, aggregation, and alerting

- Comparison of ragas with other evaluation frameworks like DeepEval and TruLens

- Strategies for cost optimization and asynchronous evaluation at scale

- Advanced features: custom domain-specific metrics with AspectCritic and multi-turn evaluation support

Key tools and technologies mentioned:

- ragas (Retrieval Augmented Generation Assessment System)

- LangChain, LlamaIndex

- LangSmith, LangFuse (observability and evaluation tools)

- OpenAI GPT-4o, GPT-3.5-turbo, Anthropic Claude, Google Gemini, Ollama

- Python datasets library

Timestamps:

00:00 - Introduction and overview with Keith Bourne

03:00 - Why reference-free evaluation matters and ragas’s approach

06:30 - Core metrics: faithfulness, answer relevancy, context precision & recall

09:00 - Code walkthrough: installation, dataset structure, evaluation calls

12:00 - Integrations with LangChain, LlamaIndex, and CI/CD workflows

14:30 - Production monitoring architecture and cost considerations

17:00 - Advanced metrics and custom domain-specific evaluations

19:00 - Common pitfalls and testing strategies

20:30 - Closing thoughts and next steps

Resources:

- "Unlocking Data with Generative AI and RAG" by Keith Bourne - Search for 'Keith Bourne' on Amazon and grab the 2nd edition

- Memriq AI: https://Memriq.ai

- ragas website: https://www.ragas.io/

- ragas GitHub repository: https://github.com/vibrantlabsai/ragas (for direct access to code and docs)

Tune in to build more reliable, scalable, and maintainable RAG systems with confidence using open-source evaluation best practices.

  continue reading

22 episod

Artwork
iconKongsi
 
Manage episode 524235035 series 3705596
Kandungan disediakan oleh Keith Bourne. Semua kandungan podcast termasuk episod, grafik dan perihalan podcast dimuat naik dan disediakan terus oleh Keith Bourne atau rakan kongsi platform podcast mereka. Jika anda percaya seseorang menggunakan karya berhak cipta anda tanpa kebenaran anda, anda boleh mengikuti proses yang digariskan di sini https://ms.player.fm/legal.

Unlock the secrets to evaluating Retrieval-Augmented Generation (RAG) pipelines effectively and efficiently with ragas, the open-source framework that’s transforming AI quality assurance. In this episode, we explore how to implement reference-free evaluation, integrate continuous monitoring into your AI workflows, and optimize for production scale — all through the lens of Keith Bourne’s comprehensive Chapter 9.

In this episode:

- Overview of ragas and its reference-free metrics that achieve 95% human agreement on faithfulness scoring

- Implementation patterns and code walkthroughs for integrating ragas with LangChain, LlamaIndex, and CI/CD pipelines

- Production monitoring architecture: sampling, async evaluation, aggregation, and alerting

- Comparison of ragas with other evaluation frameworks like DeepEval and TruLens

- Strategies for cost optimization and asynchronous evaluation at scale

- Advanced features: custom domain-specific metrics with AspectCritic and multi-turn evaluation support

Key tools and technologies mentioned:

- ragas (Retrieval Augmented Generation Assessment System)

- LangChain, LlamaIndex

- LangSmith, LangFuse (observability and evaluation tools)

- OpenAI GPT-4o, GPT-3.5-turbo, Anthropic Claude, Google Gemini, Ollama

- Python datasets library

Timestamps:

00:00 - Introduction and overview with Keith Bourne

03:00 - Why reference-free evaluation matters and ragas’s approach

06:30 - Core metrics: faithfulness, answer relevancy, context precision & recall

09:00 - Code walkthrough: installation, dataset structure, evaluation calls

12:00 - Integrations with LangChain, LlamaIndex, and CI/CD workflows

14:30 - Production monitoring architecture and cost considerations

17:00 - Advanced metrics and custom domain-specific evaluations

19:00 - Common pitfalls and testing strategies

20:30 - Closing thoughts and next steps

Resources:

- "Unlocking Data with Generative AI and RAG" by Keith Bourne - Search for 'Keith Bourne' on Amazon and grab the 2nd edition

- Memriq AI: https://Memriq.ai

- ragas website: https://www.ragas.io/

- ragas GitHub repository: https://github.com/vibrantlabsai/ragas (for direct access to code and docs)

Tune in to build more reliable, scalable, and maintainable RAG systems with confidence using open-source evaluation best practices.

  continue reading

22 episod

Semua episod

×
 
Loading …

Selamat datang ke Player FM

Player FM mengimbas laman-laman web bagi podcast berkualiti tinggi untuk anda nikmati sekarang. Ia merupakan aplikasi podcast terbaik dan berfungsi untuk Android, iPhone, dan web. Daftar untuk melaraskan langganan merentasi peranti.

 

Panduan Rujukan Pantas

Podcast Teratas
Dengar rancangan ini semasa anda meneroka
Main