Artwork

Kandungan disediakan oleh Nicolay Gerold. Semua kandungan podcast termasuk episod, grafik dan perihalan podcast dimuat naik dan disediakan terus oleh Nicolay Gerold atau rakan kongsi platform podcast mereka. Jika anda percaya seseorang menggunakan karya berhak cipta anda tanpa kebenaran anda, anda boleh mengikuti proses yang digariskan di sini https://ms.player.fm/legal.
Player FM - Aplikasi Podcast
Pergi ke luar talian dengan aplikasi Player FM !

RAG at Scale: The problems you will encounter and how to prevent (or fix) them | S2 E4

50:09
 
Kongsi
 

Manage episode 439548432 series 3585930
Kandungan disediakan oleh Nicolay Gerold. Semua kandungan podcast termasuk episod, grafik dan perihalan podcast dimuat naik dan disediakan terus oleh Nicolay Gerold atau rakan kongsi platform podcast mereka. Jika anda percaya seseorang menggunakan karya berhak cipta anda tanpa kebenaran anda, anda boleh mengikuti proses yang digariskan di sini https://ms.player.fm/legal.

Hey! Welcome back.

Today we look at how we can get our RAG system ready for scale.

We discuss common problems and their solutions, when you introduce more users and more requests to your system.

For this we are joined by Nirant Kasliwal, the author of fastembed.

Nirant shares practical insights on metadata extraction, evaluation strategies, and emerging technologies like Colipali. This episode is a must-listen for anyone looking to level up their RAG implementations.

"Naive RAG has a lot of problems on the retrieval end and then there's a lot of problems on how LLMs look at these data points as well."

"The first 30 to 50% of gains are relatively quick. The rest 50% takes forever."

"You do not want to give the same answer about company's history to the co-founding CEO and the intern who has just joined."

"Embedding similarity is the signal on which you want to build your entire search is just not quite complete."

Key insights:

  • Naive RAG often fails due to limitations of embeddings and LLMs' sensitivity to input ordering.
  • Query profiling and expansion:
    • Use clustering and tools like latent Scope to identify problematic query types
    • Expand queries offline and use parallel searches for better results
  • Metadata extraction:
    • Extract temporal, entity, and other relevant information from queries
    • Use LLMs for extraction, with checks against libraries like Stanford NLP
  • User personalization:
    • Include user role, access privileges, and conversation history
    • Adapt responses based on user expertise and readability scores
  • Evaluation and improvement:
    • Create synthetic datasets and use real user feedback
    • Employ tools like DSPY for prompt engineering
  • Advanced techniques:
    • Query routing based on type and urgency
    • Use smaller models (1-3B parameters) for easier iteration and error spotting
    • Implement error handling and cross-validation for extracted metadata

Nirant Kasliwal:

Nicolay Gerold:

query understanding, AI-powered search, Lambda Mart, e-commerce ranking, networking, experts, recommendation, search

  continue reading

33 episod

Artwork
iconKongsi
 
Manage episode 439548432 series 3585930
Kandungan disediakan oleh Nicolay Gerold. Semua kandungan podcast termasuk episod, grafik dan perihalan podcast dimuat naik dan disediakan terus oleh Nicolay Gerold atau rakan kongsi platform podcast mereka. Jika anda percaya seseorang menggunakan karya berhak cipta anda tanpa kebenaran anda, anda boleh mengikuti proses yang digariskan di sini https://ms.player.fm/legal.

Hey! Welcome back.

Today we look at how we can get our RAG system ready for scale.

We discuss common problems and their solutions, when you introduce more users and more requests to your system.

For this we are joined by Nirant Kasliwal, the author of fastembed.

Nirant shares practical insights on metadata extraction, evaluation strategies, and emerging technologies like Colipali. This episode is a must-listen for anyone looking to level up their RAG implementations.

"Naive RAG has a lot of problems on the retrieval end and then there's a lot of problems on how LLMs look at these data points as well."

"The first 30 to 50% of gains are relatively quick. The rest 50% takes forever."

"You do not want to give the same answer about company's history to the co-founding CEO and the intern who has just joined."

"Embedding similarity is the signal on which you want to build your entire search is just not quite complete."

Key insights:

  • Naive RAG often fails due to limitations of embeddings and LLMs' sensitivity to input ordering.
  • Query profiling and expansion:
    • Use clustering and tools like latent Scope to identify problematic query types
    • Expand queries offline and use parallel searches for better results
  • Metadata extraction:
    • Extract temporal, entity, and other relevant information from queries
    • Use LLMs for extraction, with checks against libraries like Stanford NLP
  • User personalization:
    • Include user role, access privileges, and conversation history
    • Adapt responses based on user expertise and readability scores
  • Evaluation and improvement:
    • Create synthetic datasets and use real user feedback
    • Employ tools like DSPY for prompt engineering
  • Advanced techniques:
    • Query routing based on type and urgency
    • Use smaller models (1-3B parameters) for easier iteration and error spotting
    • Implement error handling and cross-validation for extracted metadata

Nirant Kasliwal:

Nicolay Gerold:

query understanding, AI-powered search, Lambda Mart, e-commerce ranking, networking, experts, recommendation, search

  continue reading

33 episod

Semua episod

×
 
Loading …

Selamat datang ke Player FM

Player FM mengimbas laman-laman web bagi podcast berkualiti tinggi untuk anda nikmati sekarang. Ia merupakan aplikasi podcast terbaik dan berfungsi untuk Android, iPhone, dan web. Daftar untuk melaraskan langganan merentasi peranti.

 

Panduan Rujukan Pantas

Podcast Teratas