Artwork

Kandungan disediakan oleh Aaron Francis and Try Hard Studios. Semua kandungan podcast termasuk episod, grafik dan perihalan podcast dimuat naik dan disediakan terus oleh Aaron Francis and Try Hard Studios atau rakan kongsi platform podcast mereka. Jika anda percaya seseorang menggunakan karya berhak cipta anda tanpa kebenaran anda, anda boleh mengikuti proses yang digariskan di sini https://ms.player.fm/legal.
Player FM - Aplikasi Podcast
Pergi ke luar talian dengan aplikasi Player FM !

Building search for AI systems with Chroma CTO Hammad Bashir

1:06:43
 
Kongsi
 

Manage episode 524914090 series 3579868
Kandungan disediakan oleh Aaron Francis and Try Hard Studios. Semua kandungan podcast termasuk episod, grafik dan perihalan podcast dimuat naik dan disediakan terus oleh Aaron Francis and Try Hard Studios atau rakan kongsi platform podcast mereka. Jika anda percaya seseorang menggunakan karya berhak cipta anda tanpa kebenaran anda, anda boleh mengikuti proses yang digariskan di sini https://ms.player.fm/legal.

Hammad Bashir, CTO of Chroma, joins the show to break down how modern vector search systems are actually built from local, embedded databases to massively distributed, object-storage-backed architectures. We dig into Chroma’s shared local-to-cloud API, log-structured storage on object stores, hybrid search, and why retrieval-augmented generation (RAG) isn’t going anywhere.

Follow Hammad:
Twitter/X: https://twitter.com/HammadTime
LinkedIn: https://www.linkedin.com/in/hbashir
Chroma: https://trychroma.com

Follow Aaron:
Twitter/X: https://twitter.com/aarondfrancis
Database School: https://databaseschool.com
Database School YouTube Channel: https://www.youtube.com/@UCT3XN4RtcFhmrWl8tf_o49g (Subscribe today)
LinkedIn: https://www.linkedin.com/in/aarondfrancis
Website: https://aaronfrancis.com - find articles, podcasts, courses, and more.

Chapters:
00:00 – Introduction From high-school ASICs to CTO of Chroma
01:04 – Hammad’s background and why vector search stuck
03:01 – Why Chroma has one API for local and distributed systems
05:37 – Local experimentation vs production AI workflows
08:03 – What “unprincipled data” means in machine learning
10:31 – From computer vision to retrieval for LLMs
13:00 – Exploratory data analysis and why looking at data still matters
16:38 – Promoting data from local to Chroma Cloud
19:26 – Why Chroma is built on object storage
20:27 – Write-ahead logs, batching, and durability
26:56 – Compaction, inverted indexes, and storage layout
29:26 – Strong consistency and reading from the log
34:12 – How queries are routed and executed
37:00 – Hybrid search: vectors, full-text, and metadata
41:03 – Chunking, embeddings, and retrieval boundaries
43:22 – Agentic search and letting models drive retrieval
45:01 – Is RAG dead? A grounded explanation
48:24 – Why context windows don’t replace search
56:20 – Context rot and why retrieval reduces confusion
01:00:19 – Faster models and the future of search stacks
01:02:25 – Who Chroma is for and when it’s a great fit
01:04:25 – Hiring, team culture, and where to follow Chroma

  continue reading

29 episod

Artwork
iconKongsi
 
Manage episode 524914090 series 3579868
Kandungan disediakan oleh Aaron Francis and Try Hard Studios. Semua kandungan podcast termasuk episod, grafik dan perihalan podcast dimuat naik dan disediakan terus oleh Aaron Francis and Try Hard Studios atau rakan kongsi platform podcast mereka. Jika anda percaya seseorang menggunakan karya berhak cipta anda tanpa kebenaran anda, anda boleh mengikuti proses yang digariskan di sini https://ms.player.fm/legal.

Hammad Bashir, CTO of Chroma, joins the show to break down how modern vector search systems are actually built from local, embedded databases to massively distributed, object-storage-backed architectures. We dig into Chroma’s shared local-to-cloud API, log-structured storage on object stores, hybrid search, and why retrieval-augmented generation (RAG) isn’t going anywhere.

Follow Hammad:
Twitter/X: https://twitter.com/HammadTime
LinkedIn: https://www.linkedin.com/in/hbashir
Chroma: https://trychroma.com

Follow Aaron:
Twitter/X: https://twitter.com/aarondfrancis
Database School: https://databaseschool.com
Database School YouTube Channel: https://www.youtube.com/@UCT3XN4RtcFhmrWl8tf_o49g (Subscribe today)
LinkedIn: https://www.linkedin.com/in/aarondfrancis
Website: https://aaronfrancis.com - find articles, podcasts, courses, and more.

Chapters:
00:00 – Introduction From high-school ASICs to CTO of Chroma
01:04 – Hammad’s background and why vector search stuck
03:01 – Why Chroma has one API for local and distributed systems
05:37 – Local experimentation vs production AI workflows
08:03 – What “unprincipled data” means in machine learning
10:31 – From computer vision to retrieval for LLMs
13:00 – Exploratory data analysis and why looking at data still matters
16:38 – Promoting data from local to Chroma Cloud
19:26 – Why Chroma is built on object storage
20:27 – Write-ahead logs, batching, and durability
26:56 – Compaction, inverted indexes, and storage layout
29:26 – Strong consistency and reading from the log
34:12 – How queries are routed and executed
37:00 – Hybrid search: vectors, full-text, and metadata
41:03 – Chunking, embeddings, and retrieval boundaries
43:22 – Agentic search and letting models drive retrieval
45:01 – Is RAG dead? A grounded explanation
48:24 – Why context windows don’t replace search
56:20 – Context rot and why retrieval reduces confusion
01:00:19 – Faster models and the future of search stacks
01:02:25 – Who Chroma is for and when it’s a great fit
01:04:25 – Hiring, team culture, and where to follow Chroma

  continue reading

29 episod

Semua episod

×
 
Loading …

Selamat datang ke Player FM

Player FM mengimbas laman-laman web bagi podcast berkualiti tinggi untuk anda nikmati sekarang. Ia merupakan aplikasi podcast terbaik dan berfungsi untuk Android, iPhone, dan web. Daftar untuk melaraskan langganan merentasi peranti.

 

Panduan Rujukan Pantas

Podcast Teratas
Dengar rancangan ini semasa anda meneroka
Main