83 subscribers
Pergi ke luar talian dengan aplikasi Player FM !
AI testing, benchmarks and evals
Manage episode 462645600 series 2602635
Generative AI's popularity has led to a renewed interest in quality assurance — perhaps unsurprising given the inherent unpredictability of the technology. This is why, over the last year, the field has seen a number of techniques and approaches emerge, including evals, benchmarking and guardrails. While these terms all refer to different things, grouped together they all aim to improve the reliability and accuracy of generative AI.
To discuss these techniques and the renewed enthusiasm for testing across the industry, host Lilly Ryan is joined by Shayan Mohanty, Head of AI Research at Thoughtworks, and John Singleton, Program Manager for Thoughtworks' AI Lab. They discuss the differences between evals, benchmarking and testing and explore both what they mean for businesses venturing into generative AI and how they can be implemented effectively.
Learn more about evals, benchmarks and testing in this blog post by Shayan and John (written with Parag Mahajani): https://www.thoughtworks.com/insights/blog/generative-ai/LLM-benchmarks,-evals,-and-tests
136 episod
Manage episode 462645600 series 2602635
Generative AI's popularity has led to a renewed interest in quality assurance — perhaps unsurprising given the inherent unpredictability of the technology. This is why, over the last year, the field has seen a number of techniques and approaches emerge, including evals, benchmarking and guardrails. While these terms all refer to different things, grouped together they all aim to improve the reliability and accuracy of generative AI.
To discuss these techniques and the renewed enthusiasm for testing across the industry, host Lilly Ryan is joined by Shayan Mohanty, Head of AI Research at Thoughtworks, and John Singleton, Program Manager for Thoughtworks' AI Lab. They discuss the differences between evals, benchmarking and testing and explore both what they mean for businesses venturing into generative AI and how they can be implemented effectively.
Learn more about evals, benchmarks and testing in this blog post by Shayan and John (written with Parag Mahajani): https://www.thoughtworks.com/insights/blog/generative-ai/LLM-benchmarks,-evals,-and-tests
136 episod
Semua episod
×
1 We need to talk about vibe coding 36:53


1 How fitness functions can help us govern and measure AI 42:01




1 Exploring the intersections of software architecture 43:32

1 Who should make software architecture decisions? 35:00

1 Generative AI's uncanny valley: Problem or opportunity? 28:51

1 Using generative AI for legacy modernization 33:19

1 Data contracts: What are they and why do they matter? 37:38

1 In conversation with Thomas Squeo, Thoughtworks CTO for the Americas 33:09

1 Themes from Technology Radar Vol.31 39:39

1 Build Your Own Radar: Using the Technology Radar as a governance tool 37:11

1 Exploring DuckDB: A relational database built for online analytical processing 35:26
Selamat datang ke Player FM
Player FM mengimbas laman-laman web bagi podcast berkualiti tinggi untuk anda nikmati sekarang. Ia merupakan aplikasi podcast terbaik dan berfungsi untuk Android, iPhone, dan web. Daftar untuk melaraskan langganan merentasi peranti.