AI testing, benchmarks and evals

Thoughtworks Technology Podcast

Player FM - Internet Radio Done Right

83 subscribers

Ditambah five tahun yang lalu

Kandungan disediakan oleh Thoughtworks. Semua kandungan podcast termasuk episod, grafik dan perihalan podcast dimuat naik dan disediakan terus oleh Thoughtworks atau rakan kongsi platform podcast mereka. Jika anda percaya seseorang menggunakan karya berhak cipta anda tanpa kebenaran anda, anda boleh mengikuti proses yang digariskan di sini https://ms.player.fm/legal.

The Agile Brand with Greg Kihlström®

1
#657: Augmenting front-line employees with AI for better experiences, with Fabrice Martin, Medallia 22:42

14 days ago22:42

Main Kemudian

Senarai

Suka

Disukai

22:42

We are here recording live at Medallia Experience at the Wynn in Las Vegas, and have been seeing and hearing some amazing things about how AI can enhance the customer experience as well as enable teams at organizations to create more meaningful connections with customers. Today we’re going to talk about how AI can help to create better experiences for customers before, during, and after their interactions. To help me discuss this topic, I’d like to welcome Fabrice Martin, Chief Product Officer at Medallia. RESOURCES Medallia: https://www.medallia.com Catch the future of e-commerce at eTail Boston, August 11-14, 2025. Register now: https://bit.ly/etailboston and use code PARTNER20 for 20% off for retailers and brands Don't Miss MAICON 2025, October 14-16 in Cleveland - the event bringing together the brights minds and leading voices in AI. Use Code AGILE150 for $150 off registration. Go here to register: https://bit.ly/agile150 Connect with Greg on LinkedIn: https://www.linkedin.com/in/gregkihlstrom Don't miss a thing: get the latest episodes, sign up for our newsletter and more: https://www.theagilebrand.show Check out The Agile Brand Guide website with articles, insights, and Martechipedia, the wiki for marketing technology: https://www.agilebrandguide.com The Agile Brand podcast is brought to you by TEKsystems. Learn more here: https://www.teksystems.com/versionnextnow The Agile Brand is produced by Missing Link—a Latina-owned strategy-driven, creatively fueled production co-op. From ideation to creation, they craft human connections through intelligent, engaging and informative content. https://www.missinglink.company…

about a year ago 36:03

MP3•Laman utama episod

To discuss these techniques and the renewed enthusiasm for testing across the industry, host Lilly Ryan is joined by Shayan Mohanty, Head of AI Research at Thoughtworks, and John Singleton, Program Manager for Thoughtworks' AI Lab. They discuss the differences between evals, benchmarking and testing and explore both what they mean for businesses venturing into generative AI and how they can be implemented effectively.

Learn more about evals, benchmarks and testing in this blog post by Shayan and John (written with Parag Mahajani): https://www.thoughtworks.com/insights/blog/generative-ai/LLM-benchmarks,-evals,-and-tests

136 episod

#Tech #Careers #Business #Thoughtworks #Technology

Thoughtworks Technology Podcast

AI testing, benchmarks and evals

Thoughtworks Technology Podcast

83 subscribers

published about a year ago

Kongsi

MP3•Laman utama episod

136 episod

#Tech #Careers #Business #Thoughtworks #Technology

Semua episod

Thoughtworks Technology Podcast

1
We need to talk about vibe coding 36:53

12 days ago36:53

36:53

The term 'vibe coding' — which first appeared in a post on X by Andrej Karpathy in early February 2025 — has set the software development world abuzz: everyone seems to have their own take on what it is, how it's done and whether it's a bold new chapter in the history of programming or an insult to anyone that's ever written a line of code. Clearly, then, we need to talk about vibe coding — and that's precisely what we do on this episode of the Technology Podcast. Featuring Thoughtworkers Birgitta Böckeler (AI for Software Delivery Lead) and Lilly Ryan (Cybersecurity Principal), who join hosts Neal Ford and Prem Chandrasekaran, we dive into the different understandings and applications of the concept, and discuss what happens when a meme collides with reality.…

Thoughtworks Technology Podcast

1
Infrastructure as code in 2025 29:07

25 days ago29:07

29:07

Nearly ten years after the first edition of Infrastructure as Code was published by O'Reilly, Kief Morris is publishing a third edition of the book. But why a new edition now? What's changed in technology and business over the last decade? Quite a lot, as it happens. To talk about what's new — both in the infrastructure world and in the book itself — Kief Morris joins host Ken Mugrage on the Technology Podcast. They discuss each edition and what's new in this one, and dive into the infrastructure challenges and issues that need to be tackled in 2025, from tooling and deployment to maintenance and infrastructure evolution. Learn more about Infrastructure as Code, Third Edition: https://www.thoughtworks.com/en-gb/insights/books/infrastructure-as-code-3rd-ed…

Thoughtworks Technology Podcast

1
How fitness functions can help us govern and measure AI 42:01

6 weeks ago42:01

42:01

AI is inherently dynamic: that's true in terms of the field itself, and at a much lower level too — models are trained on new data and algorithms adapt and change to new circumstances and information. That's part of its power and what makes it so exciting, but from a business and organizational perspective, that can make governance and measurement exceptionally difficult. How can we know that our AI is optimized for the right thing? How can we be sure it's oriented towards what we want it to be? This is where the concept of fitness functions can help. Broadly speaking, fitness functions are ways of measuring the extent to which a given solution is fulfilling its goals — so, in the context of AI, they can help teams ensure that AI systems are serving their intended purpose. In this episode of the Technology Podcast, Rebecca Parsons and Neal Ford — authors (alongside Pat Kua and Pramod Sadalage) of Building Evolutionary Architectures, the book which brought fitness functions into the software architecture space — join host Ken Mugrage to explore how the fitness function concept can help us better manage the dynamism of AI and, in doing so, overcome the challenge of bringing such systems into production. Learn more about Building Evolutionary Architectures: https://www.thoughtworks.com/insights/books/building-evolutionaryarchitectures-second-edition…

Thoughtworks Technology Podcast

1
Architecture as code 43:28

8 weeks ago43:28

43:28

How can we better define and clarify architectures to ensure consistency and control? If, as Neal Ford and Mark Richards discussed on a recent episode of the Technology Podcast , software architecture intersects with many different facets of software development and delivery, what can we do to better manage architectures in a way that is adaptable and dynamic? Neal and Mark return to the guest seats to speak again to host Prem Chandrasekaran about fitness functions and architecture as code, and explain why rethinking our approach to software architecture can help ensure greater alignment with organizational needs and objectives.…

Thoughtworks Technology Podcast

1
Decoding DeepSeek 33:00

10 weeks ago33:00

33:00

The release of DeepSeek's AI models at the end of January 2025 sent shockwaves around the world. The weeks that followed have been rife with hype and rumor, ranging from suggestions that DeepSeek has completely upended the tech industry to claims the efficiency gains ostensibly unlocked by DeepSeek are exagerrated. So, what's the reality? And what does it all really mean for the tech industry? In this episode of the Technology Podcast, two of Thoughtworks' AI leaders — Prasanna Pendse (Global Director of AI Strategy) and Shayan Mohanty (Head of AI Research) — join hosts Prem Chandrasekaran and Ken Mugrage to provide a much-needed clear and sober perspective on DeepSeek. They dig into some of the technical details and discuss how the DeepSeek team was able to optimize the limited hardware at their disposal, and think through what the implications might be for the industry in the months to come. Read Prasanna's take on DeepSeek on the Thoughtworks blog: https://www.thoughtworks.com/insights/blog/generative-ai/demystifying-deepseek…

Thoughtworks Technology Podcast

1
AI testing, benchmarks and evals 36:03

12 weeks ago36:03

36:03

Generative AI's popularity has led to a renewed interest in quality assurance — perhaps unsurprising given the inherent unpredictability of the technology. This is why, over the last year, the field has seen a number of techniques and approaches emerge, including evals, benchmarking and guardrails. While these terms all refer to different things, grouped together they all aim to improve the reliability and accuracy of generative AI. To discuss these techniques and the renewed enthusiasm for testing across the industry, host Lilly Ryan is joined by Shayan Mohanty, Head of AI Research at Thoughtworks, and John Singleton, Program Manager for Thoughtworks' AI Lab. They discuss the differences between evals, benchmarking and testing and explore both what they mean for businesses venturing into generative AI and how they can be implemented effectively. Learn more about evals, benchmarks and testing in this blog post by Shayan and John (written with Parag Mahajani): https://www.thoughtworks.com/insights/blog/generative-ai/LLM-benchmarks,-evals,-and-tests…

Thoughtworks Technology Podcast

1
Exploring the intersections of software architecture 43:32

14 weeks ago43:32

43:32

Software architecture necessarily intersects with a diverse range of critical things, including implementation, infrastructure, data and engineering practices. All these elements require serious consideration and reflection if you're to architect effectively. To discuss these various intersections, Thoughtworks' Neal Ford and his long-time collaborator Mark Richards join host Prem Chandrasekaran on the Thoughtworks Technology Podcast. They dive into why these intersections matter, what they mean for software architects and how individuals and teams can go about addressing them.…

Thoughtworks Technology Podcast

1
Who should make software architecture decisions? 35:00

16 weeks ago35:00

35:00

Who should be involved in the process of making decisions about software architecture? That's a question that's been puzzling Thoughtworker Andrew Harmel-Law for some time — so much so that he decided to write a book about it. The result is Facilitating Software Architecture . Published by O'Reilly in December 2024, it's both an argument for and a guide to involving more people in the architecture decision process. To discuss the topic and the book, Andrew joined hosts Neal Ford and Prem Chandrasekaran on the Technology Podcast. They explore why including more roles in software architecture matters today, some of the common objections to and risks of such an approach, alongside techniques and practices that can make doing it in fast-paced and dynamic organizations easier. "It's quite magical when you see this blossoming of understanding of what it is that architects do... It's not less architecture, it's more. It's just happening in a broader sphere." — Andrew Harmel-Law You can find Andrew's book on the O'Reilly website: https://www.oreilly.com/library/view/facilitating-software-architecture/9781098151850/…

Thoughtworks Technology Podcast

1
Generative AI's uncanny valley: Problem or opportunity? 28:51

18 weeks ago28:51

28:51

With the rise of generative AI, the concept of the uncanny valley — where human resemblance unsettles, disturbs or disgusts — is more relevant than ever. But is it a problem that technologists need to tackle? Or does it offer an opportunity for greater thoughtfulness about the ways generative AI is being built, deployed and used? In this episode of the Technology Podcast, host Lilly Ryan is joined by Srinivasan Raguraman to discuss generative AI's uncanny valley and explore how it might offer a model for thinking through our expectations about generative AI outputs and effects. Taking in everything from the experiences of end users to the mental models engineers bring to AI development, listen for a wide-ranging dive into the implications of the uncanny valley in our experience of generative AI today. Read Srinivasan's recent article (written with Ken Mugrage): https://www.technologyreview.com/2024/10/24/1106110/reckoning-with-generative-ais-uncanny-valley/…

Thoughtworks Technology Podcast

1
Using generative AI for legacy modernization 33:19

20 weeks ago33:19

33:19

Legacy modernization is an enduring challenge — and as systems become more complex, the difficulty of understanding and modelling a system so it can be modernized only becomes more difficult. However, at Thoughtworks we've seen some recent success bringing generative AI into the legacy modernization process. To discuss what this means in practice and the benefits it can deliver, host Ken Mugrage is joined by Thoughtworks colleagues Shodhan Sheth and Tom Coggrave. Shodhan and Tom have been working together in this space in recent months and, in this episode of the Technology Podcast, offer their insights into finding success with this novel combination. They explain how it can be implemented, the challenges and experiments they did on their way to positive results and what it means for how teams and organizations think about modernization in the future. Read Shodhan and Tom's article on legacy modernization and generative AI (written with Alessio Ferri): https://martinfowler.com/articles/legacy-modernization-gen-ai.html…

Thoughtworks Technology Podcast

1
Data contracts: What are they and why do they matter? 37:38

22 weeks ago37:38

37:38

Data contracts are a bit like APIs for data — they make it possible to interface with data in a way that ensures the transfer of data from one place to another is stable and reliable. This is particularly important for building more reliable data-driven applications. To discuss data contracts, host Lilly Ryan is joined on the Technology Podcast by Andrew Jones, the creator of the data contract concept (in 2021) and author of Driving Data Quality with Data Contracts (2023), and Thoughtworker Ryan Collingwood who is currently writing their own book on data contracts due to be published in 2025. Andrew and Ryan offer their perspectives on the topic, explaining the origins and motivation for the idea and outlining how they can be used in practice. You can find Andrew’s book here: https://www.amazon.com/Driving-Data-Quality-Contracts-comprehensive/dp/1837635005…

Thoughtworks Technology Podcast

1
In conversation with Thomas Squeo, Thoughtworks CTO for the Americas 33:09

24 weeks ago33:09

33:09

What does it mean to be a technology leader today? What kind of challenges must you address? What questions do you need to answer? To explore all that — and dive into what it looks like from a Thoughtworks perspective — host Ken Mugrage spoke to Thomas Squeo, the CTO for Thoughtworks in the Americas. They discuss everything from keeping track of emerging technologies and wider industry shifts, to product thinking, AI and career development. Listen to get to know a Thoughtworks leader and discover fresh perspectives on some of the big questions and debates all of us in tech keep finding ourselves returning to. Find Thomas on X: @squeot LinkedIn: https://www.linkedin.com/in/thomassqueo/…

Thoughtworks Technology Podcast

1
Themes from Technology Radar Vol.31 39:39

26 weeks ago39:39

39:39

Volume 31 of the Technology Radar will be released on October 23, 2024. As always, it will feature 100+ technologies and techniques that we've been using with clients around the world. Alongside them will be a set of key themes that emerged during the process of putting it together. We think they offer another way into the Radar and give a unique insight on some of the most interesting issues impacting the software industry. In this episode of the Technology Podcast we discuss them: coding assistance antipatterns, Rust being anything but rusty, the rise of WebAssembly and what we describe as the "cambrian explosion of generative AI tools." To do so, Alexey Boas is joined by guests and podcast regulars Ken Mugrage and Neal Ford. Ken and Neal provide an insight into the conversations that happened during the process, and offer their perspective on the implications of these themes for the wider tech industry.…

Thoughtworks Technology Podcast

1
Build Your Own Radar: Using the Technology Radar as a governance tool 37:11

28 weeks ago37:11

37:11

The Thoughtworks Technology Radar is, first and foremost, a publication. It's a document that anyone in the tech industry can read twice a year to learn about our experiences and perspectives on technology. However, it's also more than that: it's built on top of a process of deliberation, discussion and curation. We think that's particularly important — it's something we encourage technology teams and organizations to do and which we support with our Build Your Own Radar tool. On this episode of the Technology Podcast, Neal Ford and Ken Mugrage join Prem Chandrasekaran to discuss Build Your Own Radar. They outline why the Radar process is just as important as the artifact that gets created at the end, and explain how organizations can use it to facilitate conversations about how and what technology they use and want to use in the future. Learn more about Build Your Own Radar: https://www.thoughtworks.com/radar/byor…

Thoughtworks Technology Podcast

1
Exploring DuckDB: A relational database built for online analytical processing 35:26

30 weeks ago35:26

35:26

There are no shortage of options when it comes to relational databases. While the likes of PostgreSQL have proven enduring, even as the market has evolved, for data scientists and data engineers that need to manage and query particularly complex or large data sets, the most popular databases aren't always right for the job. Thankfully, this is where projects like DuckDB can help. Built for what's called 'vectorized query execution', it's well-suited to the demands of online analytical processing (OLAP). To get a deeper understanding of DuckDB and how the product has developed, on this episode of the Technology Podcast, hosts Ken Mugrage and Lilly Ryan are joined by Thoughtworker Ned Letcher and Thoughtworks alumnus Simon Aubury. Ned and Simon explain the thinking behind DuckDB, the design decisions made by the project and how its being used by data practitioners in the wild. Learn more about DuckDB: https://duckdb.org/why_duckdb.html Explore Ned and Simon's book Getting Started with DuckDB : https://www.amazon.com/Getting-Started-DuckDB-practical-efficiently/dp/1803241004…

Selamat datang ke Player FM

Player FM mengimbas laman-laman web bagi podcast berkualiti tinggi untuk anda nikmati sekarang. Ia merupakan aplikasi podcast terbaik dan berfungsi untuk Android, iPhone, dan web. Daftar untuk melaraskan langganan merentasi peranti.