Artwork

Kandungan disediakan oleh Daniel Filan. Semua kandungan podcast termasuk episod, grafik dan perihalan podcast dimuat naik dan disediakan terus oleh Daniel Filan atau rakan kongsi platform podcast mereka. Jika anda percaya seseorang menggunakan karya berhak cipta anda tanpa kebenaran anda, anda boleh mengikuti proses yang digariskan di sini https://ms.player.fm/legal.
Player FM - Aplikasi Podcast
Pergi ke luar talian dengan aplikasi Player FM !

18 - Concept Extrapolation with Stuart Armstrong

1:46:19
 
Kongsi
 

Manage episode 340068925 series 2844728
Kandungan disediakan oleh Daniel Filan. Semua kandungan podcast termasuk episod, grafik dan perihalan podcast dimuat naik dan disediakan terus oleh Daniel Filan atau rakan kongsi platform podcast mereka. Jika anda percaya seseorang menggunakan karya berhak cipta anda tanpa kebenaran anda, anda boleh mengikuti proses yang digariskan di sini https://ms.player.fm/legal.

Concept extrapolation is the idea of taking concepts an AI has about the world - say, "mass" or "does this picture contain a hot dog" - and extending them sensibly to situations where things are different - like learning that the world works via special relativity, or seeing a picture of a novel sausage-bread combination. For a while, Stuart Armstrong has been thinking about concept extrapolation and how it relates to AI alignment. In this episode, we discuss where his thoughts are at on this topic, what the relationship to AI alignment is, and what the open questions are.

Topics we discuss, and timestamps:

- 00:00:44 - What is concept extrapolation

- 00:15:25 - When is concept extrapolation possible

- 00:30:44 - A toy formalism

- 00:37:25 - Uniqueness of extrapolations

- 00:48:34 - Unity of concept extrapolation methods

- 00:53:25 - Concept extrapolation and corrigibility

- 00:59:51 - Is concept extrapolation possible?

- 01:37:05 - Misunderstandings of Stuart's approach

- 01:44:13 - Following Stuart's work

The transcript: axrp.net/episode/2022/09/03/episode-18-concept-extrapolation-stuart-armstrong.html

Stuart's startup, Aligned AI: aligned-ai.com

Research we discuss:

- The Concept Extrapolation sequence: alignmentforum.org/s/u9uawicHx7Ng7vwxA

- The HappyFaces benchmark: github.com/alignedai/HappyFaces

- Goal Misgeneralization in Deep Reinforcement Learning: arxiv.org/abs/2105.14111

  continue reading

39 episod

Artwork
iconKongsi
 
Manage episode 340068925 series 2844728
Kandungan disediakan oleh Daniel Filan. Semua kandungan podcast termasuk episod, grafik dan perihalan podcast dimuat naik dan disediakan terus oleh Daniel Filan atau rakan kongsi platform podcast mereka. Jika anda percaya seseorang menggunakan karya berhak cipta anda tanpa kebenaran anda, anda boleh mengikuti proses yang digariskan di sini https://ms.player.fm/legal.

Concept extrapolation is the idea of taking concepts an AI has about the world - say, "mass" or "does this picture contain a hot dog" - and extending them sensibly to situations where things are different - like learning that the world works via special relativity, or seeing a picture of a novel sausage-bread combination. For a while, Stuart Armstrong has been thinking about concept extrapolation and how it relates to AI alignment. In this episode, we discuss where his thoughts are at on this topic, what the relationship to AI alignment is, and what the open questions are.

Topics we discuss, and timestamps:

- 00:00:44 - What is concept extrapolation

- 00:15:25 - When is concept extrapolation possible

- 00:30:44 - A toy formalism

- 00:37:25 - Uniqueness of extrapolations

- 00:48:34 - Unity of concept extrapolation methods

- 00:53:25 - Concept extrapolation and corrigibility

- 00:59:51 - Is concept extrapolation possible?

- 01:37:05 - Misunderstandings of Stuart's approach

- 01:44:13 - Following Stuart's work

The transcript: axrp.net/episode/2022/09/03/episode-18-concept-extrapolation-stuart-armstrong.html

Stuart's startup, Aligned AI: aligned-ai.com

Research we discuss:

- The Concept Extrapolation sequence: alignmentforum.org/s/u9uawicHx7Ng7vwxA

- The HappyFaces benchmark: github.com/alignedai/HappyFaces

- Goal Misgeneralization in Deep Reinforcement Learning: arxiv.org/abs/2105.14111

  continue reading

39 episod

Tutti gli episodi

×
 
Loading …

Selamat datang ke Player FM

Player FM mengimbas laman-laman web bagi podcast berkualiti tinggi untuk anda nikmati sekarang. Ia merupakan aplikasi podcast terbaik dan berfungsi untuk Android, iPhone, dan web. Daftar untuk melaraskan langganan merentasi peranti.

 

Panduan Rujukan Pantas

Podcast Teratas