Do we Need the Mamba Mindset when LLMs Fail? MoE Mamba and SSMs

OVERFIT: AI, Machine Learning, and Deep Learning Made Simple

OVERFIT: AI, Machine Learning, and Deep Learning Made Simple

Player FM - Internet Radio Done Right

Ditambah twenty-nine minggu yang lalu
Looks like the publisher may have taken this series offline or changed its URL. Please contact support if you believe it should be working, the feed URL is invalid, or you have any other concerns about it.

Kandungan disediakan oleh Brian Carter. Semua kandungan podcast termasuk episod, grafik dan perihalan podcast dimuat naik dan disediakan terus oleh Brian Carter atau rakan kongsi platform podcast mereka. Jika anda percaya seseorang menggunakan karya berhak cipta anda tanpa kebenaran anda, anda boleh mengikuti proses yang digariskan di sini https://ms.player.fm/legal.

<div class="span index">1</div> <span><a class="" data-remote="true" data-type="html" href="/series/all-about-change">All About Change</a></span>

1
All About Change

Batal langganan

14 days ago14d ago

Batal langganan

Bulanan+

How do we build an inclusive world? Hear intimate and in-depth conversations with changemakers on disability rights, youth mental health advocacy, prison reform, grassroots activism, and more. First-hand stories about activism, change, and courage from people who are changing the world: from how a teen mom became the Planned Parenthood CEO, to NBA player Kevin Love on mental health in professional sports, to Beetlejuice actress Geena Davis on Hollywood’s role in women’s rights. All About Change is hosted by Jay Ruderman, whose life’s work is seeking social justice and inclusion for people with disabilities worldwide. Join Jay as he interviews iconic guests who have gone through adversity and harnessed their experiences to better the world. This show ultimately offers the message of hope that we need to keep going. All About Change is a production of the Ruderman Family Foundation. Listen and subscribe to All About Change wherever you get podcasts. https://allaboutchangepodcast.com/

about a year ago 11:57

MP3•Laman utama episod

Fetch error

Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on November 09, 2024 13:09 (6M ago)

What now? This series will be checked again in the next day. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.

The research paper "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" explores a novel approach to language modeling by combining State Space Models (SSMs), which offer linear-time inference and strong performance in long-context tasks, with Mixture of Experts (MoE), a technique that scales model parameters while minimizing computational demands. The authors introduce MoE-Mamba, a model that interleaves Mamba, a recent SSM-based model, with MoE layers, resulting in significant performance gains and training efficiency. They demonstrate that MoE-Mamba outperforms both Mamba and standard Transformer-MoE architectures. The paper also explores different design choices for integrating MoE within Mamba, showcasing promising directions for future research in scaling language models beyond tens of billions of parameters.

Read it: https://arxiv.org/abs/2401.04081

71 episod

OVERFIT: AI, Machine Learning, and Deep Learning Made Simple