Brand Logo
Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

By Machine Learning Street Talk (MLST)

Welcome! We engage in fascinating discussions with pre-eminent figures in the AI field. Our flagship show covers current affairs in AI, cognitive science, neuroscience and philosophy of mind with in-depth analysis. Our approach is unrivalled in terms of scope and rigour – we believe in intellectual diversity in AI, and we touch on all of the main ideas in the field with the hype surgically removed. MLST is run by Tim Scarfe, Ph.D (https://www.linkedin.com/in/ecsquizor/) and features regular appearances from MIT Doctor of Philosophy Keith Duggar (https://www.linkedin.com/in/dr-keith-duggar/).
Available on
Apple Podcasts Logo
Overcast Logo
Pocket Casts Logo
Spotify Logo
Currently playing episode

#53 Quantum Natural Language Processing - Prof. Bob Coecke (Oxford)

Machine Learning Street Talk (MLST)May 19, 2021
00:00
02:17:39
 The AI Models Smart Enough to Know They're Cheating — Beth Barnes & David Rein [METR]

The AI Models Smart Enough to Know They're Cheating — Beth Barnes & David Rein [METR]

Beth Barnes and David Rein on the one graph that ate the AI timelines discourse, and why the two people who built it are the most careful about how you read it.**SPONSOR**Prolific - Quality data. From real people. For faster breakthroughs.https://www.prolific.com/?utm_source=mlstInterview: https://youtu.be/cnxZZTl1tkk---Beth Barnes and David Rein from METR on the one graph that ate the AI timelines discourse, and why the people who built it are the most careful about how it gets read.Beth founded METR after leaving OpenAI alignment. David is first author on GPQA and co-author on HCAST and the METR Time Horizons paper. Together they built the measurement Daniel Kokotajlo called the single most important piece of evidence on AI timelines: the log-linear line of "how long a task a frontier model can complete at 50% reliability" vs release date.The conversation opens on reward hacking. Current models can articulate in chat why a behaviour is undesired and then execute it anyway as agents. From there: construct validity, Melanie Mitchell's four-problem taxonomy, and the ARC-AGI 1-to-2 collapse as a worked example of adversarially-selected benchmarks regressing once labs target them. Beth's counter: METR deliberately does not adversarially select. David's: models do not have to do the right thing for the right reasons.Methodology, then specification — David's compiler analogy, Beth on four-month tasks as expensive to evaluate rather than unspecifiable. Then the SWE-bench reality check, the METR finding that half of passing PRs would not be merged, and Beth's horses-versus-bank-tellers analogy for the labour market.The close: monitorability, the coin-spinning boat, two-year recursive self-improvement, and Beth's line that "overhyped now" and "big deal later" are not correlated claims.---TIMESTAMPS:00:00:00 Intro00:02:06 Sponsor break: Prolific human-feedback infrastructure00:02:33 Welcome and the scalable oversight motivation00:06:02 Construct validity, benchmark pathologies and the Chollet worry00:15:45 Time Horizons: human time, HCAST tasks and the 50% logistic00:24:50 Is human difficulty really one variable?00:33:05 Agent harness evolution and the inference-compute dividend00:40:00 Scaffolding bells, token budgets and the credit-assignment problem00:44:15 Look at the damn graph: regularisation bug and reliability nuance00:50:00 Why 50%? Reliability, reward hacking and pizza-party transcripts00:55:20 Extrapolation risk and straight lines on graphs00:59:25 Software engineering as a specification acquisition problem01:07:40 Compilers also made ugly code: vibe-coding quality and Claude on METR Slack01:15:15 Strongest defensible claim, Carlini's compiler swarm and AI 202701:23:45 SWE-bench merge rates, the bank-teller analogy and horses01:31:45 Scheming, alignment faking and the mentalistic vocabulary problem01:40:45 Reward hacking, monitorability and chain-of-thought faithfulness01:45:25 Recursive self-improvement, knowledge vs intelligence and closing

ReScript: https://app.rescript.info/public/share/de3bb40cc02ee39fdf36e2c60366eb4d

(PDF, refs, transcript etc)

May 04, 202601:53:27
When AI Discovers The Next Transformer - Robert Lange (Sakana)

When AI Discovers The Next Transformer - Robert Lange (Sakana)

Robert Lange, founding researcher at Sakana AI, joins Tim to discuss *Shinka Evolve* — a framework that combines LLMs with evolutionary algorithms to do open-ended program search. The core claim: systems like AlphaEvolve can optimize solutions to fixed problems, but real scientific progress requires co-evolving the problems themselves.


GTC is coming, the premier AI conference, great opportunity to learn about AI. NVIDIA and partners will showcase breakthroughs in physical AI, AI factories, agentic AI, and inference, exploring the next wave of AI innovation for developers and researchers. Register for virtual GTC for free, using my link and win NVIDIA DGX Spark (https://nvda.ws/4qQ0LMg)


• Why AlphaEvolve gets stuck — it needs a human to hand it the right problem. Shinka tries to invent new problems automatically, drawing on ideas from POET, PowerPlay, and MAP-Elites quality-diversity search.


• The *architecture* of Shinka: an archive of programs organized as islands, LLMs used as mutation operators, and a UCB bandit that adaptively selects between frontier models (GPT-5, Sonnet 4.5, Gemini) mid-run. The credit-assignment problem across models turns out to be genuinely hard.


• Concrete results — state-of-the-art circle packing with dramatically fewer evaluations, second place in an AtCoder competitive programming challenge, evolved load-balancing loss functions for mixture-of-experts models, and agent scaffolds for AIME math benchmarks.


• Are these systems actually thinking outside the box, or are they parasitic on their starting conditions? When LLMs run autonomously, "nothing interesting happens." Robert pushes back with the stepping-stone argument — evolution doesn't need to extrapolate, just recombine usefully.


• The AI Scientist question: can automated research pipelines produce real science, or just workshop-level slop that passes surface-level review? Robert is honest that the current version is more co-pilot than autonomous researcher.


• Where this lands in 5-20 years — Robert's prediction that scientific research will be fundamentally transformed, and Tim's thought experiment about alien mathematical artifacts that no human could have conceived.


Robert Lange: https://roberttlange.com/


---

TIMESTAMPS:

00:00:00 Introduction: Robert Lange, Sakana AI and Shinka Evolve

00:04:15 AlphaEvolve's Blind Spot: Co-Evolving Problems with Solutions

00:09:05 Unknown Unknowns, POET, and Auto-Curricula for AI Science

00:14:20 MAP-Elites and Quality-Diversity: Shinka's Evolutionary Architecture

00:28:00 UCB Bandits, Mutations and the Vibe Research Vision

00:40:00 Scaling Shinka: Meta-Evolution, Democratisation and the Three-Axis Model

00:47:10 Applications, ARC-AGI and the Future of Work

00:57:00 The AI Scientist and the Human Co-Pilot: Who Steers the Search?

01:06:00 AI Scientist v2, Slop Critique and the Future of Scientific Publishing


---

REFERENCES:

paper:

[00:03:30] ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution

https://arxiv.org/abs/2509.19349

[00:04:15] AlphaEvolve: A Coding Agent for Scientific and Algorithmic Discovery

https://arxiv.org/abs/2506.13131

[00:06:30] Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents

https://arxiv.org/abs/2505.22954

[00:09:05] Paired Open-Ended Trailblazer (POET)

https://arxiv.org/abs/1901.01753

[00:10:00] PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem

https://arxiv.org/abs/1112.5309

[00:10:40] Automated Capability Discovery via Foundation Model Self-Exploration

https://arxiv.org/abs/2502.07577

[00:15:30] Illuminating Search Spaces by Mapping Elites (MAP-Elites)

https://arxiv.org/abs/1504.04909

[00:47:10] Automated Design of Agentic Systems (ADAS)

https://arxiv.org/abs/2408.08435


PDF : https://app.rescript.info/api/sessions/b8a9dcf60623657c/pdf/download

Transcript: https://app.rescript.info/public/share/SDOD_3oXOcli3zTqcAtR8eibT5U3gam84oo4KRtI-Vk

Mar 13, 202601:18:07
"Vibe Coding is a Slot Machine" - Jeremy Howard

"Vibe Coding is a Slot Machine" - Jeremy Howard

Dive into the realities of AI-assisted coding, the origins of modern fine-tuning, and the cognitive science behind machine learning with fast.ai founder Jeremy Howard. In this episode, we unpack why AI might be turning software engineering into a slot machine and how to maintain true technical intuition in the age of large language models.


GTC is coming, the premier AI conference, great opportunity to learn about AI. NVIDIA and partners will showcase breakthroughs in physical AI, AI factories, agentic AI, and inference, exploring the next wave of AI innovation for developers and researchers. Register for virtual GTC for free, using my link and win NVIDIA DGX Spark (https://nvda.ws/4qQ0LMg)


Jeremy Howard is a renowned data scientist, researcher, entrepreneur, and educator. As the co-founder of fast.ai, former President of Kaggle, and the creator of ULMFiT, Jeremy has spent decades democratizing deep learning. His pioneering work laid the foundation for modern transfer learning and the pre-training and fine-tuning paradigm that powers today's language models.


Key Topics and Main Insights Discussed:


- The Origins of ULMFiT and Fine-Tuning

- The Vibe Coding Illusion and Software Engineering

- Cognitive Science, Friction, and Learning

- The Future of Developers


RESCRIPT: https://app.rescript.info/public/share/BhX5zP3b0m63srLOQDKBTFTooSzEMh_ARwmDG_h_izk


Jeremy Howard:

https://x.com/jeremyphoward

https://www.answer.ai/


---

TIMESTAMPS (fixed):

00:00:00 Introduction & GTC Sponsor

00:04:30 ULMFiT & The Birth of Fine-Tuning

00:12:00 Intuition & The Mechanics of Learning

00:18:30 Abstraction Hierarchies & AI Creativity

00:23:00 Claude Code & The Interpolation Illusion

00:27:30 Coding vs. Software Engineering

00:30:00 Cosplaying Intelligence: Dennett vs. Searle

00:36:30 Automation, Radiology & Desirable Difficulty

00:42:30 Organizational Knowledge & The Slope

00:48:00 Vibe Coding as a Slot Machine

00:54:00 The Erosion of Control in Software

01:01:00 Interactive Programming & REPL Environments

01:05:00 The Notebook Debate & Exploratory Science

01:17:30 AI Existential Risk & Power Centralization

01:24:20 Current Risks, Privacy & Enfeeblement


---

REFERENCES:

Blog Post:

[00:03:00] fast.ai Blog: Self-Supervised Learning

https://www.fast.ai/posts/2020-01-13-self_supervised.html

[00:13:30] DeepMind Blog: Gemini Deep Think

https://deepmind.google/blog/accelerating-mathematical-and-scientific-discovery-with-gemini-deep-think/

[00:19:30] Modular Blog: Claude C Compiler analysis

https://www.modular.com/blog/the-claude-c-compiler-what-it-reveals-about-the-future-of-software

[00:19:45] Anthropic Engineering Blog: Building C Compiler

https://www.anthropic.com/engineering/building-c-compiler

[00:48:00] Cursor Blog: Scaling Agents

https://cursor.com/blog/scaling-agents

[01:05:15] fast.ai Blog: NB Dev Merged Driver

https://www.fast.ai/posts/2022-08-25-jupyter-git.html

[01:17:30] Jeremy Howard: Response to AI Risk Letter

https://www.normaltech.ai/p/is-avoiding-extinction-from-ai-really

Book:

[00:08:30] M. Chirimuuta: The Brain Abstracted

https://mitpress.mit.edu/9780262548045/the-brain-abstracted/

[00:30:00] Daniel Dennett: Consciousness Explained

https://www.amazon.com/Consciousness-Explained-Daniel-C-Dennett/dp/0316180661

[00:42:30] Cesar Hidalgo: Infinite Alphabet / Laws of Knowledge

https://www.amazon.com/Infinite-Alphabet-Laws-Knowledge/dp/0241655676

Archive Article:

[00:13:45] MLST Archive: Why Creativity Cannot Be Interpolated

https://archive.mlst.ai/read/why-creativity-cannot-be-interpolated

Research Study:

[00:24:30] METR Study: AI OS Development

https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

Paper:

[00:24:45] Fred Brooks: No Silver Bullet

https://www.cs.unc.edu/techreports/86-020.pdf

[00:30:15] John Searle: Minds, Brains, and Programs

https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/article/minds-brains-and-programs/DC644B47A4299C637C89772FACC2706A


Mar 03, 202601:26:40
 Evolution "Doesn't Need" Mutation - Blaise Agüera y Arcas

Evolution "Doesn't Need" Mutation - Blaise Agüera y Arcas

What if life itself is just a really sophisticated computer program that wrote itself into existence?


Blaise Agüera y Arcas presenting at ALife 2025 — the most technically detailed public walkthrough of the ideas in his *What is Life?* and *What is Intelligence?* books that we've come across.He covers the BFF experiments (self-replicating programs emerging spontaneously from random noise), the mathematical framework connecting Lotka-Volterra population dynamics with Smoluchowski coagulation, eigenvalue analysis of cooperation matrices, and his central claim that symbiogenesis — not mutation — is the primary engine of evolutionary novelty.The experimental results are genuinely striking: complex self-replicating code arising from random byte strings with zero mutation, a sharp phase transition that looks like gelation, and a proof that blocking deep symbiogenetic ancestry trees prevents the transition entirely.A few things worth flagging for critical viewers:— The substrate is more carefully engineered than the framing sometimes suggests. The choice of language, tape length, interaction protocol, and step limits all shape what emerges. Their own SUBLEQ counterexample (where self-replicators *don't* arise despite being theoretically possible) highlights that these design choices matter substantially — and a general theory of which substrates support this transition is still missing.— The leap from "self-replicating programs on fixed-length tapes" to "life was computational and intelligent from the start" involves significant philosophical extrapolation beyond what the experiments directly demonstrate.— The Bedau et al. (2000) open problems paper he references at the start actually sets a higher bar for Challenge 3.2 than BFF currently meets: it asks that "the internal organization of these 'organisms' and the boundaries separating them from their environment arise and be sustained through the activities of lower-level primitives" — whereas BFF's tape boundaries are fixed by design, not emergent.

---

TIMESTAMPS:

00:00:00 Introduction: From Noise to Programs & ALife History

00:03:15 Defining Life: Function as the "Spirit"

00:05:45 Von Neumann's Insight: Life is Embodied Computation

00:09:15 Physics of Computation: Irreversibility & Fallacies

00:15:00 The BFF Experiment: Spontaneous Generation of Code

00:23:45 The Mystery: Complexity Growth Without Mutation

00:27:00 Symbiogenesis: The Engine of Novelty

00:33:15 Mathematical Proof: Blocking Symbiosis Stops Life

00:40:15 Evolutionary Implications: It's Symbiogenesis All The Way Down

00:44:30 Intelligence as Modeling Others

00:46:49 Q&A: Levels of Abstraction & Definitions


---

REFERENCES:

Paper:

[00:01:16] Open Problems in Artificial Life

https://direct.mit.edu/artl/article/6/4/363/2354/Open-Problems-in-Artificial-Life

[00:09:30] When does a physical system compute?

https://arxiv.org/abs/1309.7979

[00:15:00] Computational Life

https://arxiv.org/abs/2406.19108

[00:27:30] On the Origin of Mitosing Cells

https://pubmed.ncbi.nlm.nih.gov/11541392/

[00:42:00] The Major Evolutionary Transitions

https://www.nature.com/articles/374227a0

[00:44:00] The ARC gene

https://www.nih.gov/news-events/news-releases/memory-gene-goes-viral

Person:

[00:05:45] Alan Turing

https://plato.stanford.edu/entries/turing/

[00:07:30] John von Neumann

https://en.wikipedia.org/wiki/John_von_Neumann

[00:11:15] Hector Zenil

https://hectorzenil.net/

[00:12:00] Robert Sapolsky

https://profiles.stanford.edu/robert-sapolsky



---

LINKS:

RESCRIPT: https://app.rescript.info/public/share/ff7gb6HpezOR3DF-gr9-rCoMFzzEgUjLQK6voV5XVWY

Feb 16, 202655:48
VAEs Are Energy-Based Models? [Dr. Jeff Beck]

VAEs Are Energy-Based Models? [Dr. Jeff Beck]

What makes something truly *intelligent?* Is a rock an agent? Could a perfect simulation of your brain actually *be* you? In this fascinating conversation, Dr. Jeff Beck takes us on a journey through the philosophical and technical foundations of agency, intelligence, and the future of AI.


Jeff doesn't hold back on the big questions. He argues that from a purely mathematical perspective, there's no structural difference between an agent and a rock – both execute policies that map inputs to outputs. The real distinction lies in *sophistication* – how complex are the internal computations? Does the system engage in planning and counterfactual reasoning, or is it just a lookup table that happens to give the right answers?


*Key topics explored in this conversation:*


*The Black Box Problem of Agency* – How can we tell if something is truly planning versus just executing a pre-computed response? Jeff explains why this question is nearly impossible to answer from the outside, and why the best we can do is ask which model gives us the simplest explanation.


*Energy-Based Models Explained* – A masterclass on how EBMs differ from standard neural networks. The key insight: traditional networks only optimize weights, while energy-based models optimize *both* weights and internal states – a subtle but profound distinction that connects to Bayesian inference.


*Why Your Brain Might Have Evolved from Your Nose* – One of the most surprising moments in the conversation. Jeff proposes that the complex, non-smooth nature of olfactory space may have driven the evolution of our associative cortex and planning abilities.


*The JEPA Revolution* – A deep dive into Yann LeCun's Joint Embedding Prediction Architecture and why learning in latent space (rather than predicting every pixel) might be the key to more robust AI representations.


*AI Safety Without Skynet Fears* – Jeff takes a refreshingly grounded stance on AI risk. He's less worried about rogue superintelligences and more concerned about humans becoming "reward function selectors" – couch potatoes who just approve or reject AI outputs. His proposed solution? Use inverse reinforcement learning to derive AI goals from observed human behavior, then make *small* perturbations rather than naive commands like "end world hunger."


Whether you're interested in the philosophy of mind, the technical details of modern machine learning, or just want to understand what makes intelligence *tick,* this conversation delivers insights you won't find anywhere else.


---

TIMESTAMPS:

00:00:00 Geometric Deep Learning & Physical Symmetries

00:00:56 Defining Agency: From Rocks to Planning

00:05:25 The Black Box Problem & Counterfactuals

00:08:45 Simulated Agency vs. Physical Reality

00:12:55 Energy-Based Models & Test-Time Training

00:17:30 Bayesian Inference & Free Energy

00:20:07 JEPA, Latent Space, & Non-Contrastive Learning

00:27:07 Evolution of Intelligence & Modular Brains

00:34:00 Scientific Discovery & Automated Experimentation

00:38:04 AI Safety, Enfeeblement & The Future of Work


---

REFERENCES:

Concept:

[00:00:58] Free Energy Principle (FEP)

https://en.wikipedia.org/wiki/Free_energy_principle

[00:06:00] Monte Carlo Tree Search

https://en.wikipedia.org/wiki/Monte_Carlo_tree_search

Book:

[00:09:00] The Intentional Stance

https://mitpress.mit.edu/9780262540537/the-intentional-stance/

Paper:

[00:13:00] A Tutorial on Energy-Based Learning (LeCun 2006)

http://yann.lecun.com/exdb/publis/pdf/lecun-06.pdf

[00:15:00] Auto-Encoding Variational Bayes (VAE)

https://arxiv.org/abs/1312.6114

[00:20:15] JEPA (Joint Embedding Prediction Architecture)

https://openreview.net/forum?id=BZ5a1r-kVsf

[00:22:30] The Wake-Sleep Algorithm

https://www.cs.toronto.edu/~hinton/absps/ws.pdf


---

RESCRIPT:

https://app.rescript.info/public/share/DJlSbJ_Qx080q315tWaqMWn3PixCQsOcM4Kf1IW9_Eo

PDF:

https://app.rescript.info/api/public/sessions/0efec296b9b6e905/pdf

Jan 25, 202646:57
Abstraction & Idealization: AI's Plato Problem [Mazviita Chirimuuta]

Abstraction & Idealization: AI's Plato Problem [Mazviita Chirimuuta]

Professor Mazviita Chirimuuta joins us for a fascinating deep dive into the philosophy of neuroscience and what it really means to understand the mind.*What can neuroscience actually tell us about how the mind works?* In this thought-provoking conversation, we explore the hidden assumptions behind computational theories of the brain, the limits of scientific abstraction, and why the question of machine consciousness might be more complicated than AI researchers assume.Mazviita, author of *The Brain Abstracted,* brings a unique perspective shaped by her background in both neuroscience research and philosophy. She challenges us to think critically about the metaphors we use to understand cognition — from the reflex theory of the late 19th century to today's dominant view of the brain as a computer.*Key topics explored:**The problem of oversimplification* — Why scientific models necessarily leave things out, and how this can sometimes lead entire fields astray. The cautionary tale of reflex theory shows how elegant explanations can blind us to biological complexity.*Is the brain really a computer?* — Mazviita unpacks the philosophical assumptions behind computational neuroscience and asks: if we can model anything computationally, what makes brains special? The answer might challenge everything you thought you knew about AI.*Haptic realism* — A fresh way of thinking about scientific knowledge that emphasizes interaction over passive observation. Knowledge isn't about reading the "source code of the universe" — it's something we actively construct through engagement with the world.*Why embodiment matters for understanding* — Can a disembodied language model truly understand? Mazviita makes a compelling case that human cognition is deeply entangled with our sensory-motor engagement and biological existence in ways that can't simply be abstracted away.*Technology and human finitude* — Drawing on Heidegger, we discuss how the dream of transcending our physical limitations through technology might reflect a fundamental misunderstanding of what it means to be a knower.This conversation is essential viewing for anyone interested in AI, consciousness, philosophy of mind, or the future of cognitive science. Whether you're skeptical of strong AI claims or a true believer in machine consciousness, Mazviita's careful philosophical analysis will give you new tools for thinking through these profound questions.---TIMESTAMPS:00:00:00 The Problem of Generalizing Neuroscience00:02:51 Abstraction vs. Idealization: The "Kaleidoscope"00:05:39 Platonism in AI: Discovering or Inventing Patterns?00:09:42 When Simplification Fails: The Reflex Theory00:12:23 Behaviorism and the "Black Box" Trap00:14:20 Haptic Realism: Knowledge Through Interaction00:20:23 Is Nature Protean? The Myth of Converging Truth00:23:23 The Computational Theory of Mind: A Useful Fiction?00:27:25 Biological Constraints: Why Brains Aren't Just Neural Nets00:31:01 Agency, Distal Causes, and Dennett's Stances00:37:13 Searle's Challenge: Causal Powers and Understanding00:41:58 Heidegger's Warning & The Experiment on Children---REFERENCES:Book:[00:01:28] The Brain Abstractedhttps://mitpress.mit.edu/9780262548045/the-brain-abstracted/[00:11:05] The Integrated Action of the Nervous Systemhttps://www.amazon.sg/integrative-action-nervous-system/dp/9354179029[00:18:15] The Quest for Certainty (Dewey)https://www.amazon.com/Quest-Certainty-Relation-Knowledge-Lectures/dp/0399501916[00:19:45] Realism for Realistic People (Chang)https://www.cambridge.org/core/books/realism-for-realistic-people/ACC93A7F03B15AA4D6F3A466E3FC5AB7---RESCRIPT:https://app.rescript.info/public/share/A6cZ1TY35p8ORMmYCWNBI0no9ChU3-Kx7dPXGJURvZ0PDF Transcript:https://app.rescript.info/api/public/sessions/0fb7767e066cf712/pdf

Jan 23, 202653:38
Why Every Brain Metaphor in History Has Been Wrong [SPECIAL EDITION]

Why Every Brain Metaphor in History Has Been Wrong [SPECIAL EDITION]

What if everything we think we know about the brain is just a really good metaphor that we forgot was a metaphor?This episode takes you on a journey through the history of scientific simplification, from a young Karl Friston watching wood lice in his garden to the bold claims that your mind is literally software running on biological hardware.We bring together some of the most brilliant minds we've interviewed — Professor Mazviita Chirimuuta, Francois Chollet, Joscha Bach, Professor Luciano Floridi, Professor Noam Chomsky, Nobel laureate John Jumper, and more — to wrestle with a deceptively simple question: *When scientists simplify reality to study it, what gets captured and what gets lost?**Key ideas explored:**The Spherical Cow Problem* — Science requires simplification. We're limited creatures trying to understand systems far more complex than our working memory can hold. But when does a useful model become a dangerous illusion?*The Kaleidoscope Hypothesis* — Francois Chollet's beautiful idea that beneath all the apparent chaos of reality lies simple, repeating patterns — like bits of colored glass in a kaleidoscope creating infinite complexity. Is this profound truth or Platonic wishful thinking?*Is Software Really Spirit?* — Joscha Bach makes the provocative claim that software is literally spirit, not metaphorically. We push back on this, asking whether the "sameness" we see across different computers running the same program exists in nature or only in our descriptions.*The Cultural Illusion of AGI* — Why does artificial general intelligence seem so inevitable to people in Silicon Valley? Professor Chirimuuta suggests we might be caught in a "cultural historical illusion" — our mechanistic assumptions about minds making AI seem like destiny when it might just be a bet.*Prediction vs. Understanding* — Nobel Prize winner John Jumper: AI can predict and control, but understanding requires a human in the loop. Throughout history, we've described the brain as hydraulic pumps, telegraph networks, telephone switchboards, and now computers. Each metaphor felt obviously true at the time. This episode asks: what will we think was naive about our current assumptions in fifty years?Featuring insights from *The Brain Abstracted* by Mazviita Chirimuuta — possibly the most influential book on how we think about thinking in 2025.---TIMESTAMPS:00:00:00 The Wood Louse & The Spherical Cow00:02:04 The Necessity of Abstraction00:04:42 Simplicius vs. Ignorantio: The Boxing Match00:06:39 The Kaleidoscope Hypothesis00:08:40 Is the Mind Software?00:13:15 Critique of Causal Patterns00:14:40 Temperature is Not a Thing00:18:24 The Ship of Theseus & Ontology00:23:45 Metaphors Hardening into Reality00:25:41 The Illusion of AGI Inevitability00:27:45 Prediction vs. Understanding00:32:00 Climbing the Mountain vs. The Helicopter00:34:53 Haptic Realism & The Limits of Knowledge---REFERENCES:Person:[00:00:00] Karl Friston (UCL)https://profiles.ucl.ac.uk/1236-karl-friston[00:06:30] Francois Chollethttps://fchollet.com/[00:14:41] Cesar Hidalgo, MLST interview.https://www.youtube.com/watch?v=vzpFOJRteeI[00:30:30] Terence Tao's Bloghttps://terrytao.wordpress.com/Book:[00:02:25] The Brain Abstractedhttps://mitpress.mit.edu/9780262548045/the-brain-abstracted/[00:06:00] On Learned Ignorancehttps://www.amazon.com/Nicholas-Cusa-learned-ignorance-translation/dp/0938060236[00:24:15] Science and the Modern Worldhttps://amazon.com/dp/0684836394


RESCRIPT:https://app.rescript.info/public/share/CYy0ex2M2kvcVRdMnSUky5O7H7hB7v2u_nVhoUiuKD4PDF Transcript: https://app.rescript.info/api/public/sessions/6c44c41e1e0fa6dd/pdf

Thank you to Dr. Maxwell Ramstead for early script work on this show (Ph.D student of Friston) and the woodlice story came from him!

Jan 23, 202642:04
Bayesian Brain, Scientific Method, and Models [Dr. Jeff Beck]

Bayesian Brain, Scientific Method, and Models [Dr. Jeff Beck]

Dr. Jeff Beck, mathematician turned computational neuroscientist, joins us for a fascinating deep dive into why the future of AI might look less like ChatGPT and more like your own brain.


**SPONSOR MESSAGES START**

Prolific - Quality data. From real people. For faster breakthroughs.

https://www.prolific.com/?utm_source=mlst

**END**


*What if the key to building truly intelligent machines isn't bigger models, but smarter ones?*


In this conversation, Jeff makes a compelling case that we've been building AI backwards. While the tech industry races to scale up transformers and language models, Jeff argues we're missing something fundamental: the brain doesn't work like a giant prediction engine. It works like a scientist, constantly testing hypotheses about a world made of *objects* that interact through *forces* — not pixels and tokens.


*The Bayesian Brain* — Jeff explains how your brain is essentially running the scientific method on autopilot. When you combine what you see with what you hear, you're doing optimal Bayesian inference without even knowing it. This isn't just philosophy — it's backed by decades of behavioral experiments showing humans are surprisingly efficient at handling uncertainty.


*AutoGrad Changed Everything* — Forget transformers for a moment. Jeff argues the real hero of the AI boom was automatic differentiation, which turned AI from a math problem into an engineering problem. But in the process, we lost sight of what actually makes intelligence work.


*The Cat in the Warehouse Problem* — Here's where it gets practical. Imagine a warehouse robot that's never seen a cat. Current AI would either crash or make something up. Jeff's approach? Build models that *know what they don't know*, can phone a friend to download new object models on the fly, and keep learning continuously. It's like giving robots the ability to say "wait, what IS that?" instead of confidently being wrong.


*Why Language is a Terrible Model for Thought* — In a provocative twist, Jeff argues that grounding AI in language (like we do with LLMs) is fundamentally misguided. Self-report is the least reliable data in psychology — people routinely explain their own behavior incorrectly. We should be grounding AI in physics, not words.


*The Future is Lots of Little Models* — Instead of one massive neural network, Jeff envisions AI systems built like video game engines: thousands of small, modular object models that can be combined, swapped, and updated independently. It's more efficient, more flexible, and much closer to how we actually think.


Rescript: https://app.rescript.info/public/share/D-b494t8DIV-KRGYONJghvg-aelMmxSDjKthjGdYqsE


---

TIMESTAMPS:

00:00:00 Introduction & The Bayesian Brain

00:01:25 Bayesian Inference & Information Processing

00:05:17 The Brain Metaphor: From Levers to Computers

00:10:13 Micro vs. Macro Causation & Instrumentalism

00:16:59 The Active Inference Community & AutoGrad

00:22:54 Object-Centered Models & The Grounding Problem

00:35:50 Scaling Bayesian Inference & Architecture Design

00:48:05 The Cat in the Warehouse: Solving Generalization

00:58:17 Alignment via Belief Exchange

01:05:24 Deception, Emergence & Cellular Automata


---

REFERENCES:

Paper:

[00:00:24] Zoubin Ghahramani (Google DeepMind)

https://pmc.ncbi.nlm.nih.gov/articles/PMC3538441/pdf/rsta201

[00:19:20] Mamba: Linear-Time Sequence Modeling

https://arxiv.org/abs/2312.00752

[00:27:36] xLSTM: Extended Long Short-Term Memory

https://arxiv.org/abs/2405.04517

[00:41:12] 3D Gaussian Splatting

https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/

[01:07:09] Lenia: Biology of Artificial Life

https://arxiv.org/abs/1812.05433

[01:08:20] Growing Neural Cellular Automata

https://distill.pub/2020/growing-ca/

[01:14:05] DreamCoder

https://arxiv.org/abs/2006.08381

[01:14:58] The Genomic Bottleneck

https://www.nature.com/articles/s41467-019-11786-6

Person:

[00:16:42] Karl Friston (UCL)

https://www.youtube.com/watch?v=PNYWi996Beg

Dec 31, 202501:16:37
Your Brain is Running a Simulation Right Now [Max Bennett]

Your Brain is Running a Simulation Right Now [Max Bennett]

Tim sits down with Max Bennett to explore how our brains evolved over 600 million years—and what that means for understanding both human intelligence and AI.


Max isn't a neuroscientist by training. He's a tech entrepreneur who got curious, started reading, and ended up weaving together three fields that rarely talk to each other: comparative psychology (what different animals can actually do), evolutionary neuroscience (how brains changed over time), and AI (what actually works in practice).


*Your Brain Is a Guessing Machine*

You don't actually "see" the world. Your brain builds a simulation of what it *thinks* is out there and just uses your eyes to check if it's right. That's why optical illusions work—your brain is filling in a triangle that isn't there, or can't decide if it's looking at a duck or a rabbit.


*Rats Have Regrets*

*Chimps Are Machiavellian*

*Language Is the Human Superpower*

*Does ChatGPT Think?*


(truncated description, more on rescript)


Understanding how the brain evolved isn't just about the past. It gives us clues about:

- What's actually different between human intelligence and AI

- Why we're so easily fooled by status games and tribal thinking

- What features we might want to build into—or leave out of—future AI systems


Get Max's book:

https://www.amazon.com/Brief-History-Intelligence-Humans-Breakthroughs/dp/0063286343


Rescript: https://app.rescript.info/public/share/R234b7AXyDXZusqQ_43KMGsUSvJ2TpSz2I3emnI6j9A


---

TIMESTAMPS:

00:00:00 Introduction: Outsider's Advantage & Neocortex Theories

00:11:34 Perception as Inference: The Filling-In Machine

00:19:11 Understanding, Recognition & Generative Models

00:36:39 How Mice Plan: Vicarious Trial & Error

00:46:15 Evolution of Self: The Layer 4 Mystery

00:58:31 Ancient Minds & The Social Brain: Machiavellian Apes

01:19:36 AI Alignment, Instrumental Convergence & Status Games

01:33:07 Metacognition & The IQ Paradox

01:48:40 Does GPT Have Theory of Mind?

02:00:40 Memes, Language Singularity & Brain Size Myths

02:16:44 Communication, Language & The Cyborg Future

02:44:25 Shared Fictions, World Models & The Reality Gap


---

REFERENCES:Person:

[00:00:05] Karl Friston (UCL)

https://www.youtube.com/watch?v=PNYWi996Beg

[00:00:06] Jeff Hawkins

https://www.youtube.com/watch?v=6VQILbDqaI4

[00:12:19] Hermann von Helmholtz

https://plato.stanford.edu/entries/hermann-helmholtz/

[00:38:34] David Redish (U. Minnesota)

https://redishlab.umn.edu/

[01:10:19] Robin Dunbar

https://www.psy.ox.ac.uk/people/robin-dunbar

[01:15:04] Emil Menzel

https://www.sciencedirect.com/bookseries/behavior-of-nonhuman-primates/vol/5/suppl/C

[01:19:49] Nick Bostrom

https://nickbostrom.com/

[02:28:25] Noam Chomsky

https://linguistics.mit.edu/user/chomsky/

[03:01:22] Judea Pearl

https://samueli.ucla.edu/people/judea-pearl/

Concept/Framework:

[00:05:04] Active Inference

https://www.youtube.com/watch?v=KkR24ieh5Ow

Paper:

[00:35:59] Predictions not commands [Rick A Adams]

https://pubmed.ncbi.nlm.nih.gov/23129312/

Book:

[01:25:42] The Elephant in the Brain

https://www.amazon.com/Elephant-Brain-Hidden-Motives-Everyday/dp/0190495995

[01:28:27] The Status Game

https://www.goodreads.com/book/show/58642436-the-status-game

[02:00:40] The Selfish Gene

https://amazon.com/dp/0198788606

[02:14:25] The Language Game

https://www.amazon.com/Language-Game-Improvisation-Created-Changed/dp/1541674987

[02:54:40] The Evolution of Language

https://www.amazon.com/Evolution-Language-Approaches/dp/052167736X

[03:09:37] The Three-Body Problem

https://amazon.com/dp/0765377063

Dec 30, 202503:17:10
The 3 Laws of Knowledge [César Hidalgo]

The 3 Laws of Knowledge [César Hidalgo]

César Hidalgo has spent years trying to answer a deceptively simple question: What is knowledge, and why is it so hard to move around?


We all have this intuition that knowledge is just... information. Write it down in a book, upload it to GitHub, train an AI on it—done. But César argues that's completely wrong. Knowledge isn't a thing you can copy and paste. It's more like a living organism that needs the right environment, the right people, and constant exercise to survive.


Guest: César Hidalgo, Director of the Center for Collective Learning


1. Knowledge Follows Laws (Like Physics)

2. You Can't Download Expertise

3. Why Big Companies Fail to Adapt

4. The "Infinite Alphabet" of Economies


If you think AI can just "copy" human knowledge, or that development is just about throwing money at poor countries, or that writing things down preserves them forever—this conversation will change your mind. Knowledge is fragile, specific, and collective. It decays fast if you don't use it.


The Infinite Alphabet [César A. Hidalgo]

https://www.penguin.co.uk/books/458054/the-infinite-alphabet-by-hidalgo-cesar-a/9780241655672

https://x.com/cesifoti


Rescript link.

https://app.rescript.info/public/share/eaBHbEo9xamwbwpxzcVVm4NQjMh7lsOQKeWwNxmw0JQ


---

TIMESTAMPS:

00:00:00 The Three Laws of Knowledge

00:02:28 Rival vs. Non-Rival: The Economics of Ideas

00:05:43 Why You Can't Just 'Download' Knowledge

00:08:11 The Detective Novel Analogy

00:11:54 Collective Learning & Organizational Networks

00:16:27 Architectural Innovation: Amazon vs. Barnes & Noble

00:19:15 The First Law: Learning Curves

00:23:05 The Samuel Slater Story: Treason & Memory

00:28:31 Physics of Knowledge: Joule's Cannon

00:32:33 Extensive vs. Intensive Properties

00:35:45 Knowledge Decay: Ise Temple & Polaroid

00:41:20 Absorptive Capacity: Sony & Donetsk

00:47:08 Disruptive Innovation & S-Curves

00:51:23 Team Size & The Cost of Innovation

00:57:13 Geography of Knowledge: Vespa's Origin

01:04:34 Migration, Diversity & 'Planet China'

01:12:02 Institutions vs. Knowledge: The China Story

01:21:27 Economic Complexity & The Infinite Alphabet

01:32:27 Do LLMs Have Knowledge?


---

REFERENCES:

Book:

[00:47:45] The Innovator's Dilemma (Christensen)

https://www.amazon.com/Innovators-Dilemma-Revolutionary-Change-Business/dp/0062060244

[00:55:15] Why Greatness Cannot Be Planned

https://amazon.com/dp/3319155237

[01:35:00] Why Information Grows

https://amazon.com/dp/0465048994

Paper:

[00:03:15] Endogenous Technological Change (Romer, 1990)

https://web.stanford.edu/~klenow/Romer_1990.pdf

[00:03:30] A Model of Growth Through Creative Destruction (Aghion & Howitt, 1992)

https://dash.harvard.edu/server/api/core/bitstreams/7312037d-2b2d-6bd4-e053-0100007fdf3b/content

[00:14:55] Organizational Learning: From Experience to Knowledge (Argote & Miron-Spektor, 2011)

https://www.researchgate.net/publication/228754233_Organizational_Learning_From_Experience_to_Knowledge

[00:17:05] Architectural Innovation (Henderson & Clark, 1990)

https://www.researchgate.net/publication/200465578_Architectural_Innovation_The_Reconfiguration_of_Existing_Product_Technologies_and_the_Failure_of_Established_Firms

[00:19:45] The Learning Curve Equation (Thurstone, 1916)

https://dn790007.ca.archive.org/0/items/learningcurveequ00thurrich/learningcurveequ00thurrich.pdf

[00:21:30] Factors Affecting the Cost of Airplanes (Wright, 1936)

https://pdodds.w3.uvm.edu/research/papers/others/1936/wright1936a.pdf

[00:52:45] Are Ideas Getting Harder to Find? (Bloom et al.)

https://web.stanford.edu/~chadj/IdeaPF.pdf

[01:33:00] LLMs/ Emergence

https://arxiv.org/abs/2506.11135

Person:

[00:25:30] Samuel Slater

https://en.wikipedia.org/wiki/Samuel_Slater

[00:42:05] Masaru Ibuka (Sony)

https://www.sony.com/en/SonyInfo/CorporateInfo/History/SonyHistory/1-02.html


Dec 27, 202501:37:06
 "I Desperately Want To Live In The Matrix" - Dr. Mike Israetel

"I Desperately Want To Live In The Matrix" - Dr. Mike Israetel

This is a lively, no-holds-barred debate about whether AI can truly be intelligent, conscious, or understand anything at all — and what happens when (or if) machines become smarter than us.


Dr. Mike Israetel is a sports scientist, entrepreneur, and co-founder of RP Strength (a fitness company). He describes himself as a "dilettante" in AI but brings a fascinating outsider's perspective.


Jared Feather (IFBB Pro bodybuilder and exercise physiologist)


The Big Questions:


1. When is superintelligence coming?

2. Does AI actually understand anything?

3. The Simulation Debate (The Spiciest Part)

4. Will AI kill us all? (The Doomer Debate)

5. What happens to human jobs and purpose?

6. Do we need suffering?


Mikes channel: https://www.youtube.com/channel/UCfQgsKhHjSyRLOp9mnffqVg


RESCRIPT INTERACTIVE PLAYER: https://app.rescript.info/public/share/GVMUXHCqctPkXH8WcYtufFG7FQcdJew_RL_MLgMKU1U


---

TIMESTAMPS:

00:00:00 Introduction & Workout Demo

00:04:15 ASI Timelines & Definitions

00:10:24 The Embodiment Debate

00:18:28 Neutrinos & Abstract Knowledge

00:25:56 Can AI Learn From YouTube?

00:31:25 Diversity of Intelligence

00:36:00 AI Slop & Understanding

00:45:18 The Simulation Argument: Fire & Water

00:58:36 Consciousness & Zombies

01:04:30 Do Reasoning Models Actually Reason?

01:12:00 The Live Learning Problem

01:19:15 Superintelligence & Benevolence

01:28:59 What is True Agency?

01:37:20 Game Theory & The "Kill All Humans" Fallacy

01:48:05 Regulation & The China Factor

01:55:52 Mind Uploading & The Future of Love

02:04:41 Economics of ASI: Will We Be Useless?

02:13:35 The Matrix & The Value of Suffering

02:17:30 Transhumanism & Inequality

02:21:28 Debrief: AI Medical Advice & Final Thoughts


---

REFERENCES:

Paper:

[00:10:45] Alchemy and Artificial Intelligence (Dreyfus)

https://www.rand.org/content/dam/rand/pubs/papers/2006/P3244.pdf

[00:10:55] The Chinese Room Argument (John Searle)

https://home.csulb.edu/~cwallis/382/readings/482/searle.minds.brains.programs.bbs.1980.pdf

[00:11:05] The Symbol Grounding Problem (Stephen Harnad)

https://arxiv.org/html/cs/9906002

[00:23:00] Attention Is All You Need

https://arxiv.org/abs/1706.03762

[00:45:00] GPT-4 Technical Report

https://arxiv.org/abs/2303.08774

[01:45:00] Anthropic Agentic Misalignment Paper

https://www.anthropic.com/research/agentic-misalignment

[02:17:45] Retatrutide

https://pubmed.ncbi.nlm.nih.gov/37366315/

Organization:

[00:15:50] CERN

https://home.cern/

[01:05:00] METR Long Horizon Evaluations

https://evaluations.metr.org/

MLST Episode:

[00:23:10] MLST: Llion Jones - Inventors' Remorse

https://www.youtube.com/watch?v=DtePicx_kFY

[00:50:30] MLST: Blaise Agüera y Arcas Interview

https://www.youtube.com/watch?v=rMSEqJ_4EBk

[01:10:00] MLST: David Krakauer

https://www.youtube.com/watch?v=dY46YsGWMIc

Event:

[00:23:40] ARC Prize/Challenge

https://arcprize.org/

Book:

[00:24:45] The Brain Abstracted

https://www.amazon.com/Brain-Abstracted-Simplification-Philosophy-Neuroscience/dp/0262548046

[00:47:55] Pamela McCorduck

https://www.amazon.com/Machines-Who-Think-Artificial-Intelligence/dp/1568812051

[01:23:15] The Singularity Is Nearer (Ray Kurzweil)

https://www.amazon.com/Singularity-Nearer-Ray-Kurzweil-ebook/dp/B08Y6FYJVY

[01:27:35] A Fire Upon The Deep (Vernor Vinge)

https://www.amazon.com/Fire-Upon-Deep-S-F-MASTERWORKS-ebook/dp/B00AVUMIZE/

[02:04:50] Deep Utopia (Nick Bostrom)

https://www.amazon.com/Deep-Utopia-Meaning-Solved-World/dp/1646871642

[02:05:00] Technofeudalism (Yanis Varoufakis)

https://www.amazon.com/Technofeudalism-Killed-Capitalism-Yanis-Varoufakis/dp/1685891241

Visual Context Needed:

[00:29:40] AT-AT Walker (Star Wars)

https://starwars.fandom.com/wiki/All_Terrain_Armored_Transport

Person:

[00:33:15] Andrej Karpathy

https://karpathy.ai/

Video:

[01:40:00] Mike Israetel vs Liron Shapira AI Doom Debate

https://www.youtube.com/watch?v=RaDWSPMdM4o

Company:

[02:26:30] Examine.com

https://examine.com/

Dec 24, 202502:55:47
Making deep learning perform real algorithms with Category Theory (Andrew Dudzik, Petar Velichkovich, Taco Cohen, Bruno Gavranović, Paul Lessard)

Making deep learning perform real algorithms with Category Theory (Andrew Dudzik, Petar Velichkovich, Taco Cohen, Bruno Gavranović, Paul Lessard)

We often think of Large Language Models (LLMs) as all-knowing, but as the team reveals, they still struggle with the logic of a second-grader. Why can’t ChatGPT reliably add large numbers? Why does it "hallucinate" the laws of physics? The answer lies in the architecture. This episode explores how *Category Theory* —an ultra-abstract branch of mathematics—could provide the "Periodic Table" for neural networks, turning the "alchemy" of modern AI into a rigorous science.


In this deep-dive exploration, *Andrew Dudzik*, *Petar Velichkovich*, *Taco Cohen*, *Bruno Gavranović*, and *Paul Lessard* join host *Tim Scarfe* to discuss the fundamental limitations of today’s AI and the radical mathematical framework that might fix them.


TRANSCRIPT:

https://app.rescript.info/public/share/LMreunA-BUpgP-2AkuEvxA7BAFuA-VJNAp2Ut4MkMWk


---


Key Insights in This Episode:


* *The "Addition" Problem:* *Andrew Dudzik* explains why LLMs don't actually "know" math—they just recognize patterns. When you change a single digit in a long string of numbers, the pattern breaks because the model lacks the internal "machinery" to perform a simple carry operation.

* *Beyond Alchemy:* deep learning is currently in its "alchemy" phase—we have powerful results, but we lack a unifying theory. Category Theory is proposed as the framework to move AI from trial-and-error to principled engineering. [00:13:49]

* *Algebra with Colors:* To make Category Theory accessible, the guests use brilliant analogies—like thinking of matrices as *magnets with colors* that only snap together when the types match. This "partial compositionality" is the secret to building more complex internal reasoning. [00:09:17]

* *Synthetic vs. Analytic Math:* *Paul Lessard* breaks down the philosophical shift needed in AI research: moving from "Analytic" math (what things are made of) to "Synthetic" math [00:23:41]


---


Why This Matters for AGI

If we want AI to solve the world's hardest scientific problems, it can't just be a "stochastic parrot." It needs to internalize the rules of logic and computation. By imbuing neural networks with categorical priors, researchers are attempting to build a future where AI doesn't just predict the next word—it understands the underlying structure of the universe.


---

TIMESTAMPS:

00:00:00 The Failure of LLM Addition & Physics

00:01:26 Tool Use vs Intrinsic Model Quality

00:03:07 Efficiency Gains via Internalization

00:04:28 Geometric Deep Learning & Equivariance

00:07:05 Limitations of Group Theory

00:09:17 Category Theory: Algebra with Colors

00:11:25 The Systematic Guide of Lego-like Math

00:13:49 The Alchemy Analogy & Unifying Theory

00:15:33 Information Destruction & Reasoning

00:18:00 Pathfinding & Monoids in Computation

00:20:15 System 2 Reasoning & Error Awareness

00:23:31 Analytic vs Synthetic Mathematics

00:25:52 Morphisms & Weight Tying Basics

00:26:48 2-Categories & Weight Sharing Theory

00:28:55 Higher Categories & Emergence

00:31:41 Compositionality & Recursive Folds

00:34:05 Syntax vs Semantics in Network Design

00:36:14 Homomorphisms & Multi-Sorted Syntax

00:39:30 The Carrying Problem & Hopf Fibrations


Petar Veličković (GDM)

https://petar-v.com/

Paul Lessard

https://www.linkedin.com/in/paul-roy-lessard/

Bruno Gavranović

https://www.brunogavranovic.com/

Andrew Dudzik (GDM)

https://www.linkedin.com/in/andrew-dudzik-222789142/


---

REFERENCES:


Model:

[00:01:05] Veo

https://deepmind.google/models/veo/

[00:01:10] Genie

https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/

Paper:

[00:04:30] Geometric Deep Learning Blueprint

https://arxiv.org/abs/2104.13478

https://www.youtube.com/watch?v=bIZB1hIJ4u8

[00:16:45] AlphaGeometry

https://arxiv.org/abs/2401.08312

[00:16:55] AlphaCode

https://arxiv.org/abs/2203.07814

[00:17:05] FunSearch

https://www.nature.com/articles/s41586-023-06924-6

[00:37:00] Attention Is All You Need

https://arxiv.org/abs/1706.03762

[00:43:00] Categorical Deep Learning

https://arxiv.org/abs/2402.15332

Dec 22, 202543:58
Are AI Benchmarks Telling The Full Story? [SPONSORED] (Andrew Gordon and Nora Petrova - Prolific)

Are AI Benchmarks Telling The Full Story? [SPONSORED] (Andrew Gordon and Nora Petrova - Prolific)

Is a car that wins a Formula 1 race the best choice for your morning commute? Probably not. In this sponsored deep dive with Prolific, we explore why the same logic applies to Artificial Intelligence. While models are currently shattering records on technical exams, they often fail the most important test of all: **the human experience.**


Why High Benchmark Scores Don’t Mean Better AI


Joining us are **Andrew Gordon** (Staff Researcher in Behavioral Science) and **Nora Petrova** (AI Researcher) from **Prolific**. They reveal the hidden flaws in how we currently rank AI and introduce a more rigorous, "humane" way to measure whether these models are actually helpful, safe, and relatable for real people.


---


Key Insights in This Episode:


* *The F1 Car Analogy:* Andrew explains why a model that excels at the "Humanities Last Exam" might be a nightmare for daily use. Technical benchmarks often ignore the nuances of human communication and adaptability.

* *The "Wild West" of AI Safety:* As users turn to AI for sensitive topics like mental health, Nora highlights the alarming lack of oversight and the "thin veneer" of safety training—citing recent controversial incidents like Grok-3’s "Mecha Hitler."

* *Fixing the "Leaderboard Illusion":* The team critiques current popular rankings like Chatbot Arena, discussing how anonymous, unstratified voting can lead to biased results and how companies can "game" the system.

* *The Xbox Secret to AI Ranking:* Discover how Prolific uses *TrueSkill*—the same algorithm Microsoft developed for Xbox Live matchmaking—to create a fairer, more statistically sound leaderboard for LLMs.

* *The Personality Gap:* Early data from the **Humane Leaderboard** suggests that while AI is getting smarter, it is actually performing *worse* on metrics like personality, culture, and "sycophancy" (the tendency for models to become annoying "people-pleasers").


---


About the HUMAINE Leaderboard

Moving beyond simple "A vs. B" testing, the researchers discuss their new framework that samples participants based on *census data* (Age, Ethnicity, Political Alignment). By using a representative sample of the general public rather than just tech enthusiasts, they are building a standard that reflects the values of the real world.


*Are we building models for benchmarks, or are we building them for humans? It’s time to change the scoreboard.*


Rescript link:

https://app.rescript.info/public/share/IDqwjY9Q43S22qSgL5EkWGFymJwZ3SVxvrfpgHZLXQc


---

TIMESTAMPS:

00:00:00 Introduction & The Benchmarking Problem

00:01:58 The Fractured State of AI Evaluation

00:03:54 AI Safety & Interpretability

00:05:45 Bias in Chatbot Arena

00:06:45 Prolific's Three Pillars Approach

00:09:01 TrueSkill Ranking & Efficient Sampling

00:12:04 Census-Based Representative Sampling

00:13:00 Key Findings: Culture, Personality & Sycophancy


---

REFERENCES:

Paper:

[00:00:15] MMLU

https://arxiv.org/abs/2009.03300

[00:05:10] Constitutional AI

https://arxiv.org/abs/2212.08073

[00:06:45] The Leaderboard Illusion

https://arxiv.org/abs/2504.20879

[00:09:41] HUMAINE Framework Paper

https://huggingface.co/blog/ProlificAI/humaine-framework

Company:

[00:00:30] Prolific

https://www.prolific.com

[00:01:45] Chatbot Arena

https://lmarena.ai/

Person:

[00:00:35] Andrew Gordon

https://www.linkedin.com/in/andrew-gordon-03879919a/

[00:00:45] Nora Petrova

https://www.linkedin.com/in/nora-petrova/

Event:

Algorithm:

[00:09:01] Microsoft TrueSkill

https://www.microsoft.com/en-us/research/project/trueskill-ranking-system/

Leaderboard:

[00:09:21] Prolific HUMAINE Leaderboard

https://www.prolific.com/humaine

[00:09:31] HUMAINE HuggingFace Space

https://huggingface.co/spaces/ProlificAI/humaine-leaderboard

[00:10:21] Prolific AI Leaderboard Portal

https://www.prolific.com/leaderboard

Dataset:

[00:09:51] Prolific Social Reasoning RLHF Dataset

https://huggingface.co/datasets/ProlificAI/social-reasoning-rlhf

Organization:

[00:10:31] MLCommons

https://mlcommons.org/

Dec 20, 202516:05
The Mathematical Foundations of Intelligence [Professor Yi Ma]

The Mathematical Foundations of Intelligence [Professor Yi Ma]

What if everything we think we know about AI understanding is wrong? Is compression the key to intelligence? Or is there something more—a leap from memorization to true abstraction?


In this fascinating conversation, we sit down with **Professor Yi Ma**—world-renowned expert in deep learning, IEEE/ACM Fellow, and author of the groundbreaking new book *Learning Deep Representations of Data Distributions*. Professor Ma challenges our assumptions about what large language models actually do, reveals why 3D reconstruction isn't the same as understanding, and presents a unified mathematical theory of intelligence built on just two principles: **parsimony** and **self-consistency**.


**SPONSOR MESSAGES START**

Prolific - Quality data. From real people. For faster breakthroughs.

https://www.prolific.com/?utm_source=mlst

cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economy

Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlst

Submit investment deck: https://cyber.fund/contact?utm_source=mlst

**END**


Key Insights:


**LLMs Don't Understand—They Memorize**

Language models process text (*already* compressed human knowledge) using the same mechanism we use to learn from raw data.


**The Illusion of 3D Vision**

Sora and NeRFs etc that can reconstruct 3D scenes still fail miserably at basic spatial reasoning


**"All Roads Lead to Rome"**

Why adding noise is *necessary* for discovering structure.


**Why Gradient Descent Actually Works**

Natural optimization landscapes are surprisingly smooth—a "blessing of dimensionality"


**Transformers from First Principles**

Transformer architectures can be mathematically derived from compression principles



INTERACTIVE AI TRANSCRIPT PLAYER w/REFS (ReScript):

https://app.rescript.info/public/share/Z-dMPiUhXaeMEcdeU6Bz84GOVsvdcfxU_8Ptu6CTKMQ


About Professor Yi Ma


Yi Ma is the inaugural director of the School of Computing and Data Science at Hong Kong University and a visiting professor at UC Berkeley.


https://people.eecs.berkeley.edu/~yima/

https://scholar.google.com/citations?user=XqLiBQMAAAAJ&hl=en

https://x.com/YiMaTweets


**Slides from this conversation:**

https://www.dropbox.com/scl/fi/sbhbyievw7idup8j06mlr/slides.pdf?rlkey=7ptovemezo8bj8tkhfi393fh9&dl=0


**Related Talks by Professor Ma:**

- Pursuing the Nature of Intelligence (ICLR): https://www.youtube.com/watch?v=LT-F0xSNSjo

- Earlier talk at Berkeley: https://www.youtube.com/watch?v=TihaCUjyRLM


TIMESTAMPS:

00:00:00 Introduction

00:02:08 The First Principles Book & Research Vision

00:05:21 Two Pillars: Parsimony & Consistency

00:09:50 Evolution vs. Learning: The Compression Mechanism

00:14:36 LLMs: Memorization Masquerading as Understanding

00:19:55 The Leap to Abstraction: Empirical vs. Scientific

00:27:30 Platonism, Deduction & The ARC Challenge

00:35:57 Specialization & The Cybernetic Legacy

00:41:23 Deriving Maximum Rate Reduction

00:48:21 The Illusion of 3D Understanding: Sora & NeRF

00:54:26 All Roads Lead to Rome: The Role of Noise

00:59:56 All Roads Lead to Rome: The Role of Noise

01:00:14 Benign Non-Convexity: Why Optimization Works

01:06:35 Double Descent & The Myth of Overfitting

01:14:26 Self-Consistency: Closed-Loop Learning

01:21:03 Deriving Transformers from First Principles

01:30:11 Verification & The Kevin Murphy Question

01:34:11 CRATE vs. ViT: White-Box AI & Conclusion


REFERENCES:

Book:

[00:03:04] Learning Deep Representations of Data Distributions

https://ma-lab-berkeley.github.io/deep-representation-learning-book/

[00:18:38] A Brief History of Intelligence

https://www.amazon.co.uk/BRIEF-HISTORY-INTELLIGEN-HB-Evolution/dp/0008560099

[00:38:14] Cybernetics

https://mitpress.mit.edu/9780262730099/cybernetics/

Book (Yi Ma):

[00:03:14] 3-D Vision book

https://link.springer.com/book/10.1007/978-0-387-21779-6

refs on ReScript link/YT

Dec 13, 202501:39:15
Pedro Domingos: Tensor Logic Unifies AI Paradigms

Pedro Domingos: Tensor Logic Unifies AI Paradigms

Pedro Domingos, author of the bestselling book "The Master Algorithm," introduces his latest work: Tensor Logic - a new programming language he believes could become the fundamental language for artificial intelligence.


Think of it like this: Physics found its language in calculus. Circuit design found its language in Boolean logic. Pedro argues that AI has been missing its language - until now.


**SPONSOR MESSAGES START**

Build your ideas with AI Studio from Google - http://ai.studio/build

Prolific - Quality data. From real people. For faster breakthroughs.

https://www.prolific.com/?utm_source=mlst

cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economy

Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlst

Submit investment deck: https://cyber.fund/contact?utm_source=mlst

**END**


Current AI is split between two worlds that don't play well together:


Deep Learning (neural networks, transformers, ChatGPT) - great at learning from data, terrible at logical reasoning

Symbolic AI (logic programming, expert systems) - great at logical reasoning, terrible at learning from messy real-world data


Tensor Logic unifies both. It's a single language where you can:

Write logical rules that the system can actually learn and modify

Do transparent, verifiable reasoning (no hallucinations)

Mix "fuzzy" analogical thinking with rock-solid deduction


INTERACTIVE TRANSCRIPT:

https://app.rescript.info/public/share/NP4vZQ-GTETeN_roB2vg64vbEcN7isjJtz4C86WSOhw


TOC:

00:00:00 - Introduction

00:04:41 - What is Tensor Logic?

00:09:59 - Tensor Logic vs PyTorch & Einsum

00:17:50 - The Master Algorithm Connection

00:20:41 - Predicate Invention & Learning New Concepts

00:31:22 - Symmetries in AI & Physics

00:35:30 - Computational Reducibility & The Universe

00:43:34 - Technical Details: RNN Implementation

00:45:35 - Turing Completeness Debate

00:56:45 - Transformers vs Turing Machines

01:02:32 - Reasoning in Embedding Space

01:11:46 - Solving Hallucination with Deductive Modes

01:16:17 - Adoption Strategy & Migration Path

01:21:50 - AI Education & Abstraction

01:24:50 - The Trillion-Dollar Waste


REFS

Tensor Logic: The Language of AI [Pedro Domingos]

https://arxiv.org/abs/2510.12269

The Master Algorithm [Pedro Domingos]

https://www.amazon.co.uk/Master-Algorithm-Ultimate-Learning-Machine/dp/0241004543

Einsum is All you Need (TIM ROCKTÄSCHEL)

https://rockt.ai/2018/04/30/einsum

https://www.youtube.com/watch?v=6DrCq8Ry2cw

Autoregressive Large Language Models are Computationally Universal (Dale Schuurmans et al - GDM)

https://arxiv.org/abs/2410.03170

Memory Augmented Large Language Models are Computationally Universal [Dale Schuurmans]

https://arxiv.org/pdf/2301.04589

On the computational power of NNs [95/Siegelmann]

https://binds.cs.umass.edu/papers/1995_Siegelmann_JComSysSci.pdf

Sebastian Bubeck

https://www.reddit.com/r/OpenAI/comments/1oacp38/openai_researcher_sebastian_bubeck_falsely_claims/

I am a strange loop - Hofstadter

https://www.amazon.co.uk/Am-Strange-Loop-Douglas-Hofstadter/dp/0465030793

Stephen Wolfram

https://www.youtube.com/watch?v=dkpDjd2nHgo

The Complex World: An Introduction to the Foundations of Complexity Science [David C. Krakauer]

https://www.amazon.co.uk/Complex-World-Introduction-Foundations-Complexity/dp/1947864629

Geometric Deep Learning

https://www.youtube.com/watch?v=bIZB1hIJ4u8

Andrew Wilson (NYU)

https://www.youtube.com/watch?v=M-jTeBCEGHc

Yi Ma

https://www.patreon.com/posts/yi-ma-scientific-141953348

Roger Penrose - road to reality

https://www.amazon.co.uk/Road-Reality-Complete-Guide-Universe/dp/0099440687

Artificial Intelligence: A Modern Approach [Russel and Norvig]

https://www.amazon.co.uk/Artificial-Intelligence-Modern-Approach-Global/dp/1292153962

Dec 08, 202501:27:48
He Co-Invented the Transformer. Now: Continuous Thought Machines - Llion Jones and Luke Darlow [Sakana AI]

He Co-Invented the Transformer. Now: Continuous Thought Machines - Llion Jones and Luke Darlow [Sakana AI]

The Transformer architecture (which powers ChatGPT and nearly all modern AI) might be trapping the industry in a localized rut, preventing us from finding true intelligent reasoning, according to the person who co-invented it. Llion Jones and Luke Darlow, key figures at the research lab Sakana AI, join the show to make this provocative argument, and also introduce new research which might lead the way forwards.


**SPONSOR MESSAGES START**

Build your ideas with AI Studio from Google - http://ai.studio/build

Tufa AI Labs is hiring ML Research Engineers https://tufalabs.ai/

cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economy

Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlst

Submit investment deck: https://cyber.fund/contact?utm_source=mlst

**END**


The "Spiral" Problem – Llion uses a striking visual analogy to explain what current AI is missing. If you ask a standard neural network to understand a spiral shape, it solves it by drawing tiny straight lines that just happen to look like a spiral. It "fakes" the shape without understanding the concept of spiraling.


Introducing the Continuous Thought Machine (CTM) Luke Darlow deep dives into their solution: a biology-inspired model that fundamentally changes how AI processes information.


The Maze Analogy: Luke explains that standard AI tries to solve a maze by staring at the whole image and guessing the entire path instantly. Their new machine "walks" through the maze step-by-step.

Thinking Time: This allows the AI to "ponder." If a problem is hard, the model can naturally spend more time thinking about it before answering, effectively allowing it to correct its own mistakes and backtrack—something current Language Models struggle to do genuinely.


https://sakana.ai/

https://x.com/YesThisIsLion

https://x.com/LearningLukeD


TRANSCRIPT:

https://app.rescript.info/public/share/crjzQ-Jo2FQsJc97xsBdfzfOIeMONpg0TFBuCgV2Fu8


TOC:

00:00:00 - Stepping Back from Transformers

00:00:43 - Introduction to Continuous Thought Machines (CTM)

00:01:09 - The Changing Atmosphere of AI Research

00:04:13 - Sakana’s Philosophy: Research Freedom

00:07:45 - The Local Minimum of Large Language Models

00:18:30 - Representation Problems: The Spiral Example

00:29:12 - Technical Deep Dive: CTM Architecture

00:36:00 - Adaptive Computation & Maze Solving

00:47:15 - Model Calibration & Uncertainty

01:00:43 - Sudoku Bench: Measuring True Reasoning



REFS:

Why Greatness Cannot be planned [Kenneth Stanley]

https://www.amazon.co.uk/Why-Greatness-Cannot-Planned-Objective/dp/3319155237

https://www.youtube.com/watch?v=lhYGXYeMq_E


The Hardware Lottery [Sara Hooker]

https://arxiv.org/abs/2009.06489

https://www.youtube.com/watch?v=sQFxbQ7ade0


Continuous Thought Machines [Luke Darlow et al / Sakana]

https://arxiv.org/abs/2505.05522

https://sakana.ai/ctm/


LSTM: The Comeback Story? [Prof. Sepp Hochreiter]

https://www.youtube.com/watch?v=8u2pW2zZLCs


Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis [Kumar/Stanley]

https://arxiv.org/pdf/2505.11581


A Spline Theory of Deep Networks [Randall Balestriero]

https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf

https://www.youtube.com/watch?v=86ib0sfdFtw

https://www.youtube.com/watch?v=l3O2J3LMxqI


On the Biology of a Large Language Model [Anthropic, Jack Lindsey et al]

https://transformer-circuits.pub/2025/attribution-graphs/biology.html


The ARC Prize 2024 Winning Algorithm [Daniel Franzen and Jan Disselhoff] “The ARChitects”

https://www.youtube.com/watch?v=mTX_sAq--zY


Neural Turing Machine [Graves]

https://arxiv.org/pdf/1410.5401


Adaptive Computation Time for Recurrent Neural Networks [Graves]

https://arxiv.org/abs/1603.08983


Sudoko Bench [Sakana]

https://pub.sakana.ai/sudoku/

Nov 23, 202501:12:40
Why Humans Are Still Powering AI [Sponsored]

Why Humans Are Still Powering AI [Sponsored]

Ever wonder where AI models actually get their "intelligence"? We reveal the dirty secret of Silicon Valley: behind every impressive AI system are thousands of real humans providing crucial data, feedback, and expertise.Guest: Phelim Bradley, CEO and Co-founder of ProlificPhelim Bradley runs Prolific, a platform that connects AI companies with verified human experts who help train and evaluate their models. Think of it as a sophisticated marketplace matching the right human expertise to the right AI task - whether that's doctors evaluating medical chatbots or coders reviewing AI-generated software.Prolific: https://prolific.com/?utm_source=mlsthttps://uk.linkedin.com/in/phelim-bradley-84300826The discussion dives into:**The human data pipeline**: How AI companies rely on human intelligence to train, refine, and validate their models - something rarely discussed openly**Quality over quantity**: Why paying humans well and treating them as partners (not commodities) produces better AI training data**The matching challenge**: How Prolific solves the complex problem of finding the right expert for each specific task, similar to matching Uber drivers to riders but with deep expertise requirements**Future of work**: What it means when human expertise becomes an on-demand service, and why this might actually create more opportunities rather than fewer**Geopolitical implications**: Why the centralization of AI development in US tech companies should concern Europe and the UK

Nov 03, 202524:20
The Universal Hierarchy of Life - Prof. Chris Kempes [SFI]

The Universal Hierarchy of Life - Prof. Chris Kempes [SFI]

"What is life?" - asks Chris Kempes, a professor at the Santa Fe Institute.


Chris explains that scientists are moving beyond a purely Earth-based, biological view and are searching for a universal theory of life that could apply to anything, anywhere in the universe. He proposes that things we don't normally consider "alive"—like human culture, language, or even artificial intelligence; could be seen as life forms existing on different "substrates".


To understand this, Chris presents a fascinating three-level framework:


- Materials: The physical stuff life is made of. He argues this could be incredibly diverse across the universe, and we shouldn't expect alien life to share our biochemistry.


- Constraints: The universal laws of physics (like gravity or diffusion) that all life must obey, regardless of what it's made of. This is where different life forms start to look more similar.


- Principles: At the highest level are abstract principles like evolution and learning. Chris suggests these computational or "optimization" rules are what truly define a living system.


A key idea is "convergence" – using the example of the eye. It's such a complex organ that you'd think it evolved only once. However, eyes evolved many separate times across different species. This is because the physics of light provides a clear "target", and evolution found similar solutions to the problem of seeing, even with different starting materials.



**SPONSOR MESSAGES**

Prolific - Quality data. From real people. For faster breakthroughs.

https://www.prolific.com/?utm_source=mlst

Check out NotebookLM from Google here - https://notebooklm.google.com/ - it’s really good for doing research directly from authoritative source material, minimising hallucinations.

cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economy

Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlst

Submit investment deck: https://cyber.fund/contact?utm_source=mlst


Prof. Chris Kempes:

https://www.santafe.edu/people/profile/chris-kempes


TRANSCRIPT:

https://app.rescript.info/public/share/Y2cI1i0nX_-iuZitvlguHvaVLQTwPX1Y_E1EHxV0i9I


TOC:

00:00:00 - Introduction to Chris Kempes and the Santa Fe Institute

00:02:28 - The Three Cultures of Science

00:05:08 - What Makes a Good Scientific Theory?

00:06:50 - The Universal Theory of Life

00:09:40 - The Role of Material in Life

00:12:50 - A Hierarchy for Understanding Life

00:13:55 - How Life Diversifies and Converges

00:17:53 - Adaptive Processes and Defining Life

00:19:28 - Functionalism, Memes, and Phylogenies

00:22:58 - Convergence at Multiple Levels

00:25:45 - The Possibility of Simulating Life

00:28:16 - Intelligence, Parasitism, and Spectrums of Life

00:32:39 - Phase Changes in Evolution

00:36:16 - The Separation of Matter and Logic

00:37:21 - Assembly Theory and Quantifying Complexity


REFS:

Developing a predictive science of the biosphere requires the integration of scientific cultures [Kempes et al]

https://www.pnas.org/doi/10.1073/pnas.2209196121


Seeing with an extra sense (“Dangerous prediction”) [Rob Phillips]

https://www.sciencedirect.com/science/article/pii/S0960982224009035


The Multiple Paths to Multiple Life [Christopher P. Kempes & David C. Krakauer]

https://link.springer.com/article/10.1007/s00239-021-10016-2


The Information Theory of Individuality [David Krakauer et al]

https://arxiv.org/abs/1412.2447


Minds, Brains and Programs [Searle]

https://home.csulb.edu/~cwallis/382/readings/482/searle.minds.brains.programs.bbs.1980.pdf


The error threshold

https://www.sciencedirect.com/science/article/abs/pii/S0168170204003843


Assembly theory and its relationship with computational complexity [Kempes et al]

https://arxiv.org/abs/2406.12176

Oct 25, 202540:60
Google Researcher Shows Life "Emerges From Code" - Blaise Agüera y Arcas

Google Researcher Shows Life "Emerges From Code" - Blaise Agüera y Arcas

Blaise Agüera y Arcas explores some mind-bending ideas about what intelligence and life really are—and why they might be more similar than we think (filmed at ALIFE conference, 2025 - https://2025.alife.org/).


Life and intelligence are both fundamentally computational (he says). From the very beginning, living things have been running programs. Your DNA? It's literally a computer program, and the ribosomes in your cells are tiny universal computers building you according to those instructions.


**SPONSOR MESSAGES**

Prolific - Quality data. From real people. For faster breakthroughs.

https://www.prolific.com/?utm_source=mlst

cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economy

Oct SF conference - https://dagihouse.com/?utm_source=mlst - Joscha Bach keynoting(!) + OAI, Anthropic, NVDA,++

Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlst

Submit investment deck: https://cyber.fund/contact?utm_source=mlst


Blaise argues that there is more to evolution than random mutations (like most people think). The secret to increasing complexity is *merging* i.e. when different organisms or systems come together and combine their histories and capabilities.


Blaise describes his "BFF" experiment where random computer code spontaneously evolved into self-replicating programs, showing how purpose and complexity can emerge from pure randomness through computational processes.


https://en.wikipedia.org/wiki/Blaise_Ag%C3%BCera_y_Arcas

https://x.com/blaiseaguera?lang=en


TRANSCRIPT:

https://app.rescript.info/public/share/VX7Gktfr3_wIn4Bj7cl9StPBO1MN4R5lcJ11NE99hLg


TOC:

00:00:00 Introduction - New book "What is Intelligence?"

00:01:45 Life as computation - Von Neumann's insights

00:12:00 BFF experiment - How purpose emerges

00:26:00 Symbiogenesis and evolutionary complexity

00:40:00 Functionalism and consciousness

00:49:45 AI as part of collective human intelligence

00:57:00 Comparing AI and human cognition


REFS:

What is intelligence [Blaise Agüera y Arcas]

https://whatisintelligence.antikythera.org/ [Read free online, interactive rich media]

https://mitpress.mit.edu/9780262049955/what-is-intelligence/ [MIT Press]


Large Language Models and Emergence: A Complex Systems Perspective

https://arxiv.org/abs/2506.11135


Our first Noam Chomsky MLST interview

https://www.youtube.com/watch?v=axuGfh4UR9Q


Chance and Necessity [Jacques Monod]

https://monoskop.org/images/9/99/Monod_Jacques_Chance_and_Necessity.pdf


Wonderful Life: The Burgess Shale and the History of Nature [Stephen Jay Gould]

https://www.amazon.co.uk/Wonderful-Life-Burgess-Nature-History/dp/0099273454


The major evolutionary transitions [E Szathmáry, J M Smith]

https://wiki.santafe.edu/images/0/0e/Szathmary.MaynardSmith_1995_Nature.pdf


Don't Sleep, There Are Snakes: Life and Language in the Amazonian Jungle [Dan Everett]

https://www.amazon.com/Dont-Sleep-There-Are-Snakes/dp/0307386120


The Nature of Technology: What It Is and How It Evolves [W. Brian Arthur]

https://www.amazon.com/Nature-Technology-What-How-Evolves-ebook/dp/B002RI9W16/


The MANIAC [Benjamin Labatut]

https://www.amazon.com/MANIAC-Benjam%C3%ADn-Labatut/dp/1782279814


When We Cease to Understand the World [Benjamin Labatut]

https://www.amazon.com/When-We-Cease-Understand-World/dp/1681375664/


The Boys in the Boat [Dan Brown]

https://www.amazon.com/Boys-Boat-Americans-Berlin-Olympics/dp/0143125478


[Petter Johansson] (Split brain)

https://www.lucs.lu.se/fileadmin/user_upload/lucs/2011/01/Johansson-et-al.-2006-How-Something-Can-Be-Said-About-Telling-More-Than-We-Can-Know.pdf


If Anyone Builds It, Everyone Dies [Eliezer Yudkowsky, Nate Soares]

https://www.amazon.com/Anyone-Builds-Everyone-Dies-Superhuman/dp/0316595640


The science of cycology

https://link.springer.com/content/pdf/10.3758/bf03195929.pdf



Oct 21, 202559:53
The Secret Engine of AI - Prolific [Sponsored] (Sara Saab, Enzo Blindow)

The Secret Engine of AI - Prolific [Sponsored] (Sara Saab, Enzo Blindow)

We sat down with Sara Saab (VP of Product at Prolific) and Enzo Blindow (VP of Data and AI at Prolific) to explore the critical role of human evaluation in AI development and the challenges of aligning AI systems with human values. Prolific is a human annotation and orchestration platform for AI used by many of the major AI labs. This is a sponsored show in partnership with Prolific.


**SPONSOR MESSAGES**

cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economy

Oct SF conference - https://dagihouse.com/?utm_source=mlst - Joscha Bach keynoting(!) + OAI, Anthropic, NVDA,++

Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlst

Submit investment deck: https://cyber.fund/contact?utm_source=mlst


While technologists want to remove humans from the loop for speed and efficiency, these non-deterministic AI systems actually require more human oversight than ever before. Prolific's approach is to put "well-treated, verified, diversely demographic humans behind an API" - making human feedback as accessible as any other infrastructure service.


When AI models like Grok 4 achieve top scores on technical benchmarks but feel awkward or problematic to use in practice, it exposes the limitations of our current evaluation methods. The guests argue that optimizing for benchmarks may actually weaken model performance in other crucial areas, like cultural sensitivity or natural conversation.


We also discuss Anthropic's research showing that frontier AI models, when given goals and access to information, independently arrived at solutions involving blackmail - without any prompting toward unethical behavior. Even more concerning, the more sophisticated the model, the more susceptible it was to this "agentic misalignment."


Enzo and Sarah present Prolific's "Humane" leaderboard as an alternative to existing benchmarking systems. By stratifying evaluations across diverse demographic groups, they reveal that different populations have vastly different experiences with the same AI models.


Looking ahead, the guests imagine a world where humans take on coaching and teaching roles for AI systems - similar to how we might correct a child or review code. This also raises important questions about working conditions and the evolution of labor in an AI-augmented world. Rather than replacing humans entirely, we may be moving toward more sophisticated forms of human-AI collaboration.


As AI tech becomes more powerful and general-purpose, the quality of human evaluation becomes more critical, not less. We need more representative evaluation frameworks that capture the messy reality of human values and cultural diversity.


Visit Prolific:

https://www.prolific.com/

Sara Saab (VP Product):

https://uk.linkedin.com/in/sarasaab


Enzo Blindow (VP Data & AI):

https://uk.linkedin.com/in/enzoblindow


TRANSCRIPT:

https://app.rescript.info/public/share/xZ31-0kJJ_xp4zFSC-bunC8-hJNkHpbm7Lg88RFcuLE


TOC:

[00:00:00] Intro & Background

[00:03:16] Human-in-the-Loop Challenges

[00:17:19] Can AIs Understand?

[00:32:02] Benchmarking & Vibes

[00:51:00] Agentic Misalignment Study

[01:03:00] Data Quality vs Quantity

[01:16:00] Future of AI Oversight


REFS:

Anthropic Agentic Misalignment

https://www.anthropic.com/research/agentic-misalignment


Value Compass

https://arxiv.org/pdf/2409.09586


Reasoning Models Don’t Always Say What They Think (Anthropic)

https://www.anthropic.com/research/reasoning-models-dont-say-think

https://assets.anthropic.com/m/71876fabef0f0ed4/original/reasoning_models_paper.pdf


Apollo research - science of evals blog post

https://www.apolloresearch.ai/blog/we-need-a-science-of-evals


Leaderboard Illusion

https://www.youtube.com/watch?v=9W_OhS38rIE MLST video


The Leaderboard Illusion [2025]

Shivalika Singh et al

https://arxiv.org/abs/2504.20879


(Truncated, full list on YT)



Oct 18, 202501:19:39
AI Agents Can Code 10,000 Lines of Hacking Tools In Seconds - Dr. Ilia Shumailov (ex-GDM)

AI Agents Can Code 10,000 Lines of Hacking Tools In Seconds - Dr. Ilia Shumailov (ex-GDM)

Dr. Ilia Shumailov - Former DeepMind AI Security Researcher, now building security tools for AI agents


Ever wondered what happens when AI agents start talking to each other—or worse, when they start breaking things? Ilia Shumailov spent years at DeepMind thinking about exactly these problems, and he's here to explain why securing AI is way harder than you think.


**SPONSOR MESSAGES**

—Check out notebooklm for your research project, it's really powerfulhttps://notebooklm.google.com/

Take the Prolific human data survey - https://www.prolific.com/humandatasurvey?utm_source=mlst and be the first to see the results and benchmark their practices against the wider community!

cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economy

Oct SF conference - https://dagihouse.com/?utm_source=mlst - Joscha Bach keynoting(!) + OAI, Anthropic, NVDA,++

Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlst

Submit investment deck: https://cyber.fund/contact?utm_source=mlst


We're racing toward a world where AI agents will handle our emails, manage our finances, and interact with sensitive data 24/7. But there is a problem. These agents are nothing like human employees. They never sleep, they can touch every endpoint in your system simultaneously, and they can generate sophisticated hacking tools in seconds. Traditional security measures designed for humans simply won't work.


Dr. Ilia Shumailov

https://x.com/iliaishacked

https://iliaishacked.github.io/

https://sequrity.ai/


TRANSCRIPT:

https://app.rescript.info/public/share/dVGsk8dz9_V0J7xMlwguByBq1HXRD6i4uC5z5r7EVGM


TOC:

00:00:00 - Introduction & Trusted Third Parties via ML

00:03:45 - Background & Career Journey

00:06:42 - Safety vs Security Distinction

00:09:45 - Prompt Injection & Model Capability

00:13:00 - Agents as Worst-Case Adversaries

00:15:45 - Personal AI & CAML System Defense

00:19:30 - Agents vs Humans: Threat Modeling

00:22:30 - Calculator Analogy & Agent Behavior

00:25:00 - IMO Math Solutions & Agent Thinking

00:28:15 - Diffusion of Responsibility & Insider Threats

00:31:00 - Open Source Security Concerns

00:34:45 - Supply Chain Attacks & Trust Issues

00:39:45 - Architectural Backdoors

00:44:00 - Academic Incentives & Defense Work

00:48:30 - Semantic Censorship & Halting Problem

00:52:00 - Model Collapse: Theory & Criticism

00:59:30 - Career Advice & Ross Anderson Tribute


REFS:

Lessons from Defending Gemini Against Indirect Prompt Injections

https://arxiv.org/abs/2505.14534


Defeating Prompt Injections by Design.

Debenedetti, E., Shumailov, I., Fan, T., Hayes, J., Carlini, N., Fabian, D., Kern, C., Shi, C., Terzis, A., & Tramèr, F.

https://arxiv.org/pdf/2503.18813


Agentic Misalignment: How LLMs could be insider threats

https://www.anthropic.com/research/agentic-misalignment


STOP ANTHROPOMORPHIZING INTERMEDIATE TOKENS AS REASONING/THINKING TRACES!

Subbarao Kambhampati et al

https://arxiv.org/pdf/2504.09762


Meiklejohn, S., Blauzvern, H., Maruseac, M., Schrock, S., Simon, L., & Shumailov, I. (2025).

Machine learning models have a supply chain problem.

https://arxiv.org/abs/2505.22778


Gao, Y., Shumailov, I., & Fawaz, K. (2025).

Supply-chain attacks in machine learning frameworks.

https://openreview.net/pdf?id=EH5PZW6aCr


Apache Log4j Vulnerability Guidance

https://www.cisa.gov/news-events/news/apache-log4j-vulnerability-guidance


Bober-Irizar, M., Shumailov, I., Zhao, Y., Mullins, R., & Papernot, N. (2022).

Architectural backdoors in neural networks.

https://arxiv.org/pdf/2206.07840


Position: Fundamental Limitations of LLM Censorship Necessitate New Approaches

David Glukhov, Ilia Shumailov, ...

https://proceedings.mlr.press/v235/glukhov24a.html


AlphaEvolve MLST interview [Matej Balog, Alexander Novikov]

https://www.youtube.com/watch?v=vC9nAosXrJw

Oct 04, 202501:01:08
New top score on ARC-AGI-2-pub (29.4%) - Jeremy Berman

New top score on ARC-AGI-2-pub (29.4%) - Jeremy Berman

We need AI systems to synthesise new knowledge, not just compress the data they see. Jeremy Berman, is a research scientist at Reflection AI and recent winner of the ARC-AGI v2 public leaderboard.**SPONSOR MESSAGES**—Take the Prolific human data survey - https://www.prolific.com/humandatasurvey?utm_source=mlst and be the first to see the results and benchmark their practices against the wider community!—cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economyOct SF conference - https://dagihouse.com/?utm_source=mlst - Joscha Bach keynoting(!) + OAI, Anthropic, NVDA,++Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlstSubmit investment deck: https://cyber.fund/contact?utm_source=mlst— Imagine trying to teach an AI to think like a human i.e. solving puzzles that are easy for us but stump even the smartest models. Jeremy's evolutionary approach—evolving natural language descriptions instead of python code like his last version—landed him at the top with about 30% accuracy on the ARCv2.We discuss why current AIs are like "stochastic parrots" that memorize but struggle to truly reason or innovate as well as big ideas like building "knowledge trees" for real understanding, the limits of neural networks versus symbolic systems, and whether we can train models to synthesize new ideas without forgetting everything else. Jeremy Berman:https://x.com/jerber888TRANSCRIPT:https://app.rescript.info/public/share/qvCioZeZJ4Q_NlR66m-hNUZnh-qWlUJcS15Wc2OGwD0TOC:Introduction and Overview [00:00:00]ARC v1 Solution [00:07:20]Evolutionary Python Approach [00:08:00]Trade-offs in Depth vs. Breadth [00:10:33]ARC v2 Improvements [00:11:45]Natural Language Shift [00:12:35]Model Thinking Enhancements [00:13:05]Neural Networks vs. Symbolism Debate [00:14:24]Turing Completeness Discussion [00:15:24]Continual Learning Challenges [00:19:12]Reasoning and Intelligence [00:29:33]Knowledge Trees and Synthesis [00:50:15]Creativity and Invention [00:56:41]Future Directions and Closing [01:02:30]REFS:Jeremy’s 2024 article on winning ARCAGI1-pubhttps://jeremyberman.substack.com/p/how-i-got-a-record-536-on-arc-agiGetting 50% (SoTA) on ARC-AGI with GPT-4o [Greenblatt]https://blog.redwoodresearch.org/p/getting-50-sota-on-arc-agi-with-gpt https://www.youtube.com/watch?v=z9j3wB1RRGA [his MLST interview]A Thousand Brains: A New Theory of Intelligence [Hawkins]https://www.amazon.com/Thousand-Brains-New-Theory-Intelligence/dp/1541675819https://www.youtube.com/watch?v=6VQILbDqaI4 [MLST interview]Francois Chollet + Mike Knoop’s labhttps://ndea.com/On the Measure of Intelligence [Chollet]https://arxiv.org/abs/1911.01547On the Biology of a Large Language Model [Anthropic]https://transformer-circuits.pub/2025/attribution-graphs/biology.html The ARChitects [won 2024 ARC-AGI-1-private]https://www.youtube.com/watch?v=mTX_sAq--zY Connectionism critique 1998 [Fodor/Pylshyn]https://uh.edu/~garson/F&P1.PDF Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis [Kumar/Stanley]https://arxiv.org/pdf/2505.11581 AlphaEvolve interview (also program synthesis)https://www.youtube.com/watch?v=vC9nAosXrJw ShinkaEvolve: Evolving New Algorithms with LLMs, Orders of Magnitude More Efficiently [Lange et al]https://sakana.ai/shinka-evolve/ Deep learning with Python Rev 3 [Chollet] - READ CHAPTER 19 NOW!https://deeplearningwithpython.io/

Sep 27, 202501:08:27
Deep Learning is Not So Mysterious or Different - Prof. Andrew Gordon Wilson (NYU)

Deep Learning is Not So Mysterious or Different - Prof. Andrew Gordon Wilson (NYU)

Professor Andrew Wilson from NYU explains why many common-sense ideas in artificial intelligence might be wrong. For decades, the rule of thumb in machine learning has been to fear complexity. The thinking goes: if your model has too many parameters (is "too complex") for the amount of data you have, it will "overfit" by essentially memorizing the data instead of learning the underlying patterns. This leads to poor performance on new, unseen data. This is known as the classic "bias-variance trade-off" i.e. a balancing act between a model that's too simple and one that's too complex.


**SPONSOR MESSAGES**

Tufa AI Labs is an AI research lab based in Zurich. **They are hiring ML research engineers!**

This is a once in a lifetime opportunity to work with one of the best labs in Europe

Contact Benjamin Crouzier - https://tufalabs.ai/

Take the Prolific human data survey - https://www.prolific.com/humandatasurvey?utm_source=mlst and be the first to see the results and benchmark their practices against the wider community!

cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economy

Oct SF conference - https://dagihouse.com/?utm_source=mlst - Joscha Bach keynoting(!) + OAI, Anthropic, NVDA,++

Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlst

Submit investment deck: https://cyber.fund/contact?utm_source=mlst


Description Continued:


Professor Wilson challenges this fundamental belief (fearing complexity). He makes a few surprising points:


**Bigger Can Be Better**: massive models don't just get more flexible; they also develop a stronger "simplicity bias". So, if your model is overfitting, the solution might paradoxically be to make it even bigger.


**The "Bias-Variance Trade-off" is a Misnomer**: Wilson claims you don't actually have to trade one for the other. You can have a model that is incredibly expressive and flexible while also being strongly biased toward simple solutions. He points to the "double descent" phenomenon, where performance first gets worse as models get more complex, but then surprisingly starts getting better again.


**Honest Beliefs and Bayesian Thinking**: His core philosophy is that we should build models that honestly represent our beliefs about the world. We believe the world is complex, so our models should be expressive. But we also believe in Occam's razor—that the simplest explanation is often the best. He champions Bayesian methods, which naturally balance these two ideas through a process called marginalization, which he describes as an automatic Occam's razor.


TOC:


[00:00:00] Introduction and Thesis

[00:04:19] Challenging Conventional Wisdom

[00:11:17] The Philosophy of a Scientist-Engineer

[00:16:47] Expressiveness, Overfitting, and Bias

[00:28:15] Understanding, Compression, and Kolmogorov Complexity

[01:05:06] The Surprising Power of Generalization

[01:13:21] The Elegance of Bayesian Inference

[01:33:02] The Geometry of Learning

[01:46:28] Practical Advice and The Future of AI


Prof. Andrew Gordon Wilson:

https://x.com/andrewgwils

https://cims.nyu.edu/~andrewgw/

https://scholar.google.com/citations?user=twWX2LIAAAAJ&hl=en

https://www.youtube.com/watch?v=Aja0kZeWRy4

https://www.youtube.com/watch?v=HEp4TOrkwV4


TRANSCRIPT:

https://app.rescript.info/public/share/H4Io1Y7Rr54MM05FuZgAv4yphoukCfkqokyzSYJwCK8


Hosts:

Dr. Tim Scarfe / Dr. Keith Duggar (MIT Ph.D)


REFS:


Deep Learning is Not So Mysterious or Different [Andrew Gordon Wilson]

https://arxiv.org/abs/2503.02113


Bayesian Deep Learning and a Probabilistic Perspective of Generalization [Andrew Gordon Wilson, Pavel Izmailov]

https://arxiv.org/abs/2002.08791


Compute-Optimal LLMs Provably Generalize Better With Scale [Marc Finzi, Sanyam Kapoor, Diego Granziol, Anming Gu, Christopher De Sa, J. Zico Kolter, Andrew Gordon Wilson]

https://arxiv.org/abs/2504.15208

Sep 19, 202502:03:48
Karl Friston - Why Intelligence Can't Get Too Large (Goldilocks principle)

Karl Friston - Why Intelligence Can't Get Too Large (Goldilocks principle)

In this episode, hosts Tim and Keith finally realize their long-held dream of sitting down with their hero, the brilliant neuroscientist Professor Karl Friston. The conversation is a fascinating and mind-bending journey into Professor Friston's life's work, the Free Energy Principle, and what it reveals about life, intelligence, and consciousness itself.


**SPONSORS**

Gemini CLI is an open-source AI agent that brings the power of Gemini directly into your terminal - https://github.com/google-gemini/gemini-cli

---

Take the Prolific human data survey - https://www.prolific.com/humandatasurvey?utm_source=mlst and be the first to see the results and benchmark their practices against the wider community!

---

cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economy

Oct SF conference - https://dagihouse.com/?utm_source=mlst - Joscha Bach keynoting(!) + OAI, Anthropic, NVDA,++

Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlst

Submit investment deck: https://cyber.fund/contact?utm_source=mlst

***


They kick things off by looking back on the 20-year journey of the Free Energy Principle. Professor Friston explains it as a fundamental rule for survival: all living things, from a single cell to a human being, are constantly trying to make sense of the world and reduce unpredictability. It’s this drive to minimize surprise that allows things to exist and maintain their structure.

This leads to a bigger question: What does it truly mean to be "intelligent"? The group debates whether intelligence is everywhere, even in a virus or a plant, or if it requires a certain level of complexity.


Professor Friston introduces the idea of different "kinds" of things, suggesting that creatures like us, who can model themselves and think about the future, possess a unique and "strange" kind of agency that sets us apart.


From intelligence, the discussion naturally flows to the even trickier concept of consciousness. Is it the same as intelligence? Professor Friston argues they are different. He explains that consciousness might emerge from deep, layered self-awareness—not just acting, but understanding that you are the one causing your actions and thinking about your place in the world.


They also explore intelligence at different sizes. Is a corporation intelligent? What about the entire planet? Professor Friston suggests there might be a "Goldilocks zone" for intelligence. It doesn't seem to exist at the super-tiny atomic level or at the massive scale of planets and solar systems, but thrives in the complex middle-ground where we live.


Finally, they tackle one of the most pressing topics of our time: Can we build a truly conscious AI? Professor Friston shares his doubts about whether our current computers are capable of a feat like that. He suggests that genuine consciousness might require a different kind of "mortal" computation, where the machine's physical body and its "mind" are inseparable, much like in biological creatures.


TRANSCRIPT:

https://app.rescript.info/public/share/FZkF8BO7HMt9aFfu2_q69WGT_ZbYZ1VVkC6RtU3eeOI


TOC:

00:00:00: Introduction & Retrospective on the Free Energy Principle

00:09:34: Strange Particles, Agency, and Consciousness

00:37:45: The Scale of Intelligence: From Viruses to the Biosphere

01:01:35: Modelling, Boundaries, and Practical Application

01:21:12: Conclusion

Sep 10, 202501:21:40
The Day AI Solves My Puzzles Is The Day I Worry (Prof. Cristopher Moore)

The Day AI Solves My Puzzles Is The Day I Worry (Prof. Cristopher Moore)

We are joined by Cristopher Moore, a professor at the Santa Fe Institute with a diverse background in physics, computer science, and machine learning.


The conversation begins with Cristopher, who calls himself a "frog" explaining that he prefers to dive deep into specific, concrete problems rather than taking a high-level "bird's-eye view".


They explore why current AI models, like transformers, are so surprisingly effective. Cristopher argues it's because the real world isn't random; it's full of rich structures, patterns, and hierarchies that these models can learn to exploit, even if we don't fully understand how.


**SPONSORS**

Take the Prolific human data survey - https://www.prolific.com/humandatasurvey?utm_source=mlst and be the first to see the results and benchmark their practices against the wider community!

---

cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economy.

Oct SF conference - https://dagihouse.com/?utm_source=mlst - Joscha Bach keynoting(!) + OAI, Anthropic, NVDA,++

Hiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlst

Submit investment deck: https://cyber.fund/contact?utm_source=mlst

***


Cristopher Moore:

https://sites.santafe.edu/~moore/


TOC:

00:00:00 - Introduction

00:02:05 - Meet Christopher Moore: A Frog in the World of Science

00:05:14 - The Limits of Transformers and Real-World Data

00:11:19 - Intelligence as Creative Problem-Solving

00:23:30 - Grounding, Meaning, and Shared Reality

00:31:09 - The Nature of Creativity and Aesthetics

00:44:31 - Computational Irreducibility and Universality

00:53:06 - Turing Completeness, Recursion, and Intelligence

01:11:26 - The Universe Through a Computational Lens

01:26:45 - Algorithmic Justice and the Need for Transparency


TRANSCRIPT: https://app.rescript.info/public/share/VRe2uQSvKZOm0oIBoDsrNwt46OMCqRnShVnUF3qyoFk


Filmed at DISI (Diverse Intelligences Summer Institute)

https://disi.org/


REFS:

The Nature of computation [Chris Moore]

https://nature-of-computation.org/


Birds and Frogs [Freeman Dyson]

https://www.ams.org/notices/200902/rtx090200212p.pdf


Replica Theory [Parisi et al]

https://arxiv.org/pdf/1409.2722


Janossy pooling [Fabian Fuchs]

https://fabianfuchsml.github.io/equilibriumaggregation/


Cracking the cryptic [YT channel]

https://www.youtube.com/c/CrackingTheCryptic


Sudoko Bench [Sakana]

https://sakana.ai/sudoku-bench/


Fractured entangled representations “phylogenetic locking in comment” [Kumar/Stanley]

https://arxiv.org/pdf/2505.11581 (see our shows on this)


The War Against Cliché: [Martin Amis]

https://www.amazon.com/War-Against-Cliche-Reviews-1971-2000/dp/0375727167


Rule 110 (CA)

https://mathworld.wolfram.com/Rule150.html


Universality in Elementary Cellular Automata [Matt Cooke]

https://wpmedia.wolfram.com/sites/13/2018/02/15-1-1.pdf


Small Semi-Weakly Universal Turing Machines [Damien Woods]

https://tilde.ini.uzh.ch/users/tneary/public_html/WoodsNeary-FI09.pdf


COMPUTING MACHINERY AND INTELLIGENCE [Turing, 1950]

https://courses.cs.umbc.edu/471/papers/turing.pdf


Comment on Space Time as a causal set [Moore, 88]

https://sites.santafe.edu/~moore/comment.pdf


Recursion Theory on the Reals and Continuous-time Computation [Moore, 96]

Sep 04, 202501:34:53
Michael Timothy Bennett: Defining Intelligence and AGI Approaches

Michael Timothy Bennett: Defining Intelligence and AGI Approaches

Dr. Michael Timothy Bennett is a computer scientist who's deeply interested in understanding artificial intelligence, consciousness, and what it means to be alive. He's known for his provocative paper "What the F*** is Artificial Intelligence" which challenges conventional thinking about AI and intelligence.**SPONSOR MESSAGES***Prolific: Quality data. From real people. For faster breakthroughs.https://prolific.com/mlst?utm_campaign=98404559-MLST&utm_source=youtube&utm_medium=podcast&utm_content=mb***Michael takes us on a journey through some of the biggest questions in AI and consciousness. He starts by exploring what intelligence actually is - settling on the idea that it's about "adaptation with limited resources" (a definition from researcher Pei Wang that he particularly likes).The discussion ranges from technical AI concepts to philosophical questions about consciousness, with Michael offering fresh perspectives that challenge Silicon Valley's "just scale it up" approach to AI. He argues that true intelligence isn't just about having more parameters or data - it's about being able to adapt efficiently, like biological systems do.TOC:1. Introduction & Paper Overview [00:01:34]2. Definitions of Intelligence [00:02:54]3. Formal Models (AIXI, Active Inference) [00:07:06]4. Causality, Abstraction & Embodiment [00:10:45]5. Computational Dualism & Mortal Computation [00:25:51]6. Modern AI, AGI Progress & Benchmarks [00:31:30]7. Hybrid AI Approaches [00:35:00]8. Consciousness & The Hard Problem [00:39:35]9. The Diverse Intelligences Summer Institute (DISI) [00:53:20]10. Living Systems & Self-Organization [00:54:17]11. Closing Thoughts [01:04:24]Michaels socials:https://michaeltimothybennett.com/https://x.com/MiTiBennettTranscript:https://app.rescript.info/public/share/4jSKbcM77Sf6Zn-Ms4hda7C4krRrMcQt0qwYqiqPTPIReferences:Bennett, M.T. "What the F*** is Artificial Intelligence"https://arxiv.org/abs/2503.23923Bennett, M.T. "Are Biological Systems More Intelligent Than Artificial Intelligence?" https://arxiv.org/abs/2405.02325Bennett, M.T. PhD Thesis "How To Build Conscious Machines"https://osf.io/preprints/thesiscommons/wehmg_v1Legg, S. & Hutter, M. (2007). "Universal Intelligence: A Definition of Machine Intelligence"Wang, P. "Defining Artificial Intelligence" - on non-axiomatic reasoning systems (NARS)Chollet, F. (2019). "On the Measure of Intelligence" - introduces the ARC benchmark and developer-aware generalizationHutter, M. (2005). "Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability"Chalmers, D. "The Hard Problem of Consciousness"Descartes, R. - Cartesian dualism and the pineal gland theory (historical context)Friston, K. - Free Energy Principle and Active Inference frameworkLevin, M. - Work on collective intelligence, cancer as information isolation, and "mind blindness"Hinton, G. (2022). "The Forward-Forward Algorithm" - introduces mortal computation conceptAlexander Ororbia & Friston - Formal treatment of mortal computationSutton, R. "The Bitter Lesson" - on search and learning in AIPearl, J. "The Book of Why" - causal inference and reasoningAlternative AGI ApproachesWang, P. - NARS (Non-Axiomatic Reasoning System)Goertzel, B. - Hyperon system and modular AGI architecturesBenchmarks & EvaluationHendrycks, D. - Humanities Last Exam benchmark (mentioned re: saturation)Filmed at:Diverse Intelligences Summer Institute (DISI) https://disi.org/

Aug 28, 202501:05:45
Superintelligence Strategy (Dan Hendrycks)

Superintelligence Strategy (Dan Hendrycks)

Deep dive with Dan Hendrycks, a leading AI safety researcher and co-author of the "Superintelligence Strategy" paper with former Google CEO Eric Schmidt and Scale AI CEO Alexandr Wang.


*** SPONSOR MESSAGES

Gemini CLI is an open-source AI agent that brings the power of Gemini directly into your terminal - https://github.com/google-gemini/gemini-cli


Prolific: Quality data. From real people. For faster breakthroughs.

https://prolific.com/mlst?utm_campaign=98404559-MLST&utm_source=youtube&utm_medium=podcast&utm_content=script-gen

***


Hendrycks argues that society is making a fundamental mistake in how it views artificial intelligence. We often compare AI to transformative but ultimately manageable technologies like electricity or the internet. He contends a far better and more realistic analogy is nuclear technology. Like nuclear power, AI has the potential for immense good, but it is also a dual-use technology that carries the risk of unprecedented catastrophe.


The Problem with an AI "Manhattan Project":


A popular idea is for the U.S. to launch a "Manhattan Project" for AI—a secret, all-out government race to build a superintelligence before rivals like China. Hendrycks argues this strategy is deeply flawed and dangerous for several reasons:


- It wouldn’t be secret. You cannot hide a massive, heat-generating data center from satellite surveillance.


- It would be destabilizing. A public race would alarm rivals, causing them to start their own desperate, corner-cutting projects, dramatically increasing global risk.


- It’s vulnerable to sabotage. An AI project can be crippled in many ways, from cyberattacks that poison its training data to physical attacks on its power plants. This is what the paper refers to as a "maiming attack."


This vulnerability leads to the paper's central concept: Mutual Assured AI Malfunction (MAIM). This is the AI-era version of the nuclear-era's Mutual Assured Destruction (MAD). In this dynamic, any nation that makes an aggressive, destabilizing bid for a world-dominating AI must expect its rivals to sabotage the project to ensure their own survival.


This deterrence, Hendrycks argues, is already the default reality we live in.


A Better Strategy: The Three Pillars

Instead of a reckless race, the paper proposes a more stable, three-part strategy modeled on Cold War principles:


- Deterrence: Acknowledge the reality of MAIM. The goal should not be to "win" the race to superintelligence, but to deter anyone from starting such a race in the first place through the credible threat of sabotage.


- Nonproliferation: Just as we work to keep fissile materials for nuclear bombs out of the hands of terrorists and rogue states, we must control the key inputs for catastrophic AI. The most critical input is advanced AI chips (GPUs). Hendrycks makes the powerful claim that building cutting-edge GPUs is now more difficult than enriching uranium, making this strategy viable.


- Competitiveness: The race between nations like the U.S. and China should not be about who builds superintelligence first. Instead, it should be about who can best use existing AI to build a stronger economy, a more effective military, and more resilient supply chains (for example, by manufacturing more chips domestically).


Dan says the stakes are high if we fail to manage this transition:


- Erosion of Control

- Intelligence Recursion

- Worthless Labor


Hendrycks maintains that while the risks are existential, the future is not set.


TOC:

1 Measuring the Beast [00:00:00]

2 Defining the Beast [00:11:34]

3 The Core Strategy [00:38:20]

4 Ideological Battlegrounds [00:53:12]

5 Mechanisms of Control [01:34:45]


TRANSCRIPT:

https://app.rescript.info/public/share/cOKcz4pWRPjh7BTIgybd7PUr_vChUaY6VQW64No8XMs



Aug 14, 202501:45:39
DeepMind Genie 3 [World Exclusive] (Jack Parker Holder, Shlomi Fruchter)

DeepMind Genie 3 [World Exclusive] (Jack Parker Holder, Shlomi Fruchter)

This episode features Shlomi Fuchter and Jack Parker Holder from Google DeepMind, who are unveiling a new AI called Genie 3. The host, Tim Scarfe, describes it as the most mind-blowing technology he has ever seen. We were invited to their offices to conduct the interview (not sponsored).Imagine you could create a video game world just by describing it. That's what Genie 3 does. It's an AI "world model" that learns how the real world works by watching massive amounts of video. Unlike a normal video game engine (like Unreal or the one for Doom) that needs to be programmed manually, Genie generates a realistic, interactive, 3D world from a simple text prompt.**SPONSOR MESSAGES***Prolific: Quality data. From real people. For faster breakthroughs.https://prolific.com/mlst?utm_campaign=98404559-MLST&utm_source=youtube&utm_medium=podcast&utm_content=script-gen***Here’s a breakdown of what makes it so revolutionary:From Text to a Virtual World: You can type "a drone flying by a beautiful lake" or "a ski slope," and Genie 3 creates that world for you in about three seconds. You can then navigate and interact with it in real-time.It's Consistent: The worlds it creates have a reliable memory. If you look away from an object and then look back, it will still be there, just as it was. The guests explain that this consistency isn't explicitly programmed in; it's a surprising, "emergent" capability of the powerful AI model.A Huge Leap Forward: The previous version, Genie 2, was a major step, but it wasn't fast enough for real-time interaction and was much lower resolution. Genie 3 is 720p, interactive, and photorealistic, running smoothly for several minutes at a time.The Killer App - Training Robots: Beyond entertainment, the team sees Genie 3 as a game-changer for training AI. Instead of training a self-driving car or a robot in the real world (which is slow and dangerous), you can create infinite simulations. You can even prompt rare events to happen, like a deer running across the road, to teach an AI how to handle unexpected situations safely.The Future of Entertainment: this could lead to a "YouTube version 2" or a new form of VR, where users can create and explore endless, interconnected worlds together, like the experience machine from philosophy.While the technology is still a research prototype and not yet available to the public, it represents a monumental step towards creating true artificial worlds from the ground up.Jack Parker Holder [Research Scientist at Google DeepMind in the Open-Endedness Team]https://jparkerholder.github.io/Shlomi Fruchter [Research Director, Google DeepMind]https://shlomifruchter.github.io/TOC:[00:00:00] - Introduction: "The Most Mind-Blowing Technology I've Ever Seen"[00:02:30] - The Evolution from Genie 1 to Genie 2[00:04:30] - Enter Genie 3: Photorealistic, Interactive Worlds from Text[00:07:00] - Promptable World Events & Training Self-Driving Cars[00:14:21] - Guest Introductions: Shlomi Fuchter & Jack Parker Holder[00:15:08] - Core Concepts: What is a "World Model"?[00:19:30] - The Challenge of Consistency in a Generated World[00:21:15] - Context: The Neural Network Doom Simulation[00:25:25] - How Do You Measure the Quality of a World Model?[00:28:09] - The Vision: Using Genie to Train Advanced Robots[00:32:21] - Open-Endedness: Human Skill and Prompting Creativity[00:38:15] - The Future: Is This the Next YouTube or VR?[00:42:18] - The Next Step: Multi-Agent Simulations[00:52:51] - Limitations: Thinking, Computation, and the Sim-to-Real Gap[00:58:07] - Conclusion & The Future of Game EnginesREFS:World Models [David Ha, Jürgen Schmidhuber]https://arxiv.org/abs/1803.10122POEThttps://arxiv.org/abs/1901.01753[Akarsh Kumar, Jeff Clune, Joel Lehman, Kenneth O. Stanley]The Fractured Entangled Representation Hypothesishttps://arxiv.org/pdf/2505.11581TRANSCRIPT:https://app.rescript.info/public/share/Zk5tZXk6mb06yYOFh6nSja7Lg6_qZkgkuXQ-kl5AJqM

Aug 05, 202558:23
Large Language Models and Emergence: A Complex Systems Perspective (Prof. David C. Krakauer)

Large Language Models and Emergence: A Complex Systems Perspective (Prof. David C. Krakauer)

Prof. David Krakauer, President of the Santa Fe Institute argues that we are fundamentally confusing knowledge with intelligence, especially when it comes to AI.


He defines true intelligence as the ability to do more with less—to solve novel problems with limited information. This is contrasted with current AI models, which he describes as doing less with more; they require astounding amounts of data to perform tasks that don't necessarily demonstrate true understanding or adaptation. He humorously calls this "really shit programming".


David challenges the popular notion of "emergence" in Large Language Models (LLMs). He explains that the tech community's definition—seeing a sudden jump in a model's ability to perform a task like three-digit math—is superficial. True emergence, from a complex systems perspective, involves a fundamental change in the system's internal organization, allowing for a new, simpler, and more powerful level of description. He gives the example of moving from tracking individual water molecules to using the elegant laws of fluid dynamics. For LLMs to be truly emergent, we'd need to see them develop new, efficient internal representations, not just get better at memorizing patterns as they scale.


Drawing on his background in evolutionary theory, David explains that systems like brains, and later, culture, evolved to process information that changes too quickly for genetic evolution to keep up. He calls culture "evolution at light speed" because it allows us to store our accumulated knowledge externally (in books, tools, etc.) and build upon it without corrupting the original.


This leads to his concept of "exbodiment," where we outsource our cognitive load to the world through things like maps, abacuses, or even language itself.


We create these external tools, internalize the skills they teach us, improve them, and create a feedback loop that enhances our collective intelligence.


However, he ends with a warning. While technology has historically complemented our deficient abilities, modern AI presents a new danger. Because we have an evolutionary drive to conserve energy, we will inevitably outsource our thinking to AI if we can. He fears this is already leading to a "diminution and dilution" of human thought and creativity. Just as our muscles atrophy without use, he argues our brains will too, and we risk becoming mentally dependent on these systems.


TOC:

[00:00:00] Intelligence: Doing more with less

[00:02:10] Why brains evolved: The limits of evolution

[00:05:18] Culture as evolution at light speed

[00:08:11] True meaning of emergence: "More is Different"

[00:10:41] Why LLM capabilities are not true emergence

[00:15:10] What real emergence would look like in AI

[00:19:24] Symmetry breaking: Physics vs. Life

[00:23:30] Two types of emergence: Knowledge In vs. Out

[00:26:46] Causality, agency, and coarse-graining

[00:32:24] "Exbodiment": Outsourcing thought to objects

[00:35:05] Collective intelligence & the boundary of the mind

[00:39:45] Mortal vs. Immortal forms of computation

[00:42:13] The risk of AI: Atrophy of human thought


David Krakauer

President and William H. Miller Professor of Complex Systems

https://www.santafe.edu/people/profile/david-krakauer


REFS:

Large Language Models and Emergence: A Complex Systems Perspective

David C. Krakauer, John W. Krakauer, Melanie Mitchell

https://arxiv.org/abs/2506.11135


Filmed at the Diverse Intelligences Summer Institute:

https://disi.org/

Jul 31, 202549:49
Pushing compute to the limits of physics

Pushing compute to the limits of physics

Dr. Maxwell Ramstead grills Guillaume Verdon (AKA “Beff Jezos”) who's the founder of Thermodynamic computing startup Extropic.

Guillaume shares his unique path – from dreaming about space travel as a kid to becoming a physicist, then working on quantum computing at Google, to developing a radically new form of computing hardware for machine learning. He explains how he hit roadblocks with traditional physics and computing, leading him to start his company – building "thermodynamic computers." These are based on a new design for super-efficient chips that use the natural chaos of electrons (think noise and heat) to power AI tasks, which promises to speed up AND lower the costs of modern probabilistic techniques like sampling. He is driven by the pursuit of building computers that work more like your brain, which (by the way) runs on a banana and a glass of water! 

Guillaume talks about his alter ego, Beff Jezos, and the "Effective Accelerationism" (e/acc) movement that he initiated. Its objective is to speed up tech progress in order to “grow civilization” (as measured by energy use and innovation), rather than “slowing down out of fear”. Guillaume argues we need to embrace variance, exploration, and optimism to avoid getting stuck or outpaced by competitors like China. He and Maxwell discuss big ideas like merging humans with AI, decentralizing intelligence, and why boundless growth (with smart constraints) is “key to humanity's future”.

REFS:

1. John Archibald Wheeler - "It From Bit" Concept

00:04:45 - Foundational work proposing that physical reality emerges from information at the quantum level

Learn more: https://cqi.inf.usi.ch/qic/wheeler.pdf 

2. AdS/CFT Correspondence (Holographic Principle)

00:05:15 - Theoretical physics duality connecting quantum gravity in Anti-de Sitter space with conformal field theory

https://en.wikipedia.org/wiki/Holographic_principle 

3. Renormalization Group Theory

00:06:15 - Mathematical framework for analyzing physical systems across different length scales

https://www.damtp.cam.ac.uk/user/dbs26/AQFT/Wilsonchap.pdf 

4. Maxwell's Demon and Information Theory

00:21:15 - Thought experiment linking information processing to thermodynamics and entropy

https://plato.stanford.edu/entries/information-entropy/ 

5. Landauer's Principle

00:29:45 - Fundamental limit establishing minimum energy required for information erasure

https://en.wikipedia.org/wiki/Landauer%27s_principle 

6. Free Energy Principle and Active Inference

01:03:00 - Mathematical framework for understanding self-organizing systems and perception-action loops

https://www.nature.com/articles/nrn2787 

7. Max Tegmark - Information Bottleneck Principle

01:07:00 - Connections between information theory and renormalization in machine learning

https://arxiv.org/abs/1907.07331 

8. Fisher's Fundamental Theorem of Natural Selection

01:11:45 - Mathematical relationship between genetic variance and evolutionary fitness

https://en.wikipedia.org/wiki/Fisher%27s_fundamental_theorem_of_natural_selection 

9. Tensor Networks in Quantum Systems

00:06:45 - Computational framework for simulating many-body quantum systems

https://arxiv.org/abs/1912.10049 

10. Quantum Neural Networks

00:09:30 - Hybrid quantum-classical models for machine learning applications

https://en.wikipedia.org/wiki/Quantum_neural_network 

11. Energy-Based Models (EBMs)

00:40:00 - Probabilistic framework for unsupervised learning based on energy functions

https://www.researchgate.net/publication/200744586_A_tutorial_on_energy-based_learning 

12. Markov Chain Monte Carlo (MCMC)

00:20:00 - Sampling algorithm fundamental to modern AI and statistical physics

https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo 

13. Metropolis-Hastings Algorithm

00:23:00 - Core sampling method for probability distributions

https://arxiv.org/abs/1504.01896

***SPONSOR MESSAGE***

Google Gemini 2.5 Flash is a state-of-the-art language model in the Gemini app. Sign up at https://gemini.google.com

Jul 21, 202501:23:33
The Fractured Entangled Representation Hypothesis (Kenneth Stanley, Akarsh Kumar)

The Fractured Entangled Representation Hypothesis (Kenneth Stanley, Akarsh Kumar)

Are the AI models you use today imposters?


Please watch the intro video we did before this: https://www.youtube.com/watch?v=o1q6Hhz0MAg


In this episode, hosts Dr. Tim Scarfe and Dr. Duggar are joined by AI researcher Prof. Kenneth Stanley and MIT PhD student Akash Kumar to discuss their fascinating paper, "Questioning Representational Optimism in Deep Learning."


Imagine you ask two people to draw a perfect skull. One is a brilliant artist who understands anatomy, the other is a machine that just traces the image. Both drawings look identical, but the artist understands what a skull is—they know where the mouth is, how the jaw works, and that it's symmetrical. The machine just has a tangled mess of lines that happens to form the right picture.


An AI with an elegant representation, has the building blocks to generate truly new ideas.


The Path Is the Goal: As Kenneth Stanley puts it, "it matters not just where you get, but how you got there". Two students can ace a math test, but the one who truly understands the concepts—instead of just memorizing formulas—is the one who will go on to make new discoveries.


The show is a mixture of 3 separate recordings we have done, the original Patreon warmup with Tim/Kenneth, the Tim/Keith "Steakhouse" recorded after the main interview, then the main interview with Kenneth/Akarsh/Keith/Tim. Feel free to skip around. We had to edit this in a rush as we are travelling next week but it's reasonably cleaned up.


TOC:

00:00:00 Intro: Garbage vs. Amazing Representations

00:05:42 How Good Representations Form

00:11:14 Challenging the "Bitter Lesson"

00:18:04 AI Creativity & Representation Types

00:22:13 Steakhouse: Critiques & Alternatives

00:28:30 Steakhouse: Key Concepts & Goldilocks Zone

00:39:42 Steakhouse: A Sober View on AI Risk

00:43:46 Steakhouse: The Paradox of Open-Ended Search

00:47:58 Main Interview: Paper Intro & Core Concepts

00:56:44 Main Interview: Deception and Evolvability

01:36:30 Main Interview: Reinterpreting Evolution

01:56:16 Main Interview: Impostor Intelligence

02:11:15 Main Interview: Recommendations for AI Research


REFS:

Questioning Representational Optimism in Deep Learning:

The Fractured Entangled Representation Hypothesis

Akarsh Kumar, Jeff Clune, Joel Lehman, Kenneth O. Stanley

https://arxiv.org/pdf/2505.11581


Kenneth O. Stanley, Joel Lehman

Why Greatness Cannot Be Planned: The Myth of the Objective

https://amzn.to/44xLaXK


Original show with Kenneth from 4 years ago:

https://www.youtube.com/watch?v=lhYGXYeMq_E


Kenneth Stanley is SVP Open Endedness at Lila Sciences

https://x.com/kenneth0stanley


Akarsh Kumar (MIT)

https://akarshkumar.com/


AND... Kenneth is HIRING (this is an OPPORTUNITY OF A LIFETIME!)

Research Engineer: https://job-boards.greenhouse.io/lila/jobs/7890007002

Research Scientist: https://job-boards.greenhouse.io/lila/jobs/8012245002


TRANSCRIPT:

https://app.rescript.info/public/share/W_T7E1OC2Wj49ccqlIOOztg2MJWaaVbovTeyxcFEQdU

Jul 06, 202502:16:23
The Fractured Entangled Representation Hypothesis (Intro)

The Fractured Entangled Representation Hypothesis (Intro)

What if today's incredible AI is just a brilliant "impostor"? This episode features host Dr. Tim Scarfe in conversation with guests Prof. Kenneth Stanley (ex-OpenAI), Dr. Keith Duggar (MIT), and Arkash Kumar (MIT).While AI today produces amazing results on the surface, its internal understanding is a complete mess, described as "total spaghetti" [00:00:49]. This is because it's trained with a brute-force method (SGD) that’s like building a sandcastle: it looks right from a distance, but has no real structure holding it together [00:01:45].To explain the difference, Keith Duggar shares a great analogy about his high school physics classes [00:03:18]. One class was about memorizing lots of formulas for specific situations (like the "impostor" AI). The other used calculus to derive the answers from a deeper understanding, which was much easier and more powerful. This is the core difference: one method memorizes, the other truly understands.The episode then introduces a different, more powerful way to build AI, based on Kenneth Stanley's old experiment, "Picbreeder" [00:04:45]. This method creates AI with a shockingly clean and intuitive internal model of the world. For example, it might develop a model of a skull where it understands the "mouth" as a separate component it can open and close, without ever being explicitly trained on that action [00:06:15]. This deep understanding emerges bottom-up, without massive datasets.The secret is to abandon a fixed goal and embrace "deception" [00:08:42]—the idea that the stepping stones to a great discovery often don't look anything like the final result. Instead of optimizing for a target, the AI is built through an open-ended process of exploring what's "interesting" [00:09:15]. This creates a more flexible and adaptable foundation, a bit like how evolvability wins out in nature [00:10:30].The show concludes by arguing that this choice matters immensely. The "impostor" path may be hitting a wall, requiring insane amounts of money and energy for progress and failing to deliver true creativity or continual learning [00:13:00]. The ultimate message is a call to not put all our eggs in one basket [00:14:25]. We should explore these open-ended, creative paths to discover a more genuine form of intelligence, which may be found where we least expect it.REFS:Questioning Representational Optimism in Deep Learning:The Fractured Entangled Representation HypothesisAkarsh Kumar, Jeff Clune, Joel Lehman, Kenneth O. Stanleyhttps://arxiv.org/pdf/2505.11581Kenneth O. Stanley, Joel LehmanWhy Greatness Cannot Be Planned: The Myth of the Objectivehttps://amzn.to/44xLaXKOriginal show with Kenneth from 4 years ago:https://www.youtube.com/watch?v=lhYGXYeMq_EKenneth Stanley is SVP Open Endedness at Lila Scienceshttps://x.com/kenneth0stanleyAkarsh Kumar (MIT)https://akarshkumar.com/AND... Kenneth is HIRING (this is an OPPORTUNITY OF A LIFETIME!)Research Engineer: https://job-boards.greenhouse.io/lila/jobs/7890007002Research Scientist: https://job-boards.greenhouse.io/lila/jobs/8012245002Tim's Code visualisation of FER based on Akarsh repo: https://github.com/ecsplendid/ferTRANSCRIPT: https://app.rescript.info/public/share/YKAZzZ6lwZkjTLRpVJreOOxGhLI8y4m3fAyU8NSavx0

Jul 05, 202515:45
Three Red Lines We're About to Cross Toward AGI (Daniel Kokotajlo, Gary Marcus, Dan Hendrycks)

Three Red Lines We're About to Cross Toward AGI (Daniel Kokotajlo, Gary Marcus, Dan Hendrycks)

What if the most powerful technology in human history is being built by people who openly admit they don't trust each other? In this explosive 2-hour debate, three AI experts pull back the curtain on the shocking psychology driving the race to Artificial General Intelligence—and why the people building it might be the biggest threat of all. Kokotajlo predicts AGI by 2028 based on compute scaling trends. Marcus argues we haven't solved basic cognitive problems from his 2001 research. The stakes? If Kokotajlo is right and Marcus is wrong about safety progress, humanity may have already lost control.


Sponsor messages:

========

Google Gemini: Google Gemini features Veo3, a state-of-the-art AI video generation model in the Gemini app. Sign up at https://gemini.google.com


Tufa AI Labs are hiring for ML Engineers and a Chief Scientist in Zurich/SF. They are top of the ARCv2 leaderboard!

https://tufalabs.ai/

========


Guest Powerhouse

Gary Marcus - Cognitive scientist, author of "Taming Silicon Valley," and AI's most prominent skeptic who's been warning about the same fundamental problems for 25 years (https://garymarcus.substack.com/)

Daniel Kokotajlo - Former OpenAI insider turned whistleblower who reveals the disturbing rationalizations of AI lab leaders in his viral "AI 2027" scenario (https://ai-2027.com/)

Dan Hendrycks - Director of the Center for AI Safety who created the benchmarks used to measure AI progress and argues we have only years, not decades, to prevent catastrophe (https://danhendrycks.com/)


Transcript:

http://app.rescript.info/public/share/tEcx4UkToi-2jwS1cN51CW70A4Eh6QulBRxDILoXOno


TOC:

Introduction: The AI Arms Race

00:00:04 - The Danger of Automated AI R&D

00:00:43 - The Rationalization: "If we don't, someone else will"

00:01:56 - Sponsor Reads (Tufa AI Labs & Google Gemini)

00:02:55 - Guest Introductions


The Philosophical Stakes

00:04:13 - What is the Positive Vision for AGI?

00:07:00 - The Abundance Scenario: Superintelligent Economy

00:09:06 - Differentiating AGI and Superintelligence (ASI)

00:11:41 - Sam Altman: "A Decade in a Month"

00:14:47 - Economic Inequality & The UBI Problem


Policy and Red Lines

00:17:13 - The Pause Letter: Stopping vs. Delaying AI

00:20:03 - Defining Three Concrete Red Lines for AI Development

00:25:24 - Racing Towards Red Lines & The Myth of "Durable Advantage"

00:31:15 - Transparency and Public Perception

00:35:16 - The Rationalization Cascade: Why AI Labs Race to "Win"


Forecasting AGI: Timelines and Methodologies

00:42:29 - The Case for Short Timelines (Median 2028)

00:47:00 - Scaling Limits: Compute, Data, and Money

00:49:36 - Forecasting Models: Bio-Anchors and Agentic Coding

00:53:15 - The 10^45 FLOP Thought Experiment


The Great Debate: Cognitive Gaps vs. Scaling

00:58:41 - Gary Marcus's Counterpoint: The Unsolved Problems of Cognition

01:00:46 - Current AI Can't Play Chess Reliably

01:08:23 - Can Tools and Neurosymbolic AI Fill the Gaps?

01:16:13 - The Multi-Dimensional Nature of Intelligence

01:24:26 - The Benchmark Debate: Data Contamination and Reliability

01:31:15 - The Superhuman Coder Milestone Debate

01:37:45 - The Driverless Car Analogy


The Alignment Problem

01:39:45 - Has Any Progress Been Made on Alignment?

01:42:43 - "Fairly Reasonably Scares the Sh*t Out of Me"

01:46:30 - Distinguishing Model vs. Process Alignment


Scenarios and Conclusions

01:49:26 - Gary's Alternative Scenario: The Neurosymbolic Shift

01:53:35 - Will AI Become Jeff Dean?

01:58:41 - Takeoff Speeds and Exceeding Human Intelligence

02:03:19 - Final Disagreements and Closing Remarks


REFS:

Gary Marcus (2001) - The Algebraic Mind

https://mitpress.mit.edu/9780262632683/the-algebraic-mind/

00:59:00


Gary Marcus & Ernest Davis (2019) - Rebooting AI

https://www.penguinrandomhouse.com/books/566677/rebooting-ai-by-gary-marcus-and-ernest-davis/

01:31:59


Gary Marcus (2024) - Taming SV

https://www.hachettebookgroup.com/titles/gary-marcus/taming-silicon-valley/9781541704091/

00:03:01


Jun 24, 202502:07:07
How AI Learned to Talk and What It Means - Prof. Christopher Summerfield

How AI Learned to Talk and What It Means - Prof. Christopher Summerfield

We interview Professor Christopher Summerfield from Oxford University about his new book "These Strange New Minds: How AI Learned to Talk and What It". AI learned to understand the world just by reading text - something scientists thought was impossible. You don't need to see a cat to know what one is; you can learn everything from words alone. This is "the most astonishing scientific discovery of the 21st century."People are split: some refuse to call what AI does "thinking" even when it outperforms humans, while others believe if it acts intelligent, it is intelligent. Summerfield takes the middle ground - AI does something genuinely like human reasoning, but that doesn't make it human.Sponsor messages:========Google Gemini: Google Gemini features Veo3, a state-of-the-art AI video generation model in the Gemini app. Sign up at https://gemini.google.comTufa AI Labs are hiring for ML Engineers and a Chief Scientist in Zurich/SF. They are top of the ARCv2 leaderboard! https://tufalabs.ai/========Prof. Christopher Summerfieldhttps://www.psy.ox.ac.uk/people/christopher-summerfieldThese Strange New Minds: How AI Learned to Talk and What It Meanshttps://amzn.to/4e26BVaTable of Contents:Introduction & Setup00:00:00 Superman 3 Metaphor - Humans Absorbed by Machines00:02:01 Book Introduction & AI Debate Context00:03:45 Sponsor Segments (Google Gemini, Tufa Labs)Philosophical Foundations00:04:48 The Fractured AI Discourse00:08:21 Ancient Roots: Aristotle vs Plato (Empiricism vs Rationalism)00:10:14 Historical AI: Symbolic Logic and Its LimitsThe Language Revolution00:12:11 ChatGPT as the Rubicon Moment00:14:00 The Astonishing Discovery: Learning Reality from Words Alone00:15:47 Equivalentists vs Exceptionalists DebateCognitive Science Perspectives00:19:12 Functionalism and the Duck Test00:21:48 Brain-AI Similarities and Computational Principles00:24:53 Reconciling Chomsky: Evolution vs Learning00:28:15 Lamarckian AI vs Darwinian Human LearningThe Reality of AI Capabilities00:30:29 Anthropomorphism and the Clever Hans Effect00:32:56 The Intentional Stance and Nature of Thinking00:37:56 Three Major AI Worries: Agency, Personalization, DynamicsSocietal Risks and Complex Systems00:37:56 AI Agents and Flash Crash Scenarios00:42:50 Removing Frictions: The Lawfare Example00:46:15 Gradual Disempowerment Theory00:49:18 The Faustian Pact of TechnologyHuman Agency and Control00:51:18 The Crisis of Authenticity00:56:22 Psychology of Control vs Reward01:00:21 Dopamine Hacking and Variable ReinforcementFuture Directions01:02:27 Evolution as Goal-less Optimization01:03:31 Open-Endedness and Creative Evolution01:06:46 Writing, Creativity, and AI-Generated Content01:08:18 Closing RemarksREFS:Academic References (Abbreviated)Essential Books"These Strange New Minds" - C. Summerfield [00:02:01] - Main discussion topic"The Mind is Flat" - N. Chater [00:33:45] - Summerfield's favorite on cognitive illusions"AI: A Guide for Thinking Humans" - M. Mitchell [00:04:58] - Host's previous favorite"Principia Mathematica" - Russell & Whitehead [00:11:00] - Logic Theorist reference"Syntactic Structures" - N. Chomsky (1957) [00:13:30] - Generative grammar foundation"Why Greatness Cannot Be Planned" - Stanley & Lehman [01:04:00] - Open-ended evolutionKey Papers & Studies"Gradual Disempowerment" - D. Duvenaud [00:46:45] - AI threat model"Counterfeit People" - D. Dennett (Atlantic) [00:52:45] - AI societal risks"Open-Endedness is Essential..." - DeepMind/Rocktäschel/Hughes [01:03:42]Heider & Simmel (1944) [00:30:45] - Agency attribution to shapesWhitehall Studies - M. Marmot [00:59:32] - Control and health outcomes"Clever Hans" - O. Pfungst (1911) [00:31:47] - Animal intelligence illusionHistorical References

Jun 17, 202501:08:29
"Blurring Reality" - Chai's Social AI Platform (SPONSORED)

"Blurring Reality" - Chai's Social AI Platform (SPONSORED)

"Blurring Reality" - Chai's Social AI Platform - sponsored


This episode of MLST explores the groundbreaking work of Chai, a social AI platform that quietly built one of the world's largest AI companion ecosystems before ChatGPT's mainstream adoption. With over 10 million active users and just 13 engineers serving 2 trillion tokens per day, Chai discovered the massive appetite for AI companionship through serendipity while searching for product-market fit.


CHAI sponsored this show *because they want to hire amazing engineers* --


CAREER OPPORTUNITIES AT CHAI

Chai is actively hiring in Palo Alto with competitive compensation ($300K-$800K+ equity) for roles including AI Infrastructure Engineers, Software Engineers, Applied AI Researchers, and more. Fast-track qualification available for candidates with significant product launches, open source contributions, or entrepreneurial success.

https://www.chai-research.com/jobs/


The conversation with founder William Beauchamp and engineers Tom Lu and Nischay Dhankhar covers Chai's innovative technical approaches including reinforcement learning from human feedback (RLHF), model blending techniques that combine smaller models to outperform larger ones, and their unique infrastructure challenges running exaflop-class compute.


SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers in Zurich and SF.


Goto https://tufalabs.ai/

***


Key themes explored include:

- The ethics of AI engagement optimization and attention hacking

- Content moderation at scale with a lean engineering team

- The shift from AI as utility tool to AI as social companion

- How users form deep emotional bonds with artificial intelligence

- The broader implications of AI becoming a social medium


We also examine OpenAI's recent pivot toward companion AI with April's new GPT-4o, suggesting a fundamental shift in how we interact with artificial intelligence - from utility-focused tools to companion-like experiences that blur the lines between human and artificial intimacy.


The episode also covers Chai's unconventional approach to hiring only top-tier engineers, their bootstrap funding strategy focused on user revenue over VC funding, and their rapid experimentation culture where one in five experiments succeed.


TOC:

00:00:00 - Introduction: Steve Jobs' AI Vision & Chai's Scale

00:04:02 - Chapter 1: Simulators - The Birth of Social AI

00:13:34 - Chapter 2: Engineering at Chai - RLHF & Model Blending

00:21:49 - Chapter 3: Social Impact of GenAI - Ethics & Safety

00:33:55 - Chapter 4: The Lean Machine - 13 Engineers, Millions of Users

00:42:38 - Chapter 5: GPT-4o Becoming a Companion - OpenAI's Pivot

00:50:10 - Chapter 6: What Comes Next - The Future of AI Intimacy


TRANSCRIPT: https://www.dropbox.com/scl/fi/yz2ewkzmwz9rbbturfbap/CHAI.pdf?rlkey=uuyk2nfhjzezucwdgntg5ubqb&dl=0

May 26, 202550:59
Google AlphaEvolve - Discovering new science (exclusive interview)

Google AlphaEvolve - Discovering new science (exclusive interview)

Today GoogleDeepMind released AlphaEvolve: a Gemini coding agent for algorithm discovery. It beat the famous Strassen algorithm for matrix multiplication set 56 years ago. Google has been killing it recently. We had early access to the paper and interviewed the researchers behind the work.


AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms

https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/

Authors: Alexander Novikov*, Ngân Vũ*, Marvin Eisenberger*, Emilien Dupont*, Po-Sen Huang*, Adam Zsolt Wagner*, Sergey Shirobokov*, Borislav Kozlovskii*, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, Matej Balog*

(* indicates equal contribution or special designation, if defined elsewhere)


SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.


Goto https://tufalabs.ai/

***


AlphaEvolve works like a very smart, tireless programmer. It uses powerful AI language models (like Gemini) to generate ideas for computer code. Then, it uses an "evolutionary" process – like survival of the fittest for programs. It tries out many different program ideas, automatically tests how well they solve a problem, and then uses the best ones to inspire new, even better programs.


Beyond this mathematical breakthrough, AlphaEvolve has already been used to improve real-world systems at Google, such as making their massive data centers run more efficiently and even speeding up the training of the AI models that power AlphaEvolve itself. The discussion also covers how humans work with AlphaEvolve, the challenges of making AI discover things, and the exciting future of AI helping scientists make new discoveries.


In short, AlphaEvolve is a powerful new AI tool that can invent new algorithms and solve complex problems, showing how AI can be a creative partner in science and engineering.


Guests:

Matej Balog: https://x.com/matejbalog

Alexander Novikov: https://x.com/SashaVNovikov


REFS:

MAP Elites [Jean-Baptiste Mouret, Jeff Clune]

https://arxiv.org/abs/1504.04909


FunSearch [Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Jordan S. Ellenberg, Pengming Wang, Omar Fawzi, Pushmeet Kohli & Alhussein Fawzi]

https://www.nature.com/articles/s41586-023-06924-6


TOC:


[00:00:00] Introduction: Alpha Evolve's Breakthroughs, DeepMind's Lineage, and Real-World Impact

[00:12:06] Introducing AlphaEvolve: Concept, Evolutionary Algorithms, and Architecture

[00:16:56] Search Challenges: The Halting Problem and Enabling Creative Leaps

[00:23:20] Knowledge Augmentation: Self-Generated Data, Meta-Prompting, and Library Learning

[00:29:08] Matrix Multiplication Breakthrough: From Strassen to AlphaEvolve's 48 Multiplications

[00:39:11] Problem Representation: Direct Solutions, Constructors, and Search Algorithms

[00:46:06] Developer Reflections: Surprising Outcomes and Superiority over Simple LLM Sampling

[00:51:42] Algorithmic Improvement: Hill Climbing, Program Synthesis, and Intelligibility

[01:00:24] Real-World Application: Complex Evaluations and Robotics

[01:05:39] Role of LLMs & Future: Advanced Models, Recursive Self-Improvement, and Human-AI Collaboration

[01:11:22] Resource Considerations: Compute Costs of AlphaEvolve


This is a trial of posting videos on Spotify, thoughts? Email me or chat in our Discord

May 14, 202501:13:58
Prof. Randall Balestriero - LLMs without pretraining and SSL

Prof. Randall Balestriero - LLMs without pretraining and SSL

Randall Balestriero joins the show to discuss some counterintuitive findings in AI. He shares research showing that huge language models, even when started from scratch (randomly initialized) without massive pre-training, can learn specific tasks like sentiment analysis surprisingly well, train stably, and avoid severe overfitting, sometimes matching the performance of costly pre-trained models. This raises questions about when giant pre-training efforts are truly worth it.


He also talks about how self-supervised learning (where models learn from data structure itself) and traditional supervised learning (using labeled data) are fundamentally similar, allowing researchers to apply decades of supervised learning theory to improve newer self-supervised methods.


Finally, Randall touches on fairness in AI models used for Earth data (like climate prediction), revealing that these models can be biased, performing poorly in specific locations like islands or coastlines even if they seem accurate overall, which has important implications for policy decisions based on this data.


SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.


Goto https://tufalabs.ai/

***


TRANSCRIPT + SHOWNOTES:

https://www.dropbox.com/scl/fi/n7yev71nsjso71jyjz1fy/RANDALLNEURIPS.pdf?rlkey=0dn4injp1sc4ts8njwf3wfmxv&dl=0


TOC:

1. Model Training Efficiency and Scale

[00:00:00] 1.1 Training Stability of Large Models on Small Datasets

[00:04:09] 1.2 Pre-training vs Random Initialization Performance Comparison

[00:07:58] 1.3 Task-Specific Models vs General LLMs Efficiency


2. Learning Paradigms and Data Distribution

[00:10:35] 2.1 Fair Language Model Paradox and Token Frequency Issues

[00:12:02] 2.2 Pre-training vs Single-task Learning Spectrum

[00:16:04] 2.3 Theoretical Equivalence of Supervised and Self-supervised Learning

[00:19:40] 2.4 Self-Supervised Learning and Supervised Learning Relationships

[00:21:25] 2.5 SSL Objectives and Heavy-tailed Data Distribution Challenges


3. Geographic Representation in ML Systems

[00:25:20] 3.1 Geographic Bias in Earth Data Models and Neural Representations

[00:28:10] 3.2 Mathematical Limitations and Model Improvements

[00:30:24] 3.3 Data Quality and Geographic Bias in ML Datasets


REFS:

[00:01:40] Research on training large language models from scratch on small datasets, Randall Balestriero et al.

https://openreview.net/forum?id=wYGBWOjq1Q

[00:10:35] The Fair Language Model Paradox (2024), Andrea Pinto, Tomer Galanti, Randall Balestriero

https://arxiv.org/abs/2410.11985

[00:12:20] Muppet: Massive Multi-task Representations with Pre-Finetuning (2021), Armen Aghajanyan et al.

https://arxiv.org/abs/2101.11038

[00:14:30] Dissociating language and thought in large language models (2023), Kyle Mahowald et al.

https://arxiv.org/abs/2301.06627

[00:16:05] The Birth of Self-Supervised Learning: A Supervised Theory, Randall Balestriero et al.

https://openreview.net/forum?id=NhYAjAAdQT

[00:21:25] VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning, Adrien Bardes, Jean Ponce, Yann LeCun

https://arxiv.org/abs/2105.04906

[00:25:20] No Location Left Behind: Measuring and Improving the Fairness of Implicit Representations for Earth Data (2025), Daniel Cai, Randall Balestriero, et al.

https://arxiv.org/abs/2502.06831

[00:33:45] Mark Ibrahim et al.'s work on geographic bias in computer vision datasets, Mark Ibrahim

https://arxiv.org/pdf/2304.12210

Apr 23, 202534:31
How Machines Learn to Ignore the Noise (Kevin Ellis + Zenna Tavares)

How Machines Learn to Ignore the Noise (Kevin Ellis + Zenna Tavares)

Prof. Kevin Ellis and Dr. Zenna Tavares talk about making AI smarter, like humans. They want AI to learn from just a little bit of information by actively trying things out, not just by looking at tons of data.


They discuss two main ways AI can "think": one way is like following specific rules or steps (like a computer program), and the other is more intuitive, like guessing based on patterns (like modern AI often does). They found combining both methods works well for solving complex puzzles like ARC.


A key idea is "compositionality" - building big ideas from small ones, like LEGOs. This is powerful but can also be overwhelming. Another important idea is "abstraction" - understanding things simply, without getting lost in details, and knowing there are different levels of understanding.


Ultimately, they believe the best AI will need to explore, experiment, and build models of the world, much like humans do when learning something new.


SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.


Goto https://tufalabs.ai/

***


TRANSCRIPT:

https://www.dropbox.com/scl/fi/3ngggvhb3tnemw879er5y/BASIS.pdf?rlkey=lr2zbj3317mex1q5l0c2rsk0h&dl=0

Zenna Tavares:

http://www.zenna.org/

Kevin Ellis:

https://www.cs.cornell.edu/~ellisk/


TOC:

1. Compositionality and Learning Foundations

[00:00:00] 1.1 Compositional Search and Learning Challenges

[00:03:55] 1.2 Bayesian Learning and World Models

[00:12:05] 1.3 Programming Languages and Compositionality Trade-offs

[00:15:35] 1.4 Inductive vs Transductive Approaches in AI Systems


2. Neural-Symbolic Program Synthesis

[00:27:20] 2.1 Integration of LLMs with Traditional Programming and Meta-Programming

[00:30:43] 2.2 Wake-Sleep Learning and DreamCoder Architecture

[00:38:26] 2.3 Program Synthesis from Interactions and Hidden State Inference

[00:41:36] 2.4 Abstraction Mechanisms and Resource Rationality

[00:48:38] 2.5 Inductive Biases and Causal Abstraction in AI Systems


3. Abstract Reasoning Systems

[00:52:10] 3.1 Abstract Concepts and Grid-Based Transformations in ARC

[00:56:08] 3.2 Induction vs Transduction Approaches in Abstract Reasoning

[00:59:12] 3.3 ARC Limitations and Interactive Learning Extensions

[01:06:30] 3.4 Wake-Sleep Program Learning and Hybrid Approaches

[01:11:37] 3.5 Project MARA and Future Research Directions


REFS:

[00:00:25] DreamCoder, Kevin Ellis et al.

https://arxiv.org/abs/2006.08381


[00:01:10] Mind Your Step, Ryan Liu et al.

https://arxiv.org/abs/2410.21333


[00:06:05] Bayesian inference, Griffiths, T. L., Kemp, C., & Tenenbaum, J. B.

https://psycnet.apa.org/record/2008-06911-003


[00:13:00] Induction and Transduction, Wen-Ding Li, Zenna Tavares, Yewen Pu, Kevin Ellis

https://arxiv.org/abs/2411.02272


[00:23:15] Neurosymbolic AI, Garcez, Artur d'Avila et al.

https://arxiv.org/abs/2012.05876


[00:33:50] Induction and Transduction (II), Wen-Ding Li, Kevin Ellis et al.

https://arxiv.org/abs/2411.02272


[00:38:35] ARC, François Chollet

https://arxiv.org/abs/1911.01547


[00:39:20] Causal Reactive Programs, Ria Das, Joshua B. Tenenbaum, Armando Solar-Lezama, Zenna Tavares

http://www.zenna.org/publications/autumn2022.pdf


[00:42:50] MuZero, Julian Schrittwieser et al.

http://arxiv.org/pdf/1911.08265


[00:43:20] VisualPredicator, Yichao Liang

https://arxiv.org/abs/2410.23156


[00:48:55] Bayesian models of cognition, Joshua B. Tenenbaum

https://mitpress.mit.edu/9780262049412/bayesian-models-of-cognition/


[00:49:30] The Bitter Lesson, Rich Sutton

http://www.incompleteideas.net/IncIdeas/BitterLesson.html


[01:06:35] Program induction, Kevin Ellis, Wen-Ding Li

https://arxiv.org/pdf/2411.02272


[01:06:50] DreamCoder (II), Kevin Ellis et al.

https://arxiv.org/abs/2006.08381


[01:11:55] Project MARA, Zenna Tavares, Kevin Ellis

https://www.basis.ai/blog/mara/

Apr 08, 202501:16:55
Eiso Kant (CTO poolside) - Superhuman Coding Is Coming!

Eiso Kant (CTO poolside) - Superhuman Coding Is Coming!

Eiso Kant, CTO of poolside AI, discusses the company's approach to building frontier AI foundation models, particularly focused on software development. Their unique strategy is reinforcement learning from code execution feedback which is an important axis for scaling AI capabilities beyond just increasing model size or data volume. Kant predicts human-level AI in knowledge work could be achieved within 18-36 months, outlining poolside's vision to dramatically increase software development productivity and accessibility.


SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.


Goto https://tufalabs.ai/

***


Eiso Kant:

https://x.com/eisokant

https://poolside.ai/


TRANSCRIPT:

https://www.dropbox.com/scl/fi/szepl6taqziyqie9wgmk9/poolside.pdf?rlkey=iqar7dcwshyrpeoz0xa76k422&dl=0


TOC:

1. Foundation Models and AI Strategy

[00:00:00] 1.1 Foundation Models and Timeline Predictions for AI Development

[00:02:55] 1.2 Poolside AI's Corporate History and Strategic Vision

[00:06:48] 1.3 Foundation Models vs Enterprise Customization Trade-offs


2. Reinforcement Learning and Model Economics

[00:15:42] 2.1 Reinforcement Learning and Code Execution Feedback Approaches

[00:22:06] 2.2 Model Economics and Experimental Optimization


3. Enterprise AI Implementation

[00:25:20] 3.1 Poolside's Enterprise Deployment Strategy and Infrastructure

[00:26:00] 3.2 Enterprise-First Business Model and Market Focus

[00:27:05] 3.3 Foundation Models and AGI Development Approach

[00:29:24] 3.4 DeepSeek Case Study and Infrastructure Requirements


4. LLM Architecture and Performance

[00:30:15] 4.1 Distributed Training and Hardware Architecture Optimization

[00:33:01] 4.2 Model Scaling Strategies and Chinchilla Optimality Trade-offs

[00:36:04] 4.3 Emergent Reasoning and Model Architecture Comparisons

[00:43:26] 4.4 Balancing Creativity and Determinism in AI Models

[00:50:01] 4.5 AI-Assisted Software Development Evolution


5. AI Systems Engineering and Scalability

[00:58:31] 5.1 Enterprise AI Productivity and Implementation Challenges

[00:58:40] 5.2 Low-Code Solutions and Enterprise Hiring Trends

[01:01:25] 5.3 Distributed Systems and Engineering Complexity

[01:01:50] 5.4 GenAI Architecture and Scalability Patterns

[01:01:55] 5.5 Scaling Limitations and Architectural Patterns in AI Code Generation


6. AI Safety and Future Capabilities

[01:06:23] 6.1 Semantic Understanding and Language Model Reasoning Approaches

[01:12:42] 6.2 Model Interpretability and Safety Considerations in AI Systems

[01:16:27] 6.3 AI vs Human Capabilities in Software Development

[01:33:45] 6.4 Enterprise Deployment and Security Architecture


CORE REFS (see shownotes for URLs/more refs):


[00:15:45] Research demonstrating how training on model-generated content leads to distribution collapse in AI models, Ilia Shumailov et al. (Key finding on synthetic data risk)


[00:20:05] Foundational paper introducing Word2Vec for computing word vector representations, Tomas Mikolov et al. (Seminal NLP technique)


[00:22:15] OpenAI O3 model's breakthrough performance on ARC Prize Challenge, OpenAI (Significant AI reasoning benchmark achievement)


[00:22:40] Seminal paper proposing a formal definition of intelligence as skill-acquisition efficiency, François Chollet (Influential AI definition/philosophy)


[00:30:30] Technical documentation of DeepSeek's V3 model architecture and capabilities, DeepSeek AI (Details on a major new model)


[00:34:30] Foundational paper establishing optimal scaling laws for LLM training, Jordan Hoffmann et al. (Key paper on LLM scaling)


[00:45:45] Seminal essay arguing that scaling computation consistently trumps human-engineered solutions in AI, Richard S. Sutton (Influential "Bitter Lesson" perspective)

Apr 02, 202501:36:28
The Compendium - Connor Leahy and Gabriel Alfour

The Compendium - Connor Leahy and Gabriel Alfour

Connor Leahy and Gabriel Alfour, AI researchers from Conjecture and authors of "The Compendium," joinus for a critical discussion centered on Artificial Superintelligence (ASI) safety and governance. Drawing from their comprehensive analysis in "The Compendium," they articulate a stark warning about the existential risks inherent in uncontrolled AI development, framing it through the lens of "intelligence domination"—where a sufficiently advanced AI could subordinate humanity, much like humans dominate less intelligent species.


SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.


Goto https://tufalabs.ai/

***


TRANSCRIPT + REFS + NOTES:

https://www.dropbox.com/scl/fi/p86l75y4o2ii40df5t7no/Compendium.pdf?rlkey=tukczgf3flw133sr9rgss0pnj&dl=0


https://www.thecompendium.ai/

https://en.wikipedia.org/wiki/Connor_Leahy

https://www.conjecture.dev/about

https://substack.com/@gabecc​


TOC:

1. AI Intelligence and Safety Fundamentals

[00:00:00] 1.1 Understanding Intelligence and AI Capabilities

[00:06:20] 1.2 Emergence of Intelligence and Regulatory Challenges

[00:10:18] 1.3 Human vs Animal Intelligence Debate

[00:18:00] 1.4 AI Regulation and Risk Assessment Approaches

[00:26:14] 1.5 Competing AI Development Ideologies


2. Economic and Social Impact

[00:29:10] 2.1 Labor Market Disruption and Post-Scarcity Scenarios

[00:32:40] 2.2 Institutional Frameworks and Tech Power Dynamics

[00:37:40] 2.3 Ethical Frameworks and AI Governance Debates

[00:40:52] 2.4 AI Alignment Evolution and Technical Challenges


3. Technical Governance Framework

[00:55:07] 3.1 Three Levels of AI Safety: Alignment, Corrigibility, and Boundedness

[00:55:30] 3.2 Challenges of AI System Corrigibility and Constitutional Models

[00:57:35] 3.3 Limitations of Current Boundedness Approaches

[00:59:11] 3.4 Abstract Governance Concepts and Policy Solutions


4. Democratic Implementation and Coordination

[00:59:20] 4.1 Governance Design and Measurement Challenges

[01:00:10] 4.2 Democratic Institutions and Experimental Governance

[01:14:10] 4.3 Political Engagement and AI Safety Advocacy

[01:25:30] 4.4 Practical AI Safety Measures and International Coordination


CORE REFS:

[00:01:45] The Compendium (2023), Leahy et al.

https://pdf.thecompendium.ai/the_compendium.pdf


[00:06:50] Geoffrey Hinton Leaves Google, BBC News

https://www.bbc.com/news/world-us-canada-65452940


[00:10:00] ARC-AGI, Chollet

https://arcprize.org/arc-agi


[00:13:25] A Brief History of Intelligence, Bennett

https://www.amazon.com/Brief-History-Intelligence-Humans-Breakthroughs/dp/0063286343


[00:25:35] Statement on AI Risk, Center for AI Safety

https://www.safe.ai/work/statement-on-ai-risk


[00:26:15] Machines of Love and Grace, Amodei

https://darioamodei.com/machines-of-loving-grace


[00:26:35] The Techno-Optimist Manifesto, Andreessen

https://a16z.com/the-techno-optimist-manifesto/


[00:31:55] Techno-Feudalism, Varoufakis

https://www.amazon.co.uk/Technofeudalism-Killed-Capitalism-Yanis-Varoufakis/dp/1847927270


[00:42:40] Introducing Superalignment, OpenAI

https://openai.com/index/introducing-superalignment/


[00:47:20] Three Laws of Robotics, Asimov

https://www.britannica.com/topic/Three-Laws-of-Robotics


[00:50:00] Symbolic AI (GOFAI), Haugeland

https://en.wikipedia.org/wiki/Symbolic_artificial_intelligence


[00:52:30] Intent Alignment, Christiano

https://www.alignmentforum.org/posts/HEZgGBZTpT4Bov7nH/mapping-the-conceptual-territory-in-ai-existential-safety


[00:55:10] Large Language Model Alignment: A Survey, Jiang et al.

http://arxiv.org/pdf/2309.15025


[00:55:40] Constitutional Checks and Balances, Bok

https://plato.stanford.edu/entries/montesquieu/

Mar 30, 202501:37:10
ARC Prize v2 Launch! (Francois Chollet and Mike Knoop)

ARC Prize v2 Launch! (Francois Chollet and Mike Knoop)

We are joined by Francois Chollet and Mike Knoop, to launch the new version of the ARC prize! In version 2, the challenges have been calibrated with humans such that at least 2 humans could solve each task in a reasonable task, but also adversarially selected so that frontier reasoning models can't solve them. The best LLMs today get negligible performance on this challenge.


https://arcprize.org/


SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.


Goto https://tufalabs.ai/

***


TRANSCRIPT:

https://www.dropbox.com/scl/fi/0v9o8xcpppdwnkntj59oi/ARCv2.pdf?rlkey=luqb6f141976vra6zdtptv5uj&dl=0


TOC:

1. ARC v2 Core Design & Objectives

[00:00:00] 1.1 ARC v2 Launch and Benchmark Architecture

[00:03:16] 1.2 Test-Time Optimization and AGI Assessment

[00:06:24] 1.3 Human-AI Capability Analysis

[00:13:02] 1.4 OpenAI o3 Initial Performance Results


2. ARC Technical Evolution

[00:17:20] 2.1 ARC-v1 to ARC-v2 Design Improvements

[00:21:12] 2.2 Human Validation Methodology

[00:26:05] 2.3 Task Design and Gaming Prevention

[00:29:11] 2.4 Intelligence Measurement Framework


3. O3 Performance & Future Challenges

[00:38:50] 3.1 O3 Comprehensive Performance Analysis

[00:43:40] 3.2 System Limitations and Failure Modes

[00:49:30] 3.3 Program Synthesis Applications

[00:53:00] 3.4 Future Development Roadmap


REFS:

[00:00:15] On the Measure of Intelligence, François Chollet

https://arxiv.org/abs/1911.01547

[00:06:45] ARC Prize Foundation, François Chollet, Mike Knoop

https://arcprize.org/

[00:12:50] OpenAI o3 model performance on ARC v1, ARC Prize Team

https://arcprize.org/blog/oai-o3-pub-breakthrough

[00:18:30] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jason Wei et al.

https://arxiv.org/abs/2201.11903

[00:21:45] ARC-v2 benchmark tasks, Mike Knoop

https://arcprize.org/blog/introducing-arc-agi-public-leaderboard

[00:26:05] ARC Prize 2024: Technical Report, Francois Chollet et al.

https://arxiv.org/html/2412.04604v2

[00:32:45] ARC Prize 2024 Technical Report, Francois Chollet, Mike Knoop, Gregory Kamradt

https://arxiv.org/abs/2412.04604

[00:48:55] The Bitter Lesson, Rich Sutton

http://www.incompleteideas.net/IncIdeas/BitterLesson.html

[00:53:30] Decoding strategies in neural text generation, Sina Zarrieß

https://www.mdpi.com/2078-2489/12/9/355/pdf

Mar 24, 202554:15
Test-Time Adaptation: the key to reasoning with DL (Mohamed Osman)

Test-Time Adaptation: the key to reasoning with DL (Mohamed Osman)

Mohamed Osman joins to discuss MindsAI's highest scoring entry to the ARC challenge 2024 and the paradigm of test-time fine-tuning. They explore how the team, now part of Tufa Labs in Zurich, achieved state-of-the-art results using a combination of pre-training techniques, a unique meta-learning strategy, and an ensemble voting mechanism. Mohamed emphasizes the importance of raw data input and flexibility of the network.


SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.


Goto https://tufalabs.ai/

***


TRANSCRIPT + REFS:

https://www.dropbox.com/scl/fi/jeavyqidsjzjgjgd7ns7h/MoFInal.pdf?rlkey=cjjmo7rgtenxrr3b46nk6yq2e&dl=0


Mohamed Osman (Tufa Labs)

https://x.com/MohamedOsmanML


Jack Cole (Tufa Labs)

https://x.com/MindsAI_Jack


How and why deep learning for ARC paper:

https://github.com/MohamedOsman1998/deep-learning-for-arc/blob/main/deep_learning_for_arc.pdf


TOC:

1. Abstract Reasoning Foundations

[00:00:00] 1.1 Test-Time Fine-Tuning and ARC Challenge Overview

[00:10:20] 1.2 Neural Networks vs Programmatic Approaches to Reasoning

[00:13:23] 1.3 Code-Based Learning and Meta-Model Architecture

[00:20:26] 1.4 Technical Implementation with Long T5 Model


2. ARC Solution Architectures

[00:24:10] 2.1 Test-Time Tuning and Voting Methods for ARC Solutions

[00:27:54] 2.2 Model Generalization and Function Generation Challenges

[00:32:53] 2.3 Input Representation and VLM Limitations

[00:36:21] 2.4 Architecture Innovation and Cross-Modal Integration

[00:40:05] 2.5 Future of ARC Challenge and Program Synthesis Approaches


3. Advanced Systems Integration

[00:43:00] 3.1 DreamCoder Evolution and LLM Integration

[00:50:07] 3.2 MindsAI Team Progress and Acquisition by Tufa Labs

[00:54:15] 3.3 ARC v2 Development and Performance Scaling

[00:58:22] 3.4 Intelligence Benchmarks and Transformer Limitations

[01:01:50] 3.5 Neural Architecture Optimization and Processing Distribution


REFS:

[00:01:32] Original ARC challenge paper, François Chollet

https://arxiv.org/abs/1911.01547


[00:06:55] DreamCoder, Kevin Ellis et al.

https://arxiv.org/abs/2006.08381


[00:12:50] Deep Learning with Python, François Chollet

https://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438


[00:13:35] Deep Learning with Python, François Chollet

https://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438


[00:13:35] Influence of pretraining data for reasoning, Laura Ruis

https://arxiv.org/abs/2411.12580


[00:17:50] Latent Program Networks, Clement Bonnet

https://arxiv.org/html/2411.08706v1


[00:20:50] T5, Colin Raffel et al.

https://arxiv.org/abs/1910.10683


[00:30:30] Combining Induction and Transduction for Abstract Reasoning, Wen-Ding Li, Kevin Ellis et al.

https://arxiv.org/abs/2411.02272


[00:34:15] Six finger problem, Chen et al.

https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_SpatialVLM_Endowing_Vision-Language_Models_with_Spatial_Reasoning_Capabilities_CVPR_2024_paper.pdf


[00:38:15] DeepSeek-R1-Distill-Llama, DeepSeek AI

https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B


[00:40:10] ARC Prize 2024 Technical Report, François Chollet et al.

https://arxiv.org/html/2412.04604v2


[00:45:20] LLM-Guided Compositional Program Synthesis, Wen-Ding Li and Kevin Ellis

https://arxiv.org/html/2503.15540


[00:54:25] Abstraction and Reasoning Corpus, François Chollet

https://github.com/fchollet/ARC-AGI


[00:57:10] O3 breakthrough on ARC-AGI, OpenAI

https://arcprize.org/


[00:59:35] ConceptARC Benchmark, Arseny Moskvichev, Melanie Mitchell

https://arxiv.org/abs/2305.07141


[01:02:05] Mixtape: Breaking the Softmax Bottleneck Efficiently, Yang, Zhilin and Dai, Zihang and Salakhutdinov, Ruslan and Cohen, William W.

http://papers.neurips.cc/paper/9723-mixtape-breaking-the-softmax-bottleneck-efficiently.pdf

Mar 22, 202501:03:36
GSMSymbolic paper - Iman Mirzadeh (Apple)

GSMSymbolic paper - Iman Mirzadeh (Apple)

Iman Mirzadeh from Apple, who recently published the GSM-Symbolic paper discusses the crucial distinction between intelligence and achievement in AI systems. He critiques current AI research methodologies, highlighting the limitations of Large Language Models (LLMs) in reasoning and knowledge representation.


SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.


Goto https://tufalabs.ai/

***


TRANSCRIPT + RESEARCH:

https://www.dropbox.com/scl/fi/mlcjl9cd5p1kem4l0vqd3/IMAN.pdf?rlkey=dqfqb74zr81a5gqr8r6c8isg3&dl=0


TOC:

1. Intelligence vs Achievement in AI Systems

[00:00:00] 1.1 Intelligence vs Achievement Metrics in AI Systems

[00:03:27] 1.2 AlphaZero and Abstract Understanding in Chess

[00:10:10] 1.3 Language Models and Distribution Learning Limitations

[00:14:47] 1.4 Research Methodology and Theoretical Frameworks


2. Intelligence Measurement and Learning

[00:24:24] 2.1 LLM Capabilities: Interpolation vs True Reasoning

[00:29:00] 2.2 Intelligence Definition and Measurement Approaches

[00:34:35] 2.3 Learning Capabilities and Agency in AI Systems

[00:39:26] 2.4 Abstract Reasoning and Symbol Understanding


3. LLM Performance and Evaluation

[00:47:15] 3.1 Scaling Laws and Fundamental Limitations

[00:54:33] 3.2 Connectionism vs Symbolism Debate in Neural Networks

[00:58:09] 3.3 GSM-Symbolic: Testing Mathematical Reasoning in LLMs

[01:08:38] 3.4 Benchmark Evaluation and Model Performance Assessment


REFS:

[00:01:00] AlphaZero chess AI system, Silver et al.

https://arxiv.org/abs/1712.01815

[00:07:10] Game Changer: AlphaZero's Groundbreaking Chess Strategies, Sadler & Regan

https://www.amazon.com/Game-Changer-AlphaZeros-Groundbreaking-Strategies/dp/9056918184

[00:11:35] Cross-entropy loss in language modeling, Voita

http://lena-voita.github.io/nlp_course/language_modeling.html

[00:17:20] GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in LLMs, Mirzadeh et al.

https://arxiv.org/abs/2410.05229

[00:21:25] Connectionism and Cognitive Architecture: A Critical Analysis, Fodor & Pylyshyn

https://www.sciencedirect.com/science/article/pii/001002779090014B

[00:28:55] Brain-to-body mass ratio scaling laws, Sutskever

https://www.theverge.com/2024/12/13/24320811/what-ilya-sutskever-sees-openai-model-data-training

[00:29:40] On the Measure of Intelligence, Chollet

https://arxiv.org/abs/1911.01547

[00:33:30] On definition of intelligence, Gignac et al.

https://www.sciencedirect.com/science/article/pii/S0160289624000266

[00:35:30] Defining intelligence, Wang

https://cis.temple.edu/~wangp/papers.html

[00:37:40] How We Learn: Why Brains Learn Better Than Any Machine... for Now, Dehaene

https://www.amazon.com/How-We-Learn-Brains-Machine/dp/0525559884

[00:39:35] Surfaces and Essences: Analogy as the Fuel and Fire of Thinking, Hofstadter and Sander

https://www.amazon.com/Surfaces-Essences-Analogy-Fuel-Thinking/dp/0465018475

[00:43:15] Chain-of-thought prompting, Wei et al.

https://arxiv.org/abs/2201.11903

[00:47:20] Test-time scaling laws in machine learning, Brown

https://podcasts.apple.com/mv/podcast/openais-noam-brown-ilge-akkaya-and-hunter-lightman-on/id1750736528?i=1000671532058

[00:47:50] Scaling Laws for Neural Language Models, Kaplan et al.

https://arxiv.org/abs/2001.08361

[00:55:15] Tensor product variable binding, Smolensky

https://www.sciencedirect.com/science/article/abs/pii/000437029090007M

[01:08:45] GSM-8K dataset, OpenAI

https://huggingface.co/datasets/openai/gsm8k

Mar 19, 202501:11:24
Reasoning, Robustness, and Human Feedback in AI - Max Bartolo (Cohere)

Reasoning, Robustness, and Human Feedback in AI - Max Bartolo (Cohere)

Dr. Max Bartolo from Cohere discusses machine learning model development, evaluation, and robustness. Key topics include model reasoning, the DynaBench platform for dynamic benchmarking, data-centric AI development, model training challenges, and the limitations of human feedback mechanisms. The conversation also covers technical aspects like influence functions, model quantization, and the PRISM project.


Max Bartolo (Cohere):

https://www.maxbartolo.com/

https://cohere.com/command


TRANSCRIPT:

https://www.dropbox.com/scl/fi/vujxscaffw37pqgb6hpie/MAXB.pdf?rlkey=0oqjxs5u49eqa2m7uaol64lbw&dl=0


TOC:

1. Model Reasoning and Verification

[00:00:00] 1.1 Model Consistency and Reasoning Verification

[00:03:25] 1.2 Influence Functions and Distributed Knowledge Analysis

[00:10:28] 1.3 AI Application Development and Model Deployment

[00:14:24] 1.4 AI Alignment and Human Feedback Limitations


2. Evaluation and Bias Assessment

[00:20:15] 2.1 Human Evaluation Challenges and Factuality Assessment

[00:27:15] 2.2 Cultural and Demographic Influences on Model Behavior

[00:32:43] 2.3 Adversarial Examples and Model Robustness


3. Benchmarking Systems and Methods

[00:41:54] 3.1 DynaBench and Dynamic Benchmarking Approaches

[00:50:02] 3.2 Benchmarking Challenges and Alternative Metrics

[00:50:33] 3.3 Evolution of Model Benchmarking Methods

[00:51:15] 3.4 Hierarchical Capability Testing Framework

[00:52:35] 3.5 Benchmark Platforms and Tools


4. Model Architecture and Performance

[00:55:15] 4.1 Cohere's Model Development Process

[01:00:26] 4.2 Model Quantization and Performance Evaluation

[01:05:18] 4.3 Reasoning Capabilities and Benchmark Standards

[01:08:27] 4.4 Training Progression and Technical Challenges


5. Future Directions and Challenges

[01:13:48] 5.1 Context Window Evolution and Trade-offs

[01:22:47] 5.2 Enterprise Applications and Future Challenges


REFS:

[00:03:10] Research at Cohere with Laura Ruis et al., Max Bartolo, Laura Ruis et al.

https://cohere.com/research/papers/procedural-knowledge-in-pretraining-drives-reasoning-in-large-language-models-2024-11-20

[00:04:15] Influence functions in machine learning, Koh & Liang

https://arxiv.org/abs/1703.04730

[00:08:05] Studying Large Language Model Generalization with Influence Functions, Roger Grosse et al.

https://storage.prod.researchhub.com/uploads/papers/2023/08/08/2308.03296.pdf

[00:11:10] The LLM ARChitect: Solving ARC-AGI Is A Matter of Perspective, Daniel Franzen, Jan Disselhoff, and David Hartmann

https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf

[00:12:10] Hugging Face model repo for C4AI Command A, Cohere and Cohere For AI

https://huggingface.co/CohereForAI/c4ai-command-a-03-2025

[00:13:30] OpenInterpreter

https://github.com/KillianLucas/open-interpreter

[00:16:15] Human Feedback is not Gold Standard, Tom Hosking, Max Bartolo, Phil Blunsom

https://arxiv.org/abs/2309.16349

[00:27:15] The PRISM Alignment Dataset, Hannah Kirk et al.

https://arxiv.org/abs/2404.16019

[00:32:50] How adversarial examples arise, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, Aleksander Madry

https://arxiv.org/abs/1905.02175

[00:43:00] DynaBench platform paper, Douwe Kiela et al.

https://aclanthology.org/2021.naacl-main.324.pdf

[00:50:15] Sara Hooker's work on compute limitations, Sara Hooker

https://arxiv.org/html/2407.05694v1

[00:53:25] DataPerf: Community-led benchmark suite, Mazumder et al.

https://arxiv.org/abs/2207.10062

[01:04:35] DROP, Dheeru Dua et al.

https://arxiv.org/abs/1903.00161

[01:07:05] GSM8k, Cobbe et al.

https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k

[01:09:30] ARC, François Chollet

https://github.com/fchollet/ARC-AGI

[01:15:50] Command A, Cohere

https://cohere.com/blog/command-a

[01:22:55] Enterprise search using LLMs, Cohere

https://cohere.com/blog/commonly-asked-questions-about-search-from-coheres-enterprise-customers

Mar 18, 202501:23:12
Tau Language: The Software Synthesis Future (sponsored)

Tau Language: The Software Synthesis Future (sponsored)

This sponsored episode features mathematician Ohad Asor discussing logical approaches to AI, focusing on the limitations of machine learning and introducing the Tau language for software development and blockchain tech. Asor argues that machine learning cannot guarantee correctness. Tau allows logical specification of software requirements, automatically creating provably correct implementations with potential to revolutionize distributed systems. The discussion highlights program synthesis, software updates, and applications in finance and governance.SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT + RESEARCH:https://www.dropbox.com/scl/fi/t849j6v1juk3gc15g4rsy/TAU.pdf?rlkey=hh11h2mhog3ncdbeapbzpzctc&dl=0Tau:https://tau.net/Tau Language:https://tau.ai/tau-language/Research:https://tau.net/Theories-and-Applications-of-Boolean-Algebras-0.29.pdfTOC:1. Machine Learning Foundations and Limitations [00:00:00] 1.1 Fundamental Limitations of Machine Learning and PAC Learning Theory [00:04:50] 1.2 Transductive Learning and the Three Curses of Machine Learning [00:08:57] 1.3 Language, Reality, and AI System Design [00:12:58] 1.4 Program Synthesis and Formal Verification Approaches2. Logical Programming Architecture [00:31:55] 2.1 Safe AI Development Requirements [00:32:05] 2.2 Self-Referential Language Architecture [00:32:50] 2.3 Boolean Algebra and Logical Foundations [00:37:52] 2.4 SAT Solvers and Complexity Challenges [00:44:30] 2.5 Program Synthesis and Specification [00:47:39] 2.6 Overcoming Tarski's Undefinability with Boolean Algebra [00:56:05] 2.7 Tau Language Implementation and User Control3. Blockchain-Based Software Governance [01:09:10] 3.1 User Control and Software Governance Mechanisms [01:18:27] 3.2 Tau's Blockchain Architecture and Meta-Programming Capabilities [01:21:43] 3.3 Development Status and Token Implementation [01:24:52] 3.4 Consensus Building and Opinion Mapping System [01:35:29] 3.5 Automation and Financial ApplicationsCORE REFS (more in pinned comment):[00:03:45] PAC (Probably Approximately Correct) Learning framework, Leslie Valianthttps://en.wikipedia.org/wiki/Probably_approximately_correct_learning[00:06:10] Boolean Satisfiability Problem (SAT), Varioushttps://en.wikipedia.org/wiki/Boolean_satisfiability_problem[00:13:55] Knowledge as Justified True Belief (JTB), Matthias Steuphttps://plato.stanford.edu/entries/epistemology/[00:17:50] Wittgenstein's concept of the limits of language, Ludwig Wittgensteinhttps://plato.stanford.edu/entries/wittgenstein/[00:21:25] Boolean algebras, Ohad Osorhttps://tau.net/tau-language-research/[00:26:10] The Halting Problemhttps://plato.stanford.edu/entries/turing-machine/#HaltProb[00:30:25] Alfred Tarski (1901-1983), Mario Gómez-Torrentehttps://plato.stanford.edu/entries/tarski/[00:41:50] DPLLhttps://www.cs.princeton.edu/~zkincaid/courses/fall18/readings/SATHandbook-CDCL.pdf[00:49:50] Tarski's undefinability theorem (1936), Alfred Tarskihttps://plato.stanford.edu/entries/tarski-truth/[00:51:45] Boolean Algebra mathematical foundations, J. Donald Monkhttps://plato.stanford.edu/entries/boolalg-math/[01:02:35] Belief Revision Theory and AGM Postulates, Sven Ove Hanssonhttps://plato.stanford.edu/entries/logic-belief-revision/[01:05:35] Quantifier elimination in atomless boolean algebra, H. Jerome Keislerhttps://people.math.wisc.edu/~hkeisler/random.pdf[01:08:35] Quantifier elimination in Tau language specification, Ohad Asorhttps://tau.ai/Theories-and-Applications-of-Boolean-Algebras-0.29.pdf[01:11:50] Tau Net blockchain platformhttps://tau.net/[01:19:20] Tau blockchain's innovative approach treating blockchain code itself as a contracthttps://tau.net/Whitepaper.pdf

Mar 12, 202501:41:19
John Palazza - Vice President of Global Sales @ CentML ( sponsored)

John Palazza - Vice President of Global Sales @ CentML ( sponsored)

John Palazza from CentML joins us in this sponsored interview to discuss the critical importance of infrastructure optimization in the age of Large Language Models and Generative AI. We explore how enterprises can transition from the innovation phase to production and scale, highlighting the significance of efficient GPU utilization and cost management. The conversation covers the open-source versus proprietary model debate, the rise of AI agents, and the need for platform independence to avoid vendor lock-in, as well as emerging trends in AI infrastructure and the pivotal role of strategic partnerships.


SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

https://centml.ai/pricing/


Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.


Goto https://tufalabs.ai/

***


TRANSCRIPT:

https://www.dropbox.com/scl/fi/dnjsygrgdgq5ng5fdlfjg/JOHNPALAZZA.pdf?rlkey=hl9wyydi9mj077rbg5acdmo3a&dl=0


John Palazza:

Vice President of Global Sales @ CentML

https://www.linkedin.com/in/john-p-b34655/


TOC:

1. Enterprise AI Organization and Strategy

[00:00:00] 1.1 Organizational Structure and ML Ownership

[00:02:59] 1.2 Infrastructure Efficiency and GPU Utilization

[00:07:59] 1.3 Platform Centralization vs Team Autonomy

[00:11:32] 1.4 Enterprise AI Adoption Strategy and Leadership


2. MLOps Infrastructure and Resource Management

[00:15:08] 2.1 Technology Evolution and Enterprise Integration

[00:19:10] 2.2 Enterprise MLOps Platform Development

[00:22:15] 2.3 AI Interface Evolution and Agent-Based Solutions

[00:25:47] 2.4 CentML's Infrastructure Solutions

[00:30:00] 2.5 Workload Abstraction and Resource Allocation


3. LLM Infrastructure Optimization and Independence

[00:33:10] 3.1 GPU Optimization and Cost Efficiency

[00:36:47] 3.2 AI Efficiency and Innovation Challenges

[00:41:40] 3.3 Cloud Provider Strategy and Infrastructure Control

[00:46:52] 3.4 Platform Independence and Vendor Lock-in

[00:50:53] 3.5 Technical Innovation and Growth Strategy


REFS:

[00:01:25] Apple Acquires GraphLab, Apple Inc.

https://techcrunch.com/2016/08/05/apple-acquires-turi-a-machine-learning-company/

[00:03:50] Bain Tech Report 2024, Gartner

https://www.bain.com/insights/topics/technology-report/

[00:04:50] PaaS vs IaaS Efficiency, Gartner

https://www.gartner.com/en/newsroom/press-releases/2024-11-19-gartner-forecasts-worldwide-public-cloud-end-user-spending-to-total-723-billion-dollars-in-2025

[00:14:55] Fashion Quote, Oscar Wilde

https://www.amazon.com/Complete-Works-Oscar-Wilde-Collins/dp/0007144369

[00:15:30] PointCast Network, PointCast Inc.

https://en.wikipedia.org/wiki/Push_technology

[00:18:05] AI Bain Report, Bain & Company

https://www.bain.com/insights/how-generative-ai-changes-the-game-in-tech-services-tech-report-2024/

[00:20:40] Uber Michelangelo, Uber Engineering Team

https://www.uber.com/en-SE/blog/michelangelo-machine-learning-platform/

[00:20:50] Algorithmia Acquisition, DataRobot

https://www.datarobot.com/newsroom/press/datarobot-is-acquiring-algorithmia-enhancing-leading-mlops-architecture-for-the-enterprise/

[00:22:55] Fine Tuning vs RAG, Heydar Soudani, Evangelos Kanoulas & Faegheh Hasibi.

https://arxiv.org/html/2403.01432v2

[00:24:40] LLM Agent Survey, Lei Wang et al.

https://arxiv.org/abs/2308.11432

[00:26:30] CentML CServe, CentML

https://docs.centml.ai/apps/llm

[00:29:15] CentML Snowflake, Snowflake

https://www.snowflake.com/en/engineering-blog/optimize-llms-with-llama-snowflake-ai-stack/

[00:30:15] NVIDIA H100 GPU, NVIDIA

https://www.nvidia.com/en-us/data-center/h100/

[00:33:25] CentML\'s 60% savings, CentML

https://centml.ai/platform/

Mar 10, 202554:50
Transformers Need Glasses! - Federico Barbero

Transformers Need Glasses! - Federico Barbero

Federico Barbero (DeepMind/Oxford) is the lead author of "Transformers Need Glasses!".


Have you ever wondered why LLMs struggle with seemingly simple tasks like counting or copying long strings of text? We break down the theoretical reasons behind these failures, revealing architectural bottlenecks and the challenges of maintaining information fidelity across extended contexts.


Federico explains how these issues are rooted in the transformer's design, drawing parallels to over-squashing in graph neural networks and detailing how the softmax function limits sharp decision-making.


But it's not all bad news! Discover practical "glasses" that can help transformers see more clearly, from simple input modifications to architectural tweaks.


SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

https://centml.ai/pricing/


Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.


Goto https://tufalabs.ai/

***


https://federicobarbero.com/


TRANSCRIPT + RESEARCH:

https://www.dropbox.com/s/h7ys83ztwktqjje/Federico.pdf?dl=0


TOC:

1. Transformer Limitations: Token Detection & Representation

[00:00:00] 1.1 Transformers fail at single token detection

[00:02:45] 1.2 Representation collapse in transformers

[00:03:21] 1.3 Experiment: LLMs fail at copying last tokens

[00:18:00] 1.4 Attention sharpness limitations in transformers


2. Transformer Limitations: Information Flow & Quantization

[00:18:50] 2.1 Unidirectional information mixing

[00:18:50] 2.2 Unidirectional information flow towards sequence beginning in transformers

[00:21:50] 2.3 Diagonal attention heads as expensive no-ops in LAMA/Gemma

[00:27:14] 2.4 Sequence entropy affects transformer model distinguishability

[00:30:36] 2.5 Quantization limitations lead to information loss & representational collapse

[00:38:34] 2.6 LLMs use subitizing as opposed to counting algorithms


3. Transformers and the Nature of Reasoning

[00:40:30] 3.1 Turing completeness conditions in transformers

[00:43:23] 3.2 Transformers struggle with sequential tasks

[00:45:50] 3.3 Windowed attention as solution to information compression

[00:51:04] 3.4 Chess engines: mechanical computation vs creative reasoning

[01:00:35] 3.5 Epistemic foraging introduced


REFS:

[00:01:05] Transformers Need Glasses!, Barbero et al.

https://proceedings.neurips.cc/paper_files/paper/2024/file/b1d35561c4a4a0e0b6012b2af531e149-Paper-Conference.pdf


[00:05:30] Softmax is Not Enough, Veličković et al.

https://arxiv.org/abs/2410.01104


[00:11:30] Adv Alg Lecture 15, Chawla

https://pages.cs.wisc.edu/~shuchi/courses/787-F09/scribe-notes/lec15.pdf


[00:15:05] Graph Attention Networks, Veličković

https://arxiv.org/abs/1710.10903


[00:19:15] Extract Training Data, Carlini et al.

https://arxiv.org/pdf/2311.17035


[00:31:30] 1-bit LLMs, Ma et al.

https://arxiv.org/abs/2402.17764


[00:38:35] LLMs Solve Math, Nikankin et al.

https://arxiv.org/html/2410.21272v1


[00:38:45] Subitizing, Railo

https://link.springer.com/10.1007/978-1-4419-1428-6_578


[00:43:25] NN & Chomsky Hierarchy, Delétang et al.

https://arxiv.org/abs/2207.02098


[00:51:05] Measure of Intelligence, Chollet

https://arxiv.org/abs/1911.01547


[00:52:10] AlphaZero, Silver et al.

https://pubmed.ncbi.nlm.nih.gov/30523106/


[00:55:10] Golden Gate Claude, Anthropic

https://www.anthropic.com/news/golden-gate-claude


[00:56:40] Chess Positions, Chase & Simon

https://www.sciencedirect.com/science/article/abs/pii/0010028573900042


[01:00:35] Epistemic Foraging, Friston

https://www.frontiersin.org/journals/computational-neuroscience/articles/10.3389/fncom.2016.00056/full

Mar 08, 202501:00:55
Sakana AI - Chris Lu, Robert Tjarko Lange, Cong Lu

Sakana AI - Chris Lu, Robert Tjarko Lange, Cong Lu

We speak with Sakana AI, who are building nature-inspired methods that could fundamentally transform how we develop AI systems.


The guests include Chris Lu, a researcher who recently completed his DPhil at Oxford University under Prof. Jakob Foerster's supervision, where he focused on meta-learning and multi-agent systems. Chris is the first author of the DiscoPOP paper, which demonstrates how language models can discover and design better training algorithms. Also joining is Robert Tjarko Lange, a founding member of Sakana AI who specializes in evolutionary algorithms and large language models. Robert leads research at the intersection of evolutionary computation and foundation models, and is completing his PhD at TU Berlin on evolutionary meta-learning. The discussion also features Cong Lu, currently a Research Scientist at Google DeepMind's Open-Endedness team, who previously helped develop The AI Scientist and Intelligent Go-Explore.


SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

https://centml.ai/pricing/


Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.


Goto https://tufalabs.ai/

***



* DiscoPOP - A framework where language models discover their own optimization algorithms

* EvoLLM - Using language models as evolution strategies for optimization

The AI Scientist - A fully automated system that conducts scientific research end-to-end

* Neural Attention Memory Models (NAMMs) - Evolved memory systems that make transformers both faster and more accurate


TRANSCRIPT + REFS:

https://www.dropbox.com/scl/fi/gflcyvnujp8cl7zlv3v9d/Sakana.pdf?rlkey=woaoo82943170jd4yyi2he71c&dl=0


Robert Tjarko Lange

https://roberttlange.com/

Chris Lu

https://chrislu.page/

Cong Lu

https://www.conglu.co.uk/

Sakana

https://sakana.ai/blog/


TOC:

1. LLMs for Algorithm Generation and Optimization

[00:00:00] 1.1 LLMs generating algorithms for training other LLMs

[00:04:00] 1.2 Evolutionary black-box optim using neural network loss parameterization

[00:11:50] 1.3 DiscoPOP: Non-convex loss function for noisy data

[00:20:45] 1.4 External entropy Injection for preventing Model collapse

[00:26:25] 1.5 LLMs for black-box optimization using abstract numerical sequences


2. Model Learning and Generalization

[00:31:05] 2.1 Fine-tuning on teacher algorithm trajectories

[00:31:30] 2.2 Transformers learning gradient descent

[00:33:00] 2.3 LLM tokenization biases towards specific numbers

[00:34:50] 2.4 LLMs as evolution strategies for black box optimization

[00:38:05] 2.5 DiscoPOP: LLMs discovering novel optimization algorithms


3. AI Agents and System Architectures

[00:51:30] 3.1 ARC challenge: Induction vs. transformer approaches

[00:54:35] 3.2 LangChain / modular agent components

[00:57:50] 3.3 Debate improves LLM truthfulness

[01:00:55] 3.4 Time limits controlling AI agent systems

[01:03:00] 3.5 Gemini: Million-token context enables flatter hierarchies

[01:04:05] 3.6 Agents follow own interest gradients

[01:09:50] 3.7 Go-Explore algorithm: archive-based exploration

[01:11:05] 3.8 Foundation models for interesting state discovery

[01:13:00] 3.9 LLMs leverage prior game knowledge


4. AI for Scientific Discovery and Human Alignment

[01:17:45] 4.1 Encoding Alignment & Aesthetics via Reward Functions

[01:20:00] 4.2 AI Scientist: Automated Open-Ended Scientific Discovery

[01:24:15] 4.3 DiscoPOP: LLM for Preference Optimization Algorithms

[01:28:30] 4.4 Balancing AI Knowledge with Human Understanding

[01:33:55] 4.5 AI-Driven Conferences and Paper Review


Mar 01, 202501:37:54
Clement Bonnet - Can Latent Program Networks Solve Abstract Reasoning?

Clement Bonnet - Can Latent Program Networks Solve Abstract Reasoning?

Clement Bonnet discusses his novel approach to the ARC (Abstraction and Reasoning Corpus) challenge. Unlike approaches that rely on fine-tuning LLMs or generating samples at inference time, Clement's method encodes input-output pairs into a latent space, optimizes this representation with a search algorithm, and decodes outputs for new inputs. This end-to-end architecture uses a VAE loss, including reconstruction and prior losses.


SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

https://centml.ai/pricing/


Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.


Goto https://tufalabs.ai/

***


TRANSCRIPT + RESEARCH OVERVIEW:

https://www.dropbox.com/scl/fi/j7m0gaz1126y594gswtma/CLEMMLST.pdf?rlkey=y5qvwq2er5nchbcibm07rcfpq&dl=0


Clem and Matthew-

https://www.linkedin.com/in/clement-bonnet16/

https://github.com/clement-bonnet

https://mvmacfarlane.github.io/


TOC

1. LPN Fundamentals

[00:00:00] 1.1 Introduction to ARC Benchmark and LPN Overview

[00:05:05] 1.2 Neural Networks' Challenges with ARC and Program Synthesis

[00:06:55] 1.3 Induction vs Transduction in Machine Learning


2. LPN Architecture and Latent Space

[00:11:50] 2.1 LPN Architecture and Latent Space Implementation

[00:16:25] 2.2 LPN Latent Space Encoding and VAE Architecture

[00:20:25] 2.3 Gradient-Based Search Training Strategy

[00:23:39] 2.4 LPN Model Architecture and Implementation Details


3. Implementation and Scaling

[00:27:34] 3.1 Training Data Generation and re-ARC Framework

[00:31:28] 3.2 Limitations of Latent Space and Multi-Thread Search

[00:34:43] 3.3 Program Composition and Computational Graph Architecture


4. Advanced Concepts and Future Directions

[00:45:09] 4.1 AI Creativity and Program Synthesis Approaches

[00:49:47] 4.2 Scaling and Interpretability in Latent Space Models


REFS

[00:00:05] ARC benchmark, Chollet

https://arxiv.org/abs/2412.04604


[00:02:10] Latent Program Spaces, Bonnet, Macfarlane

https://arxiv.org/abs/2411.08706


[00:07:45] Kevin Ellis work on program generation

https://www.cs.cornell.edu/~ellisk/


[00:08:45] Induction vs transduction in abstract reasoning, Li et al.

https://arxiv.org/abs/2411.02272


[00:17:40] VAEs, Kingma, Welling

https://arxiv.org/abs/1312.6114


[00:27:50] re-ARC, Hodel

https://github.com/michaelhodel/re-arc


[00:29:40] Grid size in ARC tasks, Chollet

https://github.com/fchollet/ARC-AGI


[00:33:00] Critique of deep learning, Marcus

https://arxiv.org/vc/arxiv/papers/2002/2002.06177v1.pdf

Feb 19, 202551:26
Prof. Jakob Foerster - ImageNet Moment for Reinforcement Learning?

Prof. Jakob Foerster - ImageNet Moment for Reinforcement Learning?

Prof. Jakob Foerster, a leading AI researcher at Oxford University and Meta, and Chris Lu, a researcher at OpenAI -- they explain how AI is moving beyond just mimicking human behaviour to creating truly intelligent agents that can learn and solve problems on their own. Foerster champions open-source AI for responsible, decentralised development. He addresses AI scaling, goal misalignment (Goodhart's Law), and the need for holistic alignment, offering a quick look at the future of AI and how to guide it.


SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

https://centml.ai/pricing/


Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.


Goto https://tufalabs.ai/

***


TRANSCRIPT/REFS:

https://www.dropbox.com/scl/fi/yqjszhntfr00bhjh6t565/JAKOB.pdf?rlkey=scvny4bnwj8th42fjv8zsfu2y&dl=0


Prof. Jakob Foerster

https://x.com/j_foerst

https://www.jakobfoerster.com/

University of Oxford Profile:

https://eng.ox.ac.uk/people/jakob-foerster/


Chris Lu:

https://chrislu.page/


TOC

1. GPU Acceleration and Training Infrastructure

[00:00:00] 1.1 ARC Challenge Criticism and FLAIR Lab Overview

[00:01:25] 1.2 GPU Acceleration and Hardware Lottery in RL

[00:05:50] 1.3 Data Wall Challenges and Simulation-Based Solutions

[00:08:40] 1.4 JAX Implementation and Technical Acceleration


2. Learning Frameworks and Policy Optimization

[00:14:18] 2.1 Evolution of RL Algorithms and Mirror Learning Framework

[00:15:25] 2.2 Meta-Learning and Policy Optimization Algorithms

[00:21:47] 2.3 Language Models and Benchmark Challenges

[00:28:15] 2.4 Creativity and Meta-Learning in AI Systems


3. Multi-Agent Systems and Decentralization

[00:31:24] 3.1 Multi-Agent Systems and Emergent Intelligence

[00:38:35] 3.2 Swarm Intelligence vs Monolithic AGI Systems

[00:42:44] 3.3 Democratic Control and Decentralization of AI Development

[00:46:14] 3.4 Open Source AI and Alignment Challenges

[00:49:31] 3.5 Collaborative Models for AI Development


REFS

[[00:00:05] ARC Benchmark, Chollet

https://github.com/fchollet/ARC-AGI


[00:03:05] DRL Doesn't Work, Irpan

https://www.alexirpan.com/2018/02/14/rl-hard.html


[00:05:55] AI Training Data, Data Provenance Initiative

https://www.nytimes.com/2024/07/19/technology/ai-data-restrictions.html


[00:06:10] JaxMARL, Foerster et al.

https://arxiv.org/html/2311.10090v5


[00:08:50] M-FOS, Lu et al.

https://arxiv.org/abs/2205.01447


[00:09:45] JAX Library, Google Research

https://github.com/jax-ml/jax


[00:12:10] Kinetix, Mike and Michael

https://arxiv.org/abs/2410.23208


[00:12:45] Genie 2, DeepMind

https://deepmind.google/discover/blog/genie-2-a-large-scale-foundation-world-model/


[00:14:42] Mirror Learning, Grudzien, Kuba et al.

https://arxiv.org/abs/2208.01682


[00:16:30] Discovered Policy Optimisation, Lu et al.

https://arxiv.org/abs/2210.05639


[00:24:10] Goodhart's Law, Goodhart

https://en.wikipedia.org/wiki/Goodhart%27s_law


[00:25:15] LLM ARChitect, Franzen et al.

https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf


[00:28:55] AlphaGo, Silver et al.

https://arxiv.org/pdf/1712.01815.pdf


[00:30:10] Meta-learning, Lu, Towers, Foerster

https://direct.mit.edu/isal/proceedings-pdf/isal2023/35/67/2354943/isal_a_00674.pdf


[00:31:30] Emergence of Pragmatics, Yuan et al.

https://arxiv.org/abs/2001.07752


[00:34:30] AI Safety, Amodei et al.

https://arxiv.org/abs/1606.06565


[00:35:45] Intentional Stance, Dennett

https://plato.stanford.edu/entries/ethics-ai/


[00:39:25] Multi-Agent RL, Zhou et al.

https://arxiv.org/pdf/2305.10091


[00:41:00] Open Source Generative AI, Foerster et al.

https://arxiv.org/abs/2405.08597



Feb 18, 202553:31