Back
Unorganized

Ludwig - Research Papers

Essays, notes, papers, and resources by Ludwig Abap.

ludwigabap.com 8,821 words
View original

Research Papers

203 papers in library

Graph Representations for Higher-Order Logic and Theorem Proving

This paper presents the first use of graph neural networks (GNNs) for higher-order proof search and demonstrates that GNNs can improve upon state-of-the-art results in this domain. Interactive, higher…

Computer Science - Artificial Intelligence Computer Science - Logic in Computer Science Computer Science - Machine Learning Statistics - Machine Learning

EvolProver: Advancing Automated Theorem Proving by Evolving Formalized Problems via Symmetry and Difficulty

Large Language Models (LLMs) for formal theorem proving have shown significant promise, yet they often lack generalizability and are fragile to even minor transformations of problem statements. To add…

Computer Science - Artificial Intelligence

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods. Indeed, many high-dimensional learning tasks previously thought to b…

Computer Science - Artificial Intelligence Computer Science - Computational Geometry Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning Statistics - Machine Learning

Position: Categorical Deep Learning is an Algebraic Theory of All Architectures

We present our position on the elusive quest for a general-purpose framework for specifying and studying deep learning architectures. Our opinion is that the key attempts made so far lack a coherent b…

LeanTree: Accelerating White-Box Proof Search with Factorized States in Lean 4

Automated theorem proving (ATP) has been a classical problem in artificial intelligence since its inception, yet it remains challenging due to its vast state and action space. Large language models (L…

Computer Science - Artificial Intelligence Computer Science - Machine Learning

We introduce DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lean 4, which enhances DeepSeek-Prover-V1 by optimizing both training and inference processes. Pre-trai…

Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Logic in Computer Science Computer Science - Machine Learning

Aristotle: IMO-level Automated Theorem Proving

We introduce Aristotle, an AI system that combines formal verification with informal reasoning, achieving gold-medal-equivalent performance on the 2025 International Mathematical Olympiad problems. Ar…

Computer Science - Artificial Intelligence Computer Science - Computation and Language

Lean Copilot: Large Language Models as Copilots for Theorem Proving in Lean

Neural theorem proving combines large language models (LLMs) with proof assistants such as Lean, where the correctness of formal proofs can be rigorously verified, leaving no room for hallucination. W…

Computer Science - Artificial Intelligence Computer Science - Logic in Computer Science Computer Science - Machine Learning Statistics - Machine Learning

Leanabell-Prover: Posttraining Scaling in Formal Reasoning

Recent advances in automated theorem proving (ATP) through LLMs have highlighted the potential of formal reasoning with Lean 4 codes. However, ATP has not yet be revolutionized by the recent posttrain…

Computer Science - Artificial Intelligence

Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Large Language Models (LLMs) apply uniform computation to all tokens, despite language exhibiting highly non-uniform information density. This token-uniform regime wastes capacity on locally predictab…

Computer Science - Artificial Intelligence Computer Science - Machine Learning

Sheaf Theory through Examples (Abridged Version)

This book provides an inviting tour through sheaf theory, from the perspective of applied category theory and pitched at a less specialized audience than is typical with introductions to sheaves. The…

Mathematics - Category Theory

Group Theory in a Nutshell for Physicists

Cover — Title — Copyright — Dedication — Contents — Preface — A Brief Review of Linear Algebra — Part I: Groups: Discrete or Continuous, Finite or Infinite — I.1 Symmetry and Groups — I.2 Fin…

Lean Finder: Semantic Search for Mathlib That Understands User Intents

Mind Everywhere: A Framework for Conceptualizing Goal-Directedness in Biology and Other Domains—Part One

What makes a system—evolved, engineered, or hybrid—describable by teleological and mentalistic terms such as intelligent, goal-directed, cognitive, and intentional? In this two-part article, we review…

Mind Everywhere: A Framework for Conceptualizing Goal-Directedness in Biology and Other Domains—Part Two

What makes a system—evolved, engineered, or hybrid—describable by teleological and mentalistic terms such as intelligent, goal-directed, cognitive, and intentional? In this two-part article, we review…

What does it mean for a system to compute?

Abstract Many real-world dynamic systems, both natural and artificial, are understood to be performing computations. For artificial dynamic systems, explicitly designed to p…

An Introduction to Sheaves on Grothendieck Topologies

A CATEGORICAL INTRODUCTION TO SHEAVES

Sheaf is a very useful notion when defining and computing many different cohomology theories over topological spaces. There are several ways to build up sheaf theory with different axioms; however, some…

Cognition all the way down 2.0: neuroscience beyond neurons in the diverse intelligence era

Abstract This paper formalizes biological intelligence as search efficiency in multi-scale problem spaces, aiming to resolve epistemic deadlocks in the basal “cognition wars…

In-Context Algebra

We investigate the mechanisms that arise when transformers are trained to solve arithmetic on sequences where tokens are variables whose meaning is determined only through their interactions. While pr…

Computer Science - Computation and Language Computer Science - Machine Learning

An Introduction to Topos Theory

The purpose of this text is to equip the reader with an intuitive but precise understanding of elementary structures of category and topos theory. In order to achieve this goal, we provide a guided to…

A Universal Approach to Self-Referential Paradoxes, Incompleteness and Fixed Points

Following F. William Lawvere, we show that many self-referential paradoxes, incompleteness theorems and fixed point theorems fall out of the same simple scheme. We demonstrate these similarities by sh…

Mathematics - Logic

A Type Theory with a Tiny Object

We present an extension of Martin-Löf Type Theory that contains a tiny object; a type for which there is a right adjoint to the formation of function types as well as the expected left adjoint. We dem…

Computer Science - Programming Languages Mathematics - Category Theory Mathematics - Logic

Classical Sorting Algorithms as a Model of Morphogenesis: self-sorting arrays reveal unexpected competencies in a minimal model of basal intelligence

The emerging field of Diverse Intelligence seeks to identify, formalize, and understand commonalities in behavioral competencies across a wide range of implementations. Especially interesting are simp…

Computer Science - Artificial Intelligence Computer Science - Data Structures and Algorithms Computer Science - Multiagent Systems Computer Science - Neural and Evolutionary Computing

Flow-Lenia: Emergent evolutionary dynamics in mass conservative continuous cellular automata

Central to the artificial life endeavour is the creation of artificial systems spontaneously generating properties found in the living world such as autopoiesis, self-replication, evolution and open-e…

Computer Science - Artificial Intelligence Nonlinear Sciences - Cellular Automata and Lattice Gases

Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Large Language Models (LLMs) apply uniform computation to all tokens, despite language exhibiting highly non-uniform information density. This token-uniform regime wastes capacity on locally predictab…

Computer Science - Artificial Intelligence Computer Science - Machine Learning

A Background Independent Algebra in Quantum Gravity

We propose an algebra of operators along an observer’s worldline as a background-independent algebra in quantum gravity. In that context, it is natural to think of the Hartle-Hawking no boundary state…

High Energy Physics - Theory

LLMSTEP: LLM proofstep suggestions in Lean

We present LLMSTEP, a tool for integrating a language model into the Lean proof assistant. LLMSTEP is a Lean 4 tactic that sends a user’s proof state to a server hosting a language model. The language…

Computer Science - Artificial Intelligence Computer Science - Machine Learning

miniCTX: Neural Theorem Proving with (Long-)Contexts

Real-world formal theorem proving often depends on a wealth of context, including definitions, lemmas, comments, file structure, and other information. We introduce miniCTX, which tests a model’s abil…

Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Machine Learning

AI-Assisted Generation of Difficult Math Questions

Current LLM training positions mathematical reasoning as a core capability. With publicly available sources fully tapped, there is unmet demand for diverse and challenging math questions. Relying sole…

Computer Science - Artificial Intelligence Computer Science - Machine Learning

Categorical Proof-Theoretic Semantics

In proof-theoretic semantics, model-theoretic validity is replaced by proof-theoretic validity. Validity of formulae is defined inductively from a base giving the validity of atoms using inductive cla…

Computer Science - Logic in Computer Science Mathematics - Category Theory Mathematics - Logic

Graph Representations for Higher-Order Logic and Theorem Proving

This paper presents the first use of graph neural networks (GNNs) for higher-order proof search and demonstrates that GNNs can improve upon state-of-the-art results in this domain. Interactive, higher…

Computer Science - Artificial Intelligence Computer Science - Logic in Computer Science Computer Science - Machine Learning Statistics - Machine Learning

EvolProver: Advancing Automated Theorem Proving by Evolving Formalized Problems via Symmetry and Difficulty

Large Language Models (LLMs) for formal theorem proving have shown significant promise, yet they often lack generalizability and are fragile to even minor transformations of problem statements. To add…

Computer Science - Artificial Intelligence

Categorical Proof-Theoretic Semantics

In proof-theoretic semantics, model-theoretic validity is replaced by proof-theoretic validity. Validity of formulae is defined inductively from a base giving the validity of atoms using inductive cla…

Computer Science - Logic in Computer Science Mathematics - Category Theory Mathematics - Logic

EvolProver: Advancing Automated Theorem Proving by Evolving Formalized Problems via Symmetry and Difficulty

Large Language Models (LLMs) for formal theorem proving have shown significant promise, yet they often lack generalizability and are fragile to even minor transformations of problem statements. To add…

Computer Science - Artificial Intelligence

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods. Indeed, many high-dimensional learning tasks previously thought to b…

Computer Science - Artificial Intelligence Computer Science - Computational Geometry Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning Statistics - Machine Learning

Hilbert $*$-categories: Where limits in analysis and category theory meet

This article introduces Hilbert $*$-categories: an abstraction of categories with similar algebraic and analytic properties to the categories of real, complex, and quaternionic Hilbert spaces and boun…

Mathematics - Category Theory Mathematics - Functional Analysis Mathematics - Operator Algebras

Basic Category Theory

This short introductory category theory textbook is for readers with relatively little mathematical background (e.g. the first half of an undergraduate mathematics degree). At its heart is the concept…

Mathematics - Algebraic Topology Mathematics - Category Theory Mathematics - Logic

Coend calculus

The book formerly known as “This is the (co)end, my only (co)friend”.

Mathematics - Category Theory

LeanTree: Accelerating White-Box Proof Search with Factorized States in Lean 4

Automated theorem proving (ATP) has been a classical problem in artificial intelligence since its inception, yet it remains challenging due to its vast state and action space. Large language models (L…

Computer Science - Artificial Intelligence Computer Science - Machine Learning

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

We introduce DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lean 4, which enhances DeepSeek-Prover-V1 by optimizing both training and inference processes. Pre-trai…

Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Logic in Computer Science Computer Science - Machine Learning

Aristotle: IMO-level Automated Theorem Proving

We introduce Aristotle, an AI system that combines formal verification with informal reasoning, achieving gold-medal-equivalent performance on the 2025 International Mathematical Olympiad problems. Ar…

Computer Science - Artificial Intelligence Computer Science - Computation and Language

Lean Copilot: Large Language Models as Copilots for Theorem Proving in Lean

Neural theorem proving combines large language models (LLMs) with proof assistants such as Lean, where the correctness of formal proofs can be rigorously verified, leaving no room for hallucination. W…

Computer Science - Artificial Intelligence Computer Science - Logic in Computer Science Computer Science - Machine Learning Statistics - Machine Learning

Leanabell-Prover: Posttraining Scaling in Formal Reasoning

Recent advances in automated theorem proving (ATP) through LLMs have highlighted the potential of formal reasoning with Lean 4 codes. However, ATP has not yet be revolutionized by the recent posttrain…

Computer Science - Artificial Intelligence

Premise Selection for a Lean Hammer

Neural methods are transforming automated reasoning for proof assistants, yet integrating these advances into practical verification workflows remains challenging. Hammers are tools that interface wit…

Computer Science - Artificial Intelligence Computer Science - Logic in Computer Science Computer Science - Machine Learning

LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

Large language models (LLMs) have shown promise in proving formal theorems using proof assistants such as Lean. However, existing methods are difficult to reproduce or build on, due to private code, d…

Computer Science - Artificial Intelligence Computer Science - Logic in Computer Science Computer Science - Machine Learning Statistics - Machine Learning

Position: Categorical Deep Learning is an Algebraic Theory of All Architectures

We present our position on the elusive quest for a general-purpose framework for specifying and studying deep learning architectures. Our opinion is that the key attempts made so far lack a coherent b…

Computer Science - Artificial Intelligence Computer Science - Machine Learning Mathematics - Category Theory Mathematics - Rings and Algebras Statistics - Machine Learning

Recursive Language Models

We study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. We propose Recursive Language Models (RLMs), a general inference strategy…

Computer Science - Artificial Intelligence Computer Science - Computation and Language

Adjointness in Foundations

A categorical view of nouns in their semantical roles

Swarms of Large Language Model Agents for Protein Sequence Design with Experimental Validation

Designing proteins de novo with tailored structural, physicochemical, and functional properties remains a grand challenge in biotechnology, medicine, and materials science, due to the vastness of sequ…

Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Machine Learning Condensed Matter - Mesoscale and Nanoscale Physics Condensed Matter - Soft Condensed Matter

Closing the loop on morphogenesis: a mathematical model of morphogenesis by closed-loop reaction-diffusion

Morphogenesis, the establishment and repair of emergent complex anatomy by groups of cells, is a fascinating and biomedically-relevant problem. One of its most fascinating aspects is that a developing…

Category Theory for Programming

In these lecture notes, we give a brief introduction to some elements of category theory. The choice of topics is guided by applications to functional programming. Firstly, we study initial algebras,…

Computer Science - Programming Languages Mathematics - Category Theory

Deep Manifold Part 2: Neural Network Mathematics

This work develops the global equations of neural networks through stacked piecewise manifolds, fixed-point theory, and boundary-conditioned iteration. Once fixed coordinates and operators are removed…

Computer Science - Artificial Intelligence Computer Science - Machine Learning

Foundations of Mathematics MATH 220 Lecture Notes

On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models

This paper addresses the general problem of reinforcement learning (RL) in partially observable environments. In 2013, our large RL recurrent neural networks (RNNs) learned from scratch to drive simul…

Computer Science - Artificial Intelligence Computer Science - Machine Learning Computer Science - Neural and Evolutionary Computing

Transformers know more than they can tell — Learning the Collatz sequence

We investigate transformer prediction of long Collatz steps, a complex arithmetic function that maps odd integers to their distant successors in the Collatz sequence ( $u_{n+1}=u_n/2$ if $u_n$ is even…

Computer Science - Machine Learning

Forms of life, forms of mind | Dr. Michael Levin | Algorithms Redux: finding unexpected properties in truly minimal systems

Learning in Transcriptional Network Models: Computational Discovery of Pathway-Level Memory and Effective Interventions

Trainability, in any substrate, refers to the ability to change future behavior based on past experiences. An understanding of such capacity within biological cells and tissues would enable a particul…

Top-down models in biology: explanation and control of complex living systems above the molecular level

It is widely assumed in developmental biology and bioengineering that optimal understanding and control of complex living systems follows from models of molecular events. The success of reductionism h…

Darwin’s agential materials: evolutionary implications of multiscale competency in developmental biology

A critical aspect of evolution is the layer of developmental physiology that operates between the genotype and the anatomical phenotype. While much work has addressed the evolution of developmental me…

Technological Approach to Mind Everywhere: An Experimentally-Grounded Framework for Understanding Diverse Bodies and Minds

Synthetic biology and bioengineering provide the opportunity to create novel embodied cognitive systems (otherwise known as minds) in a very wide variety of chimeric architectures combining evolved an…

The Universal Weight Subspace Hypothesis

We show that deep neural networks trained across diverse tasks exhibit remarkably similar low-dimensional parametric subspaces. We provide the first large-scale empirical evidence that demonstrates th…

Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning

Meta Large Language Model Compiler: Foundation Models of Compiler Optimization

Large Language Models (LLMs) have demonstrated remarkable capabilities across a variety of software engineering and coding tasks. However, their application in the domain of code and compiler optimiza…

Computer Science - Artificial Intelligence Computer Science - Programming Languages

Microbenchmarking NVIDIA’s Blackwell Architecture: An in-depth Architectural Analysis

As GPU architectures rapidly evolve to meet the overcoming demands of exascale computing and machine learning, the performance implications of architectural innovations remain poorly understood across…

Computer Science - Hardware Architecture

Classical Sorting Algorithms as a Model of Morphogenesis: self-sorting arrays reveal unexpected competencies in a minimal model of basal intelligence

The emerging field of Diverse Intelligence seeks to identify, formalize, and understand commonalities in behavioral competencies across a wide range of implementations. Especially interesting are simp…

Computer Science - Artificial Intelligence Computer Science - Data Structures and Algorithms Computer Science - Multiagent Systems Computer Science - Neural and Evolutionary Computing

What neuroscience can tell AI about learning in continuously changing environments

Modern artificial intelligence (AI) models, such as large language models, are usually trained once on a huge corpus of data, potentially fine-tuned for a specific task and then deployed with fixed pa…

Computational neuroscience Machine learning

Origins of the brain networks for advanced mathematics in expert mathematicians

The origins of human abilities for mathematics are debated: Some theories suggest that they are founded upon evolutionarily ancient brain circuits for number and space and others that they are grounde…

Collective intelligence: A unifying concept for integrating biology across scales and substrates

A defining feature of biology is the use of a multiscale architecture, ranging from molecular networks to cells, tissues, organs, whole bodies, and swarms. Crucially however, biology is not only neste…

The free-energy principle: a unified brain theory?

It’s not the thought that counts: Allostasis at the core of brain function

In psychology and neuroscience, scientific questions are often framed in terms of mental activity (e.g., cognition, emotion, and perception); however, the brain is an organ with a particular function…

Proof Theory

Proof theory is not an esoteric technical subject that was invented tosupport a formalist doctrine in the philosophy of mathematics; rather,it has been developed as an attempt to analyze aspects of ma…

Hilbert, David: program in the foundations of mathematics epsilon calculus logic: linear mathematics: constructive proof theory: development of reasoning: automated semantics: proof-theoretic set theory: constructive and intuitionistic ZF type theory

Cache-to-Cache: Direct Semantic Communication Between Large Language Models

Multi-LLM systems harness the complementary strengths of diverse Large Language Models, achieving performance and efficiency gains unattainable by a single model. In existing designs, LLMs communicate…

Computer Science - Computation and Language Computer Science - Machine Learning

(PDF) Metaphorical motion in mathematical reasoning: Further evidence for pre-motor implementation of structure mapping in abstract domains

Continual Learning: Fast and Slow

According to the Complementary Learning Systems (CLS) theory~\cite{mcclelland1995there} in neuroscience, humans do effective \emph{continual learning} through two complementary systems: a fast learnin…

Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning

The Future of Continual Learning in the Era of Foundation Models: Three Key Directions

Continual learning—the ability to acquire, retain, and refine knowledge over time—has always been fundamental to intelligence, both human and artificial. Historically, different AI paradigms have ac…

Computer Science - Artificial Intelligence Computer Science - Machine Learning

Fantastic Pretraining Optimizers and Where to Find Them

AdamW has long been the dominant optimizer in language model pretraining, despite numerous claims that alternative optimizers offer 1.4 to 2x speedup. We posit that two methodological shortcomings hav…

Computer Science - Artificial Intelligence Computer Science - Machine Learning Statistics - Machine Learning

Autonomous Code Evolution Meets NP-Completeness

Large language models (LLMs) have recently shown strong coding abilities, enabling not only static code generation but also iterative code self-evolving through agentic frameworks. Recently, AlphaEvol…

Computer Science - Artificial Intelligence Computer Science - Logic in Computer Science Computer Science - Machine Learning

Complexity Classes as Mathematical Axioms

Treating a conjecture, P^#P!= NP, on the separation of complexity classes as an axiom, an implication is found in three manifold topology with little obvious connection to complexity theory. This is…

Computer Science - Computational Complexity Mathematics - Geometric Topology

NP-complete Problems and Physical Reality

Can NP-complete problems be solved efficiently in the physical universe? I survey proposals including soap bubbles, protein folding, quantum computing, quantum advice, quantum adiabatic algorithms, qu…

Computer Science - Computational Complexity General Relativity and Quantum Cosmology Quantum Physics

Topological views on computational complexity.

Future of Quantum Computing

On Tuesday 26th November 2024, four discussants participated in a moderated virtual panel titled Future of Quantum Computing as one session of the 8th International Conference on Quantum Techniques in…

Quantum Physics

The Ghost in the Quantum Turing Machine

In honor of Alan Turing’s hundredth birthday, I unwisely set out some thoughts about one of Turing’s obsessions throughout his life, the question of physics and free will. I focus relatively narrowly…

Computer Science - General Literature Physics - History and Philosophy of Physics Quantum Physics

Learning Universal Predictors

Meta-learning has emerged as a powerful approach to train neural networks to learn new tasks quickly from limited data. Broad exposure to different tasks leads to versatile representations enabling ge…

Computer Science - Artificial Intelligence Computer Science - Machine Learning

Large Language Models as Computable Approximations to Solomonoff Induction

The rapid advancement of large language models (LLMs) calls for a rigorous theoretical framework to explain their empirical success. While significant progress has been made in understanding LLM behav…

Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Machine Learning

From Theory to Practice: Plug and Play with Succinct Data Structures

Engineering efficient implementations of compact and succinct structures is a time-consuming and challenging task, since there is no standard library of easy-to- use, highly optimized, and composable…

Computer Science - Data Structures and Algorithms

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Large language models interact with users through a simulated ‘Assistant’ persona. While the Assistant is typically trained to be helpful, harmless, and honest, it sometimes deviates from these ideals…

Computer Science - Computation and Language Computer Science - Machine Learning

DataRater: Meta-Learned Dataset Curation

The quality of foundation models depends heavily on their training data. Consequently, great efforts have been put into dataset curation. Yet most approaches rely on manual tuning of coarse-grained mi…

Computer Science - Artificial Intelligence Computer Science - Machine Learning Statistics - Machine Learning

Diffusion Beats Autoregressive in Data-Constrained Settings

Autoregressive (AR) models have long dominated the landscape of large language models, driving progress across a wide range of tasks. Recently, diffusion-based language models have emerged as a promis…

Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning Computer Science - Robotics

Subliminal Learning: Language models transmit behavioral traits via hidden signals in data

We study subliminal learning, a surprising phenomenon where language models transmit behavioral traits via semantically unrelated data. In our main experiments, a “teacher” model with some trait T (su…

Computer Science - Artificial Intelligence Computer Science - Machine Learning

LLM4Decompile: Decompiling Binary Code with Large Language Models

Decompilation aims to convert binary code to high-level source code, but traditional tools like Ghidra often produce results that are difficult to read and execute. Motivated by the advancements in La…

Computer Science - Computation and Language Computer Science - Programming Languages

Large Language Models and Emergence: A Complex Systems Perspective

Emergence is a concept in complexity science that describes how many-body systems manifest novel higher-level properties, properties that can be described by replacing high-dimensional mechanisms with…

Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Machine Learning Computer Science - Neural and Evolutionary Computing

Transformers are Efficient Compilers, Provably

Transformer-based large language models (LLMs) have demonstrated surprisingly robust performance across a wide range of language-related tasks, including programming language understanding and generat…

Computer Science - Machine Learning Computer Science - Programming Languages

Why Philosophers Should Care about Computational Complexity

One might think that, once we know something is computable, how efficiently it can be computed is a practical question with little further philosophical importance. In this essay, I offer a detailed case…

Fast and Simplex: 2-Simplicial Attention in Triton

Recent work has shown that training loss scales as a power law with both model size and the number of tokens, and that achieving compute-optimal models requires scaling model size and token count toge…

Computer Science - Artificial Intelligence Computer Science - Machine Learning

Simple linear attention language models balance the recall-throughput tradeoff

Recent work has shown that attention-based language models excel at recall, the ability to ground generations in tokens previously seen in context. However, the efficiency of attention-based models is…

Computer Science - Computation and Language Computer Science - Machine Learning

Dimension Mixer: A Generalized Method for Structured Sparsity in Deep Neural Networks

The recent success of multiple neural architectures like CNNs, Transformers, and MLP-Mixers motivated us to look for similarities and differences between them. We found that these architectures can be…

Computer Science - Machine Learning

Intelligence at the Edge of Chaos

We explore the emergence of intelligent behavior in artificial systems by investigating how the complexity of rule-based systems influences the capabilities of models trained to predict these rules. O…

Computer Science - Artificial Intelligence Computer Science - Neural and Evolutionary Computing

Magistral

We introduce Magistral, Mistral’s first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior mod…

Computer Science - Computation and Language

Reinforcement Pre-Training

In this work, we introduce Reinforcement Pre-Training (RPT) as a new scaling paradigm for large language models and reinforcement learning (RL). Specifically, we reframe next-token prediction as a rea…

Computer Science - Computation and Language

Frontier AI development relies on powerful AI supercomputers, yet analysis of these systems is limited. We create a dataset of 500 AI supercomputers from 2019 to 2025 and analyze key trends in perform…

Computer Science - Artificial Intelligence Computer Science - Computers and Society

eGPU: Extending eBPF Programmability and Observability to GPUs

Precise GPU observability and programmability are essential for optimizing performance in AI workloads and other computationally intensive high-performance computing (HPC) applications. In this paper,…

Attention-Level Speculation

As Large Language Models (LLMs) grow in size and context length, efficient inference strategies are essential to maintain low-latency token generation. Unfortunately, conventional tensor and data para…

Qwen3 Technical Report

In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capa…

Computer Science - Computation and Language

LazyLog: A New Shared Log Abstraction for Low-Latency Applications

Shared logs offer linearizable total order across storage shards. However, they enforce this order eagerly upon ingestion, leading to high latencies. We observe that in many modern shared-log applicat…

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

It has recently been observed that neural language models trained on unstructured text can implicitly store and retrieve knowledge using natural language queries. In this short paper, we measure the p…

Computer Science - Computation and Language Computer Science - Machine Learning Statistics - Machine Learning

The empirical status of predictive coding and active inference

Research on predictive processing models has focused largely on two specific algorithmic theories: Predictive Coding for perception and Active Inference for decision-making. While these interconnected…

Active Inference Bayesian Brain Computational Modeling Predictive Coding Predictive Processing

Predictive eye movements are adjusted in a Bayes-optimal fashion in response to unexpectedly changing environmental probabilities

This study examined the application of active inference to dynamic visuomotor control. Active inference proposes that actions are dynamically planned according to uncertainty about sensory information…

Continuous Thought Machines

Biological brains demonstrate complex neural activity, where the timing and interplay between neurons is critical to how brains process information. Most deep learning architectures simplify neural ac…

Computer Science - Artificial Intelligence Computer Science - Machine Learning

A tutorial on the free-energy framework for modelling perception and learning

This paper provides an easy to follow tutorial on the free-energy framework for modelling perception developed by Friston, which extends the predictive coding model of Rao and Ballard. These models as…

Canonical Microcircuits for Predictive Coding

This Perspective considers the influential notion of a canonical (cortical) microcircuit in light of recent theories about neuronal processing. Specifically, we conciliate quantitative studies of micr…

Language Models use Lookbacks to Track Beliefs

How do language models (LMs) represent characters’ beliefs, especially when those beliefs may differ from reality? This question lies at the heart of understanding the Theory of Mind (ToM) capabilitie…

Computer Science - Computation and Language

The Diffusion Duality

Uniform-state discrete diffusion models hold the promise of fast text generation due to their inherent ability to self-correct. However, they are typically outperformed by autoregressive models and ma…

Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Machine Learning

Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Shojaee et al. (2025) report that Large Reasoning Models (LRMs) exhibit “accuracy collapse” on planning puzzles beyond certain complexity thresholds. We demonstrate that their findings primarily refle…

Computer Science - Artificial Intelligence Computer Science - Machine Learning

Self-Adapting Language Models

Large language models (LLMs) are powerful but static; they lack mechanisms to adapt their weights in response to new tasks, knowledge, or examples. We introduce Self-Adapting LLMs (SEAL), a framework…

Chain-of-Thought Reasoning is a Policy Improvement Operator

Large language models have astounded the world with fascinating new capabilities. However, they currently lack the ability to teach themselves new skills, relying instead on large amounts of human-gen…

Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Machine Learning

General agents need world models

Are world models a necessary ingredient for flexible, goal-directed behaviour, or is model-free learning sufficient? We provide a formal answer to this question, showing that any agent capable of gene…

Computer Science - Artificial Intelligence Computer Science - Machine Learning Computer Science - Robotics Statistics - Machine Learning

Reasoning with Language Model is Planning with World Model

Large language models (LLMs) have shown remarkable reasoning capabilities, especially when prompted to generate intermediate reasoning steps (e.g., Chain-of-Thought, CoT). However, LLMs can still stru…

Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Machine Learning

Emergent Abilities of Large Language Models

Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we…

[2506.08007] Reinforcement Pre-Training

Observer Theory and the Ruliad: An Extension to the Wolfram Model

This paper presents an extension of Observer Theory within the context of the Ruliad, using a mathematically rigorous formalization with category theory as the unifying framework. This paper demonstra…

eGPU: Extending eBPF Programmability and Observability to GPUs

Minimum Description Length and Generalization Guarantees for Representation Learning

A major challenge in designing efficient statistical supervised learning algorithms is finding representations that perform well not only on available training samples but also on unseen data. While t…

Compute-Optimal LLMs Provably Generalize Better With Scale

Why do larger language models generalize better? To investigate this question, we develop generalization bounds on the pretraining objective of large language models (LLMs) in the compute-optimal regi…

Computer Science - Artificial Intelligence Computer Science - Machine Learning

Large Language Model Compression with Global Rank and Sparsity Optimization

Low-rank and sparse composite approximation is a natural idea to compress Large Language Models (LLMs). However, such an idea faces two primary challenges that adversely affect the performance of exis…

Computer Science - Artificial Intelligence Computer Science - Machine Learning

A Survey to Recent Progress Towards Understanding In-Context Learning

In-Context Learning (ICL) empowers Large Language Models (LLMs) with the ability to learn from a few examples provided in the prompt, enabling downstream generalization without the requirement for gra…

Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents

Today’s AI systems have human-designed, fixed architectures and cannot autonomously and continuously improve themselves. The advance of AI could itself be automated. If done safely, that would acceler…

Computer Science - Artificial Intelligence

AlphaEvolve: A coding agent for scientific and algorithmic discovery

The Complexity Dynamics of Grokking

We investigate the phenomenon of generalization through the lens of compression. In particular, we study the complexity dynamics of neural networks to explain grokking, where networks suddenly transit…

Computer Science - Machine Learning

Denoising Diffusion Probabilistic Models

We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results…

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still a…

Computer Science - Machine Learning Condensed Matter - Disordered Systems and Neural Networks Quantitative Biology - Neurons and Cognition Statistics - Machine Learning

Mechanistic Design and Scaling of Hybrid Architectures

The development of deep learning architectures is a resource-demanding process, due to a vast design space, long prototyping times, and high compute costs associated with at-scale model training and e…

Computer Science - Machine Learning

Mathematical discoveries from program search with large language models

Large language models (LLMs) have demonstrated tremendous capabilities in solving complex tasks, from quantitative reasoning to understanding natural language. However, LLMs sometimes suffer from conf…

Computer science Pure mathematics

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods. Indeed, many high-dimensional learning tasks previously thought to b…

Computer Science - Artificial Intelligence Computer Science - Computational Geometry Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning Statistics - Machine Learning

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful approach to enhancing the reasoning capabilities of Large Language Models (LLMs), while its mechanisms are not yet well…

Voyager: An Open-Ended Embodied Agent with Large Language Models

We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human inter…

Trade-offs in Data Memorization via Strong Data Processing Inequalities

Recent research demonstrated that training large language models involves memorization of a significant fraction of training data. Such memorization can lead to privacy violations when training on sen…

Computer Science - Information Theory Computer Science - Machine Learning Mathematics - Information Theory Statistics - Machine Learning

What Formal Languages Can Transformers Express? A Survey

As transformers have gained prominence in natural language processing, some researchers have investigated theoretically what problems they can and cannot solve, by treating problems as formal language…

The Illusion of State in State-Space Models

State-space models (SSMs) have emerged as a potential alternative to transformers. One theoretical weakness of transformers is that they cannot express certain kinds of sequential computation and stat…

Let’s Think Dot by Dot: Hidden Computation in Transformer Language Models

Emerging Properties in Unified Multimodal Pretraining

Unifying multimodal understanding and generation has shown impressive capabilities in cutting-edge proprietary systems. In this work, we introduce BAGEL, an open-source foundational model that nativel…

Computer Science - Computer Vision and Pattern Recognition

How much do language models memorize?

We propose a new method for estimating how much a model “knows” about a datapoint and use it to measure the capacity of modern language models. Prior studies of language model memorization have stru…

Computer Science - Computation and Language

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to addr…

Computer Science - Machine Learning

BPE Stays on SCRIPT: Structured Encoding for Robust Multilingual Pretokenization

Byte Pair Encoding (BPE) tokenizers, widely used in Large Language Models, face challenges in multilingual settings, including penalization of non-Western scripts and the creation of tokens with parti…

Computer Science - Computation and Language

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Recent advances in reasoning-centric language models have highlighted reinforcement learning (RL) as a promising method for aligning models with verifiable rewards. However, it remains contentious whe…

Computer Science - Artificial Intelligence Computer Science - Computation and Language

Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs

Even after fine-tuning and reinforcement learning, large language models (LLMs) can be difficult, if not impossible, to control reliably with prompts alone. We propose a new inference-time approach to…

Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Programming Languages Statistics - Computation

AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

Despite recent progress in large-scale reinforcement learning (RL) for reasoning, the training recipe for building high-performing reasoning models remains elusive. Key implementation details of front…

Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Machine Learning

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Reinforcement Learning with Verifiable Rewards (RLVR) has recently demonstrated notable success in enhancing the reasoning performance of large language models (LLMs), particularly on mathematics and…

Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful approach to enhancing the reasoning capabilities of Large Language Models (LLMs), while its mechanisms are not yet well…

Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Machine Learning

REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

We introduce Reasoning Gym (RG), a library of reasoning environments for reinforcement learning with verifiable rewards. It provides over 100 data generators and verifiers spanning multiple domains in…

Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Machine Learning

Learning to Model the World with Language

To interact with humans and act in the world, agents need to understand the range of language that people use and relate it to the visual world. While current agents can learn to execute simple langua…

Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Machine Learning

Hardware-Efficient Attention for Fast Decoding

LLM decoding is bottlenecked for large batches and long contexts by loading the key-value (KV) cache from high-bandwidth memory, which inflates per-token latency, while the sequential nature of decodi…

Computer Science - Computation and Language Computer Science - Machine Learning

FP8 Formats for Deep Learning

FP8 is a natural progression for accelerating deep learning training inference beyond the 16-bit formats common in modern processors. In this paper we propose an 8-bit floating point (FP8) binary inte…

Computer Science - Machine Learning

Reinforcement Learning Finetunes Small Subnetworks in Large Language Models

Reinforcement learning (RL) yields substantial improvements in large language models (LLMs) downstream task performance and alignment with human values. Surprisingly, such large gains result from upda…

Computer Science - Machine Learning

Deep Reinforcement Learning, a textbook

Deep reinforcement learning has gathered much attention recently. Impressive results were achieved in activities as diverse as autonomous driving, game playing, molecular recombination, and robotics…

Computer Science - Artificial Intelligence Computer Science - Machine Learning

Large Language Diffusion Models

Autoregressive models (ARMs) are widely regarded as the cornerstone of large language models (LLMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre…

Computer Science - Computation and Language Computer Science - Machine Learning

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from outcome-based rewards. Recent RLVR wo…

Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Machine Learning

Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment

This paper investigates approaches to enhance the reasoning capabilities of Large Language Model (LLM) agents using Reinforcement Learning (RL). Specifically, we focus on multi-turn tool-use scenarios…

Computer Science - Machine Learning

Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs)

Recent developments in deep learning optimization have brought about radically new algorithms based on the Linear Minimization Oracle (LMO) framework, such as $\sf Muon$ and $\sf Scion$. After over a…

Computer Science - Machine Learning Mathematics - Optimization and Control Statistics - Machine Learning

Visual Planning: Let’s Think Only with Images

Recent advancements in Large Language Models (LLMs) and their multimodal extensions (MLLMs) have substantially enhanced machine reasoning across diverse tasks. However, these models predominantly rely…

Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning

The Platonic Representation Hypothesis

We argue that representations in AI models, particularly deep networks, are converging. First, we survey many examples of convergence in the literature: over time and across multiple domains, the ways…

Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning Computer Science - Neural and Evolutionary Computing

Round and Round We Go! What makes Rotary Positional Encodings useful?

Positional Encodings (PEs) are a critical component of Transformer-based Large Language Models (LLMs), providing the attention mechanism with important sequence-position information. One of the most p…

Computer Science - Computation and Language Computer Science - Machine Learning

HiRoPE: Length Extrapolation for Code Models Using Hierarchical Position

Addressing the limitation of context length in large language models for code-related tasks is the primary focus of this paper. Existing LLMs are constrained by their pre-trained context lengths, lead…

Computer Science - Software Engineering

Efficient Memory Management for Large Language Model Serving with PagedAttention

High throughput serving of large language models (LLMs) requires batching sufficiently many requests at a time. However, existing systems struggle because the key-value cache (KV cache) memory for eac…

Computer Science - Distributed, Parallel, and Cluster Computing Computer Science - Machine Learning

Transformers Represent Belief State Geometry in their Residual Stream

Produced while being an affiliate at PIBBSS[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS.…

Consequences of the Moosbauer-Poole Algorithms

Moosbauer and Poole have recently shown that the multiplication of two $5\times 5$ matrices requires no more than 93 multiplications in the (possibly non-commutative) coefficient ring, and that the mu…

Computer Science - Symbolic Computation

Emergence of Language in the Developing Brain

Illuminating search spaces by mapping elites

Many fields use search algorithms, which automatically explore a search space to find high-performing solutions: chemists search through the space of molecules to discover new drugs; engineers search…

Computer Science - Artificial Intelligence Computer Science - Neural and Evolutionary Computing Computer Science - Robotics Quantitative Biology - Populations and Evolution

Iteratively reweighted kernel machines efficiently learn sparse functions

The impressive practical performance of neural networks is often attributed to their ability to learn low-dimensional data representations and hierarchical structure directly from data. In this work,…

Computer Science - Machine Learning Mathematics - Optimization and Control Mathematics - Statistics Theory Statistics - Machine Learning Statistics - Statistics Theory

Mechanism of feature learning in deep fully connected networks and kernel machines that recursively learn features

In recent years neural networks have achieved impressive results on many technological and scientific tasks. Yet, the mechanism through which these models automatically select features, or patterns in…

Computer Science - Artificial Intelligence Computer Science - Machine Learning Statistics - Machine Learning

Byte Latent Transformer: Patches Scale Better Than Tokens

We introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant improvements in inferen…

Computer Science - Computation and Language

Voyager: An Open-Ended Embodied Agent with Large Language Models

We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human inter…

Computer Science - Artificial Intelligence Computer Science - Machine Learning

Denoising Diffusion Probabilistic Models

We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results…

Computer Science - Machine Learning Statistics - Machine Learning

A Path Towards Autonomous Machine Intelligence Version 0.9.2, 2022-06-27

How could machines learn as efficiently as humans and animals? How could machines learn to reason and plan? How could machines learn representations of percepts and action plans at multiple levels of ab…

Learning high-level visual representations from a child’s perspective without strong inductive biases

Young children develop sophisticated internal models of the world based on their visual experience. Can such models be learned from a child’s visual experience without strong inductive biases? To inve…

Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning Computer Science - Neural and Evolutionary Computing Quantitative Biology - Neurons and Cognition

Mechanism and Emergence of Stacked Attention Heads in Multi-Layer Transformers

In this paper, I introduce the retrieval problem, a simple yet common reasoning task that can be solved only by transformers with a minimum number of layers, which grows logarithmically with the input…

Computer Science - Computation and Language Computer Science - Machine Learning

Scaling Laws for Precision

Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this. In this work, we devise “precision-aware” scaling laws for b…

Computer Science - Computation and Language Computer Science - Machine Learning

FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation

This work presents a Fully BInarized Large Language Model (FBI-LLM), demonstrating for the first time how to train a large-scale binary language model from scratch (not the partial binary or ternary L…

Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Machine Learning

Learning to Reason for Long-Form Story Generation

Generating high-quality stories spanning thousands of tokens requires competency across a variety of skills, from tracking plot and character arcs to keeping a consistent and engaging style. Due to th…

Computer Science - Computation and Language

Diffusion Models are Evolutionary Algorithms

In a convergence of machine learning and biology, we reveal that diffusion models are evolutionary algorithms. By considering evolution as a denoising process and reversed evolution as diffusion, we m…

Computer Science - Machine Learning Computer Science - Neural and Evolutionary Computing

Similarity of Neural Network Representations Revisited

Recent work has sought to understand the behavior of neural networks by comparing representations between layers and between different trained models. We examine methods for comparing neural network r…

Computer Science - Machine Learning Quantitative Biology - Neurons and Cognition Statistics - Machine Learning

Layers at Similar Depths Generate Similar Activations Across LLM Architectures

How do the latent spaces used by independently-trained LLMs relate to one another? We study the nearest neighbor relationships induced by activations at different layers of 24 open-weight LLMs, and fi…

Computer Science - Artificial Intelligence Computer Science - Computation and Language

What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models

As enthusiasm for scaling computation (data and parameters) in the pretraining era gradually diminished, test-time scaling (TTS)—also referred to as “test-time computing”—has emerged as a prominent re…

Computer Science - Artificial Intelligence Computer Science - Computation and Language

TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems

This paper focuses on multimodal alignment within the realm of Artificial Intelligence, particularly in text and image modalities. The semantic gap between the textual and visual modality poses a disc…

Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. We introduce the Image-based Joint-Embedding Predictive Archi…

History, Development, and Principles of Large Language Models-An Introductory Survey

Language models serve as a cornerstone in natural language processing (NLP), utilizing mathematical methods to generalize language laws and knowledge for prediction and generation. Over extensive rese…

Computer Science - Computation and Language

Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM’s Reasoning Capability

Mathematical reasoning tasks pose significant challenges for large language models (LLMs) because they require precise logical deduction and sequence analysis. In this work, we introduce the concept o…

Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Machine Learning

A mathematical theory of semantic development in deep neural networks

An extensive body of empirical research has revealed remarkable regularities in the acquisition, organization, deployment, and neural representation of human semantic knowledge, thereby raising a fund…

Computer Science - Artificial Intelligence Computer Science - Machine Learning Quantitative Biology - Neurons and Cognition Statistics - Machine Learning

Progress measures for grokking via mechanistic interpretability

Neural networks often exhibit emergent behavior, where qualitatively new capabilities arise from scaling up the amount of parameters, training data, or training steps. One approach to understanding em…

Computer Science - Artificial Intelligence Computer Science - Machine Learning

Towards Automated Circuit Discovery for Mechanistic Interpretability

Through considerable effort and intuition, several recent works have reverse-engineered nontrivial behaviors of transformer models. This paper systematizes the mechanistic interpretability process the…

Computer Science - Machine Learning

Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes

I argue that data becomes temporarily interesting by itself to some self-improving, but computationally limited, subjective observer once he learns to predict or compress the data in a better way, thu…

Computer Science - Artificial Intelligence Computer Science - Neural and Evolutionary Computing

Circuit Tracing: Revealing Computational Graphs in Language Models

We describe an approach to tracing the “step-by-step” computation involved when a model responds to a single prompt.

On the Biology of a Large Language Model

We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic’s lightweight production model — in a variety of contexts, using our circuit tracing methodology.

Generalization through variance: how noise shapes inductive biases in diffusion models

How diffusion models generalize beyond their training set is not known, and is somewhat mysterious given two facts: the optimum of the denoising score matching (DSM) objective usually used to train di…

GENERALIZATION THROUGH VARIANCE: HOW NOISE SHAPES INDUCTIVE BIASES IN DIFFUSION MODELS

How diffusion models generalize beyond their training set is not known, and is somewhat mysterious given two facts: the optimum of the denoising score matching (DSM) objective usually used to train di…

Training Large Language Models to Reason in a Continuous Latent Space

Large language models (LLMs) are restricted to reason in the “language space”, where they typically express the reasoning process with a chain-of-thought (CoT) to solve a complex reasoning problem. Ho…

Computer Science - Computation and Language

Contextualization Machines

Training Large Language Models to Reason in a Continuous Latent Space

Do Llamas Work in English? On the Latent Language of Multilingual Transformers

We ask whether multilingual language models trained on unbalanced, English-dominated corpora use English as an internal pivot language — a question of key importance for understanding how language mo…

Computer Science - Computation and Language Computer Science - Computers and Society

On the Emergence of Thinking in LLMs I: Searching for the Right Intuition

Recent advancements in AI, such as OpenAI’s new o models, Google’s Gemini Thinking model, and Deepseek R1, are transforming LLMs into LRMs (Large Reasoning Models). Unlike LLMs, LRMs perform thinking…

Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Machine Learning

On the Emergence of Thinking in LLMs I: Searching for the Right Intuition

Recent AI advancements, such as OpenAI’s new models, are transforming LLMs into LRMs (Large Reasoning Models) that perform reasoning during inference, taking extra time and compute for higher-quality…