Log inSign up
Goodfire
677 posts
user avatar
Goodfire
@GoodfireAI
Using interpretability to understand, learn from, and design AI.
San Francisco
goodfire.ai
Joined August 2024
29
Following
24.8K
Followers
  • Pinned
    user avatar
    Goodfire
    @GoodfireAI
    May 7
    Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵
    00:00
    3.3M
  • Goodfire reposted
    user avatar
    Lucius Bushnaq ⏹️
    @BushnaqLucius
    Jun 26
    Replying to @bygregorr and @GoodfireAI
    Checked. Same story as French and Spanish. The LoRAs wreck Dutch and Swedish performance, the single component edit suppression fine-tune leaves them alone.
    2K
  • user avatar
    Goodfire
    @GoodfireAI
    Jun 25
    Replying to @GoodfireAI
    Correction: a plotting error caused the bars in the plot of off-target effects to display at 0.01 nats above the true means. The corrected plot is below:
    6K
  • user avatar
    Goodfire
    @GoodfireAI
    Jun 25
    We removed an LM's ability to speak German by fine-tuning on only 4 German tokens. As part of a 1-day hackathon with our product Silico, we removed a 67M-parameter language model's ability to predict German text, by tuning only a scalar factor on one subcomponent of the weights.
    302K
    user avatar
    Goodfire
    @GoodfireAI
    Jun 25
    Replying to @GoodfireAI
    Plus, that interpretability lets us notice and fix problems. E.g.: initially we tuned the top 16 German-related components, but their labels showed most were about foreign languages in general. So we narrowed to the single component for German alone, improving precision. (5/6)
    7.7K
    user avatar
    Goodfire
    @GoodfireAI
    Jun 25
    This is an early demo of how parameter decomposition could enable targeted, predictable model editing. Details on this experiment: lesswrong.com/posts/ieoWstub… If you want to run experiments on your model too, learn more and request access to Silico: goodfire.ai/silico
    lesswrong.com
    Exploration: fine-tuning with parameter decomposition — LessWrong
    TL;DR: We can destroy a 67M-parameter language model's ability to predict German text by fine-tuning a single number: the scalar prefactor on one Ger…
    7.4K
  • Goodfire reposted
    user avatar
    Eric Ho
    @ericho_goodfire
    Jun 24
    we're hiring for a bunch of technical GTM roles at @GoodfireAI across forward deployed engineering, sales, and growth come help us understand every model across biology, materials, robotics, language, and more apply here or DM me: goodfire.ai/careers
    23K
  • user avatar
    Goodfire
    @GoodfireAI
    Jun 23
    Stories have shapes: a comedy rises toward joy; a tragedy falls into loss. Inside an LLM, that’s visible more literally: as an LLM reads a story, its internal activations trace a wandering path that reflects the model’s sense of what kind of story it is reading. (1/5)
    00:00
    00:08
    user avatar
    Goodfire
    @GoodfireAI
    May 7
    Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵
    102K
    user avatar
    Goodfire
    @GoodfireAI
    Jun 23
    Replying to @GoodfireAI
    Emotions in stories are a simple case study, but the lesson is general: a model's activations, viewed over time, trace trajectories along manifolds. Fully understanding models, and debugging and designing them, means studying how representations change over time! (4/5)
    2.6K
    user avatar
    Goodfire
    @GoodfireAI
    Jun 23
    Read the full post:
    Meandering on Manifolds: The Neural Geometry of Stories Over Time
    From goodfire.ai
    4.5K
  • Goodfire reposted
    user avatar
    Vmax
    @VmaxAI
    Jun 18
    Following the blog post from our collaboration with @GoodfireAI, the arxiv paper for PROPEL is now available.
    user avatar
    Augustine Mavor-Parker
    @MavorParker
    Jun 18
    Replying to @MavorParker
    The arxiv is now live! arxiv.org/abs/2606.18284
    3.8K
  • user avatar
    Goodfire
    @GoodfireAI
    Jun 17
    We're hosting a happy hour at ICML, Wednesday July 8! Come connect with members of the Goodfire team. Learn about our work in neural geometry and other recent publications. ​Note that space is limited, and we’re prioritizing attendees who are actively engaged in relevant AI
    15K
    user avatar
    Goodfire
    @GoodfireAI
    Jun 17
    Sign up for our ICML happy hour here:
    Goodfire ICML Happy Hour · Luma
    From luma.com
    2.1K
  • Goodfire reposted
    user avatar
    Santiago Aranguri
    @santiaranguri
    Jun 12
    Happy to see our work cited in the Claude Fable & Mythos system card! Steering against eval awareness can carry confounds (e.g. making the model more friendly). Interpretability can help us understand these, and is a promising source of new methods to deal with eval awareness.
    2.4K
  • user avatar
    Goodfire
    @GoodfireAI
    Jun 11
    Have you debugged your training data? You might not like what you find. Introducing predictive data debugging: reveal and shape what your model will learn before training. In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)
    00:00
    180K
    user avatar
    Goodfire
    @GoodfireAI
    Jun 11
    Replying to @GoodfireAI
    If you train models on preference data, you have a curriculum you've never read. Predictive data debugging lets you read it, understand it, and rewrite it. We've built it into Silico, our platform for model design. Request access to Silico here: goodfire.ai/silico (9/9)
    Build AI models the way you write software
    From goodfire.ai
    4.7K
    user avatar
    Goodfire
    @GoodfireAI
    Jun 11
    Read the full blog post on predictive data debugging:
    Predictive Data Debugging: Reveal and Shape What Your Model Learns, Before You Train
    From goodfire.ai
    4.4K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up