Goodfire (@GoodfireAI) / X

Goodfire

677 posts

Goodfire

@GoodfireAI

Using interpretability to understand, learn from, and design AI.

San Francisco

Joined August 2024

Pinned
Goodfire
@GoodfireAI
May 7
Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵
00:00
3.3M
Goodfire reposted
Lucius Bushnaq ⏹️
@BushnaqLucius
Jun 26
Replying to @bygregorr and @GoodfireAI
Checked. Same story as French and Spanish. The LoRAs wreck Dutch and Swedish performance, the single component edit suppression fine-tune leaves them alone.
2K
Goodfire
@GoodfireAI
Jun 25
Replying to @GoodfireAI
Correction: a plotting error caused the bars in the plot of off-target effects to display at 0.01 nats above the true means. The corrected plot is below:
6K
Goodfire
@GoodfireAI
Jun 25
We removed an LM's ability to speak German by fine-tuning on only 4 German tokens. As part of a 1-day hackathon with our product Silico, we removed a 67M-parameter language model's ability to predict German text, by tuning only a scalar factor on one subcomponent of the weights.
302K
Goodfire
@GoodfireAI
Jun 25
Replying to @GoodfireAI
Plus, that interpretability lets us notice and fix problems. E.g.: initially we tuned the top 16 German-related components, but their labels showed most were about foreign languages in general. So we narrowed to the single component for German alone, improving precision. (5/6)
7.7K
Goodfire
@GoodfireAI
Jun 25
This is an early demo of how parameter decomposition could enable targeted, predictable model editing. Details on this experiment: lesswrong.com/posts/ieoWstub… If you want to run experiments on your model too, learn more and request access to Silico: goodfire.ai/silico
lesswrong.com
Exploration: fine-tuning with parameter decomposition — LessWrong
TL;DR: We can destroy a 67M-parameter language model's ability to predict German text by fine-tuning a single number: the scalar prefactor on one Ger…
7.4K
Goodfire reposted
Eric Ho
@ericho_goodfire
Jun 24
we're hiring for a bunch of technical GTM roles at @GoodfireAI across forward deployed engineering, sales, and growth come help us understand every model across biology, materials, robotics, language, and more apply here or DM me: goodfire.ai/careers
23K
Goodfire
@GoodfireAI
Jun 23
Stories have shapes: a comedy rises toward joy; a tragedy falls into loss. Inside an LLM, that’s visible more literally: as an LLM reads a story, its internal activations trace a wandering path that reflects the model’s sense of what kind of story it is reading. (1/5)
00:00
00:08
Goodfire
@GoodfireAI
May 7
Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵
102K
Goodfire
@GoodfireAI
Jun 23
Replying to @GoodfireAI
Emotions in stories are a simple case study, but the lesson is general: a model's activations, viewed over time, trace trajectories along manifolds. Fully understanding models, and debugging and designing them, means studying how representations change over time! (4/5)
2.6K
Goodfire
@GoodfireAI
Jun 23
Read the full post:
Meandering on Manifolds: The Neural Geometry of Stories Over Time
From goodfire.ai
4.5K
Goodfire reposted
Vmax
@VmaxAI
Jun 18
Following the blog post from our collaboration with @GoodfireAI, the arxiv paper for PROPEL is now available.
Augustine Mavor-Parker
@MavorParker
Jun 18
Replying to @MavorParker
The arxiv is now live! arxiv.org/abs/2606.18284
3.8K
Goodfire
@GoodfireAI
Jun 17
We're hosting a happy hour at ICML, Wednesday July 8! Come connect with members of the Goodfire team. Learn about our work in neural geometry and other recent publications. Note that space is limited, and we’re prioritizing attendees who are actively engaged in relevant AI
15K
Goodfire
@GoodfireAI
Jun 17
Sign up for our ICML happy hour here:
Goodfire ICML Happy Hour · Luma
From luma.com
2.1K
Goodfire reposted
Santiago Aranguri
@santiaranguri
Jun 12
Happy to see our work cited in the Claude Fable & Mythos system card! Steering against eval awareness can carry confounds (e.g. making the model more friendly). Interpretability can help us understand these, and is a promising source of new methods to deal with eval awareness.
2.4K
Goodfire
@GoodfireAI
Jun 11
Have you debugged your training data? You might not like what you find. Introducing predictive data debugging: reveal and shape what your model will learn before training. In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)
00:00
180K
Goodfire
@GoodfireAI
Jun 11
Replying to @GoodfireAI
If you train models on preference data, you have a curriculum you've never read. Predictive data debugging lets you read it, understand it, and rewrite it. We've built it into Silico, our platform for model design. Request access to Silico here: goodfire.ai/silico (9/9)
Build AI models the way you write software
From goodfire.ai
4.7K
Goodfire
@GoodfireAI
Jun 11
Read the full blog post on predictive data debugging:
Predictive Data Debugging: Reveal and Shape What Your Model Learns, Before You Train
From goodfire.ai
4.4K