Log inSign up
Sebastian Raschka
19.8K posts
user avatar
Sebastian Raschka
@rasbt
ML/AI research engineer. Ex stats professor. Author of "Build a Large Language Model From Scratch" (amzn.to/4fqvn0D) & reasoning (mng.bz/lZ5B)
United States
sebastianraschka.com
Joined October 2012
1,156
Following
471K
Followers
  • Pinned
    user avatar
    Sebastian Raschka
    @rasbt
    Apr 4
    Components of a coding agent: a little write-up on the building blocks behind coding agents, from repo context and tool use to memory and delegation. Link: magazine.sebastianraschka.com/p/components-o…
    158K
  • user avatar
    Sebastian Raschka
    @rasbt
    Dec 28, 2022
    Looks like the first open source equivalent of ChatGPT has arrived: github.com/lucidrains/PaL… I.e., an implementation of RLHF (Reinforcement Learning with Human Feedback) on top of Google’s 540 billion parameter PaLM architecture
    GitHub - lucidrains/PaLM-rlhf-pytorch: Implementation of RLHF (Reinforcement Learning with Human...
    From github.com
    1.2M
  • user avatar
    Sebastian Raschka
    @rasbt
    Sep 13, 2025
    When I started LLMs-from-scratch I just hoped it might help a few people learn. Just saw the GitHub the repo has now been forked 10k times! More than the stars, the best part is seeing thousands of people actually use and build on the code ☺️
    234K
  • user avatar
    Sebastian Raschka
    @rasbt
    Jul 12, 2025
    Kimi K2 is basically DeepSeek V3 but with fewer heads and more experts:
    551K
  • user avatar
    Sebastian Raschka
    @rasbt
    Feb 9, 2025
    Maybe a hot take, but what about the following advice to the next gen: Don't get an AI degree; the curriculum will be outdated before you graduate. Instead, study math, stats, or physics as your foundation, and stay current with AI through code-focused books, blogs, and papers.
    285K
  • user avatar
    Sebastian Raschka
    @rasbt
    Aug 17, 2025
    Couldn't resist. Here's a pure PyTorch from-scratch re-implementation of Gemma 3 270M in a Jupyter Notebook (uses about 1.49 GB RAM): github.com/rasbt/LLMs-fro…
    user avatar
    Sebastian Raschka
    @rasbt
    Aug 14, 2025
    Gemma 3 270M! Great to see another awesome, small open-weight LLM for local tinkering. Here's a side-by-side comparison with Qwen3. Biggest surprise that it only has 4 attention heads!
    350K
  • user avatar
    Sebastian Raschka
    @rasbt
    Nov 27, 2023
    "Simplifying Transformer Blocks" ranks easily among my favorite research papers that I've read this year. Here, the authors look into how the standard transformer block, essential to LLMs, can be simplified without compromising convergence properties and downstream task
    1M
  • user avatar
    Sebastian Raschka
    @rasbt
    Oct 5, 2024
    The Llama 3.2 1B and 3B models are my favorite LLMs -- small but very capable. If you want to understand how the architectures look like under the hood, I implemented them from scratch (one of the best ways to learn): github.com/rasbt/LLMs-fro…
    293K
  • user avatar
    Sebastian Raschka
    @rasbt
    Jun 5, 2019
    Just reorganized and uploaded all the TensorFlow and PyTorch models and methods I implemented for teaching in a fresh GitHub repo -- 80 Jupyter notebooks in total :) github.com/rasbt/deeplear…
  • user avatar
    Sebastian Raschka
    @rasbt
    Aug 28, 2025
    I’ve been working on something new: 📚 Build a Reasoning Model (From Scratch). The first chapters just went live! (The book will cover topics from inference-time scaling to reinforcement learning)
    160K
  • user avatar
    Sebastian Raschka
    @rasbt
    Oct 22, 2024
    "What Matters In Transformers?" is an interesting paper (arxiv.org/abs/2406.15786) that finds you can actually remove half of the attention layers in LLMs like Llama without noticeably reducing modeling performance. The concept is relatively simple. The authors delete attention
    195K
  • user avatar
    Sebastian Raschka
    @rasbt
    Dec 17, 2023
    One of the best ways to understand LLMs is to code one from scratch! Last summer, I started working on a new book, "Build a Large Language Model (from Scratch)": manning.com/books/build-a-… I'm excited to share that the first chapters are now available via Manning's early access
    Build a Large Language Model (From Scratch) - Sebastian Raschka
    From manning.com
    680K
  • user avatar
    Sebastian Raschka
    @rasbt
    Oct 26, 2022
    My top-10 study list if I was learning machine learning again: 1. Python 2. Intro Data Science 3. Intro Machine Learning 4. Version Control 5. Intro Algos & Data Structures 6. Intro Linear Algebra 7. Intro Calculus 8. Deep Learning 9. Intro Proba & Stats 10. Parallel Computing
  • user avatar
    Sebastian Raschka
    @rasbt
    Oct 14, 2024
    Just put together a short Jupyter notebook with tips and tricks for reducing memory usage when loading larger and larger models (like LLMs) in PyTorch: github.com/rasbt/LLMs-fro… (PS: This is an LLM example but the same concepts apply to any PyTorch model)
    158K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up