Thank YOU Luca Massaron for sharing your knowledge with our #community!
Last Friday, at Google HQ in Milan, I had the opportunity to speak about how modern LLMs actually function under the hood, at an event hosted by the Google Developer Group - PoliMi. A big thank you to Mouadh Ltifi and all the team for organizing everything, and to Google for the hospitality. Starting from the works of Sebastian Raschka and Andrej Karpathy, I demonstrated how it is possible to train a customized Gemma, starting from the 270M blueprints, and train it in a reasonable time on consumer hardware, obtaining sound results and learning from direct experience about its architecture and solutions: tokenization, embeddings, global and local sliding window attention, RoPE, RMSNorm, Gated GeLU, QK-Norm, pre-training tricks and tips. Everything has been implemented in PyTorch and JAX (https://lnkd.in/d95NgzhB). The code runs on GPUs and macOS Metal and is open source for users to try and learn from. In the coming weeks, I will write a few blog articles that reprise the explanations from this talk and provide more details on how to build your Gemma implementation from scratch and how LLMs work under the hood. #GoogleMilano #LLM #AI #MachineLearning #Gemmaverse