Built a ~9M param LLM from scratch to understand how they actually work. Vanilla transformer, 60K synthetic conversations, ~130 lines of PyTorch. Trains in 5 min on a free Colab T4. The fish thinks the meaning of life is food.<p>Fork it and swap the personality for your own character.
See also <a href="https://x.com/stevesi/status/2036921223150440542" rel="nofollow">https://x.com/stevesi/status/2036921223150440542</a> (<a href="https://xcancel.com/stevesi/status/2036921223150440542" rel="nofollow">https://xcancel.com/stevesi/status/2036921223150440542</a>)