GPT-2 From Scratch to Tiny Stories

LLMs
PyTorch
Building the GPT-2 transformer from scratch
Published

September 10, 2024

Source: Attention is all you need by Vaswani et al.

The goal of this project is to build a GPT-2 transformer from scratch in PyTorch, making speed and memory optimizations, training a custom tokenizer (and discussing it’s mechanics), further tuning the model parameters to work best for TinyStories training on two 3090s, and finally generating stories from the trained model.