ShakespeareGPT
building & training GPT from scratch based off of Andrej Karpathy: Let's build GPT: from scratch, in code, spelled out. tutorial
dataset tiny-shakespeare : original with slight modifications.
tutorialGPT (following the video)
- basic_bigramLM.py : built a basic bigram model with generate to get things rolling.
- tutorial.ipynb : understood basic attention mechanism, using tril, masked_fill, softmax + notes on attention.
- LMwithAttention.py : continued the model but now with single attention head, token embeddings, positional embeddings.
- AttentionBlock.py : built a single attention head
- LM_multihead_attention_ffwd.ipynb : continued the model to now have multiple attention heads concantenated, and a separate feed forward layer before lm_head.
- tutorialGPT.ipynb : created the transformer block, layering, residual connections, better loss evaluation, dropout, layernorm.
Character Level GPT
used a character level tokenizer. Trained two versions with different configurations to better understand the impact of the hyperparameters such as n_embeds, num_heads.
Try on Kaggle
-
v1:
- notebook
- saved model
- results
-
v2:
- notebook
- saved model
- results
ShakespeareGPT
used a byte-pair encoding tokenizer.
Try on Kaggle
- gpt.py : the full GPT model
- dataset.py : torch dataset
- build_tokenizer.py : BPE tokenizer using
huggingface tokenizers from scratch similar to GPT-2 saved at tokenizer
- train.py : training script contains optimizer, config, loss function, train loop, validation loop, model saving
- generate.py : generate text by loading the model on CPU.
Versions
-
V1
n_embed = 384
n_heads = 12
head_size = 32
n_layers = 4
lr = 6e-4
attn_dropout = 0.1
block_dropout = 0.1
Train Loss: 4.020419597625732
Valid Loss: 6.213085174560547
- notebook
- saved model
- results
-
V2
n_embed = 384
n_heads = 6
head_size = 64
n_layers = 3
lr = 5e-4
attn_dropout = 0.2
block_dropout = 0.2
Train Loss: 3.933095216751099
Valid Loss: 5.970513820648193
- notebook
- saved model
- results
as always, an incredible tutorial by Andrej!