shakespeareGPT Download - shakespeareGPT Source code download

shakespeareGPT

AI Source Code

1.0.0

Download

ShakespeareGPT

building & training GPT from scratch based off of Andrej Karpathy: Let's build GPT: from scratch, in code, spelled out. tutorial

dataset tiny-shakespeare : original with slight modifications.

tutorialGPT (following the video)

basic_bigramLM.py : built a basic bigram model with generate to get things rolling.
tutorial.ipynb : understood basic attention mechanism, using tril, masked_fill, softmax + notes on attention.
LMwithAttention.py : continued the model but now with single attention head, token embeddings, positional embeddings.
AttentionBlock.py : built a single attention head
LM_multihead_attention_ffwd.ipynb : continued the model to now have multiple attention heads concantenated, and a separate feed forward layer before lm_head.
tutorialGPT.ipynb : created the transformer block, layering, residual connections, better loss evaluation, dropout, layernorm.

Character Level GPT

used a character level tokenizer. Trained two versions with different configurations to better understand the impact of the hyperparameters such as n_embeds, num_heads.

Try on Kaggle

v1:
- notebook
- saved model
- results
v2:
- notebook
- saved model
- results

ShakespeareGPT

used a byte-pair encoding tokenizer.

Try on Kaggle

gpt.py : the full GPT model
dataset.py : torch dataset
build_tokenizer.py : BPE tokenizer using huggingface tokenizers from scratch similar to GPT-2 saved at tokenizer
train.py : training script contains optimizer, config, loss function, train loop, validation loop, model saving
generate.py : generate text by loading the model on CPU.

Versions

  V1
  n_embed = 384
  n_heads = 12
  head_size = 32
  n_layers = 4
  lr = 6e-4
  attn_dropout = 0.1
  block_dropout = 0.1

  Train Loss: 4.020419597625732
  Valid Loss: 6.213085174560547

notebook
saved model
results

  V2
  n_embed = 384
  n_heads = 6
  head_size = 64
  n_layers = 3
  lr = 5e-4
  attn_dropout = 0.2
  block_dropout = 0.2

  Train Loss: 3.933095216751099 
  Valid Loss: 5.970513820648193