Fine tuning and text-generation using OpenAI's GPT-2 on blog data set from https://trustmeyourealive.wordpress.com/.
content-extraction : Extracting blog data using wordpress API
dataset : Train, validation, test datasets from extracted content
prepare_data.ipynb : Prepare data into train, valid, test files
text_generation.ipyb : Fine-tune GPT-2 on prepared train set and text generation
Total tokens : 246446 (76 articles)
Vocabulary : 50260
Training set(by line) :2752
Code files in transformers that need to be replaced after cloning: run_generation.py and run_language_modeling.py (instructions in text_generation.ipynb)
Frankly, I am in awe/shock - these sequences truly sound like me, and I'm quite relieved GPT-3 hasn't been open sourced (yet):