A re-implementation of the "Extracting Training Data from Large Language Models" paper by Carlini et al. The paper already has an official implementation - https://github.com/ftramer/LM_Memorization, from which I have borrowed parts of the code, at the same time improving the readability of a few functions.
However, the official repository does not cover -
I was really fascinated with the paper and wanted to implement it myself. Like the official implementation, I have also included a Samples.md file, which has some of the memorized content that I could extract from GPT-2. Although I am able to find some interesting memorized content, the results still have a few limitations -
Or, directly
pip install -r requirements.txt
The generated samples are ranked according to six membership inference metrics introduced in the paper:
The top 10 samples according to each metric are printed out, and the top 100 samples according to each metric ae logged in the outfile. These samples are likely to contain verbatim text from the GPT-2 training data.
python extraction_top_n.py --N 5000 --batch_size 20 --outfile top_n_samples.txt
This generates 5000 samples with GPT2-XL. The samples are generated with top-k sampling (k=40) and an empty prompt.
python extraction_temperature_decay.py --N 5000 --batch_size 20 --outfile temperature_decay_samples.txt
This generates 5000 samples with GPT2-XL. The samples are generated with sampling with temperature decay (decay the softmax temperature from 10 to 1 or the first 20 tokens and 1 for all subsequent tokens) and an empty prompt.
In the paper, the authors also tried prompting the GT2-XL model with snippets of text from the web (commoncrawl) which increased the chance of the model generating memorized content.
I used the same sample of the Crawl from May 2021 (~350 MB) used by the authors.
./download_cc.sh
Then,
python extraction_commoncrawl.py --N 5000 --batch_size 20 --outfile commoncrawl_samples.txt
All the generated sequences have a final length of atmost 256 tokens.
Some interesting outputs that were extracted from GPT-2 can be found here.