scaling laws for language transfer
1.0.0
Code and models from the blog post Scaling Laws for Language Transfer Learning
Building upon work from Scaling Laws for Transfer (Hernandez et. al. 2021), my experiments focused on exploring the relationships between fine-tuning on non-English languages and trying to answer the question: How much does pre-training on English help when transferring across different languages as we vary the dataset size and model size?
This repo contains the code for:
All English pre-trained models were trained for 26 billion tokens with no repeats: