open language models
1.0.0
This is a list of permissively licensed language models with MIT, Apache 2.0, or other similar licenses. We are using the term language model broadly here to include not only autoregressive models but also models that were trained with different objectives such as MLM.
This work was mostly inspired by Stella Biderman's Directory of Generative AI, and The Foundation Model Development Cheatsheet. But unlike these two very comprehensive sources, this work is meant to be a quick and more focused reference.
Important
This is still a work in progress. Contributions, corrections, and feedback are very welcome!
| Model | Parameters | Architecture | Encoder | Decoder | MoE | Year | Hugging Face | License |
|---|---|---|---|---|---|---|---|---|
| GPT-1 | 120M | Transformer | - | ✅ | - | 2018 | ? | MIT |
| BERT-Base-Cased | 110M | Transformer | ✅ | - | - | 2018 | ? | Apache 2.0 |
| BERT-Base-Uncased | 110M | Transformer | ✅ | - | - | 2018 | ? | Apache 2.0 |
| BERT-Large-Cased | 340M | Transformer | ✅ | - | - | 2018 | ? | Apache 2.0 |
| BERT-Large-Uncased | 340M | Transformer | ✅ | - | - | 2018 | ? | Apache 2.0 |
| GPT-2-Small | 124M | Transformer | - | ✅ | - | 2019 | ? | MIT |
| GPT-2-Medium | 355M | Transformer | - | ✅ | - | 2019 | ? | MIT |
| GPT-2-Large | 774M | Transformer | - | ✅ | - | 2019 | ? | MIT |
| GPT-2-XL | 1.5B | Transformer | - | ✅ | - | 2019 | ? | MIT |
| T5-Small? | 60M | Transformer | ✅ | ✅ | - | 2019 | ? | Apache 2.0 |
| T5-Base? | 220M | Transformer | ✅ | ✅ | - | 2019 | ? | Apache 2.0 |
| T5-Large? | 770M | Transformer | ✅ | ✅ | - | 2019 | ? | Apache 2.0 |
| T5-3B? | 3B | Transformer | ✅ | ✅ | - | 2019 | ? | Apache 2.0 |
| T5-11B? | 11B | Transformer | ✅ | ✅ | - | 2019 | ? | Apache 2.0 |
| XLM-RoBERTa-Large | 560M | Transformer | ✅ | - | - | 2019 | ? | MIT |
| XLM-RoBERTa-Base | 250M | Transformer | ✅ | - | - | 2019 | ? | MIT |
| RoBERTa-Base | 125M | Transformer | ✅ | - | - | 2019 | ? | MIT |
| RoBERTa-Large | 355M | Transformer | ✅ | - | - | 2019 | ? | MIT |
| DistilBERT-Base-Cased | 66M | Transformer | ✅ | - | - | 2019 | ? | Apache 2.0 |
| DistilBERT-Base-Uncased | 66M | Transformer | ✅ | - | - | 2019 | ? | Apache 2.0 |
| ALBERT-Base | 12M | Transformer | ✅ | - | - | 2019 | ? | Apache 2.0 |
| ALBERT-Large | 18M | Transformer | ✅ | - | - | 2019 | ? | Apache 2.0 |
| ALBERT-XLarge | 60M | Transformer | ✅ | - | - | 2019 | ? | Apache 2.0 |
| ALBERT-XXLarge | 235M | Transformer | ✅ | - | - | 2019 | ? | Apache 2.0 |
| DeBERTa-Base | 134M | Transformer | ✅ | - | - | 2020 | ? | MIT |
| DeBERTa-Large | 350M | Transformer | ✅ | - | - | 2020 | ? | MIT |
| DeBERTa-XLarge | 750M | Transformer | ✅ | - | - | 2020 | ? | MIT |
| ELECTRA-Small-Discriminator | 14M | Transformer | ✅ | - | - | 2020 | ? | Apache 2.0 |
| ELECTRA-Base-Discriminator | 110M | Transformer | ✅ | - | - | 2020 | ? | Apache 2.0 |
| ELECTRA-Large-Discriminator | 335M | Transformer | ✅ | - | - | 2020 | ? | Apache 2.0 |
| GPT-Neo-125M? | 125M | Transformer | - | ✅ | - | 2021 | ? | MIT |
| GPT-Neo-1.3B? | 1.3B | Transformer | - | ✅ | - | 2021 | ? | MIT |
| GPT-Neo-2.7B? | 2.7B | Transformer | - | ✅ | - | 2021 | ? | MIT |
| GPT-J? | 6B | Transformer | - | ✅ | - | 2021 | ? | Apache 2.0 |
| XLM-RoBERTa-XL | 3.5B | Transformer | ✅ | - | - | 2021 | ? | MIT |
| XLM-RoBERTa-XXL | 10.7B | Transformer | ✅ | - | - | 2021 | ? | MIT |
| DeBERTa-v2-XLarge | 900M | Transformer | ✅ | - | - | 2021 | ? | MIT |
| DeBERTa-v2-XXLarge | 1.5M | Transformer | ✅ | - | - | 2021 | ? | MIT |
| DeBERTa-v3-XSmall | 22M | Transformer | ✅ | - | - | 2021 | ? | MIT |
| DeBERTa-v3-Small | 44M | Transformer | ✅ | - | - | 2021 | ? | MIT |
| DeBERTa-v3-Base | 86M | Transformer | ✅ | - | - | 2021 | ? | MIT |
| DeBERTa-v3-Large | 304M | Transformer | ✅ | - | - | 2021 | ? | MIT |
| mDeBERTa-v3-Base | 86M | Transformer | ✅ | - | - | 2021 | ? | MIT |
| GPT-NeoX? | 20B | Transformer | - | ✅ | - | 2022 | ? | Apache 2.0 |
| UL2? | 20B | Transformer | ✅ | ✅ | - | 2022 | ? | Apache 2.0 |
| YaLM⚡ | 100B | Transformer | - | ✅ | - | 2022 | ? | Apache 2.0 |
| Pythia-14M? | 14M | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Pythia-70M? | 70M | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Pythia-160M? | 160M | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Pythia-410M? | 410M | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Pythia-1B? | 1B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Pythia-1.4B? | 1.4B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Pythia-2.8B? | 2.8B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Pythia-6.9B? | 6.9B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Pythia-12B? | 12B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Cerebras-GPT-111M | 111M | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Cerebras-GPT-256M | 256M | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Cerebras-GPT-590M | 590M | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Cerebras-GPT-1.3B | 1.3B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Cerebras-GPT-2.7B | 2.7B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Cerebras-GPT-6.7B | 6.7B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Cerebras-GPT-13B | 13B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| BTLM? | 3B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Phi-1 | 1.3B | Transformer | - | ✅ | - | 2023 | ? | MIT |
| Phi-1.5 | 1.3B | Transformer | - | ✅ | - | 2023 | ? | MIT |
| Phi-2 | 2.7B | Transformer | - | ✅ | - | 2023 | ? | MIT |
| RedPajama-INCITE-3B? | 2.8B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| RedPajama-INCITE-7B? | 6.9B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| FLM | 101B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| MPT-1B | 1.3B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| MPT-7B | 7B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| MPT-7B-8K | 7B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| MPT-30B | 30B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Mistral-7B-v0.1 | 7B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Mistral-7B-v0.2 | 7B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Mistral-7B-v0.3 | 7B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Falcon-1B | 1B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Falcon-7B | 7B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Falcon-40B | 40B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| TinyLlama | 1.1B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| OpenLLaMA-3B-v1? | 3B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| OpenLLaMA-7B-v1? | 7B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| OpenLLaMA-13B-v1? | 13B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| OpenLLaMA-3B-v2? | 3B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| OpenLLaMA-7B-v2? | 7B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| DeciLM-7B | 7B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Amber? | 7B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Solar | 10.7B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Mixtral-8x7B | 46.7B | Transformer | - | ✅ | ✅ | 2023 | ? | Apache 2.0 |
| OpenMoE-base-128B | 637M | Transformer | - | ✅ | ✅ | 2023 | ? | Apache 2.0 |
| Mamba-130M | 130M | SSM | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Mamba-370M | 370M | SSM | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Mamba-790M | 790M | SSM | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Mamba-1.4B | 1.4M | SSM | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Mamba-2.8B | 2.8B | SSM | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Mamba-2.8B-slimpj | 2.8B | SSM | - | ✅ | - | 2023 | ? | Apache 2.0 |
| OpenBA | 15B | Transformer | ✅ | ✅ | - | 2023 | ? | Apache 2.0 |
| Yi-6B | 6B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Yi-6B-200K | 6B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Yi-9B | 9B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Yi-9B-200K | 9B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Yi-34B-200K | 34B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Persimmon-8B | 8B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Palmyra-3B | 3B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Palmyra-Small-128M | 128M | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Palmyra-Base-5B | 5B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| Palmyra-Large-20B | 20B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| SEA-LION-3B | 3B | Transformer | - | ✅ | - | 2023 | ? | MIT |
| SEA-LION-7B | 7B | Transformer | - | ✅ | - | 2023 | ? | MIT |
| PLaMo-13B | 13B | Transformer | - | ✅ | - | 2023 | ? | Apache 2.0 |
| LiteLlama | 460M | Transformer | - | ✅ | - | 2024 | ? | MIT |
| H2O-Danube | 1.8B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| H2O-Danube2 | 1.8B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| Cosmo | 1.8B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| MobiLlama-0.5B | 0.5B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| MobiLlama-0.8B | 0.8B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| MobiLlama-1B | 1.2B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| OLMo-1B? | 1B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| OLMo-7B? | 7B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| OLMo-7B-Twin-2T? | 7B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| OLMo-1.7-7B? | 7B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| Poro | 34B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| Grok-1 | 314B | Transformer | - | ✅ | ✅ | 2024 | ? | Apache 2.0 |
| OpenMoe-8b-1.1T | 8B | Transformer | - | ✅ | ✅ | 2024 | ? | Apache 2.0 |
| OpenMoE-8B-1T | 8B | Transformer | - | ✅ | ✅ | 2024 | ? | Apache 2.0 |
| OpenMoE-8B-800B | 8B | Transformer | - | ✅ | ✅ | 2024 | ? | Apache 2.0 |
| OpenMoE-8B-600B | 8B | Transformer | - | ✅ | ✅ | 2024 | ? | Apache 2.0 |
| OpenMoE-8B-400B | 8B | Transformer | - | ✅ | ✅ | 2024 | ? | Apache 2.0 |
| OpenMoE-8B-200B | 8B | Transformer | - | ✅ | ✅ | 2024 | ? | Apache 2.0 |
| OpenMoE-34B-200B | 34B | Transformer | - | ✅ | ✅ | 2024 | ? | Apache 2.0 |
| Jamba | 52B | SSM-Transformer | - | ✅ | ✅ | 2024 | ? | Apache 2.0 |
| JetMoE | 8B | Transformer | - | ✅ | ✅ | 2024 | ? | Apache 2.0 |
| Mambaoutai | 1.6B | SSM | - | ✅ | - | 2024 | ? | Apache 2.0 |
| Tele-FLM | 52B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| Arctic-Base | 480B | Transformer | - | ✅ | ✅ | 2024 | ? | Apache 2.0 |
| Zamba-7B | 7B | SSM-Transformer | - | ✅ | ✅ | 2024 | ? | Apache 2.0 |
| Mixtral-8x22B-v0.1 | 141B | Transformer | - | ✅ | ✅ | 2024 | ? | Apache 2.0 |
| Granite-7b-base | 7B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| Chuxin-1.6B-Base? | 1.6B | Transformer | - | ✅ | - | 2024 | ? | MIT |
| Chuxin-1.6B-1M? | 1.6B | Transformer | - | ✅ | - | 2024 | ? | MIT |
| Neo? | 7B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| Yi-1.5-6B | 6B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| Yi-1.5-9B | 9B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| Yi-1.5-34B | 34B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| GECKO-7B | 7B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| Qwen2-0.5B | 0.5B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| Qwen2-1.5B | 1.5B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| Qwen2-7B | 7B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| Qwen2-57B-A14B | 57B | Transformer | - | ✅ | ✅ | 2024 | ? | Apache 2.0 |
| K2? | 65B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| Pile-T5-Base? | 248M | Transformer | ✅ | ✅ | - | 2024 | ? | Apache 2.0 |
| Pile-T5-Large? | 783M | Transformer | ✅ | ✅ | - | 2024 | ? | Apache 2.0 |
| Pile-T5-XL? | 2.85B | Transformer | ✅ | ✅ | - | 2024 | ? | Apache 2.0 |
| SmolLM-135M? | 135M | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| SmolLM-360M? | 360M | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| SmolLM-1.7B? | 1.7B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| GRIN | 42B | Transformer | - | ✅ | ✅ | 2024 | ? | MIT |
| OLMoE-1B-7B? | 7B | Transformer | - | ✅ | ✅ | 2024 | ? | Apache 2.0 |
| Zamba2-1.2B | 1.2B | SSM-Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| Zamba2-2.7B | 2.7B | SSM-Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
| Fox-1-1.6B | 1.6B | Transformer | - | ✅ | - | 2024 | ? | Apache 2.0 |
@misc{hamdy2024openlmlist,
title = {The Open Language Models List},
author = {Mohammed Hamdy},
url = {https://github.com/mmhamdy/open-language-models},
year = {2024},
}