awesome sentence embedding
1.0.0
A curated list of pretrained sentence and word embedding models
| date | paper | citation count | training code | pretrained models |
|---|---|---|---|---|
| - | WebVectors: A Toolkit for Building Web Interfaces for Vector Semantic Models | N/A | - | RusVectōrēs |
| 2013/01 | Efficient Estimation of Word Representations in Vector Space | 999+ | C | Word2Vec |
| 2014/12 | Word Representations via Gaussian Embedding | 221 | Cython | - |
| 2014/?? | A Probabilistic Model for Learning Multi-Prototype Word Embeddings | 127 | DMTK | - |
| 2014/?? | Dependency-Based Word Embeddings | 719 | C++ | word2vecf |
| 2014/?? | GloVe: Global Vectors for Word Representation | 999+ | C | GloVe |
| 2015/06 | Sparse Overcomplete Word Vector Representations | 129 | C++ | - |
| 2015/06 | From Paraphrase Database to Compositional Paraphrase Model and Back | 3 | Theano | PARAGRAM |
| 2015/06 | Non-distributional Word Vector Representations | 68 | Python | WordFeat |
| 2015/?? | Joint Learning of Character and Word Embeddings | 195 | C | - |
| 2015/?? | SensEmbed: Learning Sense Embeddings for Word and Relational Similarity | 249 | - | SensEmbed |
| 2015/?? | Topical Word Embeddings | 292 | Cython | |
| 2016/02 | Swivel: Improving Embeddings by Noticing What's Missing | 61 | TF | - |
| 2016/03 | Counter-fitting Word Vectors to Linguistic Constraints | 232 | Python | counter-fitting(broken) |
| 2016/05 | Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec | 91 | Chainer | - |
| 2016/06 | Siamese CBOW: Optimizing Word Embeddings for Sentence Representations | 166 | Theano | Siamese CBOW |
| 2016/06 | Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations | 58 | Go | lexvec |
| 2016/07 | Enriching Word Vectors with Subword Information | 999+ | C++ | fastText |
| 2016/08 | Morphological Priors for Probabilistic Neural Word Embeddings | 34 | Theano | - |
| 2016/11 | A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks | 359 | C++ | charNgram2vec |
| 2016/12 | ConceptNet 5.5: An Open Multilingual Graph of General Knowledge | 604 | Python | Numberbatch |
| 2016/?? | Learning Word Meta-Embeddings | 58 | - | Meta-Emb(broken) |
| 2017/02 | Offline bilingual word vectors, orthogonal transformations and the inverted softmax | 336 | Python | - |
| 2017/04 | Multimodal Word Distributions | 57 | TF | word2gm |
| 2017/05 | Poincaré Embeddings for Learning Hierarchical Representations | 413 | Pytorch | - |
| 2017/06 | Context encoders as a simple but powerful extension of word2vec | 13 | Python | - |
| 2017/06 | Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints | 99 | TF | Attract-Repel |
| 2017/08 | Learning Chinese Word Representations From Glyphs Of Characters | 44 | C | - |
| 2017/08 | Making Sense of Word Embeddings | 92 | Python | sensegram |
| 2017/09 | Hash Embeddings for Efficient Word Representations | 25 | Keras | - |
| 2017/10 | BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages | 91 | Gensim | BPEmb |
| 2017/11 | SPINE: SParse Interpretable Neural Embeddings | 48 | Pytorch | SPINE |
| 2017/?? | AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP | 161 | Gensim | AraVec |
| 2017/?? | Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics | 25 | C | - |
| 2017/?? | Dict2vec : Learning Word Embeddings using Lexical Dictionaries | 49 | C++ | Dict2vec |
| 2017/?? | Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components | 63 | C | - |
| 2018/04 | Representation Tradeoffs for Hyperbolic Embeddings | 120 | Pytorch | h-MDS |
| 2018/04 | Dynamic Meta-Embeddings for Improved Sentence Representations | 60 | Pytorch | DME/CDME |
| 2018/05 | Analogical Reasoning on Chinese Morphological and Semantic Relations | 128 | - | ChineseWordVectors |
| 2018/06 | Probabilistic FastText for Multi-Sense Word Embeddings | 39 | C++ | Probabilistic FastText |
| 2018/09 | Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks | 3 | TF | SynGCN |
| 2018/09 | FRAGE: Frequency-Agnostic Word Representation | 64 | Pytorch | - |
| 2018/12 | Wikipedia2Vec: An Optimized Tool for LearningEmbeddings of Words and Entities from Wikipedia | 17 | Cython | Wikipedia2Vec |
| 2018/?? | Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings | 106 | - | ChineseEmbedding |
| 2018/?? | cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information | 45 | C++ | - |
| 2019/02 | VCWE: Visual Character-Enhanced Word Embeddings | 5 | Pytorch | VCWE |
| 2019/05 | Learning Cross-lingual Embeddings from Twitter via Distant Supervision | 2 | Text | - |
| 2019/08 | An Unsupervised Character-Aware Neural Approach to Word and Context Representation Learning | 5 | TF | - |
| 2019/08 | ViCo: Word Embeddings from Visual Co-occurrences | 7 | Pytorch | ViCo |
| 2019/11 | Spherical Text Embedding | 25 | C | - |
| 2019/?? | Unsupervised word embeddings capture latent knowledge from materials science literature | 150 | Gensim | - |
| date | paper | citation count | code | pretrained models |
|---|---|---|---|---|
| - | Language Models are Unsupervised Multitask Learners | N/A | TF Pytorch, TF2.0 Keras |
GPT-2(117M, 124M, 345M, 355M, 774M, 1558M) |
| 2017/08 | Learned in Translation: Contextualized Word Vectors | 524 | Pytorch Keras |
CoVe |
| 2018/01 | Universal Language Model Fine-tuning for Text Classification | 167 | Pytorch | ULMFit(English, Zoo) |
| 2018/02 | Deep contextualized word representations | 999+ | Pytorch TF |
ELMO(AllenNLP, TF-Hub) |
| 2018/04 | Efficient Contextualized Representation:Language Model Pruning for Sequence Labeling | 26 | Pytorch | LD-Net |
| 2018/07 | Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation | 120 | Pytorch | ELMo |
| 2018/08 | Direct Output Connection for a High-Rank Language Model | 24 | Pytorch | DOC |
| 2018/10 | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | 999+ | TF Keras Pytorch, TF2.0 MXNet PaddlePaddle TF Keras |
BERT(BERT, ERNIE, KoBERT) |
| 2018/?? | Contextual String Embeddings for Sequence Labeling | 486 | Pytorch | Flair |
| 2018/?? | Improving Language Understanding by Generative Pre-Training | 999+ | TF Keras Pytorch, TF2.0 |
GPT |
| 2019/01 | Multi-Task Deep Neural Networks for Natural Language Understanding | 364 | Pytorch | MT-DNN |
| 2019/01 | BioBERT: pre-trained biomedical language representation model for biomedical text mining | 634 | TF | BioBERT |
| 2019/01 | Cross-lingual Language Model Pretraining | 639 | Pytorch Pytorch, TF2.0 |
XLM |
| 2019/01 | Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context | 754 | TF Pytorch Pytorch, TF2.0 |
Transformer-XL |
| 2019/02 | Efficient Contextual Representation Learning Without Softmax Layer | 2 | Pytorch | - |
| 2019/03 | SciBERT: Pretrained Contextualized Embeddings for Scientific Text | 124 | Pytorch, TF | SciBERT |
| 2019/04 | Publicly Available Clinical BERT Embeddings | 229 | Text | clinicalBERT |
| 2019/04 | ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission | 84 | Pytorch | ClinicalBERT |
| 2019/05 | ERNIE: Enhanced Language Representation with Informative Entities | 210 | Pytorch | ERNIE |
| 2019/05 | Unified Language Model Pre-training for Natural Language Understanding and Generation | 278 | Pytorch | UniLMv1(unilm1-large-cased, unilm1-base-cased) |
| 2019/05 | HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization | 81 | - | |
| 2019/06 | Pre-Training with Whole Word Masking for Chinese BERT | 98 | Pytorch, TF | BERT-wwm |
| 2019/06 | XLNet: Generalized Autoregressive Pretraining for Language Understanding | 999+ | TF Pytorch, TF2.0 |
XLNet |
| 2019/07 | ERNIE 2.0: A Continual Pre-training Framework for Language Understanding | 107 | PaddlePaddle | ERNIE 2.0 |
| 2019/07 | SpanBERT: Improving Pre-training by Representing and Predicting Spans | 282 | Pytorch | SpanBERT |
| 2019/07 | RoBERTa: A Robustly Optimized BERT Pretraining Approach | 999+ | Pytorch Pytorch, TF2.0 |
RoBERTa |
| 2019/09 | Subword ELMo | 1 | Pytorch | - |
| 2019/09 | Knowledge Enhanced Contextual Word Representations | 115 | - | |
| 2019/09 | TinyBERT: Distilling BERT for Natural Language Understanding | 129 | - | |
| 2019/09 | Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | 136 | Pytorch | Megatron-LM(BERT-345M, GPT-2-345M) |
| 2019/09 | MultiFiT: Efficient Multi-lingual Language Model Fine-tuning | 29 | Pytorch | - |
| 2019/09 | Extreme Language Model Compression with Optimal Subwords and Shared Projections | 32 | - | |
| 2019/09 | MULE: Multimodal Universal Language Embedding | 5 | - | |
| 2019/09 | Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks | 51 | - | |
| 2019/09 | K-BERT: Enabling Language Representation with Knowledge Graph | 59 | - | |
| 2019/09 | UNITER: Learning UNiversal Image-TExt Representations | 60 | - | |
| 2019/09 | ALBERT: A Lite BERT for Self-supervised Learning of Language Representations | 803 | TF | - |
| 2019/10 | BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension | 349 | Pytorch | BART(bart.base, bart.large, bart.large.mnli, bart.large.cnn, bart.large.xsum) |
| 2019/10 | DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | 481 | Pytorch, TF2.0 | DistilBERT |
| 2019/10 | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | 696 | TF | T5 |
| 2019/11 | CamemBERT: a Tasty French Language Model | 102 | - | CamemBERT |
| 2019/11 | ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations | 15 | Pytorch | - |
| 2019/11 | Unsupervised Cross-lingual Representation Learning at Scale | 319 | Pytorch | XLM-R (XLM-RoBERTa)(xlmr.large, xlmr.base) |
| 2020/01 | ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training | 35 | Pytorch | ProphetNet(ProphetNet-large-16GB, ProphetNet-large-160GB) |
| 2020/02 | CodeBERT: A Pre-Trained Model for Programming and Natural Languages | 25 | Pytorch | CodeBERT |
| 2020/02 | UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training | 33 | Pytorch | - |
| 2020/03 | ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators | 203 | TF | ELECTRA(ELECTRA-Small, ELECTRA-Base, ELECTRA-Large) |
| 2020/04 | MPNet: Masked and Permuted Pre-training for Language Understanding | 5 | Pytorch | MPNet |
| 2020/05 | ParsBERT: Transformer-based Model for Persian Language Understanding | 1 | Pytorch | ParsBERT |
| 2020/05 | Language Models are Few-Shot Learners | 382 | - | - |
| 2020/07 | InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training | 12 | Pytorch | - |
| date | paper | citation count | code | model_name |
|---|---|---|---|---|
| - | Incremental Domain Adaptation for Neural Machine Translation in Low-Resource Settings | N/A | Python | AraSIF |
| 2014/05 | Distributed Representations of Sentences and Documents | 999+ | Pytorch Python |
Doc2Vec |
| 2014/11 | Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models | 849 | Theano Pytorch |
VSE |
| 2015/06 | Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books | 795 | Theano TF Pytorch, Torch |
SkipThought |
| 2015/11 | Order-Embeddings of Images and Language | 354 | Theano | order-embedding |
| 2015/11 | Towards Universal Paraphrastic Sentence Embeddings | 411 | Theano | ParagramPhrase |
| 2015/?? | From Word Embeddings to Document Distances | 999+ | C, Python | Word Mover's Distance |
| 2016/02 | Learning Distributed Representations of Sentences from Unlabelled Data | 363 | Python | FastSent |
| 2016/07 | Charagram: Embedding Words and Sentences via Character n-grams | 144 | Theano | Charagram |
| 2016/11 | Learning Generic Sentence Representations Using Convolutional Neural Networks | 76 | Theano | ConvSent |
| 2017/03 | Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features | 319 | C++ | Sent2Vec |
| 2017/04 | Learning to Generate Reviews and Discovering Sentiment | 293 | TF Pytorch Pytorch |
Sentiment Neuron |
| 2017/05 | Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings | 60 | Theano | GRAN |
| 2017/05 | Supervised Learning of Universal Sentence Representations from Natural Language Inference Data | 999+ | Pytorch | InferSent |
| 2017/07 | VSE++: Improving Visual-Semantic Embeddings with Hard Negatives | 132 | Pytorch | VSE++ |
| 2017/08 | Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm | 357 | Keras Pytorch |
DeepMoji |
| 2017/09 | StarSpace: Embed All The Things! | 129 | C++ | StarSpace |
| 2017/10 | DisSent: Learning Sentence Representations from Explicit Discourse Relations | 47 | Pytorch | DisSent |
| 2017/11 | Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations | 128 | Theano | para-nmt |
| 2017/11 | Dual-Path Convolutional Image-Text Embedding with Instance Loss | 44 | Matlab | Image-Text-Embedding |
| 2018/03 | An efficient framework for learning sentence representations | 183 | TF | Quick-Thought |
| 2018/03 | Universal Sentence Encoder | 564 | TF-Hub | USE |
| 2018/04 | End-Task Oriented Textual Entailment via Deep Explorations of Inter-Sentence Interactions | 14 | Theano | DEISTE |
| 2018/04 | Learning general purpose distributed sentence representations via large scale multi-task learning | 198 | Pytorch | GenSen |
| 2018/06 | Embedding Text in Hyperbolic Spaces | 50 | TF | HyperText |
| 2018/07 | Representation Learning with Contrastive Predictive Coding | 736 | Keras | CPC |
| 2018/08 | Context Mover’s Distance & Barycenters: Optimal transport of contexts for building representations | 8 | Python | CMD |
| 2018/09 | Learning Universal Sentence Representations with Mean-Max Attention Autoencoder | 14 | TF | Mean-MaxAAE |
| 2018/10 | Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model | 35 | TF-Hub | USE-xling |
| 2018/10 | Improving Sentence Representations with Consensus Maximisation | 4 | - | Multi-view |
| 2018/10 | BioSentVec: creating sentence embeddings for biomedical texts | 70 | Python | BioSentVec |
| 2018/11 | Word Mover's Embedding: From Word2Vec to Document Embedding | 47 | C, Python | WordMoversEmbeddings |
| 2018/11 | A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks | 76 | Pytorch | HMTL |
| 2018/12 | Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond | 238 | Pytorch | LASER |
| 2018/?? | Convolutional Neural Network for Universal Sentence Embeddings | 6 | Theano | CSE |
| 2019/01 | No Training Required: Exploring Random Encoders for Sentence Classification | 54 | Pytorch | randsent |
| 2019/02 | CBOW Is Not All You Need: Combining CBOW with the Compositional Matrix Space Model | 4 | Pytorch | CMOW |
| 2019/07 | GLOSS: Generative Latent Optimization of Sentence Representations | 1 | - | GLOSS |
| 2019/07 | Multilingual Universal Sentence Encoder | 52 | TF-Hub | MultilingualUSE |
| 2019/08 | Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks | 261 | Pytorch | Sentence-BERT |
| 2020/02 | SBERT-WK: A Sentence Embedding Method By Dissecting BERT-based Word Models | 11 | Pytorch | SBERT-WK |
| 2020/06 | DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations | 4 | Pytorch | DeCLUTR |
| 2020/07 | Language-agnostic BERT Sentence Embedding | 5 | TF-Hub | LaBSE |
| 2020/11 | On the Sentence Embeddings from Pre-trained Language Models | 0 | TF | BERT-flow |