I've been working on several natural language processing tasks for a long time. One day, I felt like drawing a map of the NLP field where I earn a living. I'm sure I'm not the only person who wants to see at a glance which tasks are in NLP.
I did my best to cover as many as possible tasks in NLP, but admittedly this is far from exhaustive purely due to my lack of knowledge. And selected references are biased towards recent deep learning accomplishments. I expect these serve as a starting point when you're about to dig into the task. I'll keep updating this repo myself, but what I really hope is you collaborate on this work. Don't hesitate to send me a pull request!
Oct. 13, 2017.
by Kyubyong
Reviewed and updated by YJ Choe on Oct. 18, 2017.
PAPER Automatic Text Scoring Using Neural NetworksPAPER A Neural Approach to Automated Essay ScoringCHALLENGE Kaggle: The Hewlett Foundation: Automated Essay ScoringPROJECT EASE (Enhanced AI Scoring Engine)WIKI Speech recognitionPAPER Deep Speech 2: End-to-End Speech Recognition in English and MandarinPAPER WaveNet: A Generative Model for Raw AudioPROJECT A TensorFlow implementation of Baidu's DeepSpeech architecturePROJECT Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition using DeepMind's WaveNetCHALLENGE The 5th CHiME Speech Separation and Recognition ChallengeDATA The 5th CHiME Speech Separation and Recognition ChallengeDATA CSTR VCTK CorpusDATA LibriSpeech ASR corpusDATA Switchboard-1 Telephone Speech CorpusDATA TED-LIUM CorpusDATA Open Speech and Language ResourcesDATA Common VoiceWIKI Automatic summarizationBOOK Automatic Text SummarizationPAPER Text Summarization Using Neural NetworksPAPER Ranking with Recursive Neural Networks and Its Application to Multi-Document SummarizationDATA Text Analytics Conferences (TAC)DATA Document Understanding Conferences (DUC)INFO Coreference ResolutionPAPER Deep Reinforcement Learning for Mention-Ranking Coreference ModelsPAPER Improving Coreference Resolution by Learning Entity-Level Distributed RepresentationsCHALLENGE CoNLL 2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotesCHALLENGE CoNLL 2011 Shared Task: Modeling Unrestricted Coreference in OntoNotesCHALLENGE SemEval 2018 Task 4: Character Identification on Multiparty DialoguesPAPER A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error CorrectionPAPER Neural Network Translation Models for Grammatical Error CorrectionPAPER Adapting Sequence Models for Sentence CorrectionCHALLENGE CoNLL-2013 Shared Task: Grammatical Error CorrectionCHALLENGE CoNLL-2014 Shared Task: Grammatical Error CorrectionDATA NUS Non-commercial research/trial corpus licenseDATA Lang-8 Learner CorporaDATA Cornell Movie--Dialogs CorpusPROJECT Deep Text CorrectorPRODUCT deep grammarPAPER Grapheme-to-Phoneme Models for (Almost) Any LanguagePAPER Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation LearningPAPER Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme ConversionPROJECT Sequence-to-Sequence G2P toolkitPROJECT g2p_en: A Simple Python Module for English Grapheme To Phoneme ConversionDATA Multilingual Pronunciation DataPAPER Automatic Sarcasm Detection: A SurveyPAPER Magnets for Sarcasm: Making Sarcasm Detection Timely, Contextual and Very PersonalPAPER Sarcasm Detection on Twitter: A Behavioral Modeling ApproachCHALLENGE SemEval-2017 Task 6: #HashtagWars: Learning a Sense of HumorCHALLENGE SemEval-2017 Task 7: Detection and Interpretation of English PunsDATA Sarcastic comments from RedditDATA Sarcasm Corpus V2DATA Sarcasm Amazon Reviews CorpusWIKI Symbol grounding problemPAPER The Symbol Grounding ProblemPAPER From phonemes to images: levels of representation in a recurrent neural model of visually-grounded language learningPAPER Encoding of phonology in a recurrent neural model of grounded speechPAPER Gated-Attention Architectures for Task-Oriented Language GroundingPAPER Sound-Word2Vec: Learning Word Representations Grounded in SoundsCOURSE Language Grounding to Vision and ControlWORKSHOP Language Grounding for RoboticsWIKI Language identificationPAPER AUTOMATIC LANGUAGE IDENTIFICATION USING DEEP NEURAL NETWORKSPAPER Natural Language Processing with Small Feed-Forward NetworksCHALLENGE 2015 Language Recognition EvaluationWIKI Language modelTOOLKIT KenLM Language Model ToolkitPAPER Distributed Representations of Words and Phrases and their CompositionalityPAPER Generating Sequences with Recurrent Neural NetworksPAPER Character-Aware Neural Language ModelsTHESIS Statistical Language Models Based on Neural NetworksDATA Penn TreebankTUTORIAL TensorFlow Tutorial on Language Modeling with Recurrent Neural NetworksWIKI LemmatisationPAPER Joint Lemmatization and Morphological Tagging with LEMMINGTOOLKIT WordNet LemmatizerDATA Treebank-3WIKI Lip readingPAPER LipNet: End-to-End Sentence-level LipreadingPAPER Lip Reading Sentences in the WildPAPER Large-Scale Visual Speech RecognitionPROJECT Lip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural NetworksPRODUCT LiopaDATA The GRID audiovisual sentence corpusDATA The BBC-Oxford 'Multi-View Lip Reading Sentences' (MV-LRS) DatasetPAPER Neural Machine Translation by Jointly Learning to Align and TranslatePAPER Neural Machine Translation in Linear TimePAPER Attention Is All You NeedPAPER Six Challenges for Neural Machine TranslationPAPER Phrase-Based & Neural Unsupervised Machine TranslationCHALLENGE ACL 2014 NINTH WORKSHOP ON STATISTICAL MACHINE TRANSLATIONCHALLENGE EMNLP 2017 SECOND CONFERENCE ON MACHINE TRANSLATION (WMT17) DATA OpenSubtitles2016DATA WIT3: Web Inventory of Transcribed and Translated TalksDATA The QCRI Educational Domain (QED) CorpusPAPER Multi-task Sequence to Sequence LearningPAPER Unsupervised Pretraining for Sequence to Sequence LearningPAPER Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot TranslationTOOLKIT Subword Neural Machine Translation with Byte Pair Encoding (BPE)TOOLKIT Multi-Way Neural Machine TranslationTOOLKIT OpenNMT: Open-Source Toolkit for Neural Machine TranslationWIKI InflectionPAPER Morphological Inflection Generation Using Character Sequence to Sequence LearningCHALLENGE SIGMORPHON 2016 Shared Task: Morphological ReinflectionDATA sigmorphon2016WIKI Entity linkingPAPER Robust and Collective Entity Disambiguation through Semantic EmbeddingsWIKI Named-entity recognitionPAPER Neural Architectures for Named Entity RecognitionPROJECT OSU Twitter NLP ToolsCHALLENGE Named Entity Recognition in TwitterCHALLENGE CoNLL 2002 Language-Independent Named Entity RecognitionCHALLENGE Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity RecognitionDATA CoNLL-2002 NER corpusDATA CoNLL-2003 NER corpusDATA NUT Named Entity Recognition in Twitter Shared taskTOOLKIT Stanford Named Entity RecognizerPAPER Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase DetectionPROJECT Paralex: Paraphrase-Driven Learning for Open Question AnsweringCHALLENGE SemEval-2015 Task 1: Paraphrase and Semantic Similarity in TwitterDATA Microsoft Research Paraphrase CorpusDATA Microsoft Research Video Description CorpusDATA Pascal DatasetDATA Flickr DatasetDATA The SICK data setDATA PPDB: The Paraphrase DatabaseDATA WikiAnswers Paraphrase CorpusPAPER Neural Paraphrase Generation with Stacked Residual LSTM NetworksDATA Neural Paraphrase Generation with Stacked Residual LSTM NetworksCODE Neural Paraphrase Generation with Stacked Residual LSTM NetworksPAPER A Deep Generative Framework for Paraphrase GenerationPAPER Paraphrasing Revisited with Neural Machine TranslationWIKI ParsingTOOLKIT The Stanford Parser: A statistical parserTOOLKIT spaCy parserPAPER Grammar as a Foreign LanguagePAPER A fast and accurate dependency parser using neural networksPAPER Universal Semantic ParsingCHALLENGE CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal DependenciesCHALLENGE CoNLL 2016 Shared Task: Multilingual Shallow Discourse ParsingCHALLENGE CoNLL 2015 Shared Task: Shallow Discourse ParsingCHALLENGE SemEval-2016 Task 8: The meaning representations may be abstract, but this task is concrete!WIKI Part-of-speech taggingPAPER Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary LossPAPER Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov ModelsDATA Treebank-3TOOLKIT nltk.tag packageWIKI Pinyin input methodPAPER Neural Network Language Model for Chinese Pinyin Input Method EnginePROJECT Neural Chinese TransliteratorWIKI Question answeringPAPER Ask Me Anything: Dynamic Memory Networks for Natural Language ProcessingPAPER Dynamic Memory Networks for Visual and Textual Question AnsweringCHALLENGE TREC Question Answering TaskCHALLENGE NTCIR-8: Advanced Cross-lingual Information Access (ACLIA)CHALLENGE CLEF Question Answering TrackCHALLENGE SemEval-2017 Task 3: Community Question AnsweringCHALLENGE SemEval-2018 Task 11: Machine Comprehension using Commonsense KnowledgeDATA MS MARCO: Microsoft MAchine Reading COmprehension DatasetDATA Maluuba NewsQADATA SQuAD: 100,000+ Questions for Machine Comprehension of TextDATA GraphQuestions: A Characteristic-rich Question Answering DatasetDATA Story Cloze Test and ROCStories CorporaDATA Microsoft Research WikiQA CorpusDATA DeepMind Q&A DatasetDATA QASentDATA Textbook Question AnsweringWIKI Relationship extractionPAPER A deep learning approach for relationship extraction from interaction context in social manufacturing paradigmCHALLENGE SemEval-2018 task 7 Semantic Relation Extraction and Classification in Scientific PapersWIKI Semantic role labelingBOOK Semantic Role LabelingPAPER End-to-end Learning of Semantic Role Labeling Using Recurrent Neural NetworksPAPER Neural Semantic Role Labeling with Dependency Path EmbeddingsPAPER Deep Semantic Role Labeling: What Works and What's NextCHALLENGE CoNLL-2005 Shared Task: Semantic Role LabelingCHALLENGE CoNLL-2004 Shared Task: Semantic Role LabelingTOOLKIT Illinois Semantic Role Labeler (SRL)DATA CoNLL-2005 Shared Task: Semantic Role LabelingWIKI Sentence boundary disambiguationPAPER A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical DomainTOOLKIT NLTK TokenizersDATA The British National CorpusDATA Switchboard-1 Telephone Speech CorpusWIKI Sentiment analysisINFO Awesome Sentiment AnalysisCHALLENGE Kaggle: UMICH SI650 - Sentiment ClassificationCHALLENGE SemEval-2017 Task 4: Sentiment Analysis in TwitterCHALLENGE SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and NewsPROJECT SenticNetPROJECT Stanford NLP Group Sentiment AnalysisDATA Multi-Domain Sentiment Dataset (version 2.0)DATA Stanford Sentiment TreebankDATA Twitter Sentiment CorpusDATA Twitter Sentiment Analysis Training CorpusDATA AFINN: List of English words rated for valencePAPER Video-based Sign Language Recognition without Temporal SegmentationPAPER SubUNets: End-to-end Hand Shape and Continuous Sign Language RecognitionDATA RWTH-PHOENIX-WeatherDATA ASLLRPPROJECT SignAllPAPER Singing voice synthesis based on deep neural networksPAPER A Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural SongsPRODUCT VOCALOID: voice synthesis technology and software developed by YamahaCHALLENGE Special Session Interspeech 2016 Singing synthesis challenge "Fill-in the Gap"WORKSHOP NLP+CSS: Workshops on Natural Language Processing and Computational Social ScienceTOOLKIT Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level ConstraintsTOOLKIT Online Variational Bayes for Latent Dirichlet Allocation (LDA)GROUP The University of Chicago Knowledge LabWIKI Source separationPAPER From Blind to Guided Audio Source SeparationPAPER Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source SeparationCHALLENGE Signal Separation Evaluation Campaign (SiSEC)CHALLENGE CHiME Speech Separation and Recognition ChallengeWIKI Speaker diarisationPAPER DNN-based speaker clustering for speaker diarisationPAPER Unsupervised Methods for Speaker Diarization: An Integrated and Iterative ApproachPAPER Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian FusionCHALLENGE Rich Transcription EvaluationWIKI Speaker recognitionPAPER A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORKPAPER DEEP NEURAL NETWORKS FOR SMALL FOOTPRINT TEXT-DEPENDENT SPEAKER VERIFICATIONPAPER Deep Speaker: an End-to-End Neural Speaker Embedding SystemPROJECT Voice Vector: which of the Hollywood stars is most similar to my voice?CHALLENGE NIST Speaker Recognition Evaluation (SRE)INFO Are there any suggestions for free databases for speaker recognition?DATA VoxCeleb2: Deep Speaker RecognitionWIKI Speech_segmentationPAPER Word Segmentation by 8-Month-Olds: When Speech Cues Count More Than StatisticsPAPER Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word EmbeddingsPAPER Unsupervised Lexicon Discovery from Acoustic InputPAPER Weakly supervised spoken term discovery using cross-lingual side informationDATA CALLHOME Spanish SpeechWIKI Speech synthesisPAPER Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram PredictionsPAPER WaveNet: A Generative Model for Raw AudioPAPER Tacotron: Towards End-to-End Speech SynthesisPAPER Deep Voice 3: 2000-Speaker Neural Text-to-SpeechPAPER Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided AttentionDATA The World English BibleDATA LJ Speech DatasetDATA Lessac DataCHALLENGE Blizzard Challenge 2017PRODUCT LyrebirdPROJECT The Festvox projectTOOLKIT Merlin: The Neural Network (NN) based Speech Synthesis SystemWIKI Speech enhancementBOOK Speech enhancement: theory and practicePAPER An Experimental Study on Speech Enhancement BasedonDeepNeuralNetworkPAPER A Regression Approach to Speech Enhancement BasedonDeepNeuralNetworksPAPER Speech Enhancement Based on Deep Denoising AutoencoderWIKI StemmingPAPER A BACKPROPAGATION NEURAL NETWORK TO IMPROVE ARABIC STEMMINGTOOLKIT NLTK StemmersWIKI Terminology extractionPAPER Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act DetectionWIKI Semantic similarityPAPER A Survey of Text Similarity ApproachesPAPER Learning to Rank Short Text Pairs with Convolutional Deep Neural NetworksPAPER Improved Semantic Representations From Tree-Structured Long Short-Term Memory NetworksCHALLENGE SemEval-2014 Task 3: Cross-Level Semantic SimilarityCHALLENGE SemEval-2014 Task 10: Multilingual Semantic Textual SimilarityCHALLENGE SemEval-2017 Task 1: Semantic Textual SimilarityWIKI Semantic Textual Similarity WikiWIKI Text simplificationPAPER Aligning Sentences from Standard Wikipedia to Simple WikipediaPAPER Problems in Current Text Simplification Research: New Data Can HelpDATA Newsela DataWIKI Textual entailmentPROJECT Textual Entailment with TensorFlowPAPER Textual Entailment with Structured Attentions and CompositionCHALLENGE SemEval-2014 Task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailmentCHALLENGE SemEval-2013 Task 7: The Joint Student Response Analysis and 8th Recognizing Textual Entailment ChallengeWIKI TransliterationINFO Transliteration of Non-Latin scriptsPAPER A Deep Learning Approach to Machine TransliterationCHALLENGE NEWS 2016 Shared Task on Transliteration of Named EntitiesPROJECT Neural Japanese Transliteration—can you do better than SwiftKey™ Keyboard?PAPER PHONETIC POSTERIORGRAMS FOR MANY-TO-ONE VOICE CONVERSION WITHOUT PARALLEL DATA TRAININGPROJECT Deep neural networks for voice conversion (voice style transfer) in TensorflowPROJECT An implementation of voice conversion system utilizing phonetic posteriorgramsCHALLENGE Voice Conversion Challenge 2016CHALLENGE Voice Conversion Challenge 2018DATA CMU_ARCTIC speech synthesis databasesDATA TIMIT Acoustic-Phonetic Continuous Speech CorpusWIKI Word embeddingTOOLKIT Gensim: word2vecTOOLKIT fastTextTOOLKIT GloVe: Global Vectors for Word RepresentationINFO Where to get a pretrained modelPROJECT Pre-trained word vectorsPROJECT Pre-trained word vectors of 30+ languagesPROJECT Polyglot: Distributed word representations for multilingual NLPPROJECT BPEmb: a collection of pre-trained subword embeddings in 275 languagesCHALLENGE SemEval 2018 Task 10 Capturing Discriminative AttributesPAPER Bilingual Word Embeddings for Phrase-Based Machine TranslationPAPER A Survey of Cross-Lingual Embedding ModelsINFO What is Word Prediction?PAPER The prediction of character based on recurrent neural network language modelPAPER An Embedded Deep Learning based Word PredictionPAPER Evaluating Word Prediction: Framing Keystroke SavingsDATA An Embedded Deep Learning based Word PredictionPROJECT Word Prediction using Convolutional Neural Networks—can you do better than iPhone™ Keyboard?CHALLENGE SemEval-2018 Task 2, Multilingual Emoji PredictionWIKI Word segmentationPAPER Neural Word Segmentation Learning for ChinesePROJECT Convolutional neural network for Chinese word segmentationTOOLKIT Stanford Word SegmenterTOOLKIT NLTK TokenizersDATA Word-sense disambiguationPAPER Train-O-Matic: Large-Scale Supervised Word Sense Disambiguation in Multiple Languages without Manual Training DataDATA Train-O-Matic DataDATA BabelNet