The NLP arsenal mainly includes NLP competition strategy implementation, various task tutorials, experience posts, learning materials, and meeting time. If it is helpful to you, please give us a star.
This project mainly contains the following contents:
The project is constantly improving. If you have any suggestions, please leave a message in issue or contact us via email ([email protected]).
All content is collected and sorted by us from the public information on the Internet. The copyright belongs to the original author. If there is any infringement, please contact us immediately and we will deal with it in a timely manner.
It is not easy to organize. Please be sure to note the github link of this project when reprinting. Thank you for your contribution to maintaining a good open source environment.
Record the currently ongoing competition, with generous bonuses and suitable for NLPer with a certain foundation; the end time is the official website standard time or the meeting time.
| Field/Conference | competition | Registration time | End time |
|---|---|---|---|
| Large model | Knowledge Base Q&A Based on General Big Model Tips for Large Language Model Injection Offensive and Defensive Competition Open Source Software Security Application Intelligent Application Development Based on Big Model | 2023.08.23-10.24 2023.8.14-9.28 Same as above | 2023.11.03-11.05 2023.11 Same as above |
| CAIL2023 | 1. Judicial Examination 2. Dialogue-style case search 3. Class case search 4. Fact determination 5. Debate understanding 6. Information extraction 7. Judicial Model | 2023.8-11, please see the schedule of each event for details | 2023.12 |
| CHIP2023 | Evaluation 1: CHIP-PromptCBLUE medical big model evaluation task (no fine-tuning, parameter fine-tuning) Evaluation 2: Chinese medical text small sample named entity recognition evaluation task evaluation 3: Drug paper document recognition and entity relationship extraction task | 2023.8.1-9.27 | 2023.10.27-10.29 |
| SMP2023 | ChatGLM Financial Mockup Challenge | 2023.7.19-8.16 | 2023.9 |
| AI Developer Contest | Chinese semantic sentence recognition and correction challenge multilingual machine translation challenge person post matching challenge 2.0 Automotive Text Rule X Generalization Enhancement Challenge Based on Paper Abstract Text Classification and Keyword Drawing Challenge Machine Translation Quality Assessment Challenge 2023 Campus recruitment resume job application position matching project skills detection challenge Campus recruitment resume information integrity detection challenge cross-domain migration challenge Weibo comment robot ChatGPT Generate Text Detector Bid Entity Draw Challenge Natural Language-based Software Task Execution Challenge Academic Document Chapter Level Structure Recovery Challenge Academic Document Element Classification Challenge | 2023.5-9, please see the specific practices for each event | 2023.10.24 |
| DSTC11 | Track 4: Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems Track 5: Task-oriented Conversational Modeling with Subjective Knowledge | -2023.3 | 2023.8-9 |
| CCMT2023 | Translation evaluation of the Chinese-English, English-Chinese news fields organized by CCMT and WMT2023 in cooperation with China-English, English-Chinese news fields Translation evaluation of Uyghur, Mongolian, Chinese, Tibetan-Chinese Translation Quality Estimation Evaluation Automatic Translation Post-editing Review Belt and Road "Low Resource Language Machine Translation Task Chinese-Centered Multilingual Machine Translation Task Chinese-English Zero Referring Machine Translation Task | -2023.5.10 | 2023.10 |
| Thousand Words Dataset | Text generation, sentiment analysis, reading comprehension, Chinese dialogue, text similarity, semantic analysis, machine simultaneous transmission, information extraction, entity chain fingering, low-resource language translation, natural language reasoning, fact verification, interpretable evaluation, paragraph retrieval, video semantic understanding, 60 data sets | Now | None yet |
| Chinese Medical Information Processing Challenge List CBLUE | Current tasks include medical text information extraction (entity recognition, relationship extraction), medical term normalization, medical text classification, medical sentence relationship determination and medical QA, 8 sub-tasks, -> Official website | Now | None yet |
Record long-term training competitions, with rankings, which facilitates the first-time NLPer practice
| field | competition | Registration time | End time |
|---|---|---|---|
| Text classification | WEBSHELL Text Testing Learning Competition Medical Diagnosis Dialogue Intention Identification Challenge Zhihu Questions Automatically Annotate (Data) Data Analysis Expert Competition 1: User Emotional Visual Analysis Chinese News Text Title Classification Finance User Comments Classification Chinese Dialogue Sentiment Analysis News Text Classification Text Classification Fighting Attack False Job Recruitment Prediction Internet False News Detection During the Epidemic Period Internet Emotional Identification O2O shop food safety related comments discovery Internet news sentiment analysis automotive industry user opinions topics and emotional recognition film review text sentiment analysis spam classification short text classification competition-Turing federal emotional classification competition-Turing federal medical text classification-FlyAI Chinese spam message recognition - FlyAI English spam classification social networking website message content classification - FlyAI User mall evaluation sentiment analysis - FlyAI Stanford-Sentiment-Treebank Sentiment Analysis - FlyAI COLA English sentence comprehensibility classification - FlyAI Today's Headline News Classification - FlyAI American review website Yelp review prediction competition - FlyAI Thousand Words Dataset: Sentiment Analysis - Baidu AI Studio Kaggle-Contradictory, My Dear Watson Kaggle-Natural Language Processing with Disaster Tweets CLEF 2019 Lab ProtestNews (Document, Sentence, Token) | - - Ended - - - - - - - - - - - - - - Every month 1st - - - - - - - - - - - - - | 2024.02 2024.02 Ended 2022.4.30 2023.01 - - - 2021.12.31 - - - - - - - - On the 27th of every month - - - - - - - - - 2023.1 - - - |
| Text Matching | Medical search Query correlation judgment Quora - Detect whether two problems are duplicated - FlyAI Thousand Words Dataset: Text Similarity Thousand Words Dataset: Problem Matching Robust English Text Semantic Similarity IMDB Comment Spoiler Test Medical Search Query Relevance Judgment CCKS2021 Chinese NLP Address Relevance Task (Data Set) | - - - - - - -2022.9.30 - | 2024.02 - 2023.1 2023.1 - - 2022.10.7 - |
| Text Inclusion | Contradictory, My Dear Watson | - | - |
| Recommended system | Alibaba Mobile Recommendation Algorithm Challenge Zero Foundation Introductory Recommendation System - News Recommendation Tianchi Newcomer Challenge: Alibaba Mobile Recommendation Algorithm E-commerce User Purchase Behavior Prediction Book Recommendation System | - - - - - | 2024.02 - - - - |
| Q&A | Epidemic Government Affairs Q&A Assistant Medical Intelligent Q&A-FlyAI 2021 Psychological Dialogue Q&A Challenge CommonsenseQA Dataset OpenBookQA Dataset | - - - - - | - - - 2026.4.15 2026.4.15 |
| Semantic analysis | Thousand Words Dataset: Semantic Analysis | - | 2023.1 |
| summary | Automatic summary of media articles Zhihu text summary news summary automatically generates Q&A summary and reasoning (end: 2023.1) | - | - |
| voice | Chinese speech recognition in life scenes | - | - |
| Information extraction | CCKS2021 Chinese NLP address element analysis CCF BDCI Text Entity Identification and Relationship Extraction Thousand Words Dataset: Information Extraction English Text Entity Relationship Extraction Legal Field Chapter Level Multi-event Detection | - - - - - | 2024.02 - 2023.1.1 - - |
| Entity chain | Thousand Words Dataset: Entity Chain Reference | - | 2023.1.1 |
| Machine Translation | Thousand Words Dataset: Low Resource Language Translation Machine Translation Field Adaptation | - - | 2023.1.1 - |
| Entity Identification | Named entity recognition in Chinese - FlyAI | - | - |
| Relationship extraction | English text entity relationship extraction (with data) | Ended | Ended |
| Position testing | Chinese Weibo's position detection - FlyAI Weibo position testing | - - | - - |
| dialogue | MuTual Dataset Thousand Words Dataset: Oral Comprehension in Open Domain Dialogue Dialogue System | - - - | 2026.4.15 2023.1.1 - |
| Text2SQL | Yale text to SQL | - | - |
| Reading Comprehension | Thousand Words Dataset: Reading Comprehension Chinese Reading Comprehension Practice Competition - FlyAI RACE Dataset RACE-C Dataset Dream Dataset C3 Dataset SciQ Dataset LogiQA Dataset MCTest Dataset OpenBookQA Dataset | - - - - - - - - - - | 2023.1.1 - 2026.4.15 2026.4.15 2026.4.15 2026.4.15 2026.4.15 2026.4.15 2026.4.15 2026.4.15 |
| Graph | HGB-Node Classification HGB-Link Prediction HGB-Knowledge-aware Recommendation | 2021.6.28- | 2030.6 |
| other | Comment emotional word extraction (including data) | Ended | Ended |
Here is a recorded and organized competition, including data download and competition plan
| Table of contents | Events |
|---|---|
| Text classification | 2018 Franco-Crime Prediction 2018 Fayan Cup - Legal Article Recommendation 2019 Fayan Cup - Element Identification 2019CHIP-Clinical Trial Screening Criteria Short Text Classification The correlation calculation model between "Technical Requirements" and "Technical Achievements" projects in 2019 2020smp Weibo Sentiment Analysis Review 2020 Baidu Artificial Intelligence Open Source Competition-Visual Reading Comprehension Task 2020CCKS COVID-19 Knowledge Graph Construction and Q&A Review - Sub-task 1: Inference of the New Coronavirus Encyclopedia Knowledge Graph Type 2020CCKS COVID-19 Knowledge Graph Construction and Q&A Review - Subtask 2: Prediction of the upper and lower relationships of the COVID-19 concept map 2021SMP-ECISA Chinese Implicit Sentiment Analysis Evaluation 2021DIGIX-Quality Discrimination of Pre-trained Articles Based on Multi-Model Migration 2021 Test Tag Prediction Challenge 2021 Simple Triage Challenge for Non-Standardized Disease Requirements 2021CHIP-Medical Dialogue Clinical Discovery Task 2021CCL-Evaluation on Chinese Space Semantics Understanding 2021CCL-"Maverick Cup" multi-modal humor recognition evaluation 2022 Epidemic Weibo Emotion Recognition Challenge 2022 Simple Triage Challenge for Non-Standardized Disease Requirements 2.0 2022 Machine Translation Quality Assessment Challenge 2022 Text classification and query questions and answers based on paper abstracts 2022 Application Type Identification Challenge 2022 Amazon KDD Cup (task2 Multi-class Product Classification, task3 Product Substitute Identification) 2022 Medical Search Intent Identification Challenge [2022CCF BDCI small sample data classification task](./Previous competitions/Text classification/2022CCF BDCI small sample data classification task.md) 2023CCL Telecom Network Fraud Cases Classification Evaluation |
| Entity chain | 2019CCKS Chinese short text entity chain reference 2020CCKS Entity Chain Referring Tasks for Short Chinese Text 2020CCKS Title-based Large-scale Product Entity Retrieval 2020 Thousand Words Dataset: Entity Chain Referring to Chinese Short Text 2021SDU@AAAI-Task2-Acronym Disambiguation |
| Entity Identification | 2019 New Internet Finance Entity Discovery 2020CHIP-Traditional Medicine Instructions Entity Identification Challenge 2020CHIP-Chinese Medical Text Naming Entity Recognition 2020CCKS named entity recognition for experimental identification 2020CCKS Medical Entities and Event Extraction for Chinese Electronic Medical Records - Subtask 1: Medical Named Entity Identification 2021 Intelligent Medical Decision Identification and early warning of risk events of Internet public opinion enterprises in 2021 2021 Haitong & Industry-2021 Identification and early warning of risk events of Internet public opinion enterprises |
| Problem generation | 2020CHIP-Collection of TCM Literature Problems |
| Summary generation | 2020 Fa-Research Cup-Judicial Summary 2021MEDIQA-Summarization of Consumer Health Questions 2021MEDIQA-Summarization of Multiple Answers 2021MEDIQA-Summarization of Radiology Reports |
| Syntactic Analysis | 2021CCL- Cross-Domain Syntax Analysis Evaluation 2021CCL-Chinese translation-Nihao unsupervised Chinese word segmentation evaluation |
| Reading Comprehension | 2018 Machine Reading Comprehension Technology Competition 2019 Fayan Cup-Reading Comprehension 2020 Fayan Cup-Reading Comprehension 2020 Language and Intelligent Technology Competition: Machine Reading Comprehension Task 2021 Haihua AI Challenge·Chinese Reading Comprehension (Technical Group) 2021 Language and Intelligent Technology Competition: Machine Reading Comprehension Task 2021NLPCC-AIDebater |
| Text Matching | 2019 Big Data Challenge Negative and subject judgment of financial information in 2019 2019CHIP-Disease Q&A Transfer Learning Competition 2019CHIP-Clinical Terminology Standardization Task 2019 Fayan Cup-Similar Case Matching 2020 "Public Welfare AI Star" Challenge-The New Coronavirus Epidemic Sentences and Judgment Competition 2020 Real Estate Industry Chat Matching Q&A 2020CHIP-Clinical Terminology Standardization Task 2020 Fa Research Cup - Debate and Digging 2021 Sohu Campus Text Matching Algorithm Competition 2021 Xiaobu Assistant Dialogue Short Text Semantic Matching 2021CHIP-Clinical Terminology Standardization Task |
| dialogue | 2019 SMP Chinese human-computer dialogue technology evaluation 2020 Thousand Words: Multi-Skill Dialogue 2020 Language and Intelligent Technology Competition: Recommended Dialogue Tasks 2021SMP dialogue AI algorithm technology evaluation (small sample dialogue intention recognition and slot extraction, dialogue reference digestion and omission recovery) 2021CCL-Intelligent Dialogue Diagnosis and Treatment Evaluation Competition 2021DSTC10 |
| Text2SQL | 2019 Chinese NL2SQL Challenge 2020 Language and Intelligent Technology Competition: Semantic Analysis Task |
| Q&A | 2020CCKS COVID-19 Knowledge Graph Construction and Q&A Review - Sub-task 4: COVID-19 Encyclopedia Knowledge Graph Question&A Review 2020 Fa Research Cup-Judicial Examination |
| Information extraction | 2020 iFLYTEK Event Draw Challenge 2020 Language and Intelligent Technology Competition: Relationship Extraction Task 2020 Language and Intelligent Technology Competition: Event Extraction Tasks 2020-SemEval Task 6: Definition Extraction from Free Text with the DEFT Corpus 2020CCKS Medical Entities and Event Extraction for Chinese Electronic Medical Records - Subtask 2: Medical Event Extraction 2020CCKS small sample cross-class migration event extraction for the financial field 2020CCKS's chapter-level event subjects and factors extraction for the financial field 2020CHIP-Chinese medical text entity relationship extraction 2021 Language and Intelligent Technology Competition: Multi-form information extraction task 2021 Medical Entity and Relationship Identification Challenge 2021NLPCC-AutoIE 2 2021CHIP-Clinical Discovery Event Extraction Task 2021SDU@AAAI-Task1-Acronym Identification |
| Machine Translation | 2020CCMT-Bilingual, multilingual, pronunciation, quality assessment, corpus filtering 2021 NAACL Simultaneous Workshop: Thousand Words - Simultaneous Machine 2021 Low Resource Multilingual Text Translation Challenge 2021 Domain Migration Machine Translation Challenge 2021CCMT-Bilingual, multilingual, low resources, automatic translation and post-editing, quality evaluation, corpus filtering |
| other | 2018 Fayan Cup - Criminal Prediction 2020NLP Chinese Pre-training Model Generalization Ability Challenge 2020CCKS COVID-19 Knowledge Graph Construction and Q&A Review - Sub-task 3: Link prediction of antiviral drug map for COVID-19 research 2021 Future Cup - Explore the Future of Technology (Paper Recommendation) 2021NLPCC-FewCLUE |
NLP-related academic, industry, theory, practice and current events
| platform | Main fields | Self-media |
|---|---|---|
| WeChat official account | technology | Coggle Data Science, DataFunTalk (industry-oriented) |
| Industry information | The Heart of Machines, Machine Energy, AI Reporting, AI Frontline, AI Technology Review, Machine Learning Research Group Subscription | |
| Academic | Science Space, PaperWeekly, Zhiyuan Community, Frontier Artificial Intelligence Teaching, Special Knowledge, AINLP, AI TIME On Tao, Xi Xiaoyao’s Cute House, Machine Learning Algorithms and Natural Language Processing (MLNLP) | |
| BiliBili | Frontier Forum | Zhiyuan Community, AITIME Discussion |
| Base | Learn AI from Li Mu | |
| website | competition | Coggle Data Science, CompHub |
| Academic | Paper With Code, AMiner Academic Headlines, Science Space |
Warning: Please carefully evaluate the credibility of third-party platforms and beware of the leakage of important information such as code and data.
| platform | Computing power | price | illustrate |
|---|---|---|---|
| featurize | 2080Ti, 3090 | 2080Ti (¥2/h), 3090 (¥3.6/h) | Mirroring environment, flexible use, can be connected remotely through jupyter_lab, vscode, and pycharm |
| AutoDL | rtx a5000, 3090, A100 | ¥0.6/h~¥8.5/h | Standalone SSH connection, the storage space is not large, but it is very cheap |
| Zhixing Cloud | 1080Ti, 3080, 3090, V/A100, etc. | ¥2.1/h~¥11/h | The whole machine can be connected remotely (pycharm/vs code) |
| Fengyun Platform | ML270 | ¥2.8/h | One-stop AI computing platform, the CPU can be configured incrementally and the fee is charged according to the running time |
| Hengyuan Cloud | 2080Ti, 3060, 3090, V100, etc. | ¥1.25/h~¥5.5/h | It can be matched with a complete CPU and hard drive, which has higher degrees of freedom than Bithub. It is currently in the promotion period and has many discounts. |
| Parallel cloud | V100, 2080Ti, P100, etc. | unknown | The computing node comes from supercomputer, which can personalize the number of CPU cores, GPU, and storage space, has a very simple operation interface, and provides a remote Linux desktop, with a flexibility better than the above three platforms. Currently in the promotion period, there are many discounts |
| AI Studio | V100 | Basically free | Developed by Baidu, occasionally you can apply for V100 and you can get up to 8 cards for free. The Paddle Paddle Paddle framework is mainly used. Other frameworks need to be tossed by themselves. X2Paddle can also be converted into Paddle Paddle code and model with one click. Most competitions will be participated in. |
| Tianchi DSW | p100 | Free, limited to 8 hours per time, no limit on the number of times | An online platform in Alibaba cannot be shut down when running |
| Tianchi Laboratory | V100 | Free, 60h/year | Compared with AI Studio, which does not restrict deep learning frameworks, the time is relatively short |
| Kaggle | k80 | Free, limited to 30 hours per week | External Internet access |
| Google Colab | k80, T4, P4, P100 | Free, limited to 12 hours per time | Access to the external network, the specific GPU cannot be specified. Users who do not subscribe to Colab Pro will be assigned k80 most of the time. |
Catalog of International Academic Conferences and Journals Recommended by the Chinese Computer Society-2022
Catalog of Chinese Science and Technology Journals Recommended by the Chinese Computer Society
dblp: Computer Science Literature Library
AI Conference deadline: Conference Countdown Meeting Time Record Table: Updated by Jackie Tseng, Tsinghua Computer Vision and Intelligent Learning Lab
note: The following time is the default time on the official website, and has not yet been converted to Beijing time
| Meeting | level | Summary cut-off | Original text cut-off | Notice of review | Meeting time | illustrate |
|---|---|---|---|---|---|---|
| ICLR (official website, dblp) | * | 2023.9.21 | 2023.9.28 | 2023.11.10(review), 2024.1.15(final) | 2024.5.7-5.11 | Vienna |
| ACL (official website, dblp) | CCF-A | Toronto, Canada | ||||
| NeurIPS (official website, dblp) | CCF-A | 2023.9.21 | 2023.12.10-12.16 | New Orleans Ernest N. Morial Convention Center | ||
| ICML (official website, dblp) | CCF-A | * | ? | ? | 2024.7.21-7.27 | Messe Wien Exhibition Congress Center |
| SIGIR (official website, dblp) | CCF-A | Taipei, Taiwan | ||||
| WWW(official website, dblp) | CCF-A | 2023.10.5 | 2023.10.12 | 2023.12.1-12.14 (rebuttal) 2024.2.1 (final) | 2024.5.13-5.17 | Singapore |
| AAAI (official website, dblp) | CCF-A | 2023.9.27(phase 1 rejections), 2023.12.19(final) | 2024.2.20-2.27 | VANCOUVER, CANADA | ||
| IJCAI (official website, dblp) | CCF-A | Cape Town, South Africa | ||||
| EMNLP (official website, dblp) | CCF-B | 2023.8.22~8.28 (rebuttal), 2023.10.6 | 2023.12.6-12.10 | Singapore | ||
| NAACL (official website, dblp) | CCF-B | * | 2023.12.15 (ARR), 2024.2.20 (Commitment) | 2024.3.15 | 2024.6.16-6.21 | Mexico City, Mexico |
| COLING (official website, dblp) | CCF-B | * | Gyeongju, Korea | |||
| CoNLL (official website, dblp) | CCF-C | * | 2023.10.6 | 2023.12.6-12.7 | colocated with emnlp2023 | |
| NLPCC (official website, dblp) | CCF-C | * | 2023.10.12-10.15 | Foshan | ||
| IJCNN (official website, dblp) | CCF-C | * | Queensland, Australia | |||
| ICONIP (official website) | CCF-C | * | New Delhi, India | |||
| ACML (official website) | CCF-C | * | 2023.5.26 (Journal) | 2023.8.11-8.18 (rebuttal), 9.8 (final); 2023.7.7 (first review), 9.8 (final) | 2023.11.11-11.14 | İstanbul, Turkey |
| AACL (official website) | * | * | 2023.8.2-8.9(rebuttal), 9.4(final) | 2023.11.1-11.4 | Bali, Indonesia | |
| EACL (official website, dblp) | * | * | Kiev, Ukraine, online | |||
| CCL (official website, dblp) | * | * | Harbin | |||
| CCKS (official website, dblp) | * | * | Shenyang | |||
| SMP (official website, dblp) | * | * | 2023.11.24-11.26 | Beijing | ||
| CCMT (official website) | * | * | 2023.10.19-10.21 | Jinan, Shandong |