awesome_Chinese_medical_NLP
Chinese Medicine NLP public resource organization: term set/corpus/word vector/pretrained model/knowledge graph/named entity recognition/QA/information extraction/etc
Benchmark
- Chinese Medical Information Processing Challenge List CBLUE Dataset Baseline Chinese Medical Information Processing Challenge List CBLUE (Chinese Biomedical Language Understanding Evaluation) is initiated by the Medical Health and Bioinformatics Processing Professional Committee of the Chinese Society of Chinese Information under the concept of legal open sharing. It is hosted by Alibaba Cloud Tianchi Platform and co-organized by Yidu Cloud (Beijing) Technology Co., Ltd., Ping An Medical Technology, Peking University, Zhengzhou University, Pengcheng Laboratory, Harbin Institute of Technology (Shenzhen), Tongji University, Quark, Alibaba Damo Academy and other units that carry out smart medical research. It aims to promote the development of Chinese medicine NLP technology and community.
Term set/corpus
- medical-news Chinese medical news crawler
- medical-books Chinese LaTex open source medical books
- THUOCL Tsinghua University thunlp group medical vocabulary
- ICD9 ICD-9 Chinese corresponding
- ICD10 ICD-10 Chinese corresponding
- ICD11 ICD-11 Chinese corresponding
- OMAHA Tangram Medical Terminology Collection Sample Data
- Chinese diabetes annotation dataset contains entity annotation and relation annotation
Word vector/pretrained model
- ChineseEHRBert Chinese electronic medical record pre-training Bert; use Bert to test named entity recognition, question-and-answer model, relationship extraction tasks
- MC-BERTCHNESEBLUE dataset and model
- bertcner Pre-trained Chinese medicine Bert model for named entity recognition
- PCL-MedBERT Pengcheng Medical BERT pre-training model
- Exploration and research on the application of the medbert BERT model in Chinese clinical natural language processing
- Chinese-Word2vec-Medicine Word vector in Chinese biomedical field
- SMedBERT SMedBERT
- eHealth Building Chinese Biomedical Language Models via Multi-Level Text Discrimination
Participle
- PKUSEG PKUSEG word segmentation tool, model supports selection of medicine
- cmekg medical word segmentation tool github cmekg medical word segmentation tool
- GTS contains 922 Chinese medical word participle test set with two particle sizes marked with thickness
Knowledge graph/relational extraction
- cMeKG github Chinese Medical Knowledge Graph
- Ruijin Hospital Artificial Intelligence Assisted Construction of Knowledge Graph Contest Diabetes and Entity Relationship Tasks for Entity Annotation and Extraction of Diabetes Clinical Guidelines
- OMAHA Knowledge Graph (Drug Indications) Knowledge Graph Data on Drugs and Drug Indications Constructed by the Open Medical and Healthcare Alliance (OMAHA)
- Medical Knowledge Graph Data Medical Knowledge Graph Data (ownthink)
- Patient Event Map Dataset Patient Event Map is a new RDF-based medical observational data representation model that can clearly represent various event types such as clinical examination, diagnosis, and treatment, as well as the timing relationship of events. Using the electronic medical record data of three Shanghai Grade A hospitals, a medical data set including 3 specialties, 173,395 medical events, 501,335 event timing relationships, and linking with 5,313 knowledge base concepts was constructed.
- Chinese Symptom Library This is a dataset containing symptom entities and symptom-related triplets. The data from the Chinese symptom database comes from 8 mainstream health consultation websites, 3 Chinese encyclopedia websites and electronic medical records. It also contains the results of Chinese symptoms and the concepts in UMLS.
- The knowledge graph of traditional Chinese medicine case extracts clinical knowledge from medical cases to build a knowledge graph to help users understand the clinical manifestations of traditional Chinese medicine treatments, as well as the clinical manifestations of diseases (such as "chronic gastritis"), related therapies, related health care methods, etc.
- Herbnet aims at traditional Chinese medicine research and constructs a Chinese medicine body including traditional Chinese medicine diseases, prescriptions, traditional Chinese medicine, chemical components, pharmacological effects, traditional Chinese medicine experiments, and chemical experimental methods based on the characteristics of the traditional Chinese medicine field model. Furthermore, a series of database integration is realized based on the ontology, thereby building a Chinese medicine knowledge graph.
- CHIP2020 Chinese medical text entity relationship extraction
- CCKS2020 New Coronavirus Knowledge Graph Construction and Q&A
- cmekg medical relationship extraction tool cmekg medical relationship extraction
Named entity recognition
- CCKS2017 Medical entity identification and attribute extraction data set for Chinese electronic medical records
- CCKS2018 Medical entity identification and attribute extraction data set for Chinese electronic medical records
- CCKS2019 Data Download Medical entity identification and attribute extraction data set for Chinese electronic medical records
- CHIP2020 Chinese medical text naming entity recognition
- CHIP2020 Traditional Chinese Medicine Instruction Entity Identification
- CCKS2020 Medical entities and events for Chinese electronic medical records
- cmekg medical ner tool cmekg medical ner
- CCKS2021 Medical entities and events extraction for Chinese electronic medical records
QA
- CCIR2019 CCIR 2019 Data query category based on electronic medical records
- cMedQA Chinese Medical QA Dataset
- cMedQA2 Chinese Medical QA Dataset
- CMID Chinese Medical QA Intention to Understand Dataset
- KGQA Intelligent Q&A System Based on Medical Knowledge Graph
- chatbot-base-on-Knowledge-Graph uses deep learning method to analyze problem knowledge graph storage query knowledge points dialogue system based on medical vertical field
- Chinese medical dialogue data Chinese medical dialogue data Chinese medical dialogue data
- webMedQA webMedQA
- MedDialog The MedDialog dataset contains conversations (in Chinese) between doctors and patients. It has 1.1 million dialogs and 4 million utterances.
- CHIP2020 Generation of Traditional Chinese Medicine Literature Problems
- NLPEC A Medical Multi-Choice Question Dataset for the National Licensed Pharmacist Examination in China
- CCKS2021 Chinese medical dialogue generation containing entities
- IMCS21 CBLUE@Tianchi Chinese Medical Dialogue Dataset IMCS21
- EMPEC Examinations-for-Medical-PErsonnel-in-Chinese (EMPEC)
Standardization of terminology
- CHIP2019 Clinical Terminology Standardization Task: Yidu Cloud Standardization 7K Dataset
- CHIP2020 Clinical Terminology Standardization Task
Similar sentences to judge
- "Public Welfare AI Star" Challenge - Similar Sentences for the New Coronavirus Epidemic Judgment Competition compiled nearly 10,000 questions asked by patients related to epidemic-related pneumonia, mycoplasma pneumonia, bronchitis, upper respiratory tract infection, tuberculosis, asthma, pleuritis, emphysema, cold, coughing of blood, etc. in the real context, and asked the contestants to identify similar patient problems through natural language processing technology.
Text classification
- CHIP2019 clinical trial screening criteria short text classification
other
- CHIP2018: For real Chinese patient health consultation corpus, match questions intent
- CHIP2019 Ping An Medical Technology Disease Q&A Transfer Learning Competition
- CCLUE Chinese clinical natural language processing algorithm evaluation benchmark
- CCKS2021 Content understanding of popular Chinese medical knowledge