Awesome Domain LLM
Since the emergence of the large language model represented by ChatGPT, a new round of research and application has set off, and many general models including LLaMA, ChatGLM, Baichuan, Qwen, etc. have emerged. Subsequently, practitioners from different fields apply it to the vertical field through continuous pre-training/instruction fine-tuning based on a general model.

This project aims to collect and sort out open source models , data sets and evaluation benchmarks in vertical fields. Everyone is welcome to contribute open source models, data sets, evaluation benchmarks and other content that are not included in this project, and jointly promote the empowerment of big models in all walks of life!
? renew
- [2023/11/26] Enhanced network security big model SecGPT, medical big model ChiMed-GPT, financial big model Tongyi-Finance-14B, financial evaluation benchmark FinanceBench and CFBenchmark.
- [2023/11/01] Add DevOps field big model DevOps-Model and evaluation benchmark DevOps-Eval released by Ant Group and Peking University.
- [2023/10/28] Added financial model DISC-FinLLM, medical model AlpaCare, and marine model MarineGPT.
- [2023/10/19] Added psychological model MentalLLaMA, bilingual biomedical model Taiyi (Taiyi), and ocean model OceanGPT.
- [2023/10/10] Added the big model OWL and evaluation benchmark OWL-Bench for the operation and maintenance field jointly developed by Yunzhi Intelligent Research Institute and Beihang. Added the benchmark LAiW for Chinese legal model evaluation.
- [2023/10/05] Added psychological model ChatPsychiatrist, financial model InvestLM, agricultural model AgriGPT and medical model WiNGPT2.
- [2023/10/03] Two legal evaluation benchmarks are added, namely LawBench for the Chinese legal system and LegalBench for the US legal system.
- [2023/10/01] Added DISC-LawLLM, a big model of the legal field open sourced by Fudan University aims to provide users with professional, intelligent and comprehensive legal services. Add FinGLM to build an open, public welfare, and lasting financial model project, and use open source to promote "AI+finance".
- [2023/9/25] Update Qwen, add Qwen-14B and Qwen-14B-Chat models, and update the Qwen-7B and Qwen-7B-Chat models. Compared with the original Qwen-7B, the new version uses more training data (2.4T token), and the sequence length has been expanded from 2048 to 8192. The overall Chinese ability and code ability have been improved a lot.
- [2023/9/22] Add InternLM (Scholar Puyu), Shanghai AI Laboratory and SenseTime Technology jointly with the Chinese University of Hong Kong and Fudan University open source multilingual dock model.
- [2023/9/15] Added Zhongjing-LLaMA (Zhongjing), a Chinese medicine model with pre-training, supervised fine-tuning and RLHF complete training process.
- [2023/9/14] Add WeaverBird to access the financial field dialogue model with local knowledge and online search capabilities.
- [2023/9/13] Add Master Mingzha, a judicial model jointly developed by Shandong University, Inspur Cloud and China University of Political Science and Law.
Table of contents
- ? Model
- General Model
- ? Domain Model
- ? Medical
- ⚖ Legal
- ? finance
- ? educate
- ➕ Others
- Dataset
- ? Evaluation benchmark
- ? Appendix
- Like history
- ? Friendly links
? Model
General Model
Domain models are usually obtained by continuous pre-training or fine-tuning of instructions based on general models. Here we organize the commonly used open source general models.
| Model | size | mechanism | paper |
|---|
| LLaMA2 | 7B/7B-Chat 13B/13B-Chat 70B/70B-Chat | Meta | paper |
| ChatGLM3-6B | 6B-Base/6B/6B-32K | Tsinghua University | paper |
| Qwen | 1.8B/1.8B-Chat 7B/7B-Chat 14B/14B-Chat 72B/72B-Chat | Alibaba Cloud | paper |
| Baichuan2 | 7B/7B-Chat 13B/13B-Chat | Baichuan Intelligent | paper |
| InternLM | 7B/7B-Chat 20B/20B-Chat | Shanghai AI Laboratory | paper |
? Domain Model
? Medical
ChiMed-GPT [paper]
- ChiMed-GPT is a Chinese medical model based on Ziya-v2. Based on Ziya-v2, pre-training, supervised fine-tuning (SFT) and reinforcement learning (RLHF) from human feedback were carried out in a comprehensive manner.
AlpaCare [paper]
- The project open sourced the medical mockup AlpaCare, which was fine-tuned on LLaMA.
Taiyi (Taiyi) [paper]
- The project open source of the Chinese-English bilingual biomedical mockup Taiyi (Taiyi), aims to explore the ability of bilingual natural language processing multitasking in the field of biomedical.
MentalLLaMA [paper]
- The project open source mental mockup MentalLLaMA, which can be used for interpretable mental health analysis on social media.
WiNGPT2
- WiNGPT is a GPT-based medical vertical model, aiming to integrate professional medical knowledge, medical information and data, provide intelligent information services such as medical Q&A, diagnostic support and medical knowledge to improve diagnosis and treatment efficiency and medical service quality.
ChatPsychiatrist [paper]
- The project open source ChatPsychiatrist, a psychological model based on LLaMA-7B fine-tuning, can quickly identify psychological problems and provide tailor-made treatment recommendations.
Zhongjing-LLaMA (Zhongjing) [paper]
- The project open sourced the first Chinese medicine big model that includes pre-training, supervised fine-tuning and RLHF complete training process, showing good generalization ability and even approaching the professional level of professional doctors in some dialogue scenarios. In addition, a multi-round conversation dataset containing 70,000 entirely from real doctor-patient conversations was also open sourced. This dataset contains a large number of doctors’ proactive questioning sentences, which helps to improve the model’s proactive medical inquiry capabilities.
DoctorGLM [paper]
- Based on the Chinese consultation model of ChatGLM-6B, fine-tuning and deployment of Chinese medical dialogue data sets are achieved through fine-tuning and deployment of lora, p-tuningv2, etc.
BenTsao (Material Herbal) [paper]
- The project opens the source of a large language model set that has been finely tuned by Chinese medical instructions, including LLaMA, Alpaca-Chinese, Bloom, movable type model, etc. Based on the medical knowledge graph and medical literature, we combined with the ChatGPT API to construct a Chinese medical instruction fine-tuning data set, and used this to fine-tune the instructions of various basic models, improving the question-and-answer effect of the basic model in the medical field.
Med-ChatGLM
- The project open source ChatGLM-6B model fine-tuned by Chinese medical instructions, and the fine-tuning data is the same as BenTsao.
BianQue (BianQue) [paper]
- The project opens the source of a big model of living space health. Combined with the current open source Chinese medical Q&A data sets (MedDialog-CN, IMCS-V2, CHIP-MDCFNPC, MedDG, cMedQA2, Chinese-medical-dialogue-data), the single-round/multi-round characteristics and doctor inquiry characteristics were analyzed, and combined with the self-built living space health dialogue big data, BianQue Health Big Data BianQueCorpus of tens of millions of levels were constructed. Based on BianQue Health Big Data BianQueCorpus, ChatGLM-6B was selected as the initialization model, and BianQue was obtained through instruction fine-tuning training of full parameters.
HuatuoGPT (Hua Tuo) [paper]
- The project open sourced the medical big model HuatuoGPT, including HuatuoGPT-7B training based on Baichuan-7B and HuatuoGPT-13B training based on Ziya-LLaMA-13B-Pretrain-v1 training.
QiZhenGPT
- This project uses the Chinese medical instruction dataset constructed by Qizhen Medical Knowledge Base, and based on this, the instructions are fine-tuned on the Chinese-LLaMA-Plus-7B, CaMA-13B, and ChatGLM-6B models, greatly improving the effect of the model in Chinese medical scenarios.
ChatMed
- The project open sourced the Chinese medical big model ChatMed-Consult, which uses the 50w+ online consultation + ChatGPT reply of the Chinese medical online consultation data set ChatMed_Consult_Dataset as the training set, and is obtained using LoRA fine-tuning based on LlaMA-7b.
ShenNong-TCM-LLM (Shennong)
- The project open source Chinese traditional Chinese medicine big model ShenNong-TCM-LLM. Based on the open source traditional Chinese medicine knowledge graph, it adopts the entity-centered self-instruction method, and calls ChatGPT to obtain the 2.6w+ traditional Chinese medicine instruction dataset ChatMed_TCM_Dataset. Based on the dataset, it is based on LlaMA and fine-tuning using LoRA.
XrayGLM
- The project open sourced Chinese multimodal medical data sets and models, which show extraordinary potential in medical imaging diagnosis and multiple rounds of interactive dialogue.
MedicalGPT
- The project open source medical model MedicalGPT, which includes incremental pre-training, supervised fine-tuning, RLHF (reward modeling, reinforcement learning training) and DPO (direct preference optimization).
Sunsimiao (Sun Simiao)
- The project open source Chinese medical model Sunsimiao, which is fine-tuned based on baichuan-7B and ChatGLM-6B base models in high-quality Chinese medical data of 100,000.
CareGPT
- The project open source medical model CareGPT (CareGPT), which also brings together dozens of publicly available medical fine-tuning data sets and openly available medical large language models, including LLM training, evaluation, deployment, etc. to promote the rapid development of medical LLM.
DISC-MedLLM [paper]
- This project is a medical field big model and data set designed for medical and health dialogue scenarios released by Fudan University. The model is obtained by fine-tuning of the DISC-Med-SFT dataset based on Baichuan-13B-Base instruction, effectively aligning human preferences in medical scenarios and bridging the gap between the output of the common language model and the real-world medical dialogue.
PMC-LLaMA [paper]
- The project open source medical mockup PMC-LLaMA, including a pre-trained version of MedLLaMA_13B and a fine-tuned version of PMC_LLaMA_13B.
ChatDoctor [paper]
- The project open source medical model ChatDoctor, which is trained on LLaMA.
MING (Ming Medical)
- The project open source medical big model MING, which is based on the fine-tuning of the bloomz-7b instruction to obtain MING-7B, and supports medical Q&A, intelligent consultation and other functions.
IvyGPT
- The project open source medical mockup IvyGPT, which is supervised fine-tuning on high-quality medical Q&A data and trained using reinforcement learning from human feedback.
PULSE
- The project open source Chinese medical model PULSE, which uses about 4,000,000 directive fine-tuning data from the Chinese medicine and general fields to support a variety of natural language processing tasks in the medical field, including health education, physician exam questions, report interpretation, structured medical records, and simulated diagnosis and treatment.
HuangDI (Emperor)
- The project opens the source of the Chinese medicine big model HuangDI (Emperor). The model first adds Chinese medicine textbooks, Chinese medicine website data and other corpus on the basis of the Ziya-LLaMA-13B-V1 base model to train a pre-training model with understanding of Chinese medicine knowledge. Then, on this basis, it is supervised and fine-tuned through a large amount of Chinese medicine ancient book instruction dialogue data and general instruction data, so that the model has the ability to answer Chinese medicine ancient book knowledge.
ZhongJing (ZhongJing)
- The project opens the Chinese medicine model ZhongJing (Zhongjing), which aims to clarify the profound knowledge of traditional Chinese medicine, inherit ancient wisdom and modern technological innovation, and ultimately provide trustworthy and professional tools for the medical field.
TCMLLM
- The project plans to realize the tasks of clinical auxiliary diagnosis and treatment of traditional Chinese medicine (disease diagnosis, prescription recommendation, etc.) and other traditional Chinese medicine knowledge questions and answers through a big model, and promote the rapid development of traditional Chinese medicine knowledge questions and answers, clinical auxiliary diagnosis and treatment areas. At present, in response to the prescription recommendation task in the clinical intelligent diagnosis and treatment of traditional Chinese medicine, the TCMLLM-PR of traditional Chinese medicine prescription recommendation model was released. By integrating real-world clinical records, medical classics and traditional Chinese medicine textbooks and other data, a prescription recommendation instruction fine-tuning data set containing 68k data entries was constructed, and fine-tuning was obtained on the ChatGLM big model.
MeChat [paper]
- The project open sourced a Chinese mental health support dialogue model and dataset. The model is fine-tuned by the ChatGLM-6B LoRA 16-bit instruction. The dataset uses ChatGPT to rewrite the real psychological mutual aid QA to support multiple rounds of dialogues for multiple rounds of mental health. The dataset contains 56k multi-round dialogues, and its dialogue themes, vocabulary and chapter semantics are richer and more diverse, which is more in line with the application scenarios of long-term multi-round dialogues.
SoulChat (Spiritual Heart) [paper]
- The project open sourced the mental health model SoulChat (Spiritual Heart), which uses ChatGLM-6B as the initialization model and is fine-tuned by the joint instructions of long text in Chinese in the field of psychological counseling in a million scale and multiple rounds of empathy dialogue data.
MindChat (Archive)
- The project open source of the psychological model MindChat (Anime Talk), which uses about 200,000 high-quality multi-round psychological dialogue data manually for training, covering work, family, study, life, social, safety and other aspects. It is expected to help people relieve psychological stress and solve psychological confusion from four dimensions: psychological counseling, psychological evaluation, psychological diagnosis, and psychological treatment, and improve their mental health level.
QiaoBan (王子)
- The project open source of the children's emotional dialogue big model QiaoBan. It is based on the open source general model, using general-domain human-computer dialogue, single-wheel instruction data, and children's emotional companion dialogue data to fine-tune instructions, and develop a large model suitable for children's emotional companionship.
⚖ Legal
? finance
Tongyi-Finance-14B
- Tongyi Finance-14B (Tongyi-Finance-14B) is a large language model launched for the financial industry. It is based on Tongyi Qianwen’s basic model to conduct incremental learning of industry corpus, strengthens the ability to apply knowledge and scenarios in the financial field, and covers the ability quadrants such as financial knowledge questions and answers, text classification, information extraction, text creation, reading comprehension, logical reasoning, multimodal, and Coding.
DISC-FinLLM [paper]
- DISC-FinLLM is a large language model in the financial field. It is a multi-expert smart financial system composed of four modules for different financial scenarios: financial consulting, financial text analysis, financial computing, and financial knowledge retrieval Q&A. These modules show obvious advantages in four evaluations, including financial NLP tasks, human test questions, data analysis and current affairs analysis, proving that DISC-FinLLM can provide strong support for a wide range of financial fields.
InvestLM [paper]
- The project open sourced an English financial model based on LLaMA-65B fine-tuning.
FinGLM
- We are committed to building an open, public welfare and lasting financial model project, and using open source and openness to promote "AI+financial".
WeaverBird (WeaverBird) [paper]
- The project open source is a big model of dialogue in the financial field based on the fine-tuning of the Chinese-English bilingual financial field corpus, and can also access local knowledge bases and online search engines.
BBT-FinCUGE-Applications [paper]
- The project open sourced the Chinese financial field corpus BBT-FinCorpus, the knowledge-enhanced big model BBT-FinT5 and the evaluation benchmark CFLEB.
Cornucopia (Cornery of Cornucopia)
- The project constructs an instruction data set based on public and crawled Chinese financial field Q&A data, and on this basis, fine-tune the instructions on the LLaMA system model, improving the question-answer effect of LLaMA in the financial field.
XuanYuan (Xuanyuan) [paper]
- Xuanyuan is the first open source Chinese dialogue model with a billion-level Chinese dialogue model in China, and it is also the first open source Chinese dialogue model optimized for the Chinese financial field. Based on BLOOM-176B, Xuanyuan has carried out targeted pre-training and fine-tuning for the Chinese general field and financial field. It can not only deal with problems in the general field, but also answer various financial-related questions, providing users with accurate and comprehensive financial information and suggestions.
PIXIU (Pixiu) [paper]
- The project open source of the financial field instruction fine-tuning dataset FIT, large model FinMA and evaluation benchmark FLARE.
FinGPT [paper1] [paper2]
- The project open sourced several financial models, including ChatGLM2-6B+LoRA and LLaMA2-7B+LoRA, and collected Chinese and English training data including financial news, social media, financial reports, etc.
FLANG [paper]
- The project open sourced the financial model FLANG and evaluation benchmark FLUE.
? educate
Taoli (Taoli)
- The project opens the source of a large model suitable for the international Chinese education field. Based on more than 500 international Chinese education textbooks and teaching aids, Chinese proficiency test questions, and Chinese learner dictionary, etc. currently circulating in the international Chinese education field, an international Chinese education resource library has been constructed. A total of 88,000 high-quality international Chinese education question and answer data sets were constructed through various forms of instructions, and the collected data was used to fine-tune the instructions to allow the model to acquire the ability to apply international Chinese education knowledge to specific scenarios.
EduChat [paper]
- The project opens the source of dialogue models for the vertical field of education, mainly studying technologies related to education dialogue models based on pre-trained models, integrating diverse educational vertical field data, supplemented by methods such as instruction fine-tuning and value alignment, and providing rich functions such as automatic question setting, homework correction, emotional support, course tutoring, and college entrance examination consultation in educational scenarios, serving the vast number of teachers, students and parents, and helping to achieve intelligent education that is in accordance with the aptitude, fair, just, and warm.
➕ Others
Dataset
? Evaluation benchmark
C-Eval [paper]
- C-Eval is a Chinese basic model evaluation benchmark released by Shanghai Jiaotong University. It contains 13,948 multiple-choice questions, covering four major directions: humanities, social sciences, science and engineering, and other majors, 52 subjects, from middle school to university graduate students and vocational examinations.
AGIEval [paper]
- AGIEval is an evaluation benchmark released by Microsoft to evaluate the performance of large models in human cognitive tasks. It includes 20 official, open, high-standard admission and qualification examinations for ordinary candidates, including ordinary university entrance examinations (Chinese college entrance examinations and US SAT examinations), law school entrance examinations, mathematics competitions and bar qualification examinations, national civil service examinations, etc.
Xiezhi (Xiezhi) [paper]
- Xiezhi is a comprehensive, multidisciplinary, and automatically updated field knowledge evaluation benchmark released by Fudan University, including 13 disciplines: philosophy, economics, law, education, literature, history, natural sciences, engineering, agriculture, medicine, military, management, and art, 516 specific disciplines, and 249,587 questions.
CMMLU [paper]
- CMMLU is a comprehensive Chinese evaluation benchmark, specifically used to evaluate the knowledge and reasoning ability of language models in the Chinese context. CMMLU covers 67 topics from basic disciplines to advanced professional levels. It includes: natural sciences that require calculation and reasoning, humanities and social sciences that require knowledge, and Chinese driving rules that require common sense in life. Furthermore, many tasks in CMMLU have Chinese-specific answers and may not be universally applicable in other regions or languages. Therefore, it is a completely Chinese test benchmark.
MMCU [paper]
- MMCU is a comprehensive Chinese evaluation benchmark, including tests from four major fields, including medicine, law, psychology and education.
CG-Eval [paper]
- CG-Eval is a benchmark for the evaluation of Chinese big model generation capabilities jointly released by Oracle Yi AI Research Institute and LanguageX AI Lab. It includes 11,000 different types of questions in 55 sub-subjects under six major subject categories, including science and technology and engineering, humanities and social sciences, mathematical calculations, physician qualification examinations, judicial examinations, and certified public accountant examinations. CG-Eval includes a composite scoring system: for non-calculation questions, each noun explanation question and short answer question has standard reference answers, multiple standards are used to score and then weighted sum; for calculation questions, the final calculation results and problem-solving process will be extracted, and then comprehensively scored.
CBLUE [paper]
- CBLUE is a benchmark for Chinese medical language comprehension evaluation, including 8 Chinese medical language comprehension tasks.
PromptCBLUE [paper]
- PromptCBLUE is an evaluation benchmark for Chinese medical scenarios. Through secondary development of the CBLUE benchmark, NLP tasks in 16 different medical scenarios are converted into prompt-based language generation tasks.
LAiW [paper]
- LAiW is a Chinese legal model evaluation benchmark, designing 13 basic tasks for three major abilities: 1) Legal NLP basic capabilities: ability to evaluate legal basic tasks, NLP basic tasks and legal information extraction, including legal push, factor recognition, naming entity recognition, judicial key points summary and case identification; 2) Legal basic application ability: ability to evaluate the basic application ability of the big model to knowledge in the legal field, including dispute focus mining, case matching, criminal judgment prediction, civil judgment prediction and legal question-and-answer; 3) Legal complex application ability: ability to evaluate the complex application ability of the big model to knowledge in the legal field, including three basic tasks: judicial reasoning generation, case understanding and legal consultation.
LawBench [paper]
- LawBench is a legal evaluation benchmark for China's legal system. LawBench simulates three dimensions of judicial cognition and selects 20 tasks to evaluate the capabilities of the big model. Compared to some existing benchmarks with only multiple choice questions, LawBench includes more task types closely related to real-world applications, such as legal entity identification, reading comprehension, crime amount calculation and consultation.
LegalBench [paper]
- LegalBench is a legal evaluation benchmark for the US legal system, including 162 legal reasoning tasks.
LEXTREME [paper]
- LEXTREME is a multilingual legal evaluation benchmark that contains 11 evaluation data sets in 24 languages.
LexGLUE [paper]
- LexGLUE is a legal evaluation benchmark in English.
CFBenchmark [paper]
- CFBenchmark is an evaluation benchmark designed to evaluate the auxiliary work of large language models in Chinese financial scenarios. The basic version of CFBenchmark includes 3917 financial texts covering three aspects and eight tasks, and is organized from three aspects: financial identification, financial classification, and financial generation.
FinanceBench [paper]
- FinanceBench is a benchmark for evaluating open financial questions and contains 10,231 questions about listed companies and corresponding answers.
FinEval [paper]
- FinEval is a financial knowledge evaluation benchmark, which contains 4,661 high-quality multiple-choice questions covering fields such as finance, economics, accounting and certificates, and 34 different academic subjects.
FLARE [paper]
- FLARE is a financial evaluation benchmark that includes tasks such as understanding and prediction of financial knowledge.
CFLEB [paper]
- CFLEB is a Chinese financial evaluation benchmark, which includes two language generation tasks and four language comprehension tasks.
FLUE [paper]
- FLUE is a financial evaluation benchmark that contains 5 financial field data sets.
GeoGLUE [paper]
- GeoGLUE is a benchmark for geographic semantic understanding ability evaluation jointly released by Alibaba Damo Academy and Gaode, aiming to promote the development of geographic-related text processing technologies and communities. This list extracts many typical scenarios: map search, e-commerce logistics, government registration, and financial transportation, and designs six core tasks: door address address element analysis, geographical entity alignment, Query-POI library recall, Query-POI correlation sorting, address Query component analysis, Where What segmentation.
OWL-Bench [paper]
- OWL-Bench is a bilingual evaluation benchmark for the operation and maintenance field. It contains 317 Q&A questions and 1,000 multi-choice questions, covering many real-life industrial scenarios in the field, including nine different subfields: information security, applications, system architecture, software architecture, middleware, network, operating systems, infrastructure and databases. to ensure that OWL-Bench can show diversity.
DevOps-Eval
- DevOps-Eval is a benchmark for the evaluation of large language models for the DevOps field released by Ant Group and Peking University.
? Appendix
Like history
? Friendly links
- Awesome Code LLM
- The project collected papers related to code model and compiled a review.
- CodeFuse-ChatBot
- CodeFuse-ChatBot is an open source AI intelligent assistant developed by the Ant CodeFuse team, committed to simplifying and optimizing all links in the software development life cycle.
- Awesome AIGC Tutorials
- The project collects a variety of selected tutorials and resources on AIGC, suitable for both beginners and advanced AI enthusiasts.
- Awesome Tool Learning
- The project gathers resources on tool learning, including papers, frameworks, and applications.
- Awesome LLM reasoning
- The project collects resources on the inference of large language models, including papers, data sets, etc.