Deep Learning Paper
1.0.0
I read these papers that are related to NLP and Deep Learning. Here are various papers from basic to advanced. ? In addition, you can check my Korean paper reviews by clicking the link attached to the table.
You can see more paper reviews, code implementation, and mathematics descriptions in my blog <- click here
I write several articles to explain in detail some Deep Learning technologies. These articles can be found in the table below.
| Title | Blog link |
|---|---|
| How has scaling law developed in NLP? ? | https://cartinoe5930.tistory.com/entry/How-has-scaling-law-developed-in-NLP-%F0%9F%A4%94-NLP%EC%97%90%EC%84%9C-scaling-law%EB%8A%94-%EC%96%B4%EB%96%BB%EA%B2%8C-%EB%B0%9C%EC%A0%84%EB%90%98%EC%97%88%EC%9D%84%EA%B9%8C |
| Closed-source? Open-source?? What is that?? ?? | https://cartinoe5930.tistory.com/entry/The-hopes-of-researchers-Open-source-%F0%9F%A4%97-%EC%97%B0%EA%B5%AC%EC%9E%90%EB%93%A4%EC%9D%98-%ED%9D%AC%EB%A7%9D-Open-source-%F0%9F%A4%97 |
| Context window of LM, should it be long? Should it be short? ?? | https://cartinoe5930.tistory.com/entry/LM%EC%9D%98-context-window-%EA%B8%B8%EC%96%B4%EC%95%BC-%ED%95%A0%EA%B9%8C-%EC%A7%A7%EC%95%84%EC%95%BC-%ED%95%A0%EA%B9%8C-%F0%9F%93%8F%F0%9F%A4%A8 |
| What is the most optimal way to evaluate LM? ? | https://cartinoe5930.tistory.com/entry/LM%EC%9D%84-%EA%B0%80%EC%9E%A5-%EC%B5%9C%EC%A0%81%EC%9C%BC%EB%A1%9C-%ED%8F%89%EA%B0%80%ED%95%A0-%EC%88%98-%EC%9E%88%EB%8A%94-%EB%B0%A9%EB%B2%95%EC%9D%80-%EB%AC%B4%EC%97%87%EC%9D%BC%EA%B9%8C-%F0%9F%98%8E |
| The performance of ChatGPT is getting worse?!?!? ?? | https://cartinoe5930.tistory.com/entry/ChatGPT%EC%9D%98-%EC%84%B1%EB%8A%A5%EC%9D%B4-%EC%95%88-%EC%A2%8B%EC%95%84%EC%A7%80%EA%B3%A0-%EC%9E%88%EB%8B%A4%EA%B5%AC-%F0%9F%98%B2%F0%9F%98%B2 |
| You can fine-tune too! with PEFT ? | https://cartinoe5930.tistory.com/entry/%EB%8B%B9%EC%8B%A0%EB%8F%84-Fine-tuning-%ED%95%A0-%EC%88%98-%EC%9E%88%EC%8A%B5%EB%8B%88%EB%8B%A4-with-PEFT-%F0%9F%A4%97 |
| Let's think step by step like humans! ?? | https://cartinoe5930.tistory.com/entry/%ED%95%9C-%EB%8B%A8%EA%B3%84-%ED%95%9C-%EB%8B%A8%EA%B3%84%EC%94%A9-%EC%9D%B8%EA%B0%84%EC%B2%98%EB%9F%BC-%EC%83%9D%EA%B0%81%ED%95%B4%EB%B3%B4%EC%9E%90-%F0%9F%A7%A0%F0%9F%A4%94 |
| Development process of fine-tuning method!! From fine-tuning to RLHF ?➡️? | https://cartinoe5930.tistory.com/entry/Fine-tuning-method%EC%9D%98-%EC%A7%84%ED%99%94-%EA%B3%BC%EC%A0%95-%F0%9F%A6%96%E2%9E%A1%EF%B8%8F%F0%9F%A7%91 |
| It's time to fine-tune ChatGPT!! ⏰ | https://cartinoe5930.tistory.com/entry/%EC%9D%B4%EC%A0%9C%EB%8A%94-ChatGPT%EB%A5%BC-fine-tuning-%ED%95%A0-%EC%8B%9C%EA%B0%84-%E2%8F%B0 |
| Noise makes LLM better! - NEFTune | https://cartinoe5930.tistory.com/entry/Noise-makes-LLM-better-NEFTune-%F0%9F%98%89 |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| Embedding Matrix | https://wikidocs.net/book/2155 | https://cartinoe5930.tistory.com/entry/Embedding-Matrix-%ED%95%99%EC%8A%B5 |
| LSTM: Long-Short Term Memory | https://colah.github.io/posts/2015-08-Understanding-LSTMs/ | https://cartinoe5930.tistory.com/entry/%EC%95%8C%EA%B8%B0-%EC%89%BD%EA%B2%8C-LSTM-networks-%EC%9D%B4%ED%95%B4%ED%95%98%EA%B8%B0 |
| GRU: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation | https://arxiv.org/abs/1406.1078 | https://cartinoe5930.tistory.com/entry/GRU-Empirical-Evaluation-of-Gated-Recurrent-Neural-Networks-on-Sequence-Modeling-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| LSTM vs. GRU: Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling | https://arxiv.org/abs/1412.3555 | https://cartinoe5930.tistory.com/entry/LSTM-vs-GRU-%EB%AD%90%EA%B0%80-%EB%8D%94-%EB%82%98%EC%9D%84%EA%B9%8C-Empirical-Evaluation-of-Gated-Recurrent-Neural-Networks-on-Sequence-Modeling-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| Transformer: Attention Is All You Need | https://arxiv.org/abs/1706.03762 | https://cartinoe5930.tistory.com/entry/Transformer-Attention-Is-All-You-Need-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| ELMo: Deep contextualized word representations | https://arxiv.org/abs/1802.05365 | https://cartinoe5930.tistory.com/entry/Pre-trained-Language-Modeling-paper-reading1-ELMo-Deep-contextualized-word-representations |
| BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | https://arxiv.org/abs/1810.04805 | https://cartinoe5930.tistory.com/entry/Pre-trained-Language-Modeling-paper-reading2-BERT-Pre-training-of-Deep-Bidirectional-Transformers-for-Language-Understanding |
| GPT-1: Improving Language Understanding by Generative Pre-Training | https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf | https://cartinoe5930.tistory.com/entry/Pre-trained-Language-Modeling-paper-reading3-GPT-1-Improving-Language-Understanding-by-Generative-Pre-Training |
| GPT-2: Language Models are Unsupervised Multitask Learners | https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf | https://cartinoe5930.tistory.com/entry/GPT-2-Language-Models-are-Unsupervised-Multitask-Learners-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| GPT-3: Language Models are Few-Shot Learners | https://cartinoe5930.tistory.com/entry/GPT-3-Language-Models-are-Few-Shot-Learners-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 | https://cartinoe5930.tistory.com/entry/GPT-3-Language-Models-are-Few-Shot-Learners-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context | https://arxiv.org/abs/1901.02860 | https://cartinoe5930.tistory.com/entry/Transformer-XL-Attentive-Language-Models-Beyond-a-Fixed-Length-Context-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Sparse Transformers: Generating Long Sequences with Sparse Transformers | https://arxiv.org/abs/1904.10509 | https://cartinoe5930.tistory.com/entry/Sparse-Transformers-Generating-Long-Sequence-with-Sparse-Transformers-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| XLNET: Generalized Autoregressive Pretraining for Language Understanding | https://arxiv.org/abs/1906.08237 | https://cartinoe5930.tistory.com/entry/XLNet-Generalized-Autoregressive-Pretraining-for-Language-Understanding-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| SpanBERT: Improving Pre-training by Representing and Predicting Spans | https://arxiv.org/abs/1907.10529 | https://cartinoe5930.tistory.com/entry/SpanBERT-Improving-Pre-training-by-Representing-and-Predicting-Spans-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| RoBERTa: A Robustly Optimized BERT Pre-training Approach | https://arxiv.org/abs/1907.11692 | https://cartinoe5930.tistory.com/entry/RoBERTa-A-Robustly-Optimized-BERT-Pretraining-Approach-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks | https://arxiv.org/abs/1908.10084 | https://cartinoe5930.tistory.com/entry/Sentence-BERT-Sentence-Embeddings-using-Siamese-BERT-Networks-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| ALBERT: A Lite BERT for Self-supervised Learning of Language Representations | https://arxiv.org/abs/1909.11942 | https://cartinoe5930.tistory.com/entry/ALBERT-A-Lite-BERT-for-Self-supervised-Learning-of-Language-Representations-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension | https://arxiv.org/abs/1910.13461 | https://cartinoe5930.tistory.com/entry/BART-Denoising-Sequence-to-Sequence-Pre-training-for-Natural-Language-Generation-Translation-and-Comprehension-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Pre-LN Transformer: On Layer Normalization in the Transformer Architecture | https://arxiv.org/abs/2002.04745 | https://cartinoe5930.tistory.com/entry/Pre-LN-Transformer-On-Layer-Normalization-in-the-Transformer-Architecture-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| ELECTRA: Pre-training Text Encoders as Discriminators rather than Generators | https://arxiv.org/abs/2003.10555 | https://cartinoe5930.tistory.com/entry/ELECTRA-Pre-training-Text-Encoders-as-Discriminators-rather-than-Generators |
| Longformer: The Long-Document Transformer | https://arxiv.org/abs/2004.05150 | https://cartinoe5930.tistory.com/entry/Longformer-The-Long-Document-Transformer-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| BigBird: Transformers for Longer Sequences | https://arxiv.org/abs/2007.14062 | https://cartinoe5930.tistory.com/entry/BigBird-Transformers-for-Longer-Sequences-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| WebGPT: Browser-assisted question-answering with human feedback | https://arxiv.org/abs/2112.09332 | https://cartinoe5930.tistory.com/entry/WebGPT-Browser-assisted-question-answering-with-human-feedback-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| OPT: Open Pre-trained Transformer Language Models | https://arxiv.org/abs/2205.01068 | https://cartinoe5930.tistory.com/entry/OPT-Open-Pre-trained-Transformer-Language-Models-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Mamba: Linear-Time Sequence Modeling with Selective State Spaces | https://arxiv.org/abs/2312.00752 | No plan! |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| TinyBERT: Distilling BERT for Natural Language Understanding | https://arxiv.org/abs/1909.10351 | https://cartinoe5930.tistory.com/entry/TinyBERT-Distilling-BERT-for-Natural-Language-Understanding-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| DistilBERT: a distilled version of BERT | https://arxiv.org/abs/1910.01108 | https://cartinoe5930.tistory.com/entry/DistilBERT-a-distilled-version-of-BERT-smaller-faster-cheaper-and-lighter-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| It's Not Just Size That Matters: Small Language Models are Also Few-Shot Learners(PET 응용) | https://arxiv.org/abs/2009.07118 | https://cartinoe5930.tistory.com/entry/Its-Not-Just-Size-That-Matters-Small-Language-Models-Are-Also-Few-Shot-Learners-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| Chinchilla: Training Compute-Optimal Large Language Models | https://arxiv.org/abs/2203.15556 | https://cartinoe5930.tistory.com/entry/%EC%A7%80%EA%B8%88-%EA%B9%8C%EC%A7%80%EC%9D%98-LM-Scaling-Law%EC%97%90%EB%8A%94-%EB%AC%B8%EC%A0%9C%EC%A0%90%EC%9D%B4-%EC%9E%88%EB%8B%A4-%F0%9F%98%B6%E2%80%8D%F0%9F%8C%AB%EF%B8%8F-Chinchilla-Training-Compute-Optimal-Large-Language-Models-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling | https://arxiv.org/abs/2304.01373 | No plan! |
| LIMA: Less Is More for Alignment | https://arxiv.org/abs/2305.11206 | https://cartinoe5930.tistory.com/entry/LIMA-Less-Is-More-for-Alignment-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| LLaMA: Open and Efficient Foundation Language Models | https://arxiv.org/abs/2302.13971 | https://cartinoe5930.tistory.com/entry/LLaMA-Open-and-Efficient-Foundation-Language-Models-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| WizardLM: Empowering Large Language Models to Follow Complex Instructions | https://arxiv.org/abs/2304.12244 | https://cartinoe5930.tistory.com/entry/Open-domain-instruction%EC%9D%98-%ED%9A%A8%EA%B3%BC-%F0%9F%AA%84-WizardLM-Empowering-Large-Language-Models-to-Follow-Complex-Instructions-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| WizardCoder: Empowering Code Large Language Models with Evol-Instruct | https://arxiv.org/abs/2306.08568 | https://huggingface.co/WizardLM/WizardCoder-15B-V1.0 |
| WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct | https://arxiv.org/abs/2308.09583 | https://huggingface.co/WizardLM/WizardMath-70B-V1.0 |
| Alpaca: A Strong, Replicable Instruction-Following Model | https://crfm.stanford.edu/2023/03/13/alpaca.html | https://cartinoe5930.tistory.com/entry/Alpaca-A-Strong-Replicable-Instruction-Following-Model-%EB%A6%AC%EB%B7%B0 |
| Vicuna: An Open-Source Chatbot Impressing GPT-4 | https://lmsys.org/blog/2023-03-30-vicuna/ | https://cartinoe5930.tistory.com/entry/Vicuna-An-Open-Source-Chatbot-Impressing-GPT-4-%EB%A6%AC%EB%B7%B0 |
| Koala: A Dialogue Model for Academic Research | https://bair.berkeley.edu/blog/2023/04/03/koala/ | https://cartinoe5930.tistory.com/entry/%EC%A4%91%EC%9A%94%ED%95%9C-%EA%B1%B4-%EA%BA%BE%EC%9D%B4%EC%A7%80-%EC%95%8A%EB%8A%94-high-quality-data-Koala%F0%9F%90%A8-A-Dialogue-Model-for-Academic-Researc |
| Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data | https://arxiv.org/abs/2304.01196 | https://cartinoe5930.tistory.com/entry/%F0%9F%90%B2Baize-An-Open-Source-Chat-Model-with-Parameter-Efficient-Tuning-on-Self-Chat-Data-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Scaling Data-Constrained Language Models | https://arxiv.org/abs/2305.16264 | https://www.youtube.com/watch?v=TK0-sitkCMw&pp=ygUgaHR0cHM6Ly9hcnhpdi5vcmcvYWJzLzIzMDUuMTYyNjQ%3D |
| Falcon & RefinedWeb | https://arxiv.org/abs/2306.01116 | https://cartinoe5930.tistory.com/entry/Open-LLM-Leaderboard%EB%A5%BC-%ED%9C%A9%EC%93%B4-Falcon%F0%9F%A6%85-LLM-Falcon-RefinedWeb |
| Orca: Progressive Learning from Complex Explanation Traces of GPT-4 | https://arxiv.org/pdf/2306.02707 | https://cartinoe5930.tistory.com/entry/%F0%9F%90%ACOrca-Progressive-Learning-from-Complex-Explanation-Traces-of-GPT-4-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| phi-1: Text Books Are All You Need | https://arxiv.org/abs/2306.11644 | https://cartinoe5930.tistory.com/entry/%ED%95%84%EC%9A%94%ED%95%9C-%EA%B1%B4-%EC%98%A4%EC%A7%81-%EA%B5%90%EA%B3%BC%EC%84%9C-%EC%88%98%EC%A4%80%EC%9D%98-%EB%8D%B0%EC%9D%B4%ED%84%B0%EB%BF%90-%F0%9F%93%96-phi-1-Textbooks-Are-All-You-Need-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| AlpaGasus: Training a Better Alpaca with Fewer Data | https://arxiv.org/abs/2307.08701 | Will be uploaded later! |
| Llama 2: Open Foundation and Fine-Tuned Chat Models | https://arxiv.org/abs/2307.09288 | https://cartinoe5930.tistory.com/entry/The-hopes-of-researchers-Open-source-%F0%9F%A4%97-%EC%97%B0%EA%B5%AC%EC%9E%90%EB%93%A4%EC%9D%98-%ED%9D%AC%EB%A7%9D-Open-source-%F0%9F%A4%97 |
| Platypus: Quick, Cheap, and Powerful Refinement of LLMs | https://arxiv.org/abs/2308.07317 | Will be uploaded later! |
| Code Llama: Open Foundation Models for Code | https://arxiv.org/abs/2308.12950 | No plan |
| FLM-101B: An Open LLM and How to Train It with $100K Budget | https://arxiv.org/pdf/2309.03852 | No plan! |
| Textbooks are All You Need II: phi-1.5 technical report | https://arxiv.org/abs/2309.05463 | https://huggingface.co/microsoft/phi-1_5 |
| OpenChat: Advancing Open-Source Language Models with Mixed-Quality Data | https://arxiv.org/abs/2309.11235 | https://github.com/imoneoi/openchat |
| Mistral 7B | https://arxiv.org/abs/2310.06825 | https://mistral.ai/news/announcing-mistral-7b/ |
| Prometheus: Inducing Fine-grained Evaluation Capability in Language Models | https://arxiv.org/abs/2310.08491 | https://huggingface.co/papers/2310.08491#652a8e7f30355beba68c1be6 |
| Zephyr: Direct Distillation of LM Alignment | https://arxiv.org/abs/2310.16944 | https://www.youtube.com/watch?v=TkZBg3mKsIo |
| Orca2: Teaching Small Language Models How to Reason | https://arxiv.org/abs/2311.11045 | https://www.microsoft.com/en-us/research/blog/orca-2-teaching-small-language-models-how-to-reason/ |
| The Falcon Series of Open Language Models | https://arxiv.org/abs/2311.16867 | No plan! |
| SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling | https://arxiv.org/abs/2312.15166 | No plan! |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| LaMDA: Language Models for Dialog Applications | blog: https://ai.googleblog.com/2022/01/lamda-towards-safe-grounded-and-high.html, paper: https://arxiv.org/abs/2201.08239 | https://cartinoe5930.tistory.com/entry/%EA%B5%AC%EA%B8%80%EC%9D%98-%EC%B5%9C%EA%B0%95-%EC%B1%97%EB%B4%87-LaMDA%EC%97%90-%EB%8C%80%ED%95%B4-%EC%95%8C%EC%95%84%EB%B3%B4%EC%9E%90-Language-Models-for-Dialog-Applications-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| PaLM: Scaling Language Modeling with Pathways | blog: https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html, paper: https://arxiv.org/abs/2204.02311 | 1: https://cartinoe5930.tistory.com/entry/LaMDA%EC%9D%98-%EB%92%A4%EB%A5%BC-%EC%9E%87%EB%8A%94-Pathways%EB%A5%BC-%ED%99%9C%EC%9A%A9%ED%95%9C-%EC%B4%88%EA%B1%B0%EB%8C%80-%EC%96%B8%EC%96%B4-%EB%AA%A8%EB%8D%B8-PaLM-%EB%A6%AC%EB%B7%B0, 2: https://cartinoe5930.tistory.com/entry/LaMDA%EC%9D%98-%EB%92%A4%EB%A5%BC-%EC%9E%87%EB%8A%94-Pathways%EB%A5%BC-%EC%82%AC%EC%9A%A9%ED%95%9C-%EC%B4%88%EA%B1%B0%EB%8C%80-%EC%96%B8%EC%96%B4-%EB%AA%A8%EB%8D%B8-PaLM-%EB%A6%AC%EB%B7%B02 |
| GPT-4: Technical Review | blog: https://openai.com/research/gpt-4, paper: https://arxiv.org/abs/2303.08774 | https://cartinoe5930.tistory.com/entry/GPT-4-Techinal-Report-Review |
| Gemini: A Family of Highly Capable Multimodal Models | https://arxiv.org/abs/2312.11805 | No plan! |
| AlphaCode 2 Technical Report | https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf | No plan! |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| FLAN: Fine-tuned Language Models are Zero-shot Learners | https://arxiv.org/abs/2109.01652 | https://cartinoe5930.tistory.com/entry/FLAN-Fine-tuned-Language-Models-are-Zero-shot-Learners-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| T0: Multitask Prompted Training Enables Zero-shot Task Generalization | https://arxiv.org/abs/2110.08207 | https://cartinoe5930.tistory.com/entry/T0-Multitask-Prompted-Training-Enables-Zero-shot-Task-Generalization-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Super-Natural Instructions: Generalization via Declarative Instructions on 1600+ NLP Tasks | https://arxiv.org/abs/2204.07705 | https://cartinoe5930.tistory.com/entry/Super-Natural-Instructions-Generalization-via-Declarative-Instructions-on-1600-NLP-Tasks-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Unnatural Instructions: Tuning Language Models with (Almost) Not Human Labor | https://arxiv.org/abs/2212.09689 | Will be uploaded later! |
| Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-shot Learners | https://arxiv.org/abs/2210.02969 | https://cartinoe5930.tistory.com/entry/Guess-the-Instruction-Flipped-Learning-Makes-Language-Models-Stronger-Zero-shot-Learners-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Scaling Instruction-Finetuned Language Models | https://arxiv.org/abs/2210.11416 | https://cartinoe5930.tistory.com/entry/Scaling-Instruction-Finetuned-Language-Models-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Exploring the Benefits of Training Expert Language Models over Instruction Tuning | https://arxiv.org/abs/2302.03202 | https://cartinoe5930.tistory.com/entry/Exploring-the-Benefits-of-Training-Expert-Language-Models-over-Instruction-Tuning-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| ICIL: In-Context Instruction Learning | https://arxiv.org/abs/2302.14691 | https://cartinoe5930.tistory.com/entry/ICIL-In-Context-Instruction-Learning-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Instruction tuning with GPT-4 | https://arxiv.org/abs/2304.03277 | https://cartinoe5930.tistory.com/entry/Instruction-Tuning-with-GPT-4-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| FIP: Fixed Input Parameterization for Efficient Prompting | https://aclanthology.org/2023.findings-acl.533.pdf | Will be uploaded later! |
| FlaCuna: unleashin the Problem Solving Power of Vicuna using FLAN Fine-tuning | https://arxiv.org/abs/2307.02053 | Will be uploaded later! |
| Maybe Only 0.5% Data Is Needed: A Preliminary Exploration of Low Training Data Instruction Tuning | https://arxiv.org/abs/2305.09246 | Will be uploaded later! |
| Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning | https://arxiv.org/abs/2307.03692 | Will be uploaded later! |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| RLHF(Reinforcement Learning from Human Feedback) | https://huggingface.co/blog/rlhf | https://cartinoe5930.tistory.com/entry/%EC%82%AC%EB%9E%8C%EC%9D%98-%ED%94%BC%EB%93%9C%EB%B0%B1%EC%9D%84-%ED%86%B5%ED%95%9C-%EA%B0%95%ED%99%94%ED%95%99%EC%8A%B5-Reinforcement-Learning-from-Human-Feedback-RLHF |
| Red Teaming Language Models with Language Models | https://arxiv.org/abs/2202.03286 | https://cartinoe5930.tistory.com/entry/Red-Teaming-Language-Models-with-Language-Models-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| InstructGPT: Training language models to follow instructions with human feedback | https://arxiv.org/abs/2203.02155 | https://cartinoe5930.tistory.com/entry/InstructGPT-Training-language-models-to-follow-instructions-with-human-feedback-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Training a helpful and harmless assistant with reinforcement learning from human feedback | https://arxiv.org/abs/2204.05862 | https://cartinoe5930.tistory.com/entry/Training-a-helpful-and-harmless-assistant-with-reinforcement-learning-from-human-feedback-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback | https://arxiv.org/abs/2305.14387 | Will be uploaded later! |
| ALMoST: Aligning Large Language Models through Synthetic Feedback | https://arxiv.org/abs/2305.13735 | https://cartinoe5930.tistory.com/entry/Aligning-Large-Language-Models-through-Synthetic-Feedback-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback | https://arxiv.org/abs/2307.15217 | Will be uploaded later! |
| RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback | https://arxiv.org/abs/2309.00267 | No plan! |
| SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF | https://arxiv.org/abs/2310.05344 | No plan! |
| HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM | https://arxiv.org/abs/2311.09528 | No plan! |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| Adapter: Parameter-Efficient learning for NLP | https://arxiv.org/abs/1902.00751 | https://cartinoe5930.tistory.com/entry/%EB%8B%B9%EC%8B%A0%EB%8F%84-Fine-tuning-%ED%95%A0-%EC%88%98-%EC%9E%88%EC%8A%B5%EB%8B%88%EB%8B%A4-with-PEFT-%F0%9F%A4%97 |
| Prefix-Tuning: Optimizing Continuous Prompts for Generation | https://arxiv.org/abs/2101.00190 | https://cartinoe5930.tistory.com/entry/%EB%8B%B9%EC%8B%A0%EB%8F%84-Fine-tuning-%ED%95%A0-%EC%88%98-%EC%9E%88%EC%8A%B5%EB%8B%88%EB%8B%A4-with-PEFT-%F0%9F%A4%97 |
| LoRA: Low-Rank Adaptation of Large Language Models | https://arxiv.org/abs/2106.09685 | https://cartinoe5930.tistory.com/entry/%EB%8B%B9%EC%8B%A0%EB%8F%84-Fine-tuning-%ED%95%A0-%EC%88%98-%EC%9E%88%EC%8A%B5%EB%8B%88%EB%8B%A4-with-PEFT-%F0%9F%A4%97 |
| Towards a Unified View of Parameter-Efficient Transfer Learning | https://arxiv.org/abs/2110.04366 | Will be uploaded later! |
| UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning | https://arxiv.org/abs/2110.07577 | Will be uploaded later! |
| (IA)^3: Few-Shot Parameter-Efficient Fine-TUning is Better and Cheaper than In-Context Learning | https://arxiv.org/abs/2205.05638 | Will be uploaded later! |
| QLoRA: Efficient Fine-tuning of Quantized LLMs | https://arxiv.org/abs/2305.14314 | Will be uploaded later! |
| Stack More Layers Differently: High-Rank Training Through Low-Rank Updates | https://arxiv.org/abs/2307.05695 | Will be uploaded later! |
| LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition | https://arxiv.org/abs/2307.13269 | Will be uploaded later! |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| Instruction Mining: High-quality Instruction Data Selection for Large Language Models | https://arxiv.org/abs/2307.06290 | No plan! |
| SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization | https://arxiv.org/abs/2212.10465 | No plan! |
| MoDS: Model-oriented Data Selection for Instruction Tuning | https://arxiv.org/abs/2311.15653 | No plan! |
| Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models | https://arxiv.org/abs/2312.06585 | No plan! |
| Magicoder: Source Code Is All You Need | https://arxiv.org/abs/2312.02120 | No plan! |
| WaveCoder: Widespread and Versatile Enhanced Instruction Tuning with Refined Data Generation | https://arxiv.org/abs/2312.14187 | No plan! |
| What Makes Good Data for Alignment: A Comprehensive Study of Automatic Data Selection in Instruction Tuning | https://arxiv.org/abs/2312.15685 | No plan! |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| What is the 'Prompt Engineering'? | See my blog! | https://cartinoe5930.tistory.com/entry/Prompt-Engineering%EC%9D%B4-%EB%AC%B4%EC%97%87%EC%9D%BC%EA%B9%8C |
| CoT: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models | blog: https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html, paper: https://arxiv.org/abs/2201.11903 | https://cartinoe5930.tistory.com/entry/LM%EC%9D%B4-%EC%82%AC%EB%9E%8C%EA%B3%BC-%EC%9C%A0%EC%82%AC%ED%95%9C-%EC%83%9D%EA%B0%81-%ED%94%84%EB%A1%9C%EC%84%B8%EC%8A%A4%EB%A5%BC-%EA%B0%80%EC%A7%80%EA%B2%8C-%EB%90%9C%EB%8B%A4%EB%A9%B4-Chain-of-Thought-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Zero-shot CoT: Large Language Models Are Zero-shot Reasoners | https://arxiv.org/abs/2205.11916 | https://cartinoe5930.tistory.com/entry/Large-Language-Models-are-Zero-Shot-Reasoners-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Language Models are Multilingual Chain-of-Thought Reasoners | https://arxiv.org/abs/2210.03057 | Will be uploaded later! |
| Auto-CoT: Automatic Chain of Thought Prompting in Large Language Models | https://arxiv.org/abs/2210.03493 | Will be uploaded later! |
| CoT KD: Teaching Small Language Models to Reason | https://arxiv.org/abs/2212.08410 | Will be uploaded later! |
| ToT: Tree of Thoughts: Deliberate Problem Solving with Large Language Models | https://arxiv.org/abs/2305.10601 | https://cartinoe5930.tistory.com/entry/Tree-of-Thoughts-Deliberate-Problem-Solving-with-Large-Language-Models-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning | https://arxiv.org/abs/2305.14045 | https://cartinoe5930.tistory.com/entry/CoT-Collection-Improving-Zero-shot-and-Few-shot-Learning-of-Language-Models-via-Chain-of-Thought-Fine-tuning-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Let's verify step-by-step | https://arxiv.org/abs/2305.20050 | https://cartinoe5930.tistory.com/entry/Lets-verify-step-by-step-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Measuring Faitfulness in Chain-of-Thought Reasoning | https://arxiv.org/abs/2307.13702 | Will be uploaded later! |
| SoT: Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding | https://arxiv.org/abs/2307.15337 | Will be uploaded later! |
| Graph of Thoughts: Solving Elaborate Problems with Large Language Models | https://arxiv.org/abs/2308.09687 | Will be uploaded later! |
| From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting | https://arxiv.org/abs/2309.04269 | No plan! |
| Chain-of-Verification Resuces Hallucination in Large Language Models | https://arxiv.org/abs/2309.11495 | https://www.youtube.com/watch?v=l0zFjwRegog&pp=ygUgaHR0cHM6Ly9hcnhpdi5vcmcvYWJzLzIzMDkuMTE0OTU%3D |
| Contrastive Chain-of-Thought Prompting | https://arxiv.org/abs/2311.09277 | No plan! |
| Thread of Thought Unraveling Chaotic Contexts | https://arxiv.org/abs/2311.08734 | No plan! |
| System 2 Attention (Is Something You Might Need Too) | https://arxiv.org/abs/2311.11829 | No plan! |
| Chain of Code: Reasoning with a Language Model-Augmented Code Emulator | https://arxiv.org/abs/2312.04474 | No plan! |
| Paper Title | Paper | Paper Review |
|---|---|---|
| FlashAttention: Fast and Memory-Efficient Exact Attention | https://arxiv.org/abs/2205.14135 | https://gordicaleksa.medium.com/eli5-flash-attention-5c44017022ad |
| Exponentially Faster Language Modeling | https://arxiv.org/abs/2311.10770 | No plan! |
| LLM in a flash: Efficient Large Language Model Inference with Limited Memory | https://arxiv.org/abs/2312.11514 | No plan! |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| Data Augmentations in NLP | blogs: https://neptune.ai/blog/data-augmentation-nlp, https://amitness.com/2020/05/data-augmentation-for-nlp/?fbclid=IwAR11MkccCti-2cD93RYftNPHb7Wxdj7AlZG7NNG4EhPaBkmiJkcBPtdl1eo | https://cartinoe5930.tistory.com/entry/Data-Augmentation-methods-in-NLP |
| PET: Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference | https://arxiv.org/abs/2001.07676 | https://cartinoe5930.tistory.com/entry/PET-Exploiting-Cloze-Questions-for-Few-Shot-Text-Classification-and-Natural-Language-Inference-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Pathways | https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/ | https://cartinoe5930.tistory.com/entry/%EB%A7%8C%EC%95%BD-%EB%AA%A8%EB%8D%B8%EC%9D%B4-%EC%97%AC%EB%9F%AC-%EA%B0%90%EA%B0%81%EC%9D%84-%EB%8A%90%EB%82%84-%EC%88%98-%EC%9E%88%EA%B2%8C-%EB%90%9C%EB%8B%A4%EB%A9%B4-Pathways-%EB%A6%AC%EB%B7%B0 |
| LMSI: Large Language Models Can Self-Improve | https://arxiv.org/abs/2210.11610 | https://cartinoe5930.tistory.com/entry/LMSI-Large-Language-Models-can-Self-Improve-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Self-Instruct: Aligning Language Model with Self Generated Instruction | https://arxiv.org/abs/2212.10560 | https://cartinoe5930.tistory.com/entry/Self-Instruct-Aligning-Language-Model-with-Self-Generated-Instructions-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Reflexion: Language Agents with Verbal Reinforcement Learning | https://arxiv.org/abs/2303.11366 | https://cartinoe5930.tistory.com/entry/Reflexion-Language-Agents-with-Verbal-Reinforcement-Learning-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Self-Refine: Iterative Refinement with Self-Feedback | https://arxiv.org/abs/2303.17651 | https://cartinoe5930.tistory.com/entry/Self-Refine-Iterative-Refinement-with-Self-Feedback-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| REFINER: Reasoning Feedback on Intermediate Representations | https://arxiv.org/abs/2304.01904 | No plan! |
| SelFee: Iterative Self-Revising LLM Expowered by Self-Feedback Generation | https://kaistai.github.io/SelFee/ | https://cartinoe5930.tistory.com/entry/SelFee-Iterative-Self-Revising-LLM-Expowered-by-Self-Feedback-Generation-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints | https://arxiv.org/abs/2305.13245 | https://aliissa99.medium.com/-a596e4d86f79 |
| Shpherd: A Critic for Language Model Generation | https://arxiv.org/abs/2308.04592 | Will be uploaded later! |
| Self-Alignment with Instruction Backtranslation | https://arxiv.org/pdf/2308.06259 | Will be uploaded later! |
| SCREWS: A Modular Framework for Reasoning with Revisions | https://arxiv.org/pdf/2309.13075 | No plan! |
| NEFTune: Noisy Embeddings Improve Instruction Fineuning | https://arxiv.org/abs/2310.05914 | https://cartinoe5930.tistory.com/entry/Noise-makes-LLM-better-NEFTune-%F0%9F%98%89 |
| Language Models are Super Mario; Absorbing Abilities from Homologous Models as a Free Lunch | https://arxiv.org/abs/2311.03099 | No plan! |
| LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment | https://arxiv.org/abs/2312.09979 | No plan! |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks | https://arxiv.org/abs/2005.11401 | No plan! |
| Self-RAG: Learning to Retrieve, Generate, And Critique Through Self-Reflection | https://arxiv.org/abs/2310.11511 | No plan! |
| InstructRetro: Instruction Tuning Post Retrieval-Augmented Pretraining | https://arxiv.org/abs/2310.07713 | No plan! |
| Retrieval-Augmented Generation for Large Language Models: A Survey | https://arxiv.org/abs/2312.10997 | No plan! |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| BIG-Bench Hard: Challenging BIG-Bench tasks and whether chain-of-thought can solve tham | https://arxiv.org/abs/2210.09261 | Will be uploaded later! |
| Large Language Models are not Fair Evaluators | https://arxiv.org/abs/2305.17926 | Will be uploaded later! |
| MT-Bench: Judging LLM-as-a-judge with MT-Bench | https://arxiv.org/abs/2306.05685 | Will be uploaded later! |
| InstructEval: Towards Holistic Evaluation of Instruction-Tuned Large Language Models | https://arxiv.org/abs/2306.04757 | Will be uploaded later! |
| FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets | https://arxiv.org/abs/2307.10928 | Will be uploaded later! |
| GAIA: A Benchmark for General AI Assistants | https://arxiv.org/abs/2311.12983 | No plan! |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| A Length-Extrapolatable Transformer | https://arxiv.org/abs/2212.10554 | No plan! |
| Extending Context Window of Large Language Models via Positional Interpolation | https://arxiv.org/abs/2306.15595 | https://cartinoe5930.tistory.com/entry/LM%EC%9D%98-context-window-%EA%B8%B8%EC%96%B4%EC%95%BC-%ED%95%A0%EA%B9%8C-%EC%A7%A7%EC%95%84%EC%95%BC-%ED%95%A0%EA%B9%8C-%F0%9F%93%8F%F0%9F%A4%A8 |
| LongNet: Scaling Transformers to 1,000,000,000 Tokens | https://arxiv.org/abs/2307.02486 | https://cartinoe5930.tistory.com/entry/LM%EC%9D%98-context-window-%EA%B8%B8%EC%96%B4%EC%95%BC-%ED%95%A0%EA%B9%8C-%EC%A7%A7%EC%95%84%EC%95%BC-%ED%95%A0%EA%B9%8C-%F0%9F%93%8F%F0%9F%A4%A8 |
| Lost in the Middle: How Language Models Use Long Contexts | https://arxiv.org/abs/2307.03172 | https://cartinoe5930.tistory.com/entry/LM%EC%9D%98-context-window-%EA%B8%B8%EC%96%B4%EC%95%BC-%ED%95%A0%EA%B9%8C-%EC%A7%A7%EC%95%84%EC%95%BC-%ED%95%A0%EA%B9%8C-%F0%9F%93%8F%F0%9F%A4%A8 |
| YaRN: Efficient Context Window Extension of Large Language Models | https://arxiv.org/abs/2309.00071 | No plan! |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| Why can GPT learn in-context? | https://arxiv.org/abs/2212.10559 | https://cartinoe5930.tistory.com/entry/Why-can-GPT-learn-in-context-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Sparks of Artificial General Intelligence: Early experiments with GPT-4 | paper: https://arxiv.org/abs/2303.12712, youtube: https://www.youtube.com/watch?v=Mqg3aTGNxZ0 | https://cartinoe5930.tistory.com/entry/Sparks-of-Artificial-General-Intelligence-Early-experiments-with-GPT-4-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| The False Promise of Imitating Proprietary LLMs | https://arxiv.org/abs/2305.15717 | https://cartinoe5930.tistory.com/entry/%EA%B8%B0%EC%A1%B4-imitation-model%EC%9D%80-%EC%9E%98%EB%AA%BB-%ED%95%99%EC%8A%B5%EB%90%98%EA%B3%A0-%EC%9E%88%EB%8B%A4-%F0%9F%AB%A2-The-False-Promise-of-Imitating-Proprietary-L |
| TULU: How Far Can Camels Go? Exploring the State of Instructiopn Tuning on Open Resources | https://arxiv.org/abs/2306.04751 | Will be uploaded later! |
| How Is ChatGPT's Behavior Changing over Time? | https://arxiv.org/abs/2307.09009 | https://cartinoe5930.tistory.com/entry/ChatGPT%EC%9D%98-%EC%84%B1%EB%8A%A5%EC%9D%B4-%EC%95%88-%EC%A2%8B%EC%95%84%EC%A7%80%EA%B3%A0-%EC%9E%88%EB%8B%A4%EA%B5%AC-%F0%9F%98%B2%F0%9F%98%B2 |
| Large Language Models Cannot Self-Correct Reasoning Yet | https://arxiv.org/abs/2310.01798 | |
| How Far Are Large Language Models from Agents with Theory-of-Mind | https://arxiv.org/pdf/2310.03051 | No plan! |
| Can LLMs Follow Simple Rules | https://arxiv.org/abs/2311.04235 | https://www.youtube.com/watch?v=CY6o43037OY |
| Camels in a Changing Climate; Enhancing LM Adaptation with Tulu 2 | https://arxiv.org/abs/2311.10702 | No plan! |
| ChatGPT's One-year Anniversary; Are Open-Source Large Language Models Catching up | https://arxiv.org/abs/2311.15653 | No plan! |
| An In-depth Look at Gemini's Language Abilities | https://arxiv.org/abs/2312.11444 | No plan! |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature | https://arxiv.org/abs/2301.11305 | https://cartinoe5930.tistory.com/entry/%EC%9D%B4-%EA%B8%80%EC%9D%B4-LM%EC%9D%B4-%EB%A7%8C%EB%93%A4%EC%96%B4%EB%82%B8-%EA%B8%80%EC%9D%BC%EA%B9%8C-%EB%8F%84%EC%99%80%EC%A4%98-DetectGPT-DetectGPT-Zero-Shot-Machine-Generated-Text-Detection-using-Probability-Curvature-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback | https://arxiv.org/abs/2302.12813 | https://cartinoe5930.tistory.com/entry/ChatGPT%EC%9D%98-hallucination-%EC%96%B4%EB%96%BB%EA%B2%8C-%ED%95%B4%EA%B2%B0%ED%95%B4%EC%95%BC-%ED%95%A0%EA%B9%8C-Check-Your-Facts-and-Try-Again-Improving-Large-Language-Models-with-External-Knowledge-and-Automated-Feedback |
| RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text | https://arxiv.org/abs/2305.13304 | https://cartinoe5930.tistory.com/entry/ChatGPT%EC%97%90-%EB%B0%98%EB%B3%B5-%EB%A9%94%EC%BB%A4%EB%8B%88%EC%A6%98LSTM%EC%9D%84-%EC%82%AC%EC%9A%A9%ED%95%9C%EB%8B%A4%EB%A9%B4-RecurrentGPT-Interactive-Generation-of-Arbitrarily-Long-Text-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Large Language Models as Tool Makers | https://arxiv.org/abs/2305.17126 | https://cartinoe5930.tistory.com/entry/LM%EC%9D%B4-%EB%8F%84%EA%B5%AC%EB%A5%BC-%EC%82%AC%EC%9A%A9%ED%95%98%EA%B2%8C-%EB%90%9C%EB%8B%A4%EB%A9%B4-%F0%9F%94%AC-Large-Language-Models-as-Tool-Makers-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion | https://arxiv.org/abs/2306.02561 | No plan! |
| Knowledge Distillation of Large Language Models | https://arxiv.org/abs/2306.08543 | https://cartinoe5930.tistory.com/entry/KD%EC%97%90-%EC%82%B4%EC%A7%9D%EC%9D%98-%EB%B3%80%ED%99%94%EB%A5%BC-%EC%A4%98%EB%B3%B4%EC%9E%90-%F0%9F%98%9C-Knowledge-Distillation-of-Large-Language-Models-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Scaling Relationship on Learning Mathematical Reasoning with Large Language Models | https://arxiv.org/abs/2308.01825 | Will be uploaded later! |
| ToolLLM: Facilitating Lare Language Models to Master 16000+ Real-World APIs | https://arxiv.org/abs/2307.16789 | Will be uploaded later! |
| SelfCheck: Using LLMs to Zero-shot Check Their Own Step-by-Step Reasoning | https://arxiv.org/abs/2308.00436 | Will be uploaded later! |
| Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification | https://arxiv.org/abs/2308.07921 | Will be uploaded later! |
| Large Language Models as Optimizers | https://arxiv.org/abs/2309.03409 | No plan! |
| FIAT: Fusing Learning Paradigms with Instruction-Accelerated Tuning | https://arxiv.org/abs/2309.04663 | https://www.youtube.com/watch?v=EZsZEcRDte0&pp=ygUgaHR0cHM6Ly9hcnhpdi5vcmcvYWJzLzIzMDkuMDQ2NjM%3D |
| Contrastive Decoding Improves Reasoning in Large Language Models | https://arxiv.org/abs/2309.09117 | https://www.youtube.com/watch?v=nMR56TkwC1Q&pp=ygUgaHR0cHM6Ly9hcnhpdi5vcmcvYWJzLzIzMDkuMDkxMTc%3D |
| Think before you speak: Training Language Models with Pause Tokens | https://arxiv.org/abs/2310.02226 | https://www.youtube.com/watch?v=MtJ1jacr_yI |
| Large Language Models Can Learn Rules | https://arxiv.org/abs/2310.07064 | No plan! |
| In-context Pretraining: Language Modeling Beyond Document Boundaries | https://arxiv.org/abs/2310.10638 | https://www.youtube.com/watch?v=GI-0lAaILrU |
| Learning From Mistakes Makes LLM Better Reasoner | https://arxiv.org/abs/2310.20689 | No plan! |
| Language Models can be Logical Solvers | https://arxiv.org/abs/2311.06158 | No plan! |
| MART: Improving LLM Safety with Multi-round Automatic Red-Teaming | https://arxiv.org/abs/2311.07689 | No plan! |
| Fine-tuning Language Models for Factuality | https://arxiv.org/abs/2311.08401 | No plan! |
| Positional Description Matters for Transformers Arithmetic | https://arxiv.org/abs/2311.14737 | No plan! |
| Weak-to-Strong Generalization: Eliciting Strong Capabilities with Weak Supervision | https://arxiv.org/abs/2312.09390 | https://openai.com/research/weak-to-strong-generalization |
| TinyGSM: achieving higher than 80 percentage on GSM8k with small language models | https://arxiv.org/abs/2312.09241 | No plan! |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| Morpheme-aware Subword Tokenizer: An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks | https://arxiv.org/abs/2010.02534 | Will be uploaded later! |
| What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers | https://arxiv.org/abs/2109.04650 | Will be uploaded later! |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| history of CNN | LeNet, AlexNet, VGGNet, GoogLeNet, ResNet, ResNeXt, Sception, Mobilenet, DenseNet, EfficientNet, ConvNext | https://cartinoe5930.tistory.com/entry/CNN-network%EC%9D%98-%EC%97%AD%EC%82%AC |
| ViT: An Image Worth 16 x 16 Words: Transformers for Image Recognition at Scale | https://arxiv.org/abs/2010.11929 | https://cartinoe5930.tistory.com/entry/ViT-An-Image-Worth-16-x-16-Words-Transformers-for-Image-Recognition-at-Scale |
| Swin Transformer: Hierarchical Vision Transformer using Shifted Winodws | https://arxiv.org/abs/2103.14030 | https://cartinoe5930.tistory.com/entry/Swin-Transformer-Hierarchical-Vision-Transformer-using-Shifted-Windows-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| CLIP: Learning Transferable Visual Models From Natural Language Supervision | https://arxiv.org/abs/2103.00020 | https://cartinoe5930.tistory.com/entry/CLIP-Learning-Transferable-Visual-Models-From-Natural-Language-Supervision-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| Let's learn about VLM(Visual-Language Model) | https://huggingface.co/blog/vision_language_pretraining#supporting-vision-language-models-in-%F0%9F%A4%97-transformers | https://cartinoe5930.tistory.com/entry/VLMVision-Language-Model%EC%97%90-%EB%8C%80%ED%95%B4-%EC%95%8C%EC%95%84%EB%B3%B4%EC%9E%90 |
| VisualBERT: A simple and Performant Baseline for Vision and Language | https://arxiv.org/abs/1908.03557 | https://cartinoe5930.tistory.com/entry/VisualBERT-A-Simple-and-Performant-Baseline-for-Vision-and-Language-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| ViLBERT: Pre-training Task-Agnostic Visiolinguistic Representations for Visual-and-Language Tasks | https://arxiv.org/abs/1908.02265 | https://cartinoe5930.tistory.com/entry/ViLBERT-Pretraining-Task-Agnostic-Visiolinguistic-Representations-for-Visual-and-Language-Tasks |
| LXMERT: Learning Cross-Modality Encoder Representations from Transformers | https://arxiv.org/abs/1908.07490 | https://cartinoe5930.tistory.com/entry/LXMERT-Learning-Cross-Modality-Encoder-Representations-from-Transformers-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| VL-BERT: Pre-training of Generic Visual-Linguistic Representations | https://arxiv.org/abs/1908.08530 | https://cartinoe5930.tistory.com/entry/VL-BERT-Pre-training-of-Generic-Visual-Linguistic-Representations-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| VLP: Unified Vision-Language Pre-Training for Image Captioning and VQA | https://arxiv.org/abs/1909.11059 | https://cartinoe5930.tistory.com/entry/VLP-Unified-Vision-Language-Pre-Traning-for-Image-Captioning-and-VQA-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks | https://arxiv.org/abs/2004.06165 | https://cartinoe5930.tistory.com/entry/Oscar-Object-Semantics-Aligned-Pre-training-for-Vision-Language-Tasks-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| VinVL: Revisiting Visual Representations in Vision-Language Models | https://arxiv.org/abs/2101.00529 | https://cartinoe5930.tistory.com/entry/VinVL-Revisiting-Visual-Representations-in-Vision-Language-Models-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision | https://arxiv.org/abs/2102.03334 | https://cartinoe5930.tistory.com/entry/ViLT-Vision-and-Language-Transformer-Without-Convolution-or-Region-Supervision-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision | https://arxiv.org/abs/2102.05918 | https://cartinoe5930.tistory.com/entry/ALIGN-Scaling-up-Visual-and-Vision-Language-Representation-with-Noisy-Text-Supervision-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| ALBEF: Vision and Language Representation Learning with Momentum Distillation | https://arxiv.org/abs/2107.07651 | https://cartinoe5930.tistory.com/entry/ALBEF-Vision-and-Language-Representation-Learning-with-Momentum-Distillation-%EB%85%BC%EB%AC%B8 |
| SimVLM: Simple Visual Language Model Pretraining with Weak Supervision | https://arxiv.org/abs/2108.10904 | https://cartinoe5930.tistory.com/entry/SimVLM-Simple-Visual-Language-Model-Pre-training-with-Weak-Supervision-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| VLMo: Unified Vision-Language Pre-training with Mixture-of-Modality-Experts | https://arxiv.org/abs/2111.02358 | https://cartinoe5930.tistory.com/entry/VLMo-Unified-Vision-Language-Pre-training-with-Mixture-of-Modality-Experts-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| LiT : Zero-Shot Transfer with Locked-image text Tuning | https://arxiv.org/abs/2111.07991 | https://cartinoe5930.tistory.com/entry/LiT%F0%9F%94%A5-Zero-Shot-Transfer-with-Locked-image-text-Tuning-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| FLAVA: A Foundational Language And Vision Alignment Model | https://arxiv.org/abs/2112.04482 | https://cartinoe5930.tistory.com/entry/FLAVA-A-Foundational-Language-And-Vision-Alignment-Model-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation | https://arxiv.org/abs/2201.12086 | https://cartinoe5930.tistory.com/entry/BLIP-Bootstrapping-Language-Image-Pre-training-fro-Unified-Vision-Language-Understanding-and-Generation-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Paper or Posting Title | reference site Link | Review |
|---|---|---|
| Knowledge Distillation: Distilling the Knowledge in a Neural Network | https://arxiv.org/abs/1503.02531 | https://cartinoe5930.tistory.com/entry/Distilling-the-Knowledge-in-a-Neural-Network-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| What is Zero-shot, One-shot, Few-shot Learning? | see my blog! | https://cartinoe5930.tistory.com/entry/Zero-shot-One-shot-Few-shot-Learning%EC%9D%B4-%EB%AC%B4%EC%97%87%EC%9D%BC%EA%B9%8C |