Deep Learning Paper
1.0.0
Saya membaca makalah ini yang terkait dengan NLP dan pembelajaran mendalam. Berikut adalah berbagai makalah dari dasar hingga lanjutan. ? Selain itu, Anda dapat memeriksa ulasan kertas Korea saya dengan mengklik tautan yang terlampir ke tabel.
Anda dapat melihat lebih banyak ulasan kertas, implementasi kode, dan deskripsi matematika di blog saya <- klik di sini
Saya menulis beberapa artikel untuk menjelaskan secara rinci beberapa teknologi pembelajaran yang mendalam. Artikel -artikel ini dapat ditemukan di tabel di bawah ini.
| Judul | Tautan blog |
|---|---|
| Bagaimana hukum penskalaan dikembangkan di NLP? ? | https://cartinoe5930.tistory.com/entry/how-has-saling-law-developed-in-nlp-%f0%9f%A4%94-NLP%EC%97%90%EC%84 %9c-scaling-law%EB%8a%94-%EC%96%B4%EB%96%BB%EA%B2%8C-%EB%B0%9C%EC%A0%84%EB%90%98%EC%97%88%EC%9D%84%EA%B9%8C |
| Sumber tertutup? Open-source ?? Apa itu?? ?? | https://cartinoe5930.tistory.com/entry/the-hopes-of-researchers-open-source-%f0%9f%A4%97- %EC%97%B0%EA%B5%AC%EC%9e%90%EB%93%A4%EC%9d%98-%ED%9d%AC%EB%A7%9D-Open-Source-%F0%9F%A4%97 |
| Konteks jendela LM, haruskah itu lama? Haruskah itu pendek? ?? | https://cartinoe5930.tistory.com/entry/lm%ec%9d%98-context-window-%ea%B8%B8%EC%96%B4%EC%95%B C-%ED%95%A0%EA%B9%8C-%EC%A7%A7%EC%95%84%EC%95%BC-%ED%95%A0%EA%B9%8C-%F0%9F%93%8F%F0%9F%A4%A8 |
| Apa cara paling optimal untuk mengevaluasi LM? ? | https://cartinoe5930.tistory.com/entry/lm%EC%9D%84-%AA dengan%B0%80%EC%9E%A5-%EB5%A%9C%AD%A0%81%9C%ABC%BC%A1%9C-%AD%8F%8F%89 %89C%89C%ABC%ABC%A1%9C-%8F%8F%8F%8F. B0%80%ED%95%A0-%EC%88%98-%EC%9e%88%EB%8A%94-%EB%B0%A9%EB%B2%95%EC%9d%80-%EB%AC%B4%EC%97%87%EC%9d%BC%EA%B9%8c-ec%97%87%9D%%BC%EA%B9%8c-ec%97%87%9D%%BC%EA%B9%8c-ec%97%87%9d%BC%EA%B9%8c-ec%97%87%9d%BC%EA%B9C 8C- |
| Kinerja chatgpt semakin buruk?!?!? ?? | https://cartinoe5930.tistory.com/entry/chatgpt%ec%9d%98-%ec%84%B1%EB%8A%A5%EC%9D%B4-%EC%95% 88-%EC%A2%8B%EC%95%84%EC%A7%80%EA%B3%A0-%EC%9e%88%EB%8B%A4%EA%B5%AC-%F0%9F%98%B2%F0%9F%98%B2 |
| Anda bisa menyempurnakan juga! dengan peft? | https://cartinoe5930.tistory.com/entry/%eb%8b%B9%EC%8B%A0%EB%8F%84-Fine-Tuning- %ED%95%A0-%EC%88%98-%EC%9e%88%EC%8A%B5%EB%8B%88%EB%8B%A4-WITH-PEFT-%F0%9F%A4%97 |
| Mari kita pikirkan langkah demi langkah seperti manusia! ?? | https://cartinoe5930.tistory.com/entry/%ed%95%9c-%eb%8b%A8%AA%B3%84-%ed%95%9c-%eb%8b%A8%AA%B3%84%94%A9-%EB%8B%A8%AA dengan%B3%94%94%94%A9- %EC%9d%B8%EA%B0%84%EC%B2%98%EB%9F%BC-%EC%83%9d%EA%B0%81%ED%95%B4%EB%B3%B4%EC%9E%90-%F0%9F%A7%A0%F0%9F A4%A4%94 |
| Proses pengembangan metode penyempurnaan !! Dari fine-tuning ke rlhf? ➡️? | https://cartinoe5930.tistory.com/entry/fine-tuning-method%9d%98-%EC%A7%84%ED%99%94-%AA dengan%E2%AM%A0ikel |
| Saatnya menyempurnakan chatgpt !! ⏰ | https://cartinoe5930.tistory.com/entry/%EC%9D%B4%AC%A0%9C%EB%8A%94-Catgpt%AB%A5%BC-FINE-TUNING-%ED%95%A0-%8B%9C%AA dengan%AB0%84-%84-%E2 |
| Kebisingan membuat LLM lebih baik! - Neftune | https://cartinoe5930.tistory.com/entry/noise-makes-llm-better-neftune-%f0%9f%98%89 |
| Judul kertas | Tautan kertas atau situs referensi | Ulasan Kertas |
|---|---|---|
| Matriks embedding | https://wikidocs.net/book/2155 | https://cartinoe5930.tistory.com/entry/embedding-matrix-%ed%95%99%ec%8a%B5 |
| LSTM: Memori istilah jangka panjang | https://cilah.github.io/posts/2015-08-understanding-lstms/ | https://cartinoe5930.tistory.com/entry/%EC%95%8C%AA%B8%B0-%EC%89%BD%AA%B2%B4C-LSTM-NETWORKS-%AC%9D%B4%ED%95%B4%ED%95%98%AB8%B8%B4%AD%95%B4%ED%95%98%AA dengan%AB8%B8%B4%AD |
| GRU: Representasi frasa belajar menggunakan rnn encoder-decoder untuk terjemahan mesin statistik | https://arxiv.org/abs/1406.1078 | https://cartinoe5930.tistory.com/entry/gru-emphirical-evaluuation-of-gated-recurrent-neural-networks-on-setence-modeling-%eb%85%BC%AB%AC%B8-%AB%A6%AC%EB%B7%B0 |
| LSTM vs. Gru: Evaluasi empiris jaringan saraf berulang yang terjaga keamanannya pada pemodelan urutan | https://arxiv.org/abs/1412.3555 | https://cartinoe5930.tistory.com/entry/lstm-vs-gru-%eb%AD%90%AA%B0%80-%EB%8D%94-%EB%82%98%EC%9D%9D%AB9%B9%8C-EMP IRICAL-EVALUATION-OF-GATED-RECURRENT-NEURAL-NETWORKS-ON-URENENCE-MODELING-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Judul kertas | Tautan kertas atau situs referensi | Ulasan Kertas |
|---|---|---|
| Transformer: Perhatian adalah yang Anda butuhkan | https://arxiv.org/abs/1706.03762 | https://cartinoe5930.tistory.com/entry/transformer-attention-is-all-you-need-%eb%85%BC%EB%AC%B8-%EB%A6%AC%B7%B0 |
| Elmo: Representasi kata kontekstual yang mendalam | https://arxiv.org/abs/1802.05365 | https://cartinoe5930.tistory.com/entry/pre-trained-language-modeling-paper-reading1-elmo-deep-contextualized-word-representations |
| Bert: Pra-pelatihan transformator dua arah yang dalam untuk pemahaman bahasa | https://arxiv.org/abs/1810.04805 | https://cartinoe5930.tistory.com/entry/pre-trained-language-modeling-paper-reading2-bert-pre-training-of-deep-bidirectional-transformers-for-language-mengarai |
| GPT-1: Meningkatkan pemahaman bahasa dengan pra-pelatihan generatif | https://s3-us-west-2.amazonaws.com/openai-assets/research-coxs/language-unsupervised/language_understanding_paper.pdf | https://cartinoe5930.tistory.com/entry/pre-trained-language-modeling-paper-reading3-gpt-1-Improving-danguage-understanding-by-generative-pre-training |
| GPT-2: Model bahasa adalah pelajar multitask tanpa pengawasan | https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf | https://cartinoe5930.tistory.com/entry/gpt-2-language-models-are-unsupervised-multitask-learnners-%eb%85%BC%BC%AC%B8-%EB%A6%AC%B7%B0 |
| GPT-3: Model bahasa adalah pelajar beberapa shot | https://cartinoe5930.tistory.com/entry/gpt-3-language-models-are-few-shot-learnners-%eb%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 | https://cartinoe5930.tistory.com/entry/gpt-3-language-models-are-few-shot-learnners-%eb%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Transformer-xl: Model bahasa yang penuh perhatian di luar konteks panjang tetap | https://arxiv.org/abs/1901.02860 | https://cartinoe5930.tistory.com/entry/transformer-xl-attentive-danguage-models-beyond-a-fixed-length-context-%eb%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Transformator jarang: menghasilkan urutan panjang dengan transformator yang jarang | https://arxiv.org/abs/1904.10509 | https://cartinoe5930.tistory.com/entry/sparse-transformers-generating-long-sequence-with-sparse-transformers-%eb%85%BC%BC%AC%B8-%EB%A6%AC%B7%B0 |
| XLNET: pretraining autoregresif umum untuk pemahaman bahasa | https://arxiv.org/abs/1906.08237 | https://cartinoe5930.tistory.com/entry/xlnet-generalized-autoregressive-pretraining-for-danguage-understanding-%eb%85%BC%BC%AC%B8-%EB%A6%AC%B7%B0 |
| Spanbert: Meningkatkan pra-pelatihan dengan mewakili dan memprediksi rentang | https://arxiv.org/abs/1907.10529 | https://cartinoe5930.tistory.com/entry/spanbert-improving-pre-training-by-representing-and-predicting-spans-%eb%85%BC%B8%B8-%EB%A6%AC%B7%B0 |
| Roberta: pendekatan pra-pelatihan Bert yang dioptimalkan dengan kuat | https://arxiv.org/abs/1907.11692 | https://cartinoe5930.tistory.com/entry/roberta-a-robustly-optimizized-bert-pretraining-approach-%eb%85%BC%BC%AC%B8-%EB%A6%AC%B7%B0 |
| Kalimat-Bert: Embeddings Kalimat Menggunakan Siames Bert-Networks | https://arxiv.org/abs/1908.10084 | https://cartinoe5930.tistory.com/entry/sentence-bert-sentence-embeddings-ge-siamese-bert-networks-%eb%85%BC%EB%AC%B8-%EB%A6%AC%B7%B0 |
| Albert: Lite Bert untuk pembelajaran representasi bahasa sendiri | https://arxiv.org/abs/1909.11942 | https://cartinoe5930.tistory.com/entry/albert-a-lite-bert-for-self-supervised-learning-of-language-representations-%EB%B7%BC%AB%B8-%AB%A6%AC%EB%B7%B0 |
| Bart: Denoising Sequence-to-Sequence Pra-Pelatihan untuk Generasi Bahasa Alami, Terjemahan, dan Pemahaman | https://arxiv.org/abs/1910.13461 | https://cartinoe5930.tistory.com/entry/bart-denoising-setence-to-setence-pre-training-for-natural-danguage-generation-translation-and-comprehension-%EB%BC%B7%AC%B8-%BB |
| Pra-LN Transformer: pada normalisasi lapisan dalam arsitektur transformator | https://arxiv.org/abs/2002.04745 | https://cartinoe5930.tistory.com/entry/pre-ln-transformer-on-layer-normalization-ln-transformer-architecture-%eb%85%BC%BC |
| Electra: Encoder teks pra-pelatihan sebagai diskriminator daripada generator | https://arxiv.org/abs/2003.10555 | https://cartinoe5930.tistory.com/entry/electra-pre-training-text-encoders-as-discriminators-rather-than-generators |
| Longformer: Transformator Dokumen Panjang | https://arxiv.org/abs/2004.05150 | https://cartinoe5930.tistory.com/entry/longformer-the-long-document-transformer-%eb%85%bc%eb--c%B8-%EB%A6%AC%B7%B0 |
| Bigbird: Transformers untuk urutan yang lebih lama | https://arxiv.org/abs/2007.14062 | https://cartinoe5930.tistory.com/entry/bigbird-transformers-for-longer-salience-%eb%85%bc%eb--ac%B8-%EB%A6%AC%B7%B0 |
| WebGPT: Permintaan pertanyaan yang dibantu oleh browser dengan umpan balik manusia | https://arxiv.org/abs/2112.09332 | https://cartinoe5930.tistory.com/entry/webgpt-wrowser-assisted-question-answering-with-human-feedback-%eb%85%BC%EB%AC%B8-%EB%A6%AC%B7%B0 |
| Opt: Buka model bahasa transformator pra-terlatih | https://arxiv.org/abs/2205.01068 | https://cartinoe5930.tistory.com/entry/opt-open-pre-trained-transformer-language-models-%eb%85%BC%BB0 |
| Mamba: Pemodelan Urutan Linear-Time Dengan Ruang Negara Selektif | https://arxiv.org/abs/2312.00752 | Tidak ada rencana! |
| Judul kertas | Tautan kertas atau situs referensi | Ulasan Kertas |
|---|---|---|
| Tinybert: Bert Distilling untuk Pemahaman Bahasa Alami | https://arxiv.org/abs/1909.10351 | https://cartinoe5930.tistory.com/entry/tinybert-distilling-bert-for-natural-language-understanding-%eb%85%BC%BB0 |
| Distilbert: Versi Bert Distilled | https://arxiv.org/abs/1910.01108 | https://cartinoe5930.tistory.com/entry/distilbert-a-distilled-version-of-bert-smaller-faster-creeper-and-lighter-%eb%85%BC%B8%B8-%EB%A6%AC%B7%B0 |
| Bukan hanya ukuran yang penting: Model bahasa kecil juga merupakan pelajar beberapa shot (PET 응용) | https://arxiv.org/abs/2009.07118 | https://cartinoe5930.tistory.com/entry/its-not-just-size-that-matters-small-language-models-are-also-few-shot-learners-%eb 7,bc%eb 7 |
| Judul kertas | Tautan kertas atau situs referensi | Ulasan Kertas |
|---|---|---|
| Chinchilla: Model Bahasa Besar Komputasi-Optimal Pelatihan | https://arxiv.org/abs/2203.15556 | https://cartinoe5930.tistory.com/entry/%EC%A7%80%AA%B8%88-%AA%B9%8C%AC%A7%80%E C%9D%98-LM-SCALING-LAW%EC%97%90%EB%8A%94-%EB%AC%B8%EC%A0%9C%EC%A0%90%EC%9d%B4- %EC%9e%88%EB%8B%A4-%F0%9F%98%B6%E2%80%8d%F0%9F%8c%AB%EF%B8%8F-CHINCHILLA-TRAING-Compute-Compute-Optimal-Large-Language-Models-%EB EB EB EB 85%BC%EB8%AC%AC%-BANG-BURGE-MODELS-%%EB EB EB 85%BC%EB8%AC%AC%-BANG-BURGE-MOLLEGE-%%%ebal%ebal%ebal%ebs%eb-eb-%eb-eb-model eb-ebs-model-%eb-ebs-model ebol-eb-model eb-model-ebs-models- |
| Pythia: Suite untuk menganalisis model bahasa besar di seluruh pelatihan dan penskalaan | https://arxiv.org/abs/2304.01373 | Tidak ada rencana! |
| Lima: Lebih sedikit lebih banyak untuk penyelarasan | https://arxiv.org/abs/2305.11206 | https://cartinoe5930.tistory.com/entry/lima-less-is-more-for-alignment-%eb%85%BC%BM%AC%B8-%EB%A6%AC%B7%B0 |
| Llama: Model Bahasa Yayasan Terbuka dan Efisien | https://arxiv.org/abs/2302.13971 | https://cartinoe5930.tistory.com/entry/llama-open-and-eficient-foundation--language-models-%eb%85%BC%B0%B8-%EB%A6%AC%EB%B7%B0 |
| Wizardlm: Memberdayakan model bahasa besar untuk mengikuti instruksi yang kompleks | https://arxiv.org/abs/2304.12244 | https://cartinoe5930.tistory.com/entry/open-domain-instruction%ec%9d%98-%ed%9a%A8%AA%B3%BC-%F0%9F%AA%84-Wiz ardlm-empowering-large-bahasa-model-untuk-follow-complex-instruksi-%eb%85%bc%eb%ac%b8-%eb%a6%ac%eb%b7%b0 |
| WizardCoder: Model Pemberdayaan Model Bahasa Besar dengan Evol-Instruksi | https://arxiv.org/abs/2306.08568 | https://huggingface.co/wizardlm/wizardcoder-15b-v1.0 |
| Wizardmath: memberdayakan penalaran matematika untuk model bahasa besar melalui evol-instruct yang diperkuat | https://arxiv.org/abs/2308.09583 | https://huggingface.co/wizardlm/wizardmath-70b-v1.0 |
| Alpaca: Model pengikut instruksi yang kuat dan dapat ditiru | https://crfm.stanford.edu/2023/03/13/alpaca.html | https://cartinoe5930.tistory.com/entry/alpaca-a-strong-wicat-instruction-following-model-%eb%A6%AC%EB%B7%B0 |
| Vicuna: obrolan open-source yang mengesankan GPT-4 | https://lmsys.org/blog/2023-03-30-vicuna/ | https://cartinoe5930.tistory.com/entry/vicuna-an-open-source-catbot-impressing-gpt-4-%eb%A6%AC%EB%B7%B0 |
| Koala: Model Dialog untuk Penelitian Akademik | https://bair.berkeley.edu/blog/2023/04/03/koala/ | https://cartinoe5930.tistory.com/entry/%EC%A4%91%EC%9A%94%ED%95%9C-%AA%B1%B4-%AA%BA%BA%EC%9D%B4%EC %A7%80-%EC%95%8A%EB%8a%94-tinggi-koala-koala%f0%9f%90%a8-a-dialog-model-untuk-akademik-researc |
| Baize: Model obrolan sumber terbuka dengan penyetelan parameter-efisien pada data self-chat | https://arxiv.org/abs/2304.01196 | https://cartinoe5930.tistory.com/entry/%F0%9F%90%B2BAIZE-AN-OPEN-SOURCE-CHAT-MODEL-WITH PARAMETER-EFFICICION-TUNING-ON-SELF-CHAT-DATA-%BC%BC%BC%BC%ABIBANC9-%AB8-%EB8-%EB86 |
| Menskalakan model bahasa yang dibatasi data | https://arxiv.org/abs/2305.16264 | https://www.youtube.com/watch?v=tk0-sitkcmw&pp=ygugahr0chm6ly9hcnhpdi5vcmcvywjzlzizmduumtyynjq%3D |
| Falcon & RefinedWeb | https://arxiv.org/abs/2306.01116 | https://cartinoe5930.tistory.com/entry/open-llm-leaderboard%EB%A5%BC-%ED%9C%A9%EC%93%B4-FALCON%F0%9F%A6%85-LLM-FALCON-RefinedWeb |
| ORCA: Pembelajaran Progresif dari Jejak Penjelasan Kompleks GPT-4 | https://arxiv.org/pdf/2306.02707 | https://cartinoe5930.tistory.com/entry/%f0%9f%90%Acorca-progressive-learning-from-complex-explanation-traces-of-gpt-4-%eb 7,bc%eb 7.ac%B8-%eb |
| PHI-1: Buku Teks adalah semua yang Anda butuhkan | https://arxiv.org/abs/2306.11644 | https://cartinoe5930.tistory.com/entry/%ED%95%84%EC%9A%94%ED%95%9C-%AA%B1%B4-%AC%98%A4%A dengan%A7%81-%B5%90%AC%AC%B3%A4%A %A4%80%EC%9d%98-%EB%8d%B0%EC%9d%B4%ed%84%B0%EB%BF%90-%F0%9F%93%96-PHI-1-Textbooks-Are-all-you-need-%EB%85%BC%EB%AC%B8-ARE-SEMUA-YOU-NEED-%%%85%BC%EB%AC%B8-ARE-ALL-YOU-YEED-%%%85%BC%EB%AC%B8-% |
| Alpagasus: Melatih alpaca yang lebih baik dengan data yang lebih sedikit | https://arxiv.org/abs/2307.08701 | Akan diunggah nanti! |
| Llama 2: Open Foundation and Fine-Tuned Chat Model | https://arxiv.org/abs/2307.09288 | https://cartinoe5930.tistory.com/entry/the-hopes-of-researchers-open-source-%f0%9f%A4%97- %EC%97%B0%EA%B5%AC%EC%9e%90%EB%93%A4%EC%9d%98-%ED%9d%AC%EB%A7%9D-Open-Source-%F0%9F%A4%97 |
| Platypus: Penyempurnaan LLMS yang cepat, murah, dan kuat | https://arxiv.org/abs/2308.07317 | Akan diunggah nanti! |
| Kode Llama: Model Yayasan Terbuka untuk Kode | https://arxiv.org/abs/2308.12950 | Tidak ada rencana |
| FLM-101B: LLM terbuka dan cara melatihnya dengan anggaran $ 100 ribu | https://arxiv.org/pdf/2309.03852 | Tidak ada rencana! |
| Buku teks adalah semua yang Anda butuhkan II: Laporan Teknis PHI-1.5 | https://arxiv.org/abs/2309.05463 | https://huggingface.co/microsoft/phi-1_5 |
| OpenChat: Memajukan model bahasa open-source dengan data berkualitas campuran | https://arxiv.org/abs/2309.11235 | https://github.com/imoneoi/openchat |
| Mistral 7B | https://arxiv.org/abs/2310.06825 | https://mistral.ai/news/annoulcing-mistral-7b/ |
| Prometheus: Menginduksi kemampuan evaluasi berbutir halus dalam model bahasa | https://arxiv.org/abs/2310.08491 | https://huggingface.co/papers/2310.08491#652a8e7f30355beba68c1be6 |
| Zephyr: Distilasi langsung penyelarasan LM | https://arxiv.org/abs/2310.16944 | https://www.youtube.com/watch?v=tkzbg3mksio |
| Orca2: mengajar model bahasa kecil bagaimana nalar | https://arxiv.org/abs/2311.11045 | https://www.microsoft.com/en-us/research/blog/orca-2-teaching-small-language-models-how-to-reason/ |
| Seri Falcon model bahasa terbuka | https://arxiv.org/abs/2311.16867 | Tidak ada rencana! |
| Solar 10.7b: Menskalakan model bahasa besar dengan peningkatan skala yang sederhana namun efektif | https://arxiv.org/abs/2312.15166 | Tidak ada rencana! |
| Judul kertas | Tautan kertas atau situs referensi | Ulasan Kertas |
|---|---|---|
| LAMDA: Model Bahasa untuk Aplikasi Dialog | Blog: https://ai.goOgleblog.com/2022/01/lamda-towards-safe-grounded-and-high.html, kertas: https://arxiv.org/abs/2201.08239 | https://cartinoe5930.tistory.com/entry/%AA%B5%AC%AA%B8%80%EC%9D%98-%EC%B5%9C%AA dengan lamda%95-%97%97%177%B4%87 lamda%97%97² %80%ED%95%B4-%EC%95%8c%EC%95%84%EB%B3%B4%EC%9e%90-Bahasa-Model-Untuk-Dialog-Aplikasi-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB EB B7%B0 |
| Palm: Pemodelan Bahasa Penskalaan dengan Jalur | Blog: https://ai.goOgleblog.com/2022/04/pathways-language-model-palm-scaling-t.html, kertas: https://arxiv.org/abs/2204.02311 | 1: https://cartinoe5930.tistory.com/entry/lamda%ec%9d%98-%EB%92%A4%EB%A5%BC-%AC%9E%17%EB%8A%94-Pathway 99%9c%EC%9a%A9%ed%95%9c-%EC%B4%88%EA%B1%B0%EB%8C%80-%EC%96%B8%EC%96%B4-%EB%AA%A8%EB%8D%B8-PALM-%EB%A6%EB EB7%B0%B0 2: https://cartinoe5930.tistory.com/entry/lamda%ec%9d%98-%eb%92%A4%EB%A5%BC-%AC%9E%17%EB%8A%94-Pathways%A5%BC-%EC%.EB%8A%94 Pathways%A5%A5%BC-%EC% 82%ac%ec%9a%a9%ed%95%9c-%ec%b4%88%ea%b1%b0%eb%8c%80-%ec%96%b8%ec%96%b4-%eb%aa%a8%eb%8d%b8-palm-%eb%%a6%ebal b7 b7%b0%b8-palm-%eb%a6%ac%b7 b7%b0%b8-palm-%eb%a6%ac%b7 b7%b0%b8-palm-%eb%a6%ac%b7 b7%b0% |
| GPT-4: Tinjauan Teknis | Blog: https://openai.com/research/gpt-4, kertas: https://arxiv.org/abs/2303.08774 | https://cartinoe5930.tistory.com/entry/gpt-4-techinal-report-review |
| Gemini: Keluarga model multimodal yang sangat mampu | https://arxiv.org/abs/2312.11805 | Tidak ada rencana! |
| Laporan Teknis Alphacode 2 | https://storage.googleapis.com/deepmind-media/alphacode2/alphacode2_tech_report.pdf | Tidak ada rencana! |
| Judul kertas | Tautan kertas atau situs referensi | Ulasan Kertas |
|---|---|---|
| Flan: Model bahasa yang disempurnakan adalah pelajar zero-shot | https://arxiv.org/abs/2109.01652 | https://cartinoe5930.tistory.com/entry/flan-fine-tuned-language-models-are-zero-shot-learnners-%eb%85%BC%BC%AC%B8-%AB%A6%AC%B7%B0 |
| T0: Multitask Minta Pelatihan Mengaktifkan Generalisasi Tugas Zero-Shot | https://arxiv.org/abs/2110.08207 | https://cartinoe5930.tistory.com/entry/t0-multitask-prompted-training-akables-zero-shot-task-generalization-%eb%85%BC%BC |
| Instruksi Super-Natural: Generalisasi Melalui Instruksi Deklaratif pada Tugas 1600+ NLP | https://arxiv.org/abs/2204.07705 | https://cartinoe5930.tistory.com/entry/super-natural-instructions-generalization-via-declarative-instructions-N-1600-nlp-Tasks-k-%EB%BC%BC%AC%B8-%EB%A6%AC%AB%B7 |
| Instruksi yang tidak wajar: Tuning model bahasa dengan (hampir) bukan tenaga manusia | https://arxiv.org/abs/2212.09689 | Akan diunggah nanti! |
| Tebak instruksi! Pembelajaran terbalik membuat model bahasa yang lebih kuat menjadi pelajar zero-shot | https://arxiv.org/abs/2210.02969 | https://cartinoe5930.tistory.com/entry/guess-the-instruction-flipped-learning-makes-language-models-toker-zero-shot-learners-%eb%85%BC%BC |
| Model Bahasa yang Di-Instruksi Penskalaan | https://arxiv.org/abs/2210.11416 | https://cartinoe5930.tistory.com/entry/scaling-instruction-finetuned--language-models-%eb%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Menjelajahi Manfaat Pelatihan Model Bahasa Ahli daripada Penyetelan Instruksi | https://arxiv.org/abs/2302.03202 | https://cartinoe5930.tistory.com/entry/exploring-the-benefits-of-training-expert-language-models-over-sruction-tuning-%eb%85%BC%EB%AC%B8-%AB%A6%AC%EB%B7%B0 |
| ICIL: Pembelajaran Instruksi In-Context | https://arxiv.org/abs/2302.14691 | https://cartinoe5930.tistory.com/entry/icil-in-context-instruction-learning-%eb%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Penyetelan instruksi dengan GPT-4 | https://arxiv.org/abs/2304.03277 | https://cartinoe5930.tistory.com/entry/instruction-tuning-with-gpt-4-%eb%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| FIP: Parameterisasi input tetap untuk dorongan yang efisien | https://aclanthology.org/2023.findings-acl.533.pdf | Akan diunggah nanti! |
| Flacuna: melepaskan kekuatan pemecahan masalah vicuna menggunakan fine-tuning flan | https://arxiv.org/abs/2307.02053 | Akan diunggah nanti! |
| Mungkin hanya data 0,5% yang diperlukan: eksplorasi awal penyetelan instruksi data pelatihan rendah | https://arxiv.org/abs/2305.09246 | Akan diunggah nanti! |
| Menjadi instruktur diri: Memperkenalkan kriteria penghentian awal untuk penyetelan instruksi minimal | https://arxiv.org/abs/2307.03692 | Akan diunggah nanti! |
| Judul kertas | Tautan kertas atau situs referensi | Ulasan Kertas |
|---|---|---|
| RLHF (Penguatan Pembelajaran dari Umpan Balik Manusia) | https://huggingface.co/blog/rlhf | https://cartinoe5930.tistory.com/entry/%EC%82%AC%EB%9E%8C%EC%9D%98-%ed%94%BC%EB%93%9C%B0%B1%B1%9D%84 -%ed%86%B5%ed%95%9c-%ea%b0%95%ed%99%94%ed%95%99%ec%8a%B5-reinforcement-learning-from-human-feedback-rlhf |
| Model bahasa tim merah dengan model bahasa | https://arxiv.org/abs/2202.03286 | https://cartinoe5930.tistory.com/entry/red-teaming-language-models-with-language-models-%eb%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Instruktur: Model Bahasa Pelatihan untuk mengikuti instruksi dengan umpan balik manusia | https://arxiv.org/abs/2203.02155 | https://cartinoe5930.tistory.com/entry/instructgpt-raining-language-models-to-follow-sructions-with-human-feedback-%EB%B7%BC%EB%AC%B8-%EB%A6%AC%B7%B0 |
| Melatih asisten yang membantu dan tidak berbahaya dengan pembelajaran penguatan dari umpan balik manusia | https://arxiv.org/abs/2204.05862 | https://cartinoe5930.tistory.com/entry/training-a-helpful-and-harmless-assistant-with-reinforcement-learning-from-human-feedback-%eb 7,bc%eb 8 |
| Alpacafarm: Kerangka kerja simulasi untuk metode yang belajar dari umpan balik manusia | https://arxiv.org/abs/2305.14387 | Akan diunggah nanti! |
| Hampir: Menyelaraskan model bahasa besar melalui umpan balik sintetis | https://arxiv.org/abs/2305.13735 | https://cartinoe5930.tistory.com/entry/aligning-large-language-models-through-ynthetic-feedback-%eb%85%bc%eb 10 |
| Masalah terbuka dan keterbatasan mendasar pembelajaran penguatan dari umpan balik manusia | https://arxiv.org/abs/2307.15217 | Akan diunggah nanti! |
| RLAIF: Penskalaan Penguatan Pembelajaran dari Umpan Balik Manusia dengan Umpan Balik AI | https://arxiv.org/abs/2309.00267 | Tidak ada rencana! |
| Steerlm: atribut SFT terkondisi sebagai alternatif (steerable pengguna) untuk rlhf | https://arxiv.org/abs/2310.05344 | Tidak ada rencana! |
| HelpSteer: Dataset Bantuan Multi-Atribut untuk Steerlm | https://arxiv.org/abs/2311.09528 | Tidak ada rencana! |
| Judul kertas | Tautan kertas atau situs referensi | Ulasan Kertas |
|---|---|---|
| Adaptor: Pembelajaran Parameter-Efisien untuk NLP | https://arxiv.org/abs/1902.00751 | https://cartinoe5930.tistory.com/entry/%eb%8b%B9%EC%8B%A0%EB%8F%84-Fine-Tuning- %ED%95%A0-%EC%88%98-%EC%9e%88%EC%8A%B5%EB%8B%88%EB%8B%A4-WITH-PEFT-%F0%9F%A4%97 |
| Tuning awalan: Mengoptimalkan permintaan kontinu untuk generasi | https://arxiv.org/abs/2101.00190 | https://cartinoe5930.tistory.com/entry/%eb%8b%B9%EC%8B%A0%EB%8F%84-Fine-Tuning- %ED%95%A0-%EC%88%98-%EC%9e%88%EC%8A%B5%EB%8B%88%EB%8B%A4-WITH-PEFT-%F0%9F%A4%97 |
| Lora: Adaptasi Rendah dari Model Bahasa Besar | https://arxiv.org/abs/2106.09685 | https://cartinoe5930.tistory.com/entry/%eb%8b%B9%EC%8B%A0%EB%8F%84-Fine-Tuning- %ED%95%A0-%EC%88%98-%EC%9e%88%EC%8A%B5%EB%8B%88%EB%8B%A4-WITH-PEFT-%F0%9F%A4%97 |
| Menuju pandangan terpadu dari pembelajaran transfer yang efisien parameter | https://arxiv.org/abs/2110.04366 | Akan diunggah nanti! |
| Unipelt: Kerangka kerja terpadu untuk penyetelan model bahasa yang efisien parameter | https://arxiv.org/abs/2110.07577 | Akan diunggah nanti! |
| (Ia)^3: beberapa penyempurnaan parameter-efisien lebih baik dan lebih murah daripada pembelajaran dalam konteks | https://arxiv.org/abs/2205.05638 | Akan diunggah nanti! |
| Qlora: fine-tuning efisien llms kuantisasi | https://arxiv.org/abs/2305.14314 | Akan diunggah nanti! |
| Stack LEBIH BANYAK Lapisan Secara berbeda: Pelatihan peringkat tinggi melalui pembaruan peringkat rendah | https://arxiv.org/abs/2307.05695 | Akan diunggah nanti! |
| Lorahub: Generalisasi lintas tugas yang efisien melalui komposisi LORA dinamis | https://arxiv.org/abs/2307.13269 | Akan diunggah nanti! |
| Judul kertas | Tautan kertas atau situs referensi | Ulasan Kertas |
|---|---|---|
| Penambangan Instruksi: Pemilihan data instruksi berkualitas tinggi untuk model bahasa besar | https://arxiv.org/abs/2307.06290 | Tidak ada rencana! |
| Soda: Distilasi Dialog Sejuta dengan Kontekstualisasi Sosial Advonsense | https://arxiv.org/abs/2212.10465 | Tidak ada rencana! |
| Mods: Pemilihan data yang berorientasi pada model untuk penyetelan instruksi | https://arxiv.org/abs/2311.15653 | Tidak ada rencana! |
| Beyond Human Data: Meningkatkan Pelatihan Mandiri untuk Pemecahan Masalah dengan Model Bahasa | https://arxiv.org/abs/2312.06585 | Tidak ada rencana! |
| Magicoder: Kode sumber adalah semua yang Anda butuhkan | https://arxiv.org/abs/2312.02120 | Tidak ada rencana! |
| WAVECODER: Tuning Instruksi Ditingkatkan yang Tersebar | https://arxiv.org/abs/2312.14187 | Tidak ada rencana! |
| Apa yang membuat data yang baik untuk penyelarasan: Studi komprehensif pemilihan data otomatis dalam penyetelan instruksi | https://arxiv.org/abs/2312.15685 | Tidak ada rencana! |
| Judul kertas | Tautan kertas atau situs referensi | Ulasan Kertas |
|---|---|---|
| Apa 'rekayasa cepat'? | Lihat blog saya! | https://cartinoe5930.tistory.com/entry/promppt-engineering%ec%9d%B4-%EB%AC%B4%EC%97%87%EC%9D%BC%AA%B9%8C |
| COT: Rantai yang mendorong memunculkan penalaran dalam model bahasa besar | Blog: https://ai.goOgleblog.com/2022/05/language-models-perform-reasoning-via.html, kertas: https://arxiv.org/abs/2201.11903 | https://cartinoe5930.tistory.com/entry/lm%ec%9d%B4-%EC%82%AC%EB%9E%8C%EA dengan%9 >%9c-%ec%83C%A0%ACKikel %9C%EC%84%B8%EC%8A%A4%EB%A5%BC-%EA%B0%80%EC%A7%80%EA%B2%8C-%EB%90%9C%EB%8B%A4%EB%A9%B4-Chain-of-Thought-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Cot Zero-Shot: Model Bahasa Besar adalah Nol-Shot Reasoners | https://arxiv.org/abs/2205.11916 | https://cartinoe5930.tistory.com/entry/large-language-models-are-zero-shot-reasoners-%eb%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Model bahasa adalah penalaran rantai multibahasa | https://arxiv.org/abs/2210.03057 | Akan diunggah nanti! |
| Auto-Cot: Rantai pemikiran otomatis yang diminta dalam model bahasa besar | https://arxiv.org/abs/2210.03493 | Akan diunggah nanti! |
| Cot KD: Mengajar model bahasa kecil untuk bernalar | https://arxiv.org/abs/2212.08410 | Akan diunggah nanti! |
| TOT: Pohon Pikiran: Pemecahan masalah yang disengaja dengan model bahasa besar | https://arxiv.org/abs/2305.10601 | https://cartinoe5930.tistory.com/entry/tree-of-thoughts-deliberate-problem-solving-with-grange-danguage-models-%eb%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Koleksi COT: Meningkatkan pembelajaran zero-shot dan beberapa-shot dari model bahasa melalui penyesuaian rantai-dipikirkan | https://arxiv.org/abs/2305.14045 | https://cartinoe5930.tistory.com/entry/cot-collection-improving-zero-shot-and-few-shot-learning-of-language-models-via-chain-of-thought-fine-tuning-%ebeb%85%BC%AB%ACK |
| Mari kita verifikasi langkah demi langkah | https://arxiv.org/abs/2305.20050 | https://cartinoe5930.tistory.com/entry/lets-verify-step-by-step-%eb%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Mengukur kepalsuan dalam penalaran rantai-dipikirkan | https://arxiv.org/abs/2307.13702 | Akan diunggah nanti! |
| Sot: kerangka-pemikiran: model bahasa besar dapat melakukan decoding paralel | https://arxiv.org/abs/2307.15337 | Akan diunggah nanti! |
| Grafik Pikiran: Memecahkan masalah rumit dengan model bahasa besar | https://arxiv.org/abs/2308.09687 | Akan diunggah nanti! |
| Dari jarang hingga padat: ringkasan GPT-4 dengan rantai kepadatan dorongan | https://arxiv.org/abs/2309.04269 | Tidak ada rencana! |
| Rantai-verifikasi meresuksi halusinasi dalam model bahasa besar | https://arxiv.org/abs/2309.11495 | https://www.youtube.com/watch?v=l0zfjwreg &pp=ygugahr0chm6ly9hcnhpdi5vcmcvywjzlzizmdkumte0otu%3D |
| Rantai-rantai yang dikontrol | https://arxiv.org/abs/2311.09277 | Tidak ada rencana! |
| Thread of Thought Unraveling Chaotic Context | https://arxiv.org/abs/2311.08734 | Tidak ada rencana! |
| Sistem 2 Perhatian (adalah sesuatu yang mungkin Anda butuhkan juga) | https://arxiv.org/abs/2311.11829 | Tidak ada rencana! |
| Rantai Kode: Penalaran dengan model kode augmmented model bahasa | https://arxiv.org/abs/2312.04474 | Tidak ada rencana! |
| Judul kertas | Kertas | Ulasan Kertas |
|---|---|---|
| Flashattention: Perhatian yang cepat dan efisien memori | https://arxiv.org/abs/2205.14135 | https://gordicalekssa.medium.com/eli5-flash-attention-5c44017022ad |
| Pemodelan bahasa yang lebih cepat secara eksponensial | https://arxiv.org/abs/2311.10770 | Tidak ada rencana! |
| LLM dalam flash: inferensi model bahasa besar yang efisien dengan memori terbatas | https://arxiv.org/abs/2312.11514 | Tidak ada rencana! |
| Judul kertas | Tautan kertas atau situs referensi | Ulasan Kertas |
|---|---|---|
| Augmentasi data di NLP | Blog: https://neptune.ai/blog/data-augmentation-nlp, https://amitness.com/2020/05/data-augmentation-for-nlp/?fbclid=iwar11mkcccti-2cd93ryftnftnphb7wxdj7alzalzalzg7nnnnnntnftnphb7wxdj7alzalzalzg7nnnnny | https://cartinoe5930.tistory.com/entry/data-augmentation-methods-in-nlp |
| PET: Mengeksploitasi pertanyaan cloze untuk beberapa klasifikasi teks bidikan dan inferensi bahasa alami | https://arxiv.org/abs/2001.07676 | https://cartinoe5930.tistory.com/entry/pet-exploiting-cloze-questions-for-few-shot-text-classification-and-natural-language-inference-%eb%BC%BC%AC%B8-%AB%A6%AC%EB%B7 |
| Jalur | https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/ | https://cartinoe5930.tistory.com/entry/%eb%A7%8C%EC%95%BD-%EB%AA%A8%EB%8D%B8%EC%9D%B4-%AC%97%AC%9F%AC-%AA%B4%9090909097%97%AC%9F%9F%AC-%AA dengan%AA0909090909090909090909099997 B0%81%EC%9d%84-%EB%8A%90%EB%82%84-%EC%88%98-%EC%9e%88%EA%B2%8C-%EB%90%9c%EB%8B%A4%EB%A9%B4-PATHWAYS-%EB%eb%eb%B%B%b4%B%b4-PATH-PATHS-%EB%A6%EB%B%B%b4%B0%B4-PATHWay |
| LMSI: Model bahasa besar dapat meningkatkan diri sendiri | https://arxiv.org/abs/2210.11610 | https://cartinoe5930.tistory.com/entry/lmsi-large-language-models-can-self-improve-%eb%85%bc%eb 0 |
| Mandiri: Model Bahasa Menyelaraskan dengan Instruksi yang Dibuat sendiri | https://arxiv.org/abs/2212.10560 | https://cartinoe5930.tistory.com/entry/self-instruct-ligning-libanguage-model-with-self-ernerated-Instructions-%EB%B7%BC%BB0 |
| Refleksi: Agen bahasa dengan pembelajaran penguatan verbal | https://arxiv.org/abs/2303.11366 | https://cartinoe5930.tistory.com/entry/reflexion-language-agents-with-verbal-reinforcement-learning-%eb%85%BC%BB0 |
| Mandiri diri: penyempurnaan berulang dengan umpan balik diri | https://arxiv.org/abs/2303.17651 | https://cartinoe5930.tistory.com/entry/self-refine-iterative-refinement-with-self-feedback-%eb%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Refiner: Penalaran Umpan Balik tentang Representasi Menengah | https://arxiv.org/abs/2304.01904 | Tidak ada rencana! |
| Selfee: LLM mandiri berulang yang dikeluarkan oleh generasi self-feedback | https://kaistai.github.io/selfee/ | https://cartinoe5930.tistory.com/entry/selfee-iterative-self-revising-llm-expowered-by-self-feedback-generation-%EB%85%BC%EB%AC%B8-%AB%A6%AC%B7%B0 |
| GQA: Pelatihan Model Transformator Multi-Query Generalized dari Multi-Head Checkpoints | https://arxiv.org/abs/2305.13245 | https://aliissa99.medium.com/-a596e4d86f79 |
| Shpherd: Seorang kritikus untuk generasi model bahasa | https://arxiv.org/abs/2308.04592 | Akan diunggah nanti! |
| Penyelarasan diri dengan translanslasi instruksi | https://arxiv.org/pdf/2308.06259 | Akan diunggah nanti! |
| Sekrup: Kerangka kerja modular untuk penalaran dengan revisi | https://arxiv.org/pdf/2309.13075 | Tidak ada rencana! |
| Neftune: Embeddings yang bising meningkatkan fermorsi instruksi | https://arxiv.org/abs/2310.05914 | https://cartinoe5930.tistory.com/entry/noise-makes-llm-better-neftune-%f0%9f%98%89 |
| Model bahasa adalah Super Mario; Menyerap kemampuan dari model homolog sebagai makan siang gratis | https://arxiv.org/abs/2311.03099 | Tidak ada rencana! |
| Loramoe: Merevolusi campuran ahli untuk mempertahankan pengetahuan dunia dalam perataan model bahasa | https://arxiv.org/abs/2312.09979 | Tidak ada rencana! |
| Judul kertas | Tautan kertas atau situs referensi | Ulasan Kertas |
|---|---|---|
| Generasi Pengambilan-Agung untuk Tugas NLP yang intensif pengetahuan | https://arxiv.org/abs/2005.11401 | Tidak ada rencana! |
| Self-Rag: Belajar mengambil, menghasilkan, dan mengkritik melalui refleksi diri | https://arxiv.org/abs/2310.11511 | Tidak ada rencana! |
| Instructretro: Instruction Tuning Post Retrieval-Agusted Pretraining | https://arxiv.org/abs/2310.07713 | Tidak ada rencana! |
| Generasi Pengambilan-Pengambilan untuk Model Bahasa Besar: Survei | https://arxiv.org/abs/2312.10997 | Tidak ada rencana! |
| Judul kertas | Tautan kertas atau situs referensi | Ulasan Kertas |
|---|---|---|
| Big Bighing Hard: Menantang Tugas Big-Bench Dan Apakah Rantai Pikiran Dapat Memecahkan Tham | https://arxiv.org/abs/2210.09261 | Akan diunggah nanti! |
| Model bahasa besar bukanlah evaluator yang adil | https://arxiv.org/abs/2305.17926 | Akan diunggah nanti! |
| MT-Bench: Menilai LLM-AS-A-Hakim dengan Mt-Bench | https://arxiv.org/abs/2306.05685 | Akan diunggah nanti! |
| InstructEval: Towards Holistic Evaluation of Instruction-Tuned Large Language Models | https://arxiv.org/abs/2306.04757 | Will be uploaded later! |
| FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets | https://arxiv.org/abs/2307.10928 | Will be uploaded later! |
| GAIA: A Benchmark for General AI Assistants | https://arxiv.org/abs/2311.12983 | No plan! |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| A Length-Extrapolatable Transformer | https://arxiv.org/abs/2212.10554 | No plan! |
| Extending Context Window of Large Language Models via Positional Interpolation | https://arxiv.org/abs/2306.15595 | https://cartinoe5930.tistory.com/entry/LM%EC%9D%98-context-window-%EA%B8%B8%EC%96%B4%EC%95%BC-%ED%95%A0%EA%B9%8C-%EC%A7%A7%EC%95%84%EC%95%BC-%ED%95%A0%EA%B9%8C-%F0%9F%93%8F%F0%9F%A4%A8 |
| LongNet: Scaling Transformers to 1,000,000,000 Tokens | https://arxiv.org/abs/2307.02486 | https://cartinoe5930.tistory.com/entry/LM%EC%9D%98-context-window-%EA%B8%B8%EC%96%B4%EC%95%BC-%ED%95%A0%EA%B9%8C-%EC%A7%A7%EC%95%84%EC%95%BC-%ED%95%A0%EA%B9%8C-%F0%9F%93%8F%F0%9F%A4%A8 |
| Lost in the Middle: How Language Models Use Long Contexts | https://arxiv.org/abs/2307.03172 | https://cartinoe5930.tistory.com/entry/LM%EC%9D%98-context-window-%EA%B8%B8%EC%96%B4%EC%95%BC-%ED%95%A0%EA%B9%8C-%EC%A7%A7%EC%95%84%EC%95%BC-%ED%95%A0%EA%B9%8C-%F0%9F%93%8F%F0%9F%A4%A8 |
| YaRN: Efficient Context Window Extension of Large Language Models | https://arxiv.org/abs/2309.00071 | No plan! |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| Why can GPT learn in-context? | https://arxiv.org/abs/2212.10559 | https://cartinoe5930.tistory.com/entry/Why-can-GPT-learn-in-context-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Sparks of Artificial General Intelligence: Early experiments with GPT-4 | paper: https://arxiv.org/abs/2303.12712, youtube: https://www.youtube.com/watch?v=Mqg3aTGNxZ0 | https://cartinoe5930.tistory.com/entry/Sparks-of-Artificial-General-Intelligence-Early-experiments-with-GPT-4-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| The False Promise of Imitating Proprietary LLMs | https://arxiv.org/abs/2305.15717 | https://cartinoe5930.tistory.com/entry/%EA%B8%B0%EC%A1%B4-imitation-model%EC%9D%80-%EC%9E%98%EB%AA%BB-%ED%95%99%EC%8A%B5%EB%90%98%EA%B3%A0-%EC%9E%88%EB%8B%A4-%F0%9F%AB%A2-The-False-Promise-of-Imitating-Proprietary-L |
| TULU: How Far Can Camels Go? Exploring the State of Instructiopn Tuning on Open Resources | https://arxiv.org/abs/2306.04751 | Will be uploaded later! |
| How Is ChatGPT's Behavior Changing over Time? | https://arxiv.org/abs/2307.09009 | https://cartinoe5930.tistory.com/entry/ChatGPT%EC%9D%98-%EC%84%B1%EB%8A%A5%EC%9D%B4-%EC%95%88-%EC%A2%8B%EC%95%84%EC%A7%80%EA%B3%A0-%EC%9E%88%EB%8B%A4%EA%B5%AC-%F0%9F%98%B2%F0%9F%98%B2 |
| Large Language Models Cannot Self-Correct Reasoning Yet | https://arxiv.org/abs/2310.01798 | |
| How Far Are Large Language Models from Agents with Theory-of-Mind | https://arxiv.org/pdf/2310.03051 | No plan! |
| Can LLMs Follow Simple Rules | https://arxiv.org/abs/2311.04235 | https://www.youtube.com/watch?v=CY6o43037OY |
| Camels in a Changing Climate; Enhancing LM Adaptation with Tulu 2 | https://arxiv.org/abs/2311.10702 | No plan! |
| ChatGPT's One-year Anniversary; Are Open-Source Large Language Models Catching up | https://arxiv.org/abs/2311.15653 | No plan! |
| An In-depth Look at Gemini's Language Abilities | https://arxiv.org/abs/2312.11444 | No plan! |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature | https://arxiv.org/abs/2301.11305 | https://cartinoe5930.tistory.com/entry/%EC%9D%B4-%EA%B8%80%EC%9D%B4-LM%EC%9D%B4-%EB%A7%8C%EB%93%A4%EC%96%B4%EB%82%B8-%EA%B8%80%EC%9D%BC%EA%B9%8C-%EB%8F%84%EC%99%80%EC%A4%98-DetectGPT-DetectGPT-Zero-Shot-Machine-Generated-Text-Detection-using-Probability-Curvature-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback | https://arxiv.org/abs/2302.12813 | https://cartinoe5930.tistory.com/entry/ChatGPT%EC%9D%98-hallucination-%EC%96%B4%EB%96%BB%EA%B2%8C-%ED%95%B4%EA%B2%B0%ED%95%B4%EC%95%BC-%ED%95%A0%EA%B9%8C-Check-Your-Facts-and-Try-Again-Improving-Large-Language-Models-with-External-Knowledge-and-Automated-Feedback |
| RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text | https://arxiv.org/abs/2305.13304 | https://cartinoe5930.tistory.com/entry/ChatGPT%EC%97%90-%EB%B0%98%EB%B3%B5-%EB%A9%94%EC%BB%A4%EB%8B%88%EC%A6%98LSTM%EC%9D%84-%EC%82%AC%EC%9A%A9%ED%95%9C%EB%8B%A4%EB%A9%B4-RecurrentGPT-Interactive-Generation-of-Arbitrarily-Long-Text-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Large Language Models as Tool Makers | https://arxiv.org/abs/2305.17126 | https://cartinoe5930.tistory.com/entry/LM%EC%9D%B4-%EB%8F%84%EA%B5%AC%EB%A5%BC-%EC%82%AC%EC%9A%A9%ED%95%98%EA%B2%8C-%EB%90%9C%EB%8B%A4%EB%A9%B4-%F0%9F%94%AC-Large-Language-Models-as-Tool-Makers-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion | https://arxiv.org/abs/2306.02561 | No plan! |
| Distilasi pengetahuan model bahasa besar | https://arxiv.org/abs/2306.08543 | https://cartinoe5930.tistory.com/entry/KD%EC%97%90-%EC%82%B4%EC%A7%9D%EC%9D%98-%EB%B3%80%ED%99%94%EB%A5%BC-%EC%A4%98%EB%B3%B4%EC%9E%90-%F0%9F%98%9C-Knowledge-Distillation-of-Large-Language-Models-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Scaling Relationship on Learning Mathematical Reasoning with Large Language Models | https://arxiv.org/abs/2308.01825 | Will be uploaded later! |
| ToolLLM: Facilitating Lare Language Models to Master 16000+ Real-World APIs | https://arxiv.org/abs/2307.16789 | Will be uploaded later! |
| SelfCheck: Using LLMs to Zero-shot Check Their Own Step-by-Step Reasoning | https://arxiv.org/abs/2308.00436 | Will be uploaded later! |
| Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification | https://arxiv.org/abs/2308.07921 | Will be uploaded later! |
| Large Language Models as Optimizers | https://arxiv.org/abs/2309.03409 | No plan! |
| FIAT: Fusing Learning Paradigms with Instruction-Accelerated Tuning | https://arxiv.org/abs/2309.04663 | https://www.youtube.com/watch?v=EZsZEcRDte0&pp=ygUgaHR0cHM6Ly9hcnhpdi5vcmcvYWJzLzIzMDkuMDQ2NjM%3D |
| Contrastive Decoding Improves Reasoning in Large Language Models | https://arxiv.org/abs/2309.09117 | https://www.youtube.com/watch?v=nMR56TkwC1Q&pp=ygUgaHR0cHM6Ly9hcnhpdi5vcmcvYWJzLzIzMDkuMDkxMTc%3D |
| Think before you speak: Training Language Models with Pause Tokens | https://arxiv.org/abs/2310.02226 | https://www.youtube.com/watch?v=MtJ1jacr_yI |
| Large Language Models Can Learn Rules | https://arxiv.org/abs/2310.07064 | No plan! |
| In-context Pretraining: Language Modeling Beyond Document Boundaries | https://arxiv.org/abs/2310.10638 | https://www.youtube.com/watch?v=GI-0lAaILrU |
| Learning From Mistakes Makes LLM Better Reasoner | https://arxiv.org/abs/2310.20689 | No plan! |
| Language Models can be Logical Solvers | https://arxiv.org/abs/2311.06158 | No plan! |
| MART: Improving LLM Safety with Multi-round Automatic Red-Teaming | https://arxiv.org/abs/2311.07689 | No plan! |
| Fine-tuning Language Models for Factuality | https://arxiv.org/abs/2311.08401 | No plan! |
| Positional Description Matters for Transformers Arithmetic | https://arxiv.org/abs/2311.14737 | No plan! |
| Weak-to-Strong Generalization: Eliciting Strong Capabilities with Weak Supervision | https://arxiv.org/abs/2312.09390 | https://openai.com/research/weak-to-strong-generalization |
| TinyGSM: achieving higher than 80 percentage on GSM8k with small language models | https://arxiv.org/abs/2312.09241 | No plan! |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| Morpheme-aware Subword Tokenizer: An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks | https://arxiv.org/abs/2010.02534 | Will be uploaded later! |
| What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers | https://arxiv.org/abs/2109.04650 | Will be uploaded later! |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| history of CNN | LeNet, AlexNet, VGGNet, GoogLeNet, ResNet, ResNeXt, Sception, Mobilenet, DenseNet, EfficientNet, ConvNext | https://cartinoe5930.tistory.com/entry/CNN-network%EC%9D%98-%EC%97%AD%EC%82%AC |
| ViT: An Image Worth 16 x 16 Words: Transformers for Image Recognition at Scale | https://arxiv.org/abs/2010.11929 | https://cartinoe5930.tistory.com/entry/ViT-An-Image-Worth-16-x-16-Words-Transformers-for-Image-Recognition-at-Scale |
| Swin Transformer: Hierarchical Vision Transformer using Shifted Winodws | https://arxiv.org/abs/2103.14030 | https://cartinoe5930.tistory.com/entry/Swin-Transformer-Hierarchical-Vision-Transformer-using-Shifted-Windows-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| CLIP: Learning Transferable Visual Models From Natural Language Supervision | https://arxiv.org/abs/2103.00020 | https://cartinoe5930.tistory.com/entry/CLIP-Learning-Transferable-Visual-Models-From-Natural-Language-Supervision-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Paper Title | Paper or reference site Link | Paper Review |
|---|---|---|
| Let's learn about VLM(Visual-Language Model) | https://huggingface.co/blog/vision_language_pretraining#supporting-vision-language-models-in-%F0%9F%A4%97-transformers | https://cartinoe5930.tistory.com/entry/VLMVision-Language-Model%EC%97%90-%EB%8C%80%ED%95%B4-%EC%95%8C%EC%95%84%EB%B3%B4%EC%9E%90 |
| VisualBERT: A simple and Performant Baseline for Vision and Language | https://arxiv.org/abs/1908.03557 | https://cartinoe5930.tistory.com/entry/VisualBERT-A-Simple-and-Performant-Baseline-for-Vision-and-Language-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| ViLBERT: Pre-training Task-Agnostic Visiolinguistic Representations for Visual-and-Language Tasks | https://arxiv.org/abs/1908.02265 | https://cartinoe5930.tistory.com/entry/ViLBERT-Pretraining-Task-Agnostic-Visiolinguistic-Representations-for-Visual-and-Language-Tasks |
| LXMERT: Learning Cross-Modality Encoder Representations from Transformers | https://arxiv.org/abs/1908.07490 | https://cartinoe5930.tistory.com/entry/LXMERT-Learning-Cross-Modality-Encoder-Representations-from-Transformers-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| VL-BERT: Pre-training of Generic Visual-Linguistic Representations | https://arxiv.org/abs/1908.08530 | https://cartinoe5930.tistory.com/entry/VL-BERT-Pre-training-of-Generic-Visual-Linguistic-Representations-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| VLP: Unified Vision-Language Pre-Training for Image Captioning and VQA | https://arxiv.org/abs/1909.11059 | https://cartinoe5930.tistory.com/entry/VLP-Unified-Vision-Language-Pre-Traning-for-Image-Captioning-and-VQA-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks | https://arxiv.org/abs/2004.06165 | https://cartinoe5930.tistory.com/entry/Oscar-Object-Semantics-Aligned-Pre-training-for-Vision-Language-Tasks-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| VinVL: Revisiting Visual Representations in Vision-Language Models | https://arxiv.org/abs/2101.00529 | https://cartinoe5930.tistory.com/entry/VinVL-Revisiting-Visual-Representations-in-Vision-Language-Models-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision | https://arxiv.org/abs/2102.03334 | https://cartinoe5930.tistory.com/entry/ViLT-Vision-and-Language-Transformer-Without-Convolution-or-Region-Supervision-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision | https://arxiv.org/abs/2102.05918 | https://cartinoe5930.tistory.com/entry/ALIGN-Scaling-up-Visual-and-Vision-Language-Representation-with-Noisy-Text-Supervision-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| ALBEF: Vision and Language Representation Learning with Momentum Distillation | https://arxiv.org/abs/2107.07651 | https://cartinoe5930.tistory.com/entry/ALBEF-Vision-and-Language-Representation-Learning-with-Momentum-Distillation-%EB%85%BC%EB%AC%B8 |
| SimVLM: Simple Visual Language Model Pretraining with Weak Supervision | https://arxiv.org/abs/2108.10904 | https://cartinoe5930.tistory.com/entry/SimVLM-Simple-Visual-Language-Model-Pre-training-with-Weak-Supervision-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| VLMo: Unified Vision-Language Pre-training with Mixture-of-Modality-Experts | https://arxiv.org/abs/2111.02358 | https://cartinoe5930.tistory.com/entry/VLMo-Unified-Vision-Language-Pre-training-with-Mixture-of-Modality-Experts-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| LiT : Zero-Shot Transfer with Locked-image text Tuning | https://arxiv.org/abs/2111.07991 | https://cartinoe5930.tistory.com/entry/LiT%F0%9F%94%A5-Zero-Shot-Transfer-with-Locked-image-text-Tuning-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| FLAVA: A Foundational Language And Vision Alignment Model | https://arxiv.org/abs/2112.04482 | https://cartinoe5930.tistory.com/entry/FLAVA-A-Foundational-Language-And-Vision-Alignment-Model-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation | https://arxiv.org/abs/2201.12086 | https://cartinoe5930.tistory.com/entry/BLIP-Bootstrapping-Language-Image-Pre-training-fro-Unified-Vision-Language-Understanding-and-Generation-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| Paper or Posting Title | reference site Link | Tinjauan |
|---|---|---|
| Knowledge Distillation: Distilling the Knowledge in a Neural Network | https://arxiv.org/abs/1503.02531 | https://cartinoe5930.tistory.com/entry/Distilling-the-Knowledge-in-a-Neural-Network-%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0 |
| What is Zero-shot, One-shot, Few-shot Learning? | see my blog! | https://cartinoe5930.tistory.com/entry/Zero-shot-One-shot-Few-shot-Learning%EC%9D%B4-%EB%AC%B4%EC%97%87%EC%9D%BC%EA%B9%8C |