lm contamination下載lm contamination代碼下載

lm contamination

Ai源碼

1.0.0

下載

LM污染指數

大型語言模型看到了數万億個令牌 - 誰知道裡面有什麼？最近的作品已經在許多不同的任務上評估了這些模型，但是它們是否確保該模型尚未看到培訓甚至評估數據集？在博客文章中，我們顯示了一些流行的基準數據集已經由Chatgpt記住，並且可以提示Chatgpt再生它們。

在此存儲庫中，我們旨在收集（盡可能多地）污染證據，以向研究社區提供可靠的資源，以快速檢查該模型是否已經看到了他們的評估數據集。但是，我們意識到該指數的不完整性，因此我們要求研究人員事先進行小型污染實驗。

您可以訪問搜索工具LM污染指數

貢獻

數據集和模型的數量令人生畏。因此，我們正在設想社區的努力。如果您對NLP研究充滿熱情，並想在LLM評估中貢獻污染，請遵循貢獻指南

引用

如果您想參考這項工作，如果您引用以下內容，我們將不勝感激：

奧斯卡·塞恩斯（Oscar Sainz），喬恩·安德·坎波斯（Jon Ander Campos），伊克·加爾克·弗雷羅（IkerGarćıa-Ferrero），朱倫·埃克斯尼茲（Julen Etxaniz）和恩科·阿吉爾（Eneko Agirre）。 Chatgpt是否在您的測試中作弊？，2023年6月。 URLhttps://hitz-zentroa.github.io/lm-contamination/blog/。

 @misc { sainz2023chatgpt ,
    title = { Did ChatGPT cheat on your test? } ,
    url = { https://hitz-zentroa.github.io/lm-contamination/blog/ } , 
    author = { Sainz, Oscar and Campos, Jon Ander and García-Ferrero, Iker and Etxaniz, Julen and Agirre, Eneko } , 
    year = { 2023 } , 
    month = { Jun }
}

奧斯卡·塞恩斯（Oscar Sainz），喬恩·坎波斯（Jon Campos），伊克·加西亞·費雷羅（IkerGarcía-Ferrero），朱倫·埃特克薩茲（Julen Etxaniz），奧伊爾·洛佩茲·德·拉卡勒（Oier Lopez de Lacalle）和恩科·阿吉爾（Eneko Agirre）。 2023。麻煩中的NLP評估：需要測量每個基準測試的LLM數據污染。在計算語言學協會的發現中：EMNLP 2023，第10776–10787頁，新加坡。計算語言學協會。

 @inproceedings { sainz-etal-2023-nlp ,
    title = " {NLP} Evaluation in trouble: On the Need to Measure {LLM} Data Contamination for each Benchmark " ,
    author = " Sainz, Oscar  and
      Campos, Jon  and
      Garc{'i}a-Ferrero, Iker  and
      Etxaniz, Julen  and
      de Lacalle, Oier Lopez  and
      Agirre, Eneko " ,
    editor = " Bouamor, Houda  and
      Pino, Juan  and
      Bali, Kalika " ,
    booktitle = " Findings of the Association for Computational Linguistics: EMNLP 2023 " ,
    month = dec,
    year = " 2023 " ,
    address = " Singapore " ,
    publisher = " Association for Computational Linguistics " ,
    url = " https://aclanthology.org/2023.findings-emnlp.722 " ,
    doi = " 10.18653/v1/2023.findings-emnlp.722 " ,
    pages = " 10776--10787 " ,
    abstract = "In this position paper we argue that the classical evaluation on Natural Language Processing (NLP) tasks using annotated benchmarks is in trouble. The worst kind of data contamination happens when a Large Language Model (LLM) is trained on the test split of a benchmark, and then evaluated in the same benchmark. The extent of the problem is unknown, as it is not straightforward to measure. Contamination causes an overestimation of the performance of a contaminated model in a target benchmark and associated task with respect to their non-contaminated counterparts. The consequences can be very harmful, with wrong scientific conclusions being published while other correct ones are discarded. This position paper defines different levels of data contamination and argues for a community effort, including the development of automatic and semi-automatic measures to detect when data from a benchmark was exposed to a model, and suggestions for flagging papers with conclusions that are compromised by data contamination.",
}

展開

附加信息

版本 1.0.0
類型 Ai源碼
更新時間 2025-09-11
大小 142.73KB
來自於 Github

相關應用

GitHub sgrebnov/cordova plugin background download

2024-11-05
Wa ch ull navra maza navsacha 2 2024 ull ovie Fr e Online On Strea ings

2024-11-03
Wa ch navra maza navsacha 2 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-03
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01
LM線上表白網頁製作PHP原始碼美化版正式版

2022-11-01

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
ML stack

Ai源碼

1.0.0
awesome free chatgpt

Ai源碼

1.0.0
pywin_contextmenu

Ai源碼

Version update
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部