entity recognition datasetsのダウンロード - entity recognition datasetsセットソースコードダウンロード

entity recognition datasets

その他のソースコード

1.0.0

ダウンロード

エンティティ認識のためのデータセット

このリポジトリには、さまざまなエンティティタイプが注釈されたいくつかのドメインからのデータセットが含まれており、エンティティ認識と名前付きエンティティ認識（NER）タスクに役立ちます。

注：このリストにデータセットを積極的に追加していません。2020年以来登場しているNERデータセットが増えている可能性があります。ただし、問題やプルリクエストを介してデータセットを追加していただければ幸いです。

英語のNERのデータセット

次の表は、英語のエンティティ認識のためのデータセットのリストを示しています（他の言語のNERデータセットのリストについては、以下を参照）。データディレクトリには、ライセンスの制限が原因で共有できなかったデータセットを取得する場所に関する情報と、それらを（必要に応じて）CONLL 2003形式に変換するコードが含まれています。他の言語のNERコーパスへのリンクも以下にリストされています。

データセット	ドメイン	ライセンス	参照	可用性
CONLL 2003	ニュース	ドゥア	Sang and Meulder、2003年	見つけやすい
Nist-Ieer	ニュース	なし	NIST 1999 IE-er	NLTKデータ
MUC-6	ニュース	LDC	グリッシュマンとサンドハイム、1996年	LDC 2003T13
オントノーテ5	様々な	LDC	Weischedel et al。、2013	LDC 2013T19
BBN	様々な	LDC	Weischedel and Brunstein、2005年	LDC 2005T33
GMB-1.0.0	様々な	なし	Bos et al。、2017	http://gmb.let.rug.nl/data.php
ガム-3.1.0	ウィキ	いくつか（ * 2）	ゼルデス、2016年	ここに含まれています
ウィキゴールド	ウィキペディア	CC-by 4.0	Balasuriya et al。、2009	ここに含まれています
リッター	ツイッター	なし	Ritter et al。、2011	分割なし、トレーニング/テスト/開発
BTC	ツイッター	CC-by 4.0	Derczynski et al。、2016	ここに含まれています
WNUT17	ソーシャルメディア	CC-by 4.0	Derczynski et al。、2017	ここに含まれています
I2B2-2006	医学	ドゥア	Uzuner et al。、2007	http://www.i2b2.org
I2B2-2014	医学	ドゥア	Stubbs et al。、2015	http://www.i2b2.org
CADEC	医学	csiro	Karimi et al。、2015	http://data.csiro.au/
アネム	解剖学的	CC-SA 3.0	Ohta et al。、2012	ここに含まれています
Mitrestaurant	クエリ	なし	Liu et al。、2013a	http://groups.csail.mit.edu/sls/
ミトモビー	クエリ	なし	Liu et al。、2013b	http://groups.csail.mit.edu/sls/
MalWARETEXTDB	マルウェア	なし	Lim et al。、2017	http://www.statnlp.org/
Re3d	防衛	いくつか（ * 1）	DSTL、2017年	ここに含まれています
Sec-Filings	ファイナンス	CC-by 3.0	Alvarado et al。、2015	ここに含まれています
組み立て	ロボット工学	x	Costa et al。、2017	x
ウィキネラル	ウィキペディア	CC by-sa-nc 4.0	Tedeschi et al。、2021	https://github.com/babelscape/wikineural
MultiNerd	ウィキペディア	CC by-sa-nc 4.0	Tedeschi et al。、2022	https://github.com/babelscape/multinerd
HIPE-2022	歴史的	CC by-sa-nc 4.0	Ehrmann et al。、2022	https://github.com/hipe-eval/hipe-2022-data
音楽家	音楽	mit	Epure and Hennequin、2023	https://github.com/deezer/music-ner-eacl2023
wiesp2022-ner	天体物理学	CC by-sa-nc 4.0	Grezes et al。、2022	https://huggingface.co/datasets/adsabs/wiesp2022-ner
nne	ニュース	CC 4.0 / LDC	Ringland et al。、2019	https://github.com/nickyringland/nested_named_entities
世界中	ニュース	CC by-sa-nc 4.0	Shan et al。、2023	https://github.com/stanfordnlp/en-worldwide-newswire https://arxiv.org/abs/2404.13465

ライセンス

ライセンスに関するメモ：

（1）RE3D（ "関係とエンティティ抽出評価データセット"）には、異なるライセンスがあるいくつかのデータセットが含まれています。これらは：

CC-SA 3.0（ウィキペディアデータセット）
CC by-nc 3.0（bbc_onlineデータセット）
CC by 3.0 AU（Australian_department_of_foreign_affairsデータセット）
パブリックドメイン（US_STATE_DEPARTMENTデータセット、CENTCOMデータセット）
UK Open Governmentライセンスv3.0（uk_governmentデータセット）
Delegation_of_the_european_union_to_syria：https：//eeas.europa.eu/delegations/syria/8157/legal-notice_enを参照してください

Gum 3.1.0は3つのデータセットで構成され、CC-BY 3.0、CC-BY-SA 3.0、CC-BY-NC-SA 3.0のライセンスがあります。注釈は、CC-By 4.0でライセンスされています。

各データセットのより詳細なライセンス情報は、対応するサブディレクトリにあります。

後で... - Tabassum et al。、Stackoverflow https://cocoxu.github.io/publications/acl2020_stackoverflow_ner.pdfのコードと名前付きエンティティ認識 - litbank：https://github.com/dbamman/litbank（naactated dataset of shen、ancat dataset of shen、aan shen、naactbank） NNE：英語ニュースワイヤーのネストされた名前付きエンティティ認識のデータセット、2019年https://github.com/nickyringland/nested_named_entities -Mars Target Encyclopedia -LPSC Abstractsラベル付きデータセット：https：// Zenodo.org/Record/104848419#.W5A2 https://www.kaggle.com/dataturks/best-buy-ecommerce-ner-dataset/home- nerのエンティティの履歴書：https：//www.kaggle.com/dataturks/resume-entities-for-ner/home-数え切れないものhttps://aclanthology.org/2021.acl-long.248/

他の言語のNERのデータセット

語彙指定されたエンティティリソース

Heiner：http：//heiner.cl.uni-heidelberg.de/index.shtml
Neckar：https：//event.ifi.uni-heidelberg.de/?page_id=532#wikidata_ne_dataset

コードスイッチング

英語とスペインのツイート（Calcs 2018）：https：//code-switching.github.io/2018/; https://code-switching.github.io/2018/files/spa-eng/release.zip; http://www.aclweb.org/anthology/w18-3219
アラビア語 - エジプトのツイート（Calcs 2018）：https：//code-switching.github.io/2018/; https://code-switching.github.io/2018/files/msa-egy/arabictweetstokenassigner.zip; http://www.aclweb.org/anthology/w18-3219
Hindi-Englishソーシャルメディアテキスト：https：//github.com/silentflame/named-entity-ecognition; http://aclweb.org/anthology/w18-2405
EMNLP 2014共有タスク - コードスイッチツイート（ネパール英語、スペイン語 - 英語、マンダリン - 英語、アラビア語 - アラビック方言）：http：//mendlp2014.org/workshops/codeswitch/call.html

ドイツ語

Conll 2003（英語、ドイツ語）：https：//www.clips.uantwerpen.be/conll2003/ner/
Germeval 2014：https：//sites.google.com/site/germeval2014ner/data
ドイツ語のチュービンゲンツリーバンク（Tüba-d/z）：http：//www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html
欧州新聞（オランダ語、フランス語、ドイツ語）：https：//github.com/europeananewspapers/ner-corpora; http://lab.kb.nl/dataset/europeana-newspapers-ner#access
ドイツのEuroparl転写産物（サブセット）：https：//nlpado.de/~sebastian/software/ner_german.shtml
ドイツ語の名前付きエンティティモデル、政治（nemgp）：https：//www.thomas-zastrow.de/nlp/
ウィキナー：https：//figshare.com/articles/learning_multilingual_named_entity_recognition_from_wikipedia/5462500
Wikineural：https：//github.com/babelscape/wikineural
Multinerd：https：//github.com/babelscape/multinerd
DFKI SmartData Corpus（地理学）：https：//dfki-lt-re-group.bitbucket.io/smartdata-corpus/ Gabryszak、Leonhard Hennig、2018年の議事録）
dbpedia抽象コーパス（英語、ドイツ語、オランダ語、フランス語、イタリア語、日本語）：http：//downloads.dbpedia.org/2015-04/ext/nlp/abstracts/
DAWTデータセット - 複数の言語（英語、スペイン語、フランス語、イタリア語、ドイツ語、アラビア語）にわたって密集した注釈付きウィキペディアテキスト：https：//github.com/klout/opendata/tree/master/wiki_annotation
Elena Leitner、Georg Rehm、Juliデータ：https：//github.com/elenanereiss/legal-entity-ecognition
HIPE-2022、名前付きエンティティ認識と多言語の歴史的文書にリンクされているエンティティ：https：//hipe-eval.github.io/phe-2022/ https://github.com/hipe-eval/phe-2022-data

オランダ語

Conll 2002（スペイン語、オランダ語）：https：//www.clips.uantwerpen.be/conll2002/ner/
欧州新聞（オランダ語、フランス語、ドイツ語）：https：//github.com/europeananewspapers/ner-corpora; http://lab.kb.nl/dataset/europeana-newspapers-ner#access
その間のコーパス（パラレルコーパス：英語、スペイン語、イタリア語、オランダ）：http：//www.newsreader-project.eu/results/data/wikinews/
ウィキナー：https：//figshare.com/articles/learning_multilingual_named_entity_recognition_from_wikipedia/5462500
Wikineural：https：//github.com/babelscape/wikineural
Multinerd：https：//github.com/babelscape/multinerd
dbpedia抽象コーパス（英語、ドイツ語、オランダ語、フランス語、イタリア語、日本語）：http：//downloads.dbpedia.org/2015-04/ext/nlp/abstracts/
オランダ議会文書2015-2016、1848.NL（ジョンカーズ、カエルを使用したオランダの議会文書に関するエンティティ認識、2016年、アムステルダム大学を使用したオランダの議会文書に関するエンティティ認識）：https：//github.com/poezedoez/ner/blob/master/data/lobby/golden_stand
ソナー1-デスメットとホステ、ファイングレインのオランダ人のエンティティ認識、2014年（クラスの階層）
Corpus-sonar Books and Corpus Gutenberg Dutch：http：//blog.namescape.nl/?page_id=85; http://portal.clarin.nl/node/1940

アフリカーンス

nchlt afrikaansはエンティティの注釈付きコーパスを指名しました：https：//repo.sadilar.org/handle/20.500.12185/299

スペイン語

Conll 2002（スペイン語、オランダ語）：https：//www.clips.uantwerpen.be/conll2002/ner/
Ancora（スペイン語、カタロニア）：http：//clic.ub.edu/corpus/en
Deft Spanise TreeBank（LDC2018T01）：https：//catalog.ldc.upenn.edu/ldc2018t01
パナセア（ラボ）：http：//panacea-lr.eu/en/info-for-researchers/data-sets/dependency-parsed-corpora/dependency-lab-es
パナセア（env）：http：//panacea-lr.eu/en/info-for-researchers/data-sets/dependency-parsed-corpora/dependency-env-es
その間のコーパス（パラレルコーパス：英語、スペイン語、イタリア語、オランダ）：http：//www.newsreader-project.eu/results/data/wikinews/
ACE 2007（スペイン語とアラビア語）：https：//catalog.ldc.upenn.edu/ldc2014t18
ウィキナー：https：//figshare.com/articles/learning_multilingual_named_entity_recognition_from_wikipedia/5462500
Wikineural：https：//github.com/babelscape/wikineural
Multinerd：https：//github.com/babelscape/multinerd
http://www.grupolys.org/~marcos/pub/lrec16.tar.bz2（「文書レベルでの名前付きエンティティ認識のために、レキシコセマンチックなヒューリスティックを共同参照に組み込む」））））））
個人エンティティのコアファレンスアノテーションを備えた多言語コーパス（スペイン語、ガリシア語、ポルトガル語）：http：//gramatica.usc.es/~marcos/lrec.tar.bz2
Drugsemantics Gold Standard（Moreno et al。、Drugsemantics：スペイン語の製品特性の要約の指定されたエンティティ認識のためのコーパス、2017年）：https：//data.mendeley.com/datasets/fwc7jrc5jr/11
dbpedia抽象コーパス（英語、ドイツ語、オランダ語、フランス語、イタリア語、日本語）：http：//downloads.dbpedia.org/2015-04/ext/nlp/abstracts/
DAWTデータセット - 複数の言語（英語、スペイン語、フランス語、イタリア語、ドイツ語、アラビア語）にわたって密集した注釈付きウィキペディアテキスト：https：//github.com/klout/opendata/tree/master/wiki_annotation
cantemist（がんテキストマイニング共有タスク - エンティティ認識という名前の腫瘍） - 癌に関連する重要なタイプの概念の名前付き存在の認識、すなわちスペインの医療テキストの腫瘍形態：https：//temu.bsc.es/cantemist/

カタロニア

Ancora（スペイン語、カタロニア）：http：//clic.ub.edu/corpus/en

ガリシア語

Galian ner Corpus：https：//gramatica.usc.es/~marcos/resources/corpus_gal_nec.txt.gz
個人エンティティのコアファレンスアノテーションを備えた多言語コーパス（スペイン語、ガリシア語、ポルトガル語）：http：//gramatica.usc.es/~marcos/lrec.tar.bz2

バスク

バスクという名前のエンティティコーパス（EIEC）：http：//ixa.eus/node/4486?language=en
バスクは、名前が付けられたエンティティコーパス（ediec）を編成しました：http：//ixa.si.ehu.es/node/4485?language=en
egunkaria 2000コーパス（383 Newswire Texts）、http：//qtleap.eu/wp-content/uploads/2014/04/qtleap-2013-d5.1.pdf

ポルトガル語

Harem：https：//www.linguateca.pt/aval_conjunta/harem/harem_ing.html
Cintil Corpus：http：//cintil.ul.pt/cintilfeatures.html#corpus
ウィキナー：https：//figshare.com/articles/learning_multilingual_named_entity_recognition_from_wikipedia/5462500
Wikineural：https：//github.com/babelscape/wikineural
Multinerd：https：//github.com/babelscape/multinerd
個人エンティティのコアファレンスアノテーションを備えた多言語コーパス（スペイン語、ガリシア語、ポルトガル語）：http：//gramatica.usc.es/~marcos/lrec.tar.bz2
Bosque 8.0 Eagles Format：https：//gramatica.usc.es/~marcos/resources/corpora_flpt.tgz
Lener-Br（ブラジルの法的文書）：https：//cic.unb.br/~teodecampos/lener-br/
Paramopama：名前付きエンティティ認識のためのブラジル - ポルトガルのコーパス

フランス語

エステル：http：//catalogue.elra.info/en-us/repository/browse/elra-s0241/
エステル2：http：//catalogue.elra.info/en-us/repository/browse/elra-s0338/
etape：http：//catalogue.elra.info/en-us/repository/browse/elra-e0046/
欧州新聞（オランダ語、フランス語、ドイツ語）：https：//github.com/europeananewspapers/ner-corpora; http://lab.kb.nl/dataset/europeana-newspapers-ner#access
Quaero French Medical Corpus：https：//quaerofrenchmed.limsi.fr/
Quaero Broadcast News拡張名前付きEntity Corpus：http：//catalog.elra.info/en-us/repository/browse/elra-s0349/
Quaero Old Press Extended Adament Entity Corpus：http：//catalog.elra.info/en-us/repository/browse/elra-w0073/
ウィキナー：https：//figshare.com/articles/learning_multilingual_named_entity_recognition_from_wikipedia/5462500
wikiner-fr-gold https://arxiv.org/abs/2411.00030 https://huggingface.co/datasets/danrun/wikiner-fr-gold
Wikineural：https：//github.com/babelscape/wikineural
Multinerd：https：//github.com/babelscape/multinerd
dbpedia抽象コーパス（英語、ドイツ語、オランダ語、フランス語、イタリア語、日本語）：http：//downloads.dbpedia.org/2015-04/ext/nlp/abstracts/
DAWTデータセット - 複数の言語（英語、スペイン語、フランス語、イタリア語、ドイツ語、アラビア語）にわたって密集した注釈付きウィキペディアテキスト：https：//github.com/klout/opendata/tree/master/wiki_annotation
CAP 2017-（Twitter Data）、Lopez et al。、Cap 2017 Challenge：Twitterという名前のEntity認識、2017年：http：//cap2017.imag.fr/competition.html
HIPE-2022、名前付きエンティティ認識と多言語の歴史的文書にリンクされているエンティティ：https：//hipe-eval.github.io/phe-2022/ https://github.com/hipe-eval/phe-2022-data

イタリア語

種類：https：//github.com/dhfbk/kind
Evalita：http：//www.evalita.it/2009/tasks/entity
その間のコーパス（パラレルコーパス：英語、スペイン語、イタリア語、オランダ）：http：//www.newsreader-project.eu/results/data/wikinews/
Panacea（env）：http：//panacea-lr.eu/en/info-for-researchers/data-sets/dependency-parsed-corpora/dependency-env-it
Panacea（Lab）：http：//panacea-lr.eu/en/info-for-researchers/data-sets/dependency-parsed-corpora/dependency-lab-it
ウィキナー：https：//figshare.com/articles/learning_multilingual_named_entity_recognition_from_wikipedia/5462500
Wikineural：https：//github.com/babelscape/wikineural
Multinerd：https：//github.com/babelscape/multinerd
dbpedia抽象コーパス（英語、ドイツ語、オランダ語、フランス語、イタリア語、日本語）：http：//downloads.dbpedia.org/2015-04/ext/nlp/abstracts/
DAWTデータセット - 複数の言語（英語、スペイン語、フランス語、イタリア語、ドイツ語、アラビア語）にわたって密集した注釈付きウィキペディアテキスト：https：//github.com/klout/opendata/tree/master/wiki_annotation

ルーマニア人

Ronec（Dumitrescu and Avram、Ronecの紹介-Romanianという名前のエンティティコーパス。LREC2020）。論文：https：//arxiv.org/pdf/1909.01247.pdfデータ：https：//github.com/dumitrescustefan/ronec
ルーマニアのジャーナリスティックコーパス（ROCO）：http：//metashare.elda.org/repository/browse/romanistic-corpus-roco/038baa80dc7311e5aa0b00237df3e35838381D7C022084057a018a018a018a018a0187a0187a0187a
ルーマニアバランスコーパス（ROMBAC）：http：//metashare.elda.org/repository/browse/romanian-balanced-corpus-rombac/0a7dd85edc7311e5aa0b00237df3e35873a0d66244442d94fba48c294fba48c294fba48c2948c29

ギリシャ語

Panacea（env）：http：//panacea-lr.eu/en/info-for-researchers/data-sets/dependency-parsed-corpora/dependency-env-el
パナセア（ラボ）：http：//panacea-lr.eu/en/info-for-researchers/data-sets/dependency-parsed-corpora/depenty-lab-el

ハンガリー

ハンガリー人名はエンティティコーパス：http：//rgai.inf.u-szeged.hu/index.php?lang=en&page=corpus_ne
Hunnerwiki：http：//hlt.sztaki.hu/resources/hunnerwiki.html
NYTK：https：//github.com/nytud/nytk-nerkor

チェコ

Czechという名前のEntity Corpus：http：//ufal.mff.cuni.cz/cnec
BSNLP 2017（クロアチア語、チェコ語、ポーランド語、ロシア語、スロバキア語、スロベニア、ウクライナ語）：http：//bsnlp-2017.cs.helsinki.fi/shared_task_results.html
Czeng 1.0（平行コーパス：Czech-english）：http：//ufal.mff.cuni.cz/czeng/czeng10
Pero OCR NER（チェコの歴史的OCRクロニクル）：https：//github.com/roman-janik/poner https://dspace.vut.cz/items/6092e1b0-3d75-4451-8582-28573ac30404

研磨

ポーランドのsejmコーパス：http：//clip.ipipan.waw.pl/psc
BSNLP 2017（クロアチア語、チェコ語、ポーランド語、ロシア語、スロバキア語、スロベニア、ウクライナ語）：http：//bsnlp-2017.cs.helsinki.fi/shared_task_results.html
ポリッシュコアレファレンスコーパス：http：//zil.ipipan.waw.pl/polishcoreferencecorpus
ウィキナー：https：//figshare.com/articles/learning_multilingual_named_entity_recognition_from_wikipedia/5462500
Wikineural：https：//github.com/babelscape/wikineural
Multinerd：https：//github.com/babelscape/multinerd
経済ニュースのコーパス（Cen Corpus）：http：//www.nlp.pwr.wroc.pl/narzedzia-i-zasoby/zasoby/cen
KPWR（KorpusJęZykaPolskiegoPolitechnikiwrocławskiej/Polidy Corpus ofWrocławsRecholoing of Technology）：http：//plwordnet.pwr.wroc.pl/index.php?option=com_content&view=article＆Id = 35&itemid = 181&lang=pl; http://plwordnet.pwr.wroc.pl/attachments/article/35/kpwr-1.1.7z（Broda et al。、KPWR：Free Corpus of Polish、2012）
nkjp：http：//clip.ipipan.waw.pl/nationalcorpusofpolish?action= attachfile&do=view&target=nkjp-podkorpusmilionowy-2.2.tar.gz

クロアチア語

HR500K 1.0：http：//hdl.handle.net/11356/1183
BSNLP 2017（クロアチア語、チェコ語、ポーランド語、ロシア語、スロバキア語、スロベニア、ウクライナ語）：http：//bsnlp-2017.cs.helsinki.fi/shared_task_results.html
reldi-normtagner-hr（クロアチアのツイート）：http：//hdl.handle.net/11356/1170

スロバキア

BSNLP 2017（クロアチア語、チェコ語、ポーランド語、ロシア語、スロバキア語、スロベニア、ウクライナ語）：http：//bsnlp-2017.cs.helsinki.fi/shared_task_results.html
Slovakはニュースコーパスを分類しました：https：//nlp.web.tuke.sk/pages/categorizednews

スロベニア

BSNLP 2017（クロアチア語、チェコ語、ポーランド語、ロシア語、スロバキア語、スロベニア、ウクライナ語）：http：//bsnlp-2017.cs.helsinki.fi/shared_task_results.html
SSJ500K：http：//www.slovenscina.eu/tehnologije/ucni-korpus; http://eng.slovenscina.eu/tehnologije/ucni-korpus; https://www.clarin.si/repository/xmlui/handle/11356/1029;注：v 2.2については、http：//hdl.handle.net/11356/1210を参照してください
Slovene News：http：//zitnik.si/mediawiki/index.php?title=datasets#slovene_news; http://zitnik.si/mediawiki/images/7/7d/rtvslo_dec2011.tsv; http://zitnik.si/mediawiki/images/5/5e/rtvslo_dec2011_v2.tsv
Janes-Tag 2.0（ソーシャルメディアテキスト）https://www.clarin.si/repository/xmlui/handle/11356/1123;参照：Fišeretal。、The Janes Project：Sloveneユーザー生成コンテンツの言語リソースとツール、2018。

ウクライナ人

BSNLP 2017（クロアチア語、チェコ語、ポーランド語、ロシア語、スロバキア語、スロベニア、ウクライナ語）：http：//bsnlp-2017.cs.helsinki.fi/shared_task_results.html
ウクライナの茶色のnerコーパス：https：//github.com/lang-uk/ner-uk; http://lang.org.ua/en/corpora/

セルビア人

setimes.sr -http：//hdl.handle.net/11356/1200
セルビア語の名前付きエンティティ評価コーパス：http：//www.korpus.matf.bg.ac.rs/srpneval/
reldi-normtagner-sr（セルビアのツイート）：http：//hdl.handle.net/11356/1171

ブルガリア

bltreebank（btb）

アイスランド語

Mim-Gold-ner（Ingólfsdóttir、SvanhvítLilja、SigurjónÞorsteinsson、およびHrafn Loftsson。 http://www.malfong.is/index.php?pg=mim_gold_ner

デンマーク語

Dane：Hvingelby et al。、[Dane：A Named Entity Resource for Danish
デンマークのプロップバンク（DPB）：http：//catalog.elra.info/en-us/repository/browse/elra-w0117/
arboretum treebank：http：//catalog.elra.info/en-us/repository/browse/elra-w0084/

ノルウェー語

ノルウェー語の指名されたエンティティ認識、計算言語学に関する北欧会議の議事録、Bjarte Johansen。 2019（https://www.aclweb.org/anthology/w19-6123.pdf）データ：https：//github.com/ljos/navnkjenner
Fredrikjørgensenet al。、norne：Norwegianに指名されたエンティティへの注釈、2019年（https://arxiv.org/pdf/1911.12146.pdf）。データ：https：//github.com/ltgoslo/norne/; https://www.nb.no/sprakbanken/show?serial=oai%3anb.no%3Asbr-49

スウェーデン語

ストックホルムインターネットコーパス：https：//www.ling.su.se/english/nlp/corpora-and-resources/sic
suc 3.0：https：//spraakbanken.gu.se/eng/resource/suc3
スウェーデンの手動注釈付きner：https：//github.com/klintan/swedish-ner-corpus/
医療ウィキペディアデータ（Almgren et al。、Swedish Health Records with Charture Bidelectional LSTMS、2016年のエンティティ認識）：https：//github.com/olofmogren/biomedical-ner-nata-swedish
HIPE-2022、名前付きエンティティ認識と多言語の歴史的文書にリンクされているエンティティ：https：//hipe-eval.github.io/phe-2022/ https://github.com/hipe-eval/phe-2022-data

フィンランド語

フィンランドの名前付きエンティティのレコギニションのデータセット：https：//github.com/mpsilfve/finer-data
Turku Ner Corpus：https：//github.com/turkunlp/turku-ner-corpus
HIPE-2022、名前付きエンティティ認識と多言語の歴史的文書にリンクされているエンティティ：https：//hipe-eval.github.io/phe-2022/ https://github.com/hipe-eval/phe-2022-data

エストニアン

Estonian Ner Corpus：https：//metashare.ut.ee/repository/browse/estonian-ner-corpus/88d030c0acde11e2a2a6e4005056b0024f1def472ed254e77a8952e1003d9d9f81e////

ラトビア人とリトアニア人

https://github.com/accurat-toolkit/tildener/tree/master/test（pinnis、latvian and lithuanianという名前のエンティティ認識、Tildener、LREC 2012）
LV Taggerのトレーニングデータ：https：//github.com/peterisp/lvtagger/tree/master/nertrainingdata

トルコ語

K̈ucukand Can、名前付きエンティティ認識とスタンス検出に注釈が付けられたツイートデータセット、2019：https：//github.com/dkucuk/tweet-dataset-ner-sd
K̈ucuket al。、Turkish Tweetsの名前付きエンティティ認識：http：//optima.jrc.it/resources/2014_jrc_twitter_tr_ner-dataset.zip
英語/トルコ語のウィキペディア名の認識とテキスト分類データセット（http://arxiv.org/abs/1702.02363）：https://data.mendeley.com/datasets/cdcztymf4k/11
choban et al、fbnerよりもエンティティ認識という名前：トルコ語の新しいFacebookデータセット：https：//ieeexplore.ieee.org/document/9598971リクエストに応じて利用可能なデータ

カザフ

Kaznerd：https：//arxiv.org/pdf/2111.13419.pdf、https://github.com/is2ai/kaznerd

uyghur

Uyghurという名前のエンティティ関係コーパス：https：//github.com/kaharjan/uynerel（Abiderexiti et al。、Annotation Schemes for Annotation Schemes for Annotation relationcorpus。IALP2016）

アルメニア人

パイオナー（ゴールドスタンダードとシルバースタンダードのデータセット）：https：//github.com/ispras-texterra/pioner（Ghukasyan et al。、Paioner：Armenian named Entity認識のデータセットとベースライン、2018）
armtdp-ner：https：//github.com/myavrum/armtp-ner

コプト

コプトユニバーサル依存界のツリーバンク：https：//github.com/universaldependencies/ud_coptic-scriptorium/tree/dev（https://copticscriptorium.org/treebank.htmlも参照）。これには、Sahidic Copticテキストからのネストされた（非）名前付きエンティティの46,000トークンが含まれています。

アムハラ語

コーパスを言う（「ディープラーニングを使用したAmharicの名前付きエンティティ認識」を参照）：https：//github.com/geezorg/data/tree/master/amharic/tagged/nmsu-say; http://data.geez.org/

アラビア語

Aqmar Arabic Wikipediaという名前のエンティティコーパス：http：//www.cs.cmu.edu/~ark/arabicner/
NE3Lという名前のエンティティアラビアコーパス（アラビア語、中国、ロシア語）：http：//catalog.elra.info/en-us/repository/browse/elra-w0078/
反射エンティティの翻訳（並列コーパス：英語、アラビア語、中国語）：https：//catalog.ldc.upenn.edu/ldc2009t11
Anercorp：http：//users.dsic.upv.es/~ybenajiba/downloads.html（http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.htmlも参照））
ACE 2003（英語、中国語、アラビア語）：https：//catalog.ldc.upenn.edu/ldc2004t09
ACE 2004（英語、中国語、アラビア語）：https：//catalog.ldc.upenn.edu/ldc2005t09
ACE 2005（英語、中国語、アラビア語）：https：//catalog.ldc.upenn.edu/ldc2006t06
ACE 2007（スペイン語とアラビア語）：https：//catalog.ldc.upenn.edu/ldc2014t18
Ontonotes 5（英語、アラビア語、中国語）：https：//catalog.ldc.upenn.edu/ldc2013t19
DAWTデータセット - 複数の言語（英語、スペイン語、フランス語、イタリア語、ドイツ語、アラビア語）にわたって密集した注釈付きウィキペディアテキスト：https：//github.com/klout/opendata/tree/master/wiki_annotation
Wojood -2022ネストされたアラビア語という名前のエンティティコーパス。 https://dlnlp.ai/st/wojood/ https://aclanthology.org/2022.lrec-1.387.pdf https://codalab.lisn.upsaclay.fr/competitions/11740

ペルシャ語

armanpersonercorpus：http：//islrn.org/resources/399-379-640-828-6/; https://github.com/haniehp/persianner

シンディ

SINER：https：//aclanthology.org/2020.lrec-1.361/、https://github.com/aliwazir/siner-dataset

ウルドゥー語

IJCNLP 2008 SSEAL：http：//ltrc.iiit.ac.in/ner-ssea-08/index.cgi?topic = 5
Uner Dataset（Khan et al。、Urdu Named Entity認識タスクの名前付きEntity Dataset、2016）。 http://www.iiu.edu.pk/?page_id=5181で入手可能
mk-pucit：https：//www.dropbox.com/sh/1ivw7ykm2tugg94/aab9t5wnn7fynespo7tjjw8la;参照：Kanwal et al。、Urduという名前のエンティティ認識：Corpus Generationand Deep Learningアプリケーション、2019年

インド

NAAMAPADAM：2つの言語家族からの11の主要なインド言語の名前付きエンティティ認識（NER）データセット。 https://research.ibm.com/publications/naamapadam-a-large-scale-named-named-Annotated-data-for-indic-languages https://ai4bharat.iitm.ac.in/naamapadam

ヒンディー語

HINER：https：//github.com/cfiltnlp/hiner
ヒンディー語の健康データセット：https：//www.kaggle.com/aijain/hindi-health-dataset/home
Fire 2015、ESM-IL（英語、ヒンディー語、タミル、マラヤラム）：http：//au-kbc.org/nlp/esm-fire2015/#traincorpus
Fire Ner 2013（英語、ヒンディー語、タミル、マラヤラム、ベンガル語）：http：//au-kbc.org/nlp/ner-fire2013/
IJCNLP 2008 SSEAL：http：//ltrc.iiit.ac.in/ner-ssea-08/index.cgi?topic = 5

ベンガル語

Fire Ner 2013（英語、ヒンディー語、タミル、マラヤラム、ベンガル語）：http：//au-kbc.org/nlp/ner-fire2013/
IJCNLP 2008 SSEAL：http：//ltrc.iiit.ac.in/ner-ssea-08/index.cgi?topic = 5
Bengali-ner：https：//github.com/rifat1493/bengali-ner、https://ieeexplore.ieeee.org/document/8944804
ner-bangla：https：//github.com/misabic/ner-bangla-dataset、https：//content.iospress.com/articles/journal-of-intelligent-and-fuzzy-systems/ifs179349

テルグ語

ner_telugu：https：//github.com/anikethjr/ner_telugu
IJCNLP 2008 SSEAL：http：//ltrc.iiit.ac.in/ner-ssea-08/index.cgi?topic = 5
名前付きエンティティはテルグ語に注釈を付けました：http：//www.tdil-dc.in/index.php?option=com_download&task = showresourcedetails&toolid=982&lang=en

マイチリ

Maithiliの最初の名前のエンティティ認識者：リソース作成とシステム開発：https：//content.iospress.com/articles/journal-of-intelligent-and-fuzzy-systems/ifs210051

ネパール

Everestner：https：//journals.flvc.org/flairs/article/view/130725、https://github.com/nowalab/everest-ner

マラーティー

名前付きエンティティはマラーティー語に注釈されたコーパス：http：//www.tdil-dc.in/index.php?option=com_download&task = showresourcedetails&toolid=979&lang=en
L3Cube Mahaner：https：//arxiv.org/abs/2204.06029 https://github.com/l3cube-pune/marathinlp

パンジャブ

パンジャブ語の名前付きエンティティ注釈付きコーパス：http：//www.tdil-dc.in/index.php?option=com_download&task=showresourcedetails&toolid=980&lang=en

タミル語

Fire 2015、ESM-IL（英語、ヒンディー語、タミル、マラヤラム）：http：//au-kbc.org/nlp/esm-fire2015/#traincorpus
Fire Ner 2013（英語、ヒンディー語、タミル、マラヤラム、ベンガル語）：http：//au-kbc.org/nlp/ner-fire2013/

マラヤーラム語

Fire 2015、ESM-IL（英語、ヒンディー語、タミル、マラヤラム）：http：//au-kbc.org/nlp/esm-fire2015/#traincorpus
Fire Ner 2013（英語、ヒンディー語、タミル、マラヤラム、ベンガル語）：http：//au-kbc.org/nlp/ner-fire2013/

オリヤ/オディア

IJCNLP 2008 SSEAL：http：//ltrc.iiit.ac.in/ner-ssea-08/index.cgi?topic = 5

シンハラ/シンハラ人

lorelei（ldc2018e57）

タイ

Thai-named-entity-ecognition-data：https：//github.com/pythainlp/thai-named-entity-ecognition-data
Thaiという名前のEntity Corpora：http：//pioneer.chula.ac.th/~awirote/resources/corpora--data.html; http://pioneer.chula.ac.th/~awirote/data-nutcha.zip; http://pioneer.chula.ac.th/~awirote/data-sasiwimon.zip; http://pioneer.chula.ac.th/~awirote/data-nattadaporn.zip
LST20：https：//huggingface.co/datasets/lst20; https://arxiv.org/abs/2008.05055
Thai-nner：https：//github.com/vistec-ai/thai-nner、https：//aclanthology.org/2022.findings-acl.116

インドネシア語

識別：http：//metashare.elda.org/repository/browse/identic/fed3fada7ef111e5aa3b001d8b71c66c98eee36eabd42f18ffd9a95da9104cc/
https://github.com/yohanesgultom/nlp-experiments/tree/master/data/ner
インドネシア - ナー：Syaifudin＆Nurwidyantoro https://ieeexplore.ieee.org/document/7828656 https://github.com/yusufsyaifudin/indonesia-ner
Idner-News-2K：指定されたエンティティ認識タスクに関するインドネシアのニュースのデータセット。 Syaifudin＆Nurwidyantoro https://dl.acm.org/doi/10.1145/3592854#fn8 https://github.com/khairunnisaor/idner-news-2k/
NERPとNER-GRIT：Indonlp/Indonlu https://github.com/indonlp/indonlu/tree/master/dataset https://aclanthology.org/2020.aacl-main.85/

ベトナム人

VLSP 2016：http：//vlsp.org.vn/resources-vlsp2016; https://github.com/undertheseanlp/ner
VLSP 2018：http：//vlsp.org.vn/resources-vlsp2018; https://github.com/undertheseanlp/ner
Phoner_covid19：https：//github.com/vinairesearch/phoner_covid19

日本語

IREX：https：//nlp.cs.nyu.edu/irex/package/
Met-2（日本、中国語）：https：//www-nlpir.nist.gov/related_projects/muc/
bccwj Basic Ne Corpus：https：//sites.google.com/site/projectnextnlpne/en（Iwakura et al。、さまざまなジャンルの日本の基本的な名前のエンティティコーパスの構築、ニュース2016）
dbpedia抽象コーパス（英語、ドイツ語、オランダ語、フランス語、イタリア語、日本語）：http：//downloads.dbpedia.org/2015-04/ext/nlp/abstracts/
データのデータ：Mai et al。、微細に指定されたエンティティ認識に関する実証研究、Coling 2018（英語、日本語）：https：//fgner.alt.ai/duc/ene/testsets/comp/
wikipedia ner corpus：https：//github.com/stockmarkteam/ner-wikipedia-dataset
wikiann：https：//elisa-ie.github.io/wikiann/
GSD：Megagon Labs https://github.com/megagonlabs/ud_japanese-gsdによるUD GSDデータセットの名前付きエンティティへの変換
kwdlc：京都大学のWebドキュメントがCorpus https://nlp.ist.i.kyoto-u.ac.jp/en/index.php?kwdlc https://github.com/ku-nlp/kwdlc https://nagisa.readthedocs.io.io/en/latetig

韓国語

国立韓国語研究所（ROK） - nerコーパス：https：//github.com/digitalprk/koreaner; https://ithub.korean.go.kr/user/total/referenceview.do?boardseq=5&articleseq=118＆boardgb=t＆sinsupd&boardtype=corpus
kmou ner -https：//github.com/kmounlp/ner
韓国語の理解評価-Klue ner -https：//klue-benchmark.com/tasks/69/overview/description
https://github.com/songys/entity
HLCT 2016 Corpus、更新 - https：//github.com/machinereading/koreannercorpus

中国語

ACE 2003（英語、中国語、アラビア語）：https：//catalog.ldc.upenn.edu/ldc2004t09
ACE 2004（英語、中国語、アラビア語）：https：//catalog.ldc.upenn.edu/ldc2005t09
ACE 2005（英語、中国語、アラビア語）：https：//catalog.ldc.upenn.edu/ldc2006t06
Ontonotes 5（英語、アラビア語、中国語）：https：//catalog.ldc.upenn.edu/ldc2013t19
Met-2（日本、中国語）：https：//www-nlpir.nist.gov/related_projects/muc/
反射エンティティの翻訳（並列コーパス：英語、アラビア語、中国語）：https：//catalog.ldc.upenn.edu/ldc2009t11
Ne3l名のエンティティ中国のコーパス（アラビア語、中国、ロシア語）：http：//catalogue.elra.info/en-us/repository/browse/elra-w0079/
中国語のオリジナルの短いメッセージデータ照合I（名前付きエンティティ）：http：//catalog.elra.info/en-us/repository/browse/elra-w0045_04/
中国語のオリジナルの短いメッセージデータ照合II（名前付きエンティティ）：http：//catalog.elra.info/en-us/repository/browse/elra-w0045_08/
ERE Deft Corpora（並列コーパス：英語、中国語）：Mott et al。、Parallel-English Entities、Relations and Events Corpora、2016（LDC2015E78、LDC2014E114）
中国のweibo：中国のソーシャルメディアでの名前付きおよび名目上の言及のための巧妙なスタイルの注釈（weibo）：https：//github.com/hltcoe/golden-horse
中国のエドゥナー：教育ドメインの2023データセット：https：//link.springer.com/article/10.1007/S00521-023-08635-5 https://github.com/anonymous-xl/eduner
Chinese Aerospace NER: https://www.nature.com/articles/s41598-023-50705-0 https://github.com/Coder-XIAOKAI/Aerospace_NERdatasets
SciCN: A Chinese Dataset and Benchmark for Scientific Information Extraction https://file.techscience.com/files/cmc/2024/TSP_CMC-78-3/TSP_CMC_35594/TSP_CMC_35594.pdf https://github.com/yangjingla/SciCN
EMP NER: Historical Chinese https://aclanthology.org/2024.lrec-main.35.pdf https://gitlab.com/enpchina/ENP-NER

Tagalog

TLUnifed: https://arxiv.org/abs/2311.07161 https://huggingface.co/datasets/ljvmiranda921/tlunified-ner

ロシア

BSNLP 2017 (Croatian, Czech, Polish, Russian, Slovak, Slovene, Ukrainian): http://bsnlp-2017.cs.helsinki.fi/shared_task_results.html
NE3L named entities Russian corpus (Arabic, Chinese, Russian): https://catalog.elra.info/en-us/repository/browse/ELRA-W0080/
WikiNER: https://figshare.com/articles/Learning_multilingual_named_entity_recognition_from_Wikipedia/5462500
WikiNEuRal: https://github.com/Babelscape/wikineural
MultiNERD: https://github.com/Babelscape/multinerd
factRuEval-2016: https://github.com/dialogue-evaluation/factRuEval-2016
RuREBus 2020 (Russian Relation Extraction for Business) corpus https://github.com/dialogue-evaluation/RuREBus

Yoruba

GV-Yorùbá-NER. Data: https://github.com/ajesujoba/YorubaTwi-Embedding/tree/master/Yoruba/Yor%C3%B9b%C3%A1-NER ; Data statement: https://drive.google.com/file/d/177xu-O2FTJ7VJQ-0ohCWjVd1qu61Tvml/view Paper: Jesujoba O Alabi, Kwabena Amponsah-Kaakyire, David I Adelani, and Cristina Espãna-Bonet. Massive vs. curated word embeddings for low-resourced languages. the case of Yorùbá and Twi. In LREC, 2020 (https://arxiv.org/abs/1912.02481)

Swahili

Helsinki Corpus of Swahili 2.0 (HCS 2.0) Annotated Version: http://metashare.csc.fi/repository/browse/helsinki-corpus-of-swahili-20-hcs-20-annotated-version/232c1910b9eb11e5915e005056be118e59fb2e920f1f4c0cafc94915fc6f5cac/ See: Shah et al., 2010. SYNERGY: A Named Entity Recognition System for Resource-scarce Languages such as Swahili using Online Machine Translation

イボ

IgboNER: https://aclanthology.org/2022.lrec-1.547/ https://github.com/Chiamakac/IgboNER-Models later updated in https://openreview.net/pdf?id=tHUS9-vmUfC from https://sites.google.com/view/africanlp2023/home

isiNdebele

NCHLT isiNdebele Named Entity Annotated Corpus: https://repo.sadilar.org/handle/20.500.12185/306

Xhosa

NCHLT isiXhosa Named Entity Annotated Corpus: https://repo.sadilar.org/handle/20.500.12185/312

Zulu

NCHLT isiZulu Named Entity Annotated Corpus: https://repo.sadilar.org/handle/20.500.12185/319

Sepedi

NCHLT Sepedi Named Entity Annotated Corpus: https://repo.sadilar.org/handle/20.500.12185/328

セソト

NCHLT Sesotho Named Entity Annotated Corpus: https://repo.sadilar.org/handle/20.500.12185/334

Setswana

NCHLT Setswana Named Entity Annotated Corpus: https://repo.sadilar.org/handle/20.500.12185/341

Siswati

NCHLT Siswati Named Entity Annotated Corpus: https://repo.sadilar.org/handle/20.500.12185/346

Venda

NCHLT Tshivenda Named Entity Annotated Corpus: https://repo.sadilar.org/handle/20.500.12185/355
MPHAYANER: Named Entity Recognition for Tshivenḓa: https://openreview.net/pdf?id=0nneuL3bSLt https://github.com/rendanim/MphayaNER from https://sites.google.com/view/africanlp2023/home

xitsonga

NCHLT Xitsonga Named Entity Annotated Corpus: https://repo.sadilar.org/handle/20.500.12185/362

ラテン

Herodotos Project: https://github.com/alexerdmann/Herodotos_Project_Annotation

A long list can be found here: http://damien.nouvels.net/resourcesen/corpora.html

参照

[Alvarado et al., 2015] Alvarado, Julio Cesar Salinas, Karin Verspoor, and Timothy Baldwin. Domain adaption of named entity recognition to support credit risk assessment. In Proceedings of the Australasian Language Technology Association Workshop 2015, pp. 84-90. 2015. Accessed: August 2018.

[Balasuriya et al., 2009] Balasuriya, Dominic, Nicky Ringland, Joel Nothman, Tara Murphy, and James R. Curran. Named entity recognition in wikipedia. In Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources, pp. 10-18. Association for Computational Linguistics, 2009

[Bos et al., 2017] Bos, Johan, Valerio Basile, Kilian Evang, Noortje J. Venhuizen, and Johannes Bjerva. The Groningen meaning bank. In Handbook of linguistic annotation, pp. 463-496. Springer, Dordrecht, 2017.

[Derczynski et al., 2016] Derczynski, Leon, Kalina Bontcheva, and Ian Roberts. Broad twitter corpus: A diverse named entity recognition resource. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 1169-1179. 2016. Available at: https://github.com/GateNLP/broad_twitter_corpus Accessed: August 2018.

[Derczynski et al., 2017] Leon Derczynski, Eric Nichols, Marieke van Erp, Nut Limsopatham (2017) Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition, in Proceedings of the 3rd Workshop on Noisy, User-generated Text. Available at: https://noisy-text.github.io/2017/emerging-rare-entities.html

[DSTL, 2017] Defence Science and Technology Laboratory. 2017. Relationship and Entity Extraction Evaluation Dataset. https://github.com/dstl/re3d. Accessed: January 2018.

[Grishman and Sundheim, 1996] Ralph Grishman and Beth Sundheim. 1996. Message understanding conference- 6: A brief history. In COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics.

[Karimi et al., 2015] Sarvnaz Karimi, Alejandro Metke-Jimenez, Madonna Kemp, and Chen Wang. 2015. Cadec: A corpus of adverse drug event annotations. Journal of biomedical informatics, 55:73-81. Available at https://data.csiro.au Accessed: November 2017.

[Lim et al., 2017] Lim, Swee Kiat, Aldrian Obaja Muis, Wei Lu, and Chen Hui Ong. MalwareTextDB: A database for annotated malware articles. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1557-1567. 2017年。

[Liu et al., 2013a] Jingjing Liu, Panupong Pasupat, Scott Cyphers, and Jim Glass. 2013. Asgard: A portable architecture for multilingual dialogue systems. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 8386-8390. IEEE. Available at https://groups.csail.mit.edu/sls/downloads/restaurant/ Accessed: January 2018

[Liu et al., 2013b] Jingjing Liu, Panupong Pasupat, Yining Wang, Scott Cyphers, and Jim Glass. 2013. Query understanding enhanced by hierarchical parsing structures. In Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on, pages 72-77. IEEE. Available at https://groups.csail.mit.edu/sls/downloads/movie/ We used the trivia10k13 portion. Accessed: January 2018

[NIST, 1999 IE-ER] NIST. 1999. Information Extraction - Entity Recognition Evaluation. http://www.nist.gov/speech/tests/ieer/er_99/er_99.htm. The newswire development test data only (included in the NLTK package).

[Ohta et al., 2012] Tomoko Ohta, Sampo Pyysalo, Jun'ichi Tsujii and Sophia Ananiadou. 2012. Open-domain Anatomical Entity Mention Detection. In Proceedings of ACL 2012 Workshop on Detecting Structure in Scholarly Discourse (DSSD), pp. 27-36. Available at: http://www.nactem.ac.uk/anatomy/ and https://github.com/openbiocorpora/anem Accessed: November 2017.

[Ritter et al., 2011] Alan Ritter, Sam Clark, Mausam, and Oren Etzioni. 2011. Named entity recognition in tweets: An experimental study. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1524-1534, Edinburgh, Scotland, UK., July. Association for Computational Linguistics. Accessed January 2018.

[Sang and Meulder, 2003] Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Languageindependent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003.

[Stubbs et al., 2015] Amber Stubbs and Ozlem Uzuner. 2015. Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus. Journal of biomedical informatics, 58:S20-S29. Available at https://www.i2b2.org/NLP/DataSets/ Accessed: February 2018.

[Uzuner et al., 2007] Ozlem Uzuner, Yuan Luo, and Peter Szolovits. 2007. Evaluating the state-of-the-art in automatic de-identification. Journal of the American Medical Informatics Association, 14(5):550-563. Available at https://www.i2b2.org/NLP/DataSets/ Accessed: February 2018.

[Weischedel and Brunstein, 2005] Ralph Weischedel and Ada Brunstein. 2005. BBN pronoun coreference and entity type corpus. Linguistic Data Consortium, Philadelphia.

[Weischedel et al., 2013] Weischedel, Ralph, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue et al. Ontonotes release 5.0 ldc2013t19. Linguistic Data Consortium, Philadelphia, PA (2013).

[Zeldes, 2017] Amir Zeldes. 2017. The GUM corpus: creating multilayer resources in the classroom. Language Resources and Evaluation, 51(3):581-612. Available at https://github.com/amir-zeldes/gum/tree/master/coref/tsv/ Accessed: November 2017.

拡大する

追加情報

バージョン 1.0.0
タイプその他のソースコード
更新時間 2025-04-17
サイズ 2.39MB
から Github