Unduh entity recognition datasets - Unduh Kode Sumber entity recognition datasets

entity recognition datasets

Kode sumber lainnya

1.0.0

Unduh

Dataset untuk pengakuan entitas

Repositori ini berisi dataset dari beberapa domain yang dianotasi dengan berbagai jenis entitas, berguna untuk pengenalan entitas dan tugas pengenalan entitas (NER).

Catatan: Saya tidak lagi secara aktif menambahkan set data ke daftar ini - ada kemungkinan lebih banyak kumpulan data NER yang telah muncul sejak 2020. Namun, saya senang menambahkan lebih banyak kumpulan data melalui masalah atau permintaan tarik.

Dataset untuk NER dalam bahasa Inggris

Tabel berikut menunjukkan daftar set data untuk pengakuan entitas bahasa Inggris (untuk daftar dataset NER dalam bahasa lain, lihat di bawah). Direktori Data berisi informasi tentang di mana mendapatkan set data yang tidak dapat dibagikan karena pembatasan lisensi, serta kode untuk mengonversi mereka (jika perlu) ke format CONLL 2003. Tautan ke NER Corpora dalam bahasa lain juga tercantum di bawah ini.

Dataset	Domain	Lisensi	Referensi	Ketersediaan
Conll 2003	Berita	Doa	Sang and Meulder, 2003	Mudah ditemukan
Nist-ieer	Berita	Tidak ada	NIST 1999 IE-ER	Data NLTK
MUC-6	Berita	LDC	Grishman dan Sundheim, 1996	LDC 2003t13
Ontonotes 5	Bermacam-macam	LDC	Weischedel et al., 2013	LDC 2013t19
BBN	Bermacam-macam	LDC	Weischedel dan Brunstein, 2005	LDC 2005T33
GMB-1.0.0	Bermacam-macam	Tidak ada	Bos et al., 2017	http://gmb.let.rug.nl/data.php
GUM-3.1.0	Wiki	Beberapa ( * 2)	Zeldes, 2016	✔ Termasuk di sini
Wikigold	Wikipedia	CC-BY 4.0	Balasuriya et al., 2009	✔ Termasuk di sini
Ritter	Twitter	Tidak ada	Ritter et al., 2011	Tidak ada split, kereta/tes/dev split
BTC	Twitter	CC-BY 4.0	Derczynski et al., 2016	✔ Termasuk di sini
Wnut17	Media sosial	CC-BY 4.0	Derczynski et al., 2017	✔ Termasuk di sini
I2B2-2006	Medis	Doa	Uzuner et al., 2007	http://www.i2b2.org
I2B2-2014	Medis	Doa	Stubbs et al., 2015	http://www.i2b2.org
Cadec	Medis	Csiro	Karimi et al., 2015	http://data.csiro.au/
Anem	Anatomis	CC-BY-SA 3.0	Ohta et al., 2012	✔ Termasuk di sini
Mitrestaurant	Kueri	Tidak ada	Liu et al., 2013a	http://groups.csail.mit.edu/sls/
Mitmovie	Kueri	Tidak ada	Liu et al., 2013b	http://groups.csail.mit.edu/sls/
MalwaretextDB	Malware	Tidak ada	Lim et al., 2017	http://www.statnlp.org/
re3d	Pertahanan	Beberapa ( * 1)	DSTL, 2017	✔ Termasuk di sini
Sec-filings	Keuangan	CC-BY 3.0	Alvarado et al., 2015	✔ Termasuk di sini
Perakitan	Robotika	X	Costa et al., 2017	X
Wikineural	Wikipedia	CC BY-SA-NC 4.0	Tedeschi et al., 2021	https://github.com/babelscape/wikineural
Multinerd	Wikipedia	CC BY-SA-NC 4.0	Tedeschi et al., 2022	https://github.com/babelscape/multinerd
Hipe-2022	Historis	CC BY-SA-NC 4.0	Ehrmann et al., 2022	https://github.com/hipe-eval/hipe-2022-data
Musik-ner	Musik	Mit	Epure dan Hennequin, 2023	https://github.com/deezer/music-ner-eacl2023
Wiesp2022-ner	Astrofisika	CC BY-SA-NC 4.0	Grezes et al., 2022	https://huggingface.co/datasets/adsabs/wiesp2022-ner
Nne	Berita	CC 4.0 / LDC	Ringland et al., 2019	https://github.com/nickyringland/nested_named_entities
Di seluruh dunia	Berita	CC BY-SA-NC 4.0	Shan et al., 2023	https://github.com/stanfordnlp/en-worldwide-newswire https://arxiv.org/abs/2404.13465

Lisensi

Catatan Lisensi:

(1) RE3D ("Dataset Evaluasi Ekstraksi Hubungan dan Entitas") berisi beberapa dataset, dengan lisensi yang berbeda. Ini adalah:

CC-BY-SA 3.0 (Wikipedia Dataset)
CC BY-NC 3.0 (dataset BBC_ONLINE)
CC oleh 3.0 AU (Australia_Department_of_foreign_affairs Dataset)
Domain Publik (US_State_Department Dataset, Dataset Centcom)
UK Open Government Lisensi v3.0 (UK_Government Dataset)
Delegation_of_the_european_union_to_syria: lihat https://eeas.europa.eu/delegations/syria/8157/legal-notice_en

GUM 3.1.0 terdiri dari tiga dataset, dengan lisensi CC-BY 3.0, CC-BY-SA 3.0 dan CC-BY-NC-SA 3.0. Anotasi dilisensikan berdasarkan CC-BY 4.0.

Informasi lisensi yang lebih rinci untuk setiap dataset dapat ditemukan di subdirektori yang sesuai.

Kemudian ... - Tabassum et al., Kode dan pengakuan entitas yang disebutkan di StackOverflow https://cocoxu.github.io/publications/acl2020_stackovlow_ner.pdf - litbank, https://github.com/dbamman/litbank (bamman, popat, popat, popat, popat, popat, popat, popat, popat, popat, popat, popat, popat, popat, popat, popat, popat, popat, popat, popat, popat, popat, popat, popat, popat, popat yang popat yang popat yang ”popat, popat, popat, popat yang popat yang popat yang” popat, popat, popat, popat, popat yang popat yang ”, NNE: Dataset untuk pengakuan entitas bernama bersarang dalam bahasa Inggris Newswire, 2019 https://github.com/nickyringland/nested_named_entities - mars target ensiklopedia - lpsc abstrak berlabel: https://zenodo.org/record/104841 (https://zenodo.org/record/1048419 (https://zenodo.org/record/1048419 https://www.kaggle.com/dataturks/best-buy-ecommerce-ner-dataset/home-resume entitas untuk ner: https://www.kaggle.com/dataaturks/resume-entities-for-ner/home-beberapa-nd: beberapa-shoturks bernama pengakuan beberapa-shotisi: beberapa-shot-shotisi https://aclanthology.org/2021.acl-long.248/

Dataset untuk NER dalam bahasa lain

Sumber Daya Entitas Bernama Leksikal

Heiner: http://heiner.cl.uni-heidelberg.de/index.shtml
Neckar: https://event.ifi.uni-heidelberg.de/?page_id=532#wikidata_ne_dataset

SWITCHING KODE

Tweet Inggris-Spanyol (Calcs 2018): https://code-switching.github.io/2018/; https://code-switching.github.io/2018/files/spa-eng/release.zip; http://www.aclweb.org/anthology/w18-3219
Tweet Arab-Egyptian (Calcs 2018): https://code-switching.github.io/2018/; https://code-switching.github.io/2018/files/msa-egy/arabictweetstokenAsigner.zip; http://www.aclweb.org/anthology/w18-3219
Teks Media Sosial Hindi-Inggris: https://github.com/silentflame/named-entity-cognition; http://aclweb.org/anthology/w18-2405
EMNLP 2014 Tugas Bersama-Tweet yang disapu kode (Nepali-English, Spanyol-Inggris, Mandarin-Inggris, dialek Arab-Arab): http://emnlp2014.org/workshops/codeswitch/call.html

Jerman

Conll 2003 (Bahasa Inggris, Jerman): https://www.clips.uantwerpen.be/conll2003/ner/
Germeval 2014: https://sites.google.com/site/germeval2014ner/data
Tübingen Treebank dari Jerman Tertulis (Tüba-d/Z): http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html
Europeana Newspapers (Belanda, Prancis, Jerman): https://github.com/europeaneanewspapers/ner-corpora; http://lab.kb.nl/dataset/europeana-newspapers-ner#access
Transkrip Europarl Jerman (subset): https://nlpado.de/~sebastian/software/ner_german.shtml
Model entitas bernama untuk Jerman, Politik (NEMGP): https://www.thomas-zastrow.de/nlp/
Wikiner: https://figshare.com/articles/learning_multilingual_named_entity_recognition_from_wikipedia/5462500
Wikineural: https://github.com/babelscape/wikineural
Multinerd: https://github.com/babelscape/multinerd
DFKI SmartData Corpus (geo-entitas): https://dfki-lt-re-group.bitbucket.io/smartdata-corpus/ (sebuah korpus Jerman untuk pengakuan entitas yang bernama berbutir halus, Maximile, Martin Schersersch, Veselina Mononova, Maximile, Maximile, Martin Schersersch, Veselina Mononova Mironova, Maximile, Maximile. Leonhard Hennig.
Dbpedia abstrak corpus (Inggris, Jerman, Belanda, Prancis, Italia, Jepang): http://downloads.dbpedia.org/2015-04/ext/nlp/abstracts/
DAWT Dataset - Teks Wikipedia yang beranotasi padat di berbagai bahasa (Inggris, Spanyol, Prancis, Italia, Jerman, Arab): https://github.com/klout/opendata/tree/master/wiki_annotation
Elena Leitner, Georg Rehm, Juli ́an Moreno-Schneider, Dataset Dokumen Hukum Jerman untuk Pengakuan Entitas yang Dinamai, LREC 2020: http://georg-re.hm/pdf/lrec-2020-leitner-et-al-preprint.pdf; Data: https://github.com/elenanereiss/legal-entity-recognition
HIPE-2022, pengakuan entitas bernama dan entitas yang menghubungkan dalam dokumen sejarah multibahasa: https://hipe-eval.github.io/hipe-2022/ https://github.com/hipe-eval/hipe-2022-data

Belanda

Conll 2002 (Spanyol, Belanda): https://www.clips.uantwerpen.be/conll2002/ner/
Europeana Newspapers (Belanda, Prancis, Jerman): https://github.com/europeaneanewspapers/ner-corpora; http://lab.kb.nl/dataset/europeana-newspapers-ner#access
Sementara itu Corpus (Parallel Corpus: Bahasa Inggris, Spanyol, Italia, Belanda): http://www.newsreader-project.eu/results/data/wikinews/
Wikiner: https://figshare.com/articles/learning_multilingual_named_entity_recognition_from_wikipedia/5462500
Wikineural: https://github.com/babelscape/wikineural
Multinerd: https://github.com/babelscape/multinerd
Dbpedia abstrak corpus (Inggris, Jerman, Belanda, Prancis, Italia, Jepang): http://downloads.dbpedia.org/2015-04/ext/nlp/abstracts/
Dokumen Parlemen Belanda 2015-2016, dari 1848.nl (Jonkers, dinobatkan sebagai pengakuan entitas pada dokumen parlemen Belanda menggunakan Frog, tesis, University of Amsterdam, 2016): https://github.com/poezedoez/ner/blob/master/code/datab.com
Sonar 1 - Desmet and Hoste, Belanda berbutir halus Nama Pengakuan Entitas, 2014 (Hirarki Kelas)
Buku Corpus-Sonar dan Corpus Gutenberg Belanda: http://blog.namescape.nl/?page_id=85; http://portal.clarin.nl/node/1940

Afrikanas

Nchlt Afrikaans bernama Entity Annotated Corpus: https://repo.sadilar.org/handle/20.500.12185/299

Spanyol

Conll 2002 (Spanyol, Belanda): https://www.clips.uantwerpen.be/conll2002/ner/
Ancora (Spanyol, Catalan): http://clic.ub.edu/corpus/en
Deft Spanish Treebank (LDC2018T01): https://catalog.ldc.upenn.edu/ldc2018t01
Panacea (lab): http://panacea-lr.eu/en/info-for-researchers/data-sets/dependency-parsed-corpora/dependency-lab-es
Panacea (env): http://panacea-lr.eu/en/info-for-researchers/data-sets/dependency-parsed-corpora/dependency-env-es
Sementara itu Corpus (Parallel Corpus: Bahasa Inggris, Spanyol, Italia, Belanda): http://www.newsreader-project.eu/results/data/wikinews/
ACE 2007 (Spanyol dan Arab): https://catalog.ldc.upenn.edu/ldc2014t18
Wikiner: https://figshare.com/articles/learning_multilingual_named_entity_recognition_from_wikipedia/5462500
Wikineural: https://github.com/babelscape/wikineural
Multinerd: https://github.com/babelscape/multinerd
http://www.grupolys.org/~marcos/pub/lrec16.tar.bz2 (digunakan dalam "Menggabungkan heuristik leksiko-semantik ke dalam resolusi coreference untuk pengakuan entitas yang disebutkan di tingkat dokumen")
Korpora multibahasa dengan anotasi coreferential entitas orang (Spanyol, Galicia, Portugis): http://gramatatica.usc.es/~marcos/lrec.tar.bz2
Standar Emas Obat -obatan Obat -obatan (Moreno et al., Obat -obatan: sebuah korpus untuk pengakuan entitas yang disebutkan dalam ringkasan Karakteristik Produk Spanyol, 2017): https://data.mendeley.com/datasets/fwc7jrc5jr/1
Dbpedia abstrak corpus (Inggris, Jerman, Belanda, Prancis, Italia, Jepang): http://downloads.dbpedia.org/2015-04/ext/nlp/abstracts/
DAWT Dataset - Teks Wikipedia yang beranotasi padat di berbagai bahasa (Inggris, Spanyol, Prancis, Italia, Jerman, Arab): https://github.com/klout/opendata/tree/master/wiki_annotation
Cantemist (Cancer Text Mining Tugas Bersama - Tumor Bernama Pengenalan Entitas) - Bernama pengenalan entitas dari jenis konsep kritis yang terkait dengan kanker, yaitu morfologi tumor dalam teks medis Spanyol: https://temu.bsc.es/cantemist/

Catalan

Ancora (Spanyol, Catalan): http://clic.ub.edu/corpus/en

Galicia

Galicia ner corpus: https://gramatatica.usc.es/~marcos/resources/corpus_gal_nec.txt.gz
Korpora multibahasa dengan anotasi coreferential entitas orang (Spanyol, Galicia, Portugis): http://gramatatica.usc.es/~marcos/lrec.tar.bz2

Basque

Basque bernama Entities Corpus (EIEC): http://ixa.eus/node/4486?Language=en
Basque Disambig untuk Entities Corpus (Ediec): http://ixa.si.ehu.es/node/4485?language=en
Egunkaria 2000 Corpus (383 Newswire Texts), disebutkan dalam http://qtleap.eu/wp-content/uploads/2014/04/qtleap-2013-d5.1.pdf

Portugis

Harem: https://www.linguateca.pt/aval_conjunta/harem/harem_ing.html
Cintil corpus: http://cintil.ul.pt/cintilfeatures.html#corpus
Wikiner: https://figshare.com/articles/learning_multilingual_named_entity_recognition_from_wikipedia/5462500
Wikineural: https://github.com/babelscape/wikineural
Multinerd: https://github.com/babelscape/multinerd
Korpora multibahasa dengan anotasi coreferential entitas orang (Spanyol, Galicia, Portugis): http://gramatatica.usc.es/~marcos/lrec.tar.bz2
Format Bosque 8.0 Eagles: https://gramatatica.usc.es/~marcos/resources/corpora_flpt.tgz
Lener-BR (Dokumen Hukum Brasil): https://cic.unb.br/~teodecampos/lener-r/
Paramopama: Corpus Brasil-Portugis untuk pengakuan entitas bernama

Perancis

Ester: http://catalogue.elra.info/en-us/repository/browse/elra-s0241/
Ester 2: http://catalogue.elra.info/en-us/repository/browse/elra-s0338/
Etape: http://catalogue.elra.info/en-us/repository/browse/elra-e0046/
Europeana Newspapers (Belanda, Prancis, Jerman): https://github.com/europeaneanewspapers/ner-corpora; http://lab.kb.nl/dataset/europeana-newspapers-ner#access
Quaero Prancis Medical Corpus: https://quaerofrenchmed.limsi.fr/
Berita Siaran Quaero Diperpanjang Named Entity Corpus: http://catalog.elra.info/en-us/repository/browse/elra-s0349/
Quaero Old Press Diperpanjang Entitas Corpus: http://catalog.elra.info/en-us/repository/browse/elra-w0073/
Wikiner: https://figshare.com/articles/learning_multilingual_named_entity_recognition_from_wikipedia/5462500
Wikiner-fr-gold https://arxiv.org/abs/2411.00030 https://huggingface.co/datasets/danrun/wikiner-fr-gold
Wikineural: https://github.com/babelscape/wikineural
Multinerd: https://github.com/babelscape/multinerd
Dbpedia abstrak corpus (Inggris, Jerman, Belanda, Prancis, Italia, Jepang): http://downloads.dbpedia.org/2015-04/ext/nlp/abstracts/
DAWT Dataset - Teks Wikipedia yang beranotasi padat di berbagai bahasa (Inggris, Spanyol, Prancis, Italia, Jerman, Arab): https://github.com/klout/opendata/tree/master/wiki_annotation
CAP 2017 - (Data Twitter), Lopez et al., Tantangan CAP 2017: Twitter bernama Entity Recognition, 2017: http://cap2017.imag.fr/competition.html
HIPE-2022, pengakuan entitas bernama dan entitas yang menghubungkan dalam dokumen sejarah multibahasa: https://hipe-eval.github.io/hipe-2022/ https://github.com/hipe-eval/hipe-2022-data

Italia

Kind: https://github.com/dhfbk/kind
Evalita: http://www.evalita.it/2009/tasks/entity
Sementara itu Corpus (Parallel Corpus: Bahasa Inggris, Spanyol, Italia, Belanda): http://www.newsreader-project.eu/results/data/wikinews/
Panacea (env): http://panacea-lr.eu/en/info-for-researchers/data-sets/dependency-parsed-corpora/dependency-env-it
Panacea (lab): http://panacea-lr.eu/en/info-for-researchers/data-sets/dependency-parsed-corpora/dependency-lab-it
Wikiner: https://figshare.com/articles/learning_multilingual_named_entity_recognition_from_wikipedia/5462500
Wikineural: https://github.com/babelscape/wikineural
Multinerd: https://github.com/babelscape/multinerd
Dbpedia abstrak corpus (Inggris, Jerman, Belanda, Prancis, Italia, Jepang): http://downloads.dbpedia.org/2015-04/ext/nlp/abstracts/
DAWT Dataset - Teks Wikipedia yang beranotasi padat di berbagai bahasa (Inggris, Spanyol, Prancis, Italia, Jerman, Arab): https://github.com/klout/opendata/tree/master/wiki_annotation

Rumania

Ronec (Dumitrescu dan Avram, memperkenalkan Ronec - The Romanian bernama Entity Corpus. LREC 2020). Kertas: https://arxiv.org/pdf/1909.01247.pdf Data: https://github.com/dumitrescustefan/ronec
Romanian Journalistic Corpus (Roco): http://metashare.elda.org/repository/browse/romanian-journalistic-corpus-roco/038baa80dc7311e5aa0b00847df3e3583781d7c0b0084405df3e3e3583781d7c7c0b00844012
Romanian Balanced Corpus (ROMBAC): http://metashare.elda.org/repository/browse/romanian-balanced-corpus-rombac/0a7dd85edc7311e5aa0b00237df3e35873a0d662435d42dd94fba48c29dc0065/

Orang yunani

Panacea (env): http://panacea-lr.eu/en/info-for-researchers/data-sets/dependency-parsed-corpora/dependency-env-el
Panacea (lab): http://panacea-lr.eu/en/info-for-researchers/data-sets/dependency-parsed-corpora/dependency-lab-el

Hongaria

Hungaria bernama Entity Corpora: http://rgai.inf.u-szeged.hu/index.php?lang=en&page=corpus_ne
Hunnerwiki: http://hlt.sztaki.hu/resources/hunnerwiki.html
NYTK: https://github.com/nytud/nytk-nerkor

Ceko

Ceko bernama Entity Corpus: http://ufal.mff.cuni.cz/cnec
BSNLP 2017 (Kroasia, Ceko, Polandia, Rusia, Slovakia, Slovene, Ukraina): http://bsnlp-2017.cs.helsinki.fi/shared_task_results.html
Czeng 1.0 (Corpus Paralel: Ceko-Inggris): http://ufal.mff.cuni.cz/czeng/czeng10
Pero Ocr Ner (Ceko Historis OCR Chronicles): https://github.com/roman-janik/poner https://dspace.vut.cz/items/6092e1b0-3d75-4451-8582-28573AC30404

Polandia

The Polandia Sejm Corpus: http://clip.ipipan.waw.pl/psc
BSNLP 2017 (Kroasia, Ceko, Polandia, Rusia, Slovakia, Slovene, Ukraina): http://bsnlp-2017.cs.helsinki.fi/shared_task_results.html
Polandia Coreference Corpus: http://zil.ipipan.waw.pl/polishcoreferenceCorpus
Wikiner: https://figshare.com/articles/learning_multilingual_named_entity_recognition_from_wikipedia/5462500
Wikineural: https://github.com/babelscape/wikineural
Multinerd: https://github.com/babelscape/multinerd
Corpus of Economic News (Cen Corpus): http://www.nlp.pwr.wroc.pl/narzedzia-i-zasoby/zasoby/cen
KPWR (Korpus Języka Polskiego Politechniki Wrocławskiej/Polandia Corpus dari Wrocław University of Technology): http://plwordnet.pwr.wroc.pl/index.php?option=com_content&view=article&id=35&ipid=1818181 http://plwordnet.pwr.wroc.pl/attachments/article/35/kpwr-1.1.7z (Broda et al., KPWR: Menuju korpus gratis Polandia, 2012)
NKJP: http://clip.ipipan.waw.pl/nationalcorpusofpolish?action=attachfile&do=view&target=nkjp-podkorpusmilionowy-1.2.tar.gz

Kroasia

HR500K 1.0: http://hdl.handle.net/11356/1183
BSNLP 2017 (Kroasia, Ceko, Polandia, Rusia, Slovakia, Slovene, Ukraina): http://bsnlp-2017.cs.helsinki.fi/shared_task_results.html
RELDI-NORMTAGNER-HR (Tweet Kroasia): http://hdl.handle.net/11356/1170

Slovakia

BSNLP 2017 (Kroasia, Ceko, Polandia, Rusia, Slovakia, Slovene, Ukraina): http://bsnlp-2017.cs.helsinki.fi/shared_task_results.html
Slovak Corpus Berita Yang dikategorikan: https://nlp.web.tone.sk/pages/categorizedNews

Slovene

BSNLP 2017 (Kroasia, Ceko, Polandia, Rusia, Slovakia, Slovene, Ukraina): http://bsnlp-2017.cs.helsinki.fi/shared_task_results.html
SSJ500K: http://www.slovenscina.eu/tehnologije/ucni-korpus; http://eng.slovenscina.eu/tehnologije/ucni-korpus; https://www.clarin.si/repository/xmlui/handle/11356/1029; Catatan: Untuk V 2.2 Lihat: http://hdl.handle.net/11356/1210
Slovene News: http://zitnik.si/mediawiki/index.php?title=datasets#slovene_news; http://zitnik.si/mediawiki/images/7/7d/rtvslo_dec2011.tsv; http://zitnik.si/mediawiki/images/5/5e/rtvslo_dec2011_v2.tsv
Janes-Tag 2.0 (Teks Media Sosial) https://www.clarin.si/repository/xmlui/handle/11356/1123; Lihat juga: Fišer et al., Proyek Janes: Sumber Daya Bahasa dan Alat untuk Konten yang Dibuat Pengguna Slovene, 2018.

Ukraina

BSNLP 2017 (Kroasia, Ceko, Polandia, Rusia, Slovakia, Slovene, Ukraina): http://bsnlp-2017.cs.helsinki.fi/shared_task_results.html
Ukraina Brown Ner Corpus: https://github.com/lang-uk/ner-uk; http://lang.org.ua/en/corpora/

Serbia

Setimes.sr - http://hdl.handle.net/11356/1200
Corpus Evaluasi Entitas Bernama untuk Serbia: http://www.korpus.matf.bg.ac.rs/srpneval/
Reldi-Normtagner-sr (tweet Serbia): http://hdl.handle.net/11356/1171

Bulgaria

Bultreebank (BTB)

Islandia

Mim-gold-ner (ingólfsdóttir, svanhvít lilja, sigurjón þorsteinsson, dan hrafn loftsson. "Menuju akurasi tinggi bernama pengakuan entitas untuk Islandia." Prosiding Konferensi Nordik ke-22 tentang Linguistik Komputasi. 2019): " http://www.malfong.is/index.php?pg=mim_gold_ner

Denmark

Dane: Hvingelby et al., [Dane: Sumber Daya Entitas Bernama untuk Denmark.] (Http://www.lrec-conf.org/proedings/lrec2020/pdf/2020.lrec-1.565.pdf), LREC 2020: https://github.
Denmark Propbank (DPB): http://catalog.elra.info/en-us/repository/browse/elra-w0117/
Arboretum Treebank: http://catalog.elra.info/en-us/repository/browse/elra-w0084/

Norwegia

Bjarte Johansen, pengakuan namanya-entitas untuk Norwegia, Prosiding Konferensi Nordik ke-22 tentang Linguistik Komputasi. 2019 (https://www.aclweb.org/anthology/w19-6123.pdf) Data: https://github.com/ljos/navnkjenner
Fredrik Jørgensen et al., Norne: anotasi entitas bernama Norwegian, 2019 (https://arxiv.org/pdf/1911.12146.pdf). Data: https://github.com/ltgoslo/norne/; https://www.nb.no/sprakbanken/show?serial=oai%3anb.no%3asbr-49

Swedia

Stockholm Internet Corpus: https://www.ling.su.se/english/nlp/corpora-and-sources/sic
SUC 3.0: https://spraakbanken.gu.se/eng/resource/suc3
Swedia secara manual beranotasi ner: https://github.com/klintan/swedish-ner-corpus/
Data Wikipedia Medis (Almgren et al., Bernama pengakuan entitas dalam catatan kesehatan Swedia dengan LSTMS Deep Bidirectional berbasis karakter, 2016): https://github.com/olofmogren/biomedical-ner-data-swedish
HIPE-2022, pengakuan entitas bernama dan entitas yang menghubungkan dalam dokumen sejarah multibahasa: https://hipe-eval.github.io/hipe-2022/ https://github.com/hipe-eval/hipe-2022-data

Finlandia

Kumpulan data untuk Finlandia bernama Entity Recoginition: https://github.com/mpsilfve/finer-data
Turku ner corpus: https://github.com/turkunlp/turku-ner-corpus
HIPE-2022, pengakuan entitas bernama dan entitas yang menghubungkan dalam dokumen sejarah multibahasa: https://hipe-eval.github.io/hipe-2022/ https://github.com/hipe-eval/hipe-2022-data

Estonia

Estonian Ner Corpus: https://metashare.ut.ee/repository/browse/estonian-ner-corpus/88d030c0acde11e2a6e4005056b40024f1def472ed254e77a8952e1003d9f82ed254e7a8952e1

Latvia dan Lithuania

https://github.com/accurat-toolkit/tildener/tree/master/test (Pinnis, Latvian dan Lithuanian bernama Entity Recognition dengan Tildener, LREC 2012)
Data Pelatihan untuk Tagger LV: https://github.com/peterisp/lvtagger/tree/master/nertraindata

Turki

K̈ucuk dan can, dataset tweet yang dianotasi untuk pengakuan entitas bernama dan deteksi sikap, 2019: https://github.com/dkucuk/tweet-dataset-ner-sd
K̈ucuk et al., Pengakuan entitas bernama pada tweet Turki: http://optima.jrc.it/resources/2014_jrc_twitter_tr_ner-dataset.zip
Wikipedia Inggris/Turki Dataset pengakuan dan kategorisasi teks (http://arxiv.org/abs/1702.02363): https://data.mendeley.com/datasets/cdcztymf4k/1
Çoban et al, pengakuan entitas bernama atas fbner: dataset Facebook baru di Turki: https://ieexplore.ieee.org/document/9598971 Data tersedia untuk tujuan penelitian berdasarkan permintaan

Kazakh

Kaznerd: https://arxiv.org/pdf/2111.13419.pdf, https://github.com/is2ai/kaznerd

Uyghur

Uyghur bernama entitas relasi corpus: https://github.com/kaharjan/uynerel (Abiderexiti et al., Skema anotasi untuk membangun Uyghur bernama Entity Relational Corpus. IALP 2016)

Armenia

Pioner (Gold-Standard dan Silver-Standard Datasets): https://github.com/ispras-texterra/pioner (Ghukasyan et al., Pioner: Dataset dan Baselines untuk Armenia bernama Entity Recognition, 2018)
ARMTDP-NER: https://github.com/myavrum/armtdp-ner

Koptik

Treebank Ketergantungan Universal Koptik: https://github.com/universaldependencies/ud_coptic-scriptorium/tree/dev (lihat juga https://copticscriptorium.org/treebank.html). Ini berisi 46.000 token entitas bersarang (non-) yang dinamai dan wikified dari teks Koptik Sahidic.

Amharik

Katakanlah Corpus (lihat "Pengakuan Entitas yang Dinamai untuk Amharic Menggunakan Deep Learning"): https://github.com/geezorg/data/tree/master/amharic/tagged/nmsu-say; http://data.geez.org/

Arab

Aqmar Arab Wikipedia bernama Entity Corpus: http://www.cs.cmu.edu/~ark/arabicner/
Ne3l bernama entitas corpus Arab (Arab, Cina, Rusia): http://catalog.elra.info/en-us/repository/browse/elra-w0078/
Terjemahan Entitas Refleks (Parallel Corpus: English, Arabic, China): https://catalog.ldc.upenn.edu/ldc2009t11
Anercorp: http://users.dsic.upv.es/~ybenjaban/downloads.html (lihat juga: http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html)
ACE 2003 (Bahasa Inggris, Cina, Arab): https://catalog.ldc.upenn.edu/ldc2004t09
ACE 2004 (Bahasa Inggris, Cina, Arab): https://catalog.ldc.upenn.edu/ldc2005t09
ACE 2005 (Bahasa Inggris, Cina, Arab): https://catalog.ldc.upenn.edu/ldc2006t06
ACE 2007 (Spanyol dan Arab): https://catalog.ldc.upenn.edu/ldc2014t18
Ontonotes 5 (Inggris, Arab, Cina): https://catalog.ldc.upenn.edu/ldc2013t19
DAWT Dataset - Teks Wikipedia yang beranotasi padat di berbagai bahasa (Inggris, Spanyol, Prancis, Italia, Jerman, Arab): https://github.com/klout/opendata/tree/master/wiki_annotation
WOJOOD - 2022 Nested Arab yang bernama Entity Corpus. https://dlnlp.ai/st/wojood/ https://aclanthology.org/2022.lrec-1.387.pdf https://codalab.lisn.upsaclay.fr/competitions/11740

Persia

ArmanPersonerCorpus: http://islrn.org/resources/399-379-640-828-6/; https://github.com/haniehp/persianner

Sindhi

Siner: https://aclanthology.org/2020.lrec-1.361/, https://github.com/aliwazir/siner-dataset

Urdu

IJCNLP 2008 SSEAL: http://ltrc.iiit.ac.in/ner-ssea-08/index.cgi?topic=5
Dataset Uner (Khan et al., Dataset entitas yang disebut Urdu bernama Entity Recognition Task, 2016). Tersedia di http://www.iiu.edu.pk/?page_id=5181
Mk-pucit: https://www.dropbox.com/sh/1ivw7ykm2tugg94/aab9t5wnn7fypo7tjjw8la; Lihat: Kanwal et al., Urdu bernama Entity Recognition: Corpus Generation and Deep Learning Applications, 2019

Indic

Naamapadam: Dataset Entity Recognition (NER) untuk 11 bahasa India utama dari dua keluarga bahasa. https://research.ibm.com/publications/naamapadam-a-large-scale-named-entity-annotated-data-for-indic-languages https://ai4bharat.iitm.ac.in/naamapadam

Hindi

Hiner: https://github.com/cfiltnlp/hiner
Dataset Kesehatan Hindi: https://www.kaggle.com/aijain/hindi-health-dataset/home
Fire 2015, ESM-Il (Inggris, Hindi, Tamil, Malayalam): http://au-kbc.org/nlp/esm-fire2015/#traincorpus
Fire Ner 2013 (Bahasa Inggris, Hindi, Tamil, Malayalam, Bengali): http://au-kbc.org/nlp/ner-fire2013/
IJCNLP 2008 SSEAL: http://ltrc.iiit.ac.in/ner-ssea-08/index.cgi?topic=5

Benggala

Fire Ner 2013 (Bahasa Inggris, Hindi, Tamil, Malayalam, Bengali): http://au-kbc.org/nlp/ner-fire2013/
IJCNLP 2008 SSEAL: http://ltrc.iiit.ac.in/ner-ssea-08/index.cgi?topic=5
Bengali-ner: https://github.com/rifat1493/bengali-ner, https://eeeeexplore.ieee.org/document/8944804
Ner-Bangla: https://github.com/misabic/ner-bangla-dataset, https://content.iospress.com/articles/journal-of-intelligent-and-fuzzy-systems/ifs179349

Telugu

Ner_telugu: https://github.com/anikethjr/ner_telugu
IJCNLP 2008 SSEAL: http://ltrc.iiit.ac.in/ner-ssea-08/index.cgi?topic=5
Named Entity Annotated Corpora untuk Telugu: http://www.tdil-dc.in/index.php?option=com_download&task=showResourceDetails&toolid=982&lang=en

Maithili

Pengukur Entitas Dinamai Pertama di Maithili: Penciptaan Sumber Daya dan Pengembangan Sistem: https://content.iospress.com/articles/journal-of-intelligent-and-fuzzy-systems/ifs210051

Nepal

Everestner: https://journals.flvc.org/flairs/article/view/130725, https://github.com/nowalab/everest-ner

Marathi

Named Entity Annotated Corpora for Marathi: http://www.tdil-dc.in/index.php?option=com_download&task=showResourdeTails&toolid=979&lang=en
L3Cube Mahaner: https://arxiv.org/abs/2204.06029 https://github.com/l3cube-pune/marathinlp

Punjabi

Named Entity Annotated Corpora untuk Punjabi: http://www.tdil-dc.in/index.php?option=com_download&task=showResourceDetails&toolid=980&lang=en

Tamil

Fire 2015, ESM-Il (Inggris, Hindi, Tamil, Malayalam): http://au-kbc.org/nlp/esm-fire2015/#traincorpus
Fire Ner 2013 (Bahasa Inggris, Hindi, Tamil, Malayalam, Bengali): http://au-kbc.org/nlp/ner-fire2013/

Malayalam

Fire 2015, ESM-Il (Inggris, Hindi, Tamil, Malayalam): http://au-kbc.org/nlp/esm-fire2015/#traincorpus
Fire Ner 2013 (Bahasa Inggris, Hindi, Tamil, Malayalam, Bengali): http://au-kbc.org/nlp/ner-fire2013/

Oriya/Odia

IJCNLP 2008 SSEAL: http://ltrc.iiit.ac.in/ner-ssea-08/index.cgi?topic=5

Sinhala/Sinhala

Lorelei (LDC2018E57)

Thai

thai-named-entity-fecognition-data: https://github.com/pythainlp/thai-named-entity-recognition-data
Thai bernama Entity Corpora: http://pioneer.chula.ac.th/~awirote/resources/corpora--data.html; http://pioneer.chula.ac.th/~awirote/data-nutcha.zip; http://pioneer.chula.ac.th/~awirote/data-sasiwimon.zip; http://pioneer.chula.ac.th/~awirote/data-nattadaporn.zip
LST20: https://huggingface.co/datasets/lst20; https://arxiv.org/abs/2008.05055
Thai-nner: https://github.com/vistec-ai/thai-nner, https://aclanthology.org/2022.findings-acl.116

Indonesia

Identik: http://metashare.elda.org/repository/browse/ididenc/fed3fada7ef111e5aa3b001dd8b71c66c98EeEEEEE36EAD42F18FFD9A95DA9104CC/
https://github.com/yohanesgultom/nlp-experiments/tree/master/data/ner
Indonesia-ner: Syaifudin & Nurwidyantoro https://ieexplore.ieee.org/document/7828656 https://github.com/yusufsyaifudin/indonesia-ner
IDNER-NEWS-2K: Dataset Berita Indonesia untuk Tugas Pengenalan Nama-Entitas. Reannotation dari Syaifudin & Nurwidyantoro https://dl.acm.org/doi/10.1145/3592854#fn8 https://github.com/khairunnisaor/idner-news-2k/
NERP dan NER-GRIT: Dua dataset Indonesia dari Indonlp/Indonlu https://github.com/indonlp/indonlu/tree/master/dataset https://aclanthology.org/2020.aacl-main.85/

Vietnam

VLSP 2016: http://vlsp.org.vn/resources-vlsp2016; https://github.com/undertheseanlp/ner
VLSP 2018: http://vlsp.org.vn/resources-vlsp2018; https://github.com/undertheseanlp/ner
Phoner_covid19: https://github.com/vinairesearch/phoner_covid19

Jepang

IREX: https://nlp.cs.nyu.edu/irex/package/
Met-2 (Jepang, Cina): https://www-nlpir.nist.gov/related_projects/muc/
BCCWJ BASIC NE CORPUS: https://sites.google.com/site/projectnextnlpne/en (Iwakura et al., Membangun Corpus Entitas Dasar Jepang dari berbagai genre, News 2016)
Dbpedia abstrak corpus (Inggris, Jerman, Belanda, Prancis, Italia, Jepang): http://downloads.dbpedia.org/2015-04/ext/nlp/abstracts/
Data dari: Mai et al., Sebuah studi empiris tentang pengakuan entitas bernama berbutir halus, Coling 2018 (Inggris, Jepang): https://fgner.alt.ai/duc/ene/testsets/comp/
Wikipedia ner corpus: https://github.com/stockmarkteam/ner-wikipedia-dataset
Wikiann: https://elisa-ie.github.io/wikiann/
GSD: Konversi dataset UD GSD menjadi entitas bernama oleh Megagon Labs https://github.com/megagonlabs/ud_japanese-gsd
KWDLC: Dokumen web Universitas Kyoto memimpin Corpus https://nlp.ist.i.kyoto-u.ac.jp/en/index.php?kwdlc https://github.com/ku-nlp/kwdlc https:/nagisa.readthedthedthedhedc.kwdlc https:/nagisa.readthedthedthed

Korea

Institut Nasional Bahasa Korea (ROK) - Ner Corpus: https://github.com/digitalprk/koreaner; https://ithub.korean.go.kr/user/total/referenceview.do?boardseq=5&articleSeq=11&boardgb=T&isinsupd&boardType=corpus
KMOU NER - https://github.com/kmounlp/ner
Evaluasi Pemahaman Bahasa Korea - Klue Ner - https://klue-benchmark.com/tasks/69/overview/description
https://github.com/songys/entity
HLCT 2016 Corpus, dengan pembaruan - https://github.com/machinereading/koreannercorpus

Cina

ACE 2003 (Bahasa Inggris, Cina, Arab): https://catalog.ldc.upenn.edu/ldc2004t09
ACE 2004 (Bahasa Inggris, Cina, Arab): https://catalog.ldc.upenn.edu/ldc2005t09
ACE 2005 (Bahasa Inggris, Cina, Arab): https://catalog.ldc.upenn.edu/ldc2006t06
Ontonotes 5 (Inggris, Arab, Cina): https://catalog.ldc.upenn.edu/ldc2013t19
Met-2 (Jepang, Cina): https://www-nlpir.nist.gov/related_projects/muc/
Terjemahan Entitas Refleks (Parallel Corpus: English, Arabic, China): https://catalog.ldc.upenn.edu/ldc2009t11
Ne3l bernama entitas corpus Cina (Arab, Cina, Rusia): http://catalogue.elra.info/en-us/repository/browse/elra-w0079/
Kolasi Data Message Asli I dalam bahasa Cina (entitas bernama): http://catalog.elra.info/en-us/repository/browse/elra-w0045_04/
Kolasi Data Message Singkat Asli II dalam bahasa Cina (entitas bernama): http://catalog.elra.info/en-us/repository/browse/elra-w0045_08/
ERE DEFT Corpora (Parallel Corpus: English, China): Mott et al., Paralel Entitas Cina-Inggris, Hubungan dan Acara Corpora, 2016 (LDC2015E78, LDC2014E114)
Weibo Cina: Anotasi gaya yang cekatan untuk bernama dan nominal menyebutkan media sosial Cina (Weibo): https://github.com/hltcoe/golden-horse
Eduner Cina: 2023 Dataset di Domain Pendidikan: https://link.springer.com/article/10.1007/s00521-023-08635-5 https://github.com/anonymous-xl/eduner
Chinese Aerospace NER: https://www.nature.com/articles/s41598-023-50705-0 https://github.com/Coder-XIAOKAI/Aerospace_NERdatasets
SciCN: A Chinese Dataset and Benchmark for Scientific Information Extraction https://file.techscience.com/files/cmc/2024/TSP_CMC-78-3/TSP_CMC_35594/TSP_CMC_35594.pdf https://github.com/yangjingla/SciCN
EMP NER: Historical Chinese https://aclanthology.org/2024.lrec-main.35.pdf https://gitlab.com/enpchina/ENP-NER

Tagalog

TLUnifed: https://arxiv.org/abs/2311.07161 https://huggingface.co/datasets/ljvmiranda921/tlunified-ner

Rusia

BSNLP 2017 (Croatian, Czech, Polish, Russian, Slovak, Slovene, Ukrainian): http://bsnlp-2017.cs.helsinki.fi/shared_task_results.html
NE3L named entities Russian corpus (Arabic, Chinese, Russian): https://catalog.elra.info/en-us/repository/browse/ELRA-W0080/
WikiNER: https://figshare.com/articles/Learning_multilingual_named_entity_recognition_from_Wikipedia/5462500
WikiNEuRal: https://github.com/Babelscape/wikineural
MultiNERD: https://github.com/Babelscape/multinerd
factRuEval-2016: https://github.com/dialogue-evaluation/factRuEval-2016
RuREBus 2020 (Russian Relation Extraction for Business) corpus https://github.com/dialogue-evaluation/RuREBus

Yoruba

GV-Yorùbá-NER. Data: https://github.com/ajesujoba/YorubaTwi-Embedding/tree/master/Yoruba/Yor%C3%B9b%C3%A1-NER ; Data statement: https://drive.google.com/file/d/177xu-O2FTJ7VJQ-0ohCWjVd1qu61Tvml/view Paper: Jesujoba O Alabi, Kwabena Amponsah-Kaakyire, David I Adelani, and Cristina Espãna-Bonet. Massive vs. curated word embeddings for low-resourced languages. the case of Yorùbá and Twi. In LREC, 2020 (https://arxiv.org/abs/1912.02481)

Swahili

Helsinki Corpus of Swahili 2.0 (HCS 2.0) Annotated Version: http://metashare.csc.fi/repository/browse/helsinki-corpus-of-swahili-20-hcs-20-annotated-version/232c1910b9eb11e5915e005056be118e59fb2e920f1f4c0cafc94915fc6f5cac/ See: Shah et al., 2010. SYNERGY: A Named Entity Recognition System for Resource-scarce Languages such as Swahili using Online Machine Translation

Igbo

IgboNER: https://aclanthology.org/2022.lrec-1.547/ https://github.com/Chiamakac/IgboNER-Models later updated in https://openreview.net/pdf?id=tHUS9-vmUfC from https://sites.google.com/view/africanlp2023/home

isiNdebele

NCHLT isiNdebele Named Entity Annotated Corpus: https://repo.sadilar.org/handle/20.500.12185/306

Xhosa

NCHLT isiXhosa Named Entity Annotated Corpus: https://repo.sadilar.org/handle/20.500.12185/312

Zulu

NCHLT isiZulu Named Entity Annotated Corpus: https://repo.sadilar.org/handle/20.500.12185/319

Sepedi

NCHLT Sepedi Named Entity Annotated Corpus: https://repo.sadilar.org/handle/20.500.12185/328

Sesotho

NCHLT Sesotho Named Entity Annotated Corpus: https://repo.sadilar.org/handle/20.500.12185/334

Setswana

NCHLT Setswana Named Entity Annotated Corpus: https://repo.sadilar.org/handle/20.500.12185/341

Siswati

NCHLT Siswati Named Entity Annotated Corpus: https://repo.sadilar.org/handle/20.500.12185/346

Venda

NCHLT Tshivenda Named Entity Annotated Corpus: https://repo.sadilar.org/handle/20.500.12185/355
MPHAYANER: Named Entity Recognition for Tshivenḓa: https://openreview.net/pdf?id=0nneuL3bSLt https://github.com/rendanim/MphayaNER from https://sites.google.com/view/africanlp2023/home

Xitsonga

NCHLT Xitsonga Named Entity Annotated Corpus: https://repo.sadilar.org/handle/20.500.12185/362

Latin

Herodotos Project: https://github.com/alexerdmann/Herodotos_Project_Annotation

A long list can be found here: http://damien.nouvels.net/resourcesen/corpora.html

Referensi

[Alvarado et al., 2015] Alvarado, Julio Cesar Salinas, Karin Verspoor, and Timothy Baldwin. Domain adaption of named entity recognition to support credit risk assessment. In Proceedings of the Australasian Language Technology Association Workshop 2015, pp. 84-90. 2015. Accessed: August 2018.

[Balasuriya et al., 2009] Balasuriya, Dominic, Nicky Ringland, Joel Nothman, Tara Murphy, and James R. Curran. Named entity recognition in wikipedia. In Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources, pp. 10-18. Association for Computational Linguistics, 2009

[Bos et al., 2017] Bos, Johan, Valerio Basile, Kilian Evang, Noortje J. Venhuizen, and Johannes Bjerva. The Groningen meaning bank. In Handbook of linguistic annotation, pp. 463-496. Springer, Dordrecht, 2017.

[Derczynski et al., 2016] Derczynski, Leon, Kalina Bontcheva, and Ian Roberts. Broad twitter corpus: A diverse named entity recognition resource. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 1169-1179. 2016. Available at: https://github.com/GateNLP/broad_twitter_corpus Accessed: August 2018.

[Derczynski et al., 2017] Leon Derczynski, Eric Nichols, Marieke van Erp, Nut Limsopatham (2017) Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition, in Proceedings of the 3rd Workshop on Noisy, User-generated Text. Available at: https://noisy-text.github.io/2017/emerging-rare-entities.html

[DSTL, 2017] Defence Science and Technology Laboratory. 2017. Relationship and Entity Extraction Evaluation Dataset. https://github.com/dstl/re3d. Accessed: January 2018.

[Grishman and Sundheim, 1996] Ralph Grishman and Beth Sundheim. 1996. Message understanding conference- 6: A brief history. In COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics.

[Karimi et al., 2015] Sarvnaz Karimi, Alejandro Metke-Jimenez, Madonna Kemp, and Chen Wang. 2015. Cadec: A corpus of adverse drug event annotations. Journal of biomedical informatics, 55:73-81. Available at https://data.csiro.au Accessed: November 2017.

[Lim et al., 2017] Lim, Swee Kiat, Aldrian Obaja Muis, Wei Lu, and Chen Hui Ong. MalwareTextDB: A database for annotated malware articles. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1557-1567. 2017.

[Liu et al., 2013a] Jingjing Liu, Panupong Pasupat, Scott Cyphers, and Jim Glass. 2013. Asgard: A portable architecture for multilingual dialogue systems. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 8386-8390. IEEE. Available at https://groups.csail.mit.edu/sls/downloads/restaurant/ Accessed: January 2018

[Liu et al., 2013b] Jingjing Liu, Panupong Pasupat, Yining Wang, Scott Cyphers, and Jim Glass. 2013. Query understanding enhanced by hierarchical parsing structures. In Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on, pages 72-77. IEEE. Available at https://groups.csail.mit.edu/sls/downloads/movie/ We used the trivia10k13 portion. Accessed: January 2018

[NIST, 1999 IE-ER] NIST. 1999. Information Extraction - Entity Recognition Evaluation. http://www.nist.gov/speech/tests/ieer/er_99/er_99.htm. The newswire development test data only (included in the NLTK package).

[Ohta et al., 2012] Tomoko Ohta, Sampo Pyysalo, Jun'ichi Tsujii and Sophia Ananiadou. 2012. Open-domain Anatomical Entity Mention Detection. In Proceedings of ACL 2012 Workshop on Detecting Structure in Scholarly Discourse (DSSD), pp. 27-36. Available at: http://www.nactem.ac.uk/anatomy/ and https://github.com/openbiocorpora/anem Accessed: November 2017.

[Ritter et al., 2011] Alan Ritter, Sam Clark, Mausam, and Oren Etzioni. 2011. Named entity recognition in tweets: An experimental study. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1524-1534, Edinburgh, Scotland, UK., July. Association for Computational Linguistics. Accessed January 2018.

[Sang and Meulder, 2003] Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Languageindependent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003.

[Stubbs et al., 2015] Amber Stubbs and Ozlem Uzuner. 2015. Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus. Journal of biomedical informatics, 58:S20-S29. Available at https://www.i2b2.org/NLP/DataSets/ Accessed: February 2018.

[Uzuner et al., 2007] Ozlem Uzuner, Yuan Luo, and Peter Szolovits. 2007. Evaluating the state-of-the-art in automatic de-identification. Journal of the American Medical Informatics Association, 14(5):550-563. Available at https://www.i2b2.org/NLP/DataSets/ Accessed: February 2018.

[Weischedel and Brunstein, 2005] Ralph Weischedel and Ada Brunstein. 2005. BBN pronoun coreference and entity type corpus. Linguistic Data Consortium, Philadelphia.

[Weischedel et al., 2013] Weischedel, Ralph, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue et al. Ontonotes release 5.0 ldc2013t19. Linguistic Data Consortium, Philadelphia, PA (2013).

[Zeldes, 2017] Amir Zeldes. 2017. The GUM corpus: creating multilayer resources in the classroom. Language Resources and Evaluation, 51(3):581-612. Available at https://github.com/amir-zeldes/gum/tree/master/coref/tsv/ Accessed: November 2017.

Memperluas

Informasi Tambahan

Versi 1.0.0
Tipe Kode sumber lainnya
Waktu Pembaruan 2025-04-17
ukuran 2.39MB
Berasal dari Github

Aplikasi Terkait

GitHub sgrebnov/cordova plugin background download

2024-11-05
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01
Game Entitas Ruang Belakang 30

2023-06-12
permainan ruang entitas

2023-04-26
MVC+Linq ke Entity Music Store v1.0

2022-06-02

Direkomendasikan untuk Anda

chat.petals.dev

Kode sumber lainnya

1.0.0
GPT Prompt Templates

Kode sumber lainnya

1.0.0
GPTyped

Kode sumber lainnya

GPTyped 1.0.5
Google Dorks

Kode sumber lainnya

1.0
shepherd

Kode sumber lainnya

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Kode sumber lainnya

v1.1.0-rc-3
Google Dorks

Kode sumber lainnya

1.0
shepherd

Kode sumber lainnya

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Kode sumber lainnya

v1.1.0-rc-3

Informasi Terkait Semua