wikipedia2vec下載wikipedia2vec源代碼下載

wikipedia2vec

其他源碼

v2.0.0

下載

Wikipedia2vec

Wikipedia2Vec是一種用於獲取單詞和實體（即Wikipedia中具有相應頁面的概念）的嵌入（或向量表示）的工具。它是由Studio Ousia開發和維護的。

該工具使您能夠同時學習單詞和實體的嵌入，並將相似的單詞和實體彼此放置在連續的向量空間中。嵌入可以通過單個命令輕鬆訓練，並以公開可用的Wikipedia轉儲為輸入。

該工具實現了傳統的跳過模型，以學習單詞的嵌入及其在Yamada等人中提出的擴展。（2016年）學習實體的嵌入。

Wikipedia2Vec和現有嵌入工具（即，FastText，Gensim，RDF2VEC和Wiki2Vec）之間的經驗比較。

可以在http://wikipedia2vec.github.io/上在線獲得文檔。

基本用法

Wikipedia2Vec可以通過PYPI安裝：

% pip install wikipedia2vec

使用此工具，可以通過以Wikipedia轉儲為輸入來運行火車命令來學習嵌入。例如，以下命令下載最新的英語wikipedia dump，並從此轉儲中學習嵌入：

% wget https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
% wikipedia2vec train enwiki-latest-pages-articles.xml.bz2 MODEL_FILE

然後，學習的嵌入將寫入model_file 。請注意，此命令可以採用許多可選參數。有關更多詳細信息，請參考我們的文檔。

預處理的嵌入

可以從此頁面下載12種語言（即，英語，阿拉伯語，荷蘭語，法語，法語，法語，德語，日語，波蘭語，葡萄牙語，俄語和西班牙語）的預測嵌入。

用例

Wikipedia2Vec已應用於以下任務：

實體鏈接：Yamada等，2016； Eshel等，2017； Chen等，2019，Poerner等，2020； Van Hulst等，2020。
命名實體識別：Sato等人，2017年，Lara-Clares和Garcia-Serrano，2019年。
問題回答：Yamada等，2017，Poerner等，2020。
實體打字：Yamada等，2018。
文本分類：Yamada等，2018； Yamada和Shindo，2019； Alam等，2020。
關係分類：Poerner等，2020。
釋義檢測：Duong等，2018。
知識圖完成：Shah等人，2019年，Shah等，2020。
假新聞檢測：Singh等人，2019年，Ghosal等，2020。
電影情節分析：Papalampidi等，2019。
新實體發現：Zhang等，2020。
實體檢索：Gerritse等，2020。
DeepFake檢測：Zhong等，2020。
會話信息尋求：Rodriguez等，2020。
查詢擴展：Rosin等，2020。

參考

如果您在科學出版物中使用wikipedia2vec，請引用以下論文：

Ikuya Yamada，Akari Asai，Jin Sakuma，Hiroyuki Shindo，Hideaki Takeda，Yoshiyasu Takefuji，Yuji Matsumoto，Wikipedia2vec：一種有效的工具包，用於學習和可視化Wikipedial的單詞和嵌入式。

 @inproceedings{yamada2020wikipedia2vec,
  title = "{W}ikipedia2{V}ec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from {W}ikipedia",
  author={Yamada, Ikuya and Asai, Akari and Sakuma, Jin and Shindo, Hiroyuki and Takeda, Hideaki and Takefuji, Yoshiyasu and Matsumoto, Yuji},
  booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  year = {2020},
  publisher = {Association for Computational Linguistics},
  pages = {23--30}
}

嵌入模型最初是在以下論文中提出的：

Ikuya Yamada，Hiroyuki Shindo，Hideaki Takeda，Yoshiyasu Takefuji，聯合學習命名實體歧義的單詞和實體的嵌入。

 @inproceedings{yamada2016joint,
  title={Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation},
  author={Yamada, Ikuya and Shindo, Hiroyuki and Takeda, Hideaki and Takefuji, Yoshiyasu},
  booktitle={Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning},
  year={2016},
  publisher={Association for Computational Linguistics},
  pages={250--259}
}

以下論文提出了本示例中實施的文本分類模型：

Ikuya Yamada，Hiroyuki Shindo，文本分類的神經關注範圍模型。

 @article{yamada2019neural,
  title={Neural Attentive Bag-of-Entities Model for Text Classification},
  author={Yamada, Ikuya and Shindo, Hiroyuki},
  booktitle={Proceedings of The 23th SIGNLL Conference on Computational Natural Language Learning},
  year={2019},
  publisher={Association for Computational Linguistics},
  pages = {563--573}
}

執照

Apache許可證2.0

展開

附加信息

版本 v2.0.0
類型其他源碼
更新時間 2025-04-18
大小 747.51KB
來自於 Github

相關應用

Google Dorks

2025-03-10
shepherd

2025-06-04
mongo express

2025-06-04
hidusbf

2025-02-14
Free Algorithms Books

2025-05-29
markdownpedia

2025-04-22

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3
Google Dorks

其他源碼

1.0
shepherd

其他源碼

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

其他源碼

v1.1.0-rc-3

相關資訊全部