ดาวน์โหลด ngram language model - ngram language model Source Source Download

ngram language model

โค้ดแหล่งที่มา AI

1.0.0

ดาวน์โหลด

รูปแบบภาษา N-Gram

การใช้ Python การใช้แบบจำลองภาษา N-Gram พร้อมการสร้างแบบราบรื่นและการสร้างประโยค

มีการใช้ฟังก์ชัน NLTK บางอย่าง ( nltk.ngrams , nltk.FreqDist ) แต่ทุกอย่างส่วนใหญ่ใช้ด้วยมือ

หมายเหตุ: คลาส LanguageModel คาดว่าจะได้รับข้อมูลซึ่งมีการปรับแต่งด้วยประโยคแล้ว หากใช้ฟังก์ชั่น load_data ที่รวมอยู่ไฟล์ train.txt และ test.txt ควรประมวลผลแล้ว:

เครื่องหมายวรรคตอนจะถูกลบออก
แต่ละประโยคอยู่ในสายของตัวเอง

ดู data/ ไดเรกทอรีสำหรับตัวอย่าง

ตัวอย่างเอาต์พุตสำหรับโมเดล trigram ที่ผ่านการฝึกอบรมเกี่ยวกับ data/train.txt และทดสอบกับ data/test.txt :

 Loading 3-gram model...
Vocabulary size: 23505
Generating sentences...
...
<s> <s> the company said it has agreed to sell its shares in a statement </s> (0.03163)
<s> <s> he said the company also announced measures to boost its domestic economy and could be a long term debt </s> (0.01418)
<s> <s> this is a major trade bill that would be the first quarter of 1987 </s> (0.02182)
...
Model perplexity: 51.555

ตัวเลขในวงเล็บข้างประโยคที่สร้างขึ้นคือความน่าจะเป็นสะสมของประโยคเหล่านั้นที่เกิดขึ้น

ข้อมูลการใช้งาน:

 usage: N-gram Language Model [-h] --data DATA --n N [--laplace LAPLACE] [--num NUM]

optional arguments:
  -h, --help         show this help message and exit
  --data DATA        Location of the data directory containing train.txt and test.txt
  --n N              Order of N-gram model to create (i.e. 1 for unigram, 2 for bigram, etc.)
  --laplace LAPLACE  Lambda parameter for Laplace smoothing (default is 0.01 -- use 1 for add-1 smoothing)
  --num NUM          Number of sentences to generate (default 10)

แต่เดิมเขียนโดย Josh Loehr และ Robin Cosbey โดยมีการดัดแปลงเล็กน้อย แก้ไขล่าสุด 8 กุมภาพันธ์ 2018

ขยาย

ข้อมูลเพิ่มเติม