ดาวน์โหลด lda - ดาวน์โหลดซอร์สโค้ด lda

lda

ซอร์สโค้ดอื่น ๆ

1.0.0

ดาวน์โหลด

LDA

การสร้างแบบจำลองหัวข้อ Latent Dirichlet (LDA) ใน JavaScript สำหรับ node.js. LDA เป็นอัลกอริทึมการเรียนรู้ของเครื่องที่แยกหัวข้อและคำหลักที่เกี่ยวข้องจากการรวบรวมเอกสาร

ใน LDA เอกสารอาจมีหลายหัวข้อที่แตกต่างกันแต่ละข้อมีคำที่เกี่ยวข้องของตนเอง อัลกอริทึมใช้แบบจำลองความน่าจะเป็นสำหรับการตรวจจับจำนวนหัวข้อที่ระบุและแยกคำหลักที่เกี่ยวข้อง ตัวอย่างเช่นเอกสารอาจมีหัวข้อที่สามารถจัดประเภทเป็นที่เกี่ยวข้องกับชายหาดและที่เกี่ยวข้องกับสภาพอากาศ หัวข้อชายหาดอาจมีคำที่เกี่ยวข้องเช่นทรายมหาสมุทรและน้ำ ในทำนองเดียวกันหัวข้อสภาพอากาศอาจมีคำที่เกี่ยวข้องเช่นดวงอาทิตย์อุณหภูมิและเมฆ

ดู http://en.wikipedia.org/wiki/latent_dirichlet_allocation

$ npm install lda

การใช้งาน

 var lda = require ( 'lda' ) ;

// Example document.
var text = 'Cats are small. Dogs are big. Cats like to chase mice. Dogs like to eat bones.' ;

// Extract sentences.
var documents = text . match ( / [^.!?]+[.!?]+ / g ) ;

// Run LDA to get terms for 2 topics (5 terms each).
var result = lda ( documents , 2 , 5 ) ;

ตัวอย่างข้างต้นสร้างผลลัพธ์ต่อไปนี้ด้วยสองหัวข้อ (หัวข้อที่ 1 คือ "เกี่ยวข้องกับแมว" หัวข้อ 2 คือ "เกี่ยวข้องกับสุนัข"):

 Topic 1
cats (0.21%)
dogs (0.19%)
small (0.1%)
mice (0.1%)
chase (0.1%)

Topic 2
dogs (0.21%)
cats (0.19%)
big (0.11%)
eat (0.1%)
bones (0.1%)

เอาท์พุท

LDA ส่งคืนอาร์เรย์ของหัวข้อแต่ละรายการมีอาร์เรย์ของคำศัพท์ ผลลัพธ์มีรูปแบบต่อไปนี้:

 [ [ { term: 'dogs', probability: 0.2 },
    { term: 'cats', probability: 0.2 },
    { term: 'small', probability: 0.1 },
    { term: 'mice', probability: 0.1 },
    { term: 'chase', probability: 0.1 } ],
  [ { term: 'dogs', probability: 0.2 },
    { term: 'cats', probability: 0.2 },
    { term: 'bones', probability: 0.11 },
    { term: 'eat', probability: 0.1 },
    { term: 'big', probability: 0.099 } ] ]

ผลลัพธ์สามารถข้ามได้ดังนี้:

 var result = lda ( documents , 2 , 5 ) ;

// For each topic.
for ( var i in result ) {
	var row = result [ i ] ;
	console . log ( 'Topic ' + ( parseInt ( i ) + 1 ) ) ;
	
	// For each term.
	for ( var j in row ) {
		var term = row [ j ] ;
		console . log ( term . term + ' (' + term . probability + '%)' ) ;
	}
	
	console . log ( '' ) ;
}

ภาษาเพิ่มเติม

LDA ใช้คำหยุดเพื่อเพิกเฉยต่อคำทั่วไปในข้อความ (ตัวอย่างเช่น: นี่คือเรา) โดยค่าเริ่มต้นรายการหยุดคำว่าใช้ภาษาอังกฤษ หากต้องการใช้ภาษาเพิ่มเติมคุณสามารถระบุอาร์เรย์ของรหัสภาษาดังต่อไปนี้:

 // Use English (this is the default).
result = lda ( documents , 2 , 5 , [ 'en' ] ) ;

// Use German.
result = lda ( documents , 2 , 5 , [ 'de' ] ) ;

// Use English + German.
result = lda ( documents , 2 , 5 , [ 'en' , 'de' ] ) ;

หากต้องการเพิ่มรายการหยุดคำเฉพาะภาษาใหม่ให้สร้างไฟล์ /lib/lib/stopwords_xx.js โดยที่ XX คือ ID สำหรับภาษา ตัวอย่างเช่นรายการหยุดคำภาษาฝรั่งเศสอาจมีชื่อว่า "Stopords_fr.js" เนื้อหาของไฟล์ควรเป็นไปตามรูปแบบของรายการหยุดคำที่มีอยู่ รูปแบบคือดังนี้:

 exports . stop_words = [
    'cette' ,
    'que' ,
    'une' ,
    'il'
] ;

การตั้งค่าเมล็ดสุ่ม

เมล็ดสุ่มที่เฉพาะเจาะจงสามารถใช้ในการคำนวณคำศัพท์และความน่าจะเป็นเดียวกันในระหว่างการวิ่งครั้งต่อไป คุณสามารถระบุเมล็ดสุ่มดังนี้:

 // Use the random seed 123.
result = lda ( documents , 2 , 5 , null , null , null , 123 ) ;

ผู้เขียน

Kory Becker http://www.primaryobjects.com

ขึ้นอยู่กับการใช้งาน JavaScript ดั้งเดิม https://github.com/awaisathar/lda.js

ขยาย

ข้อมูลเพิ่มเติม

เวอร์ชัน 1.0.0
ประเภท ซอร์สโค้ดอื่น ๆ
เวลาอัปเดต 2025-04-19
ขนาด 16.56KB
มาจาก Github

แอปที่เกี่ยวข้อง

Google Dorks

2025-03-10
shepherd

2025-06-04
mongo express

2025-06-04
hidusbf

2025-02-14
Free Algorithms Books

2025-05-29
markdownpedia

2025-04-22

แนะนำสำหรับคุณ

chat.petals.dev

ซอร์สโค้ดอื่น ๆ

1.0.0
GPT Prompt Templates

ซอร์สโค้ดอื่น ๆ

1.0.0
GPTyped

ซอร์สโค้ดอื่น ๆ

GPTyped 1.0.5
Google Dorks

ซอร์สโค้ดอื่น ๆ

1.0
shepherd

ซอร์สโค้ดอื่น ๆ

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

ซอร์สโค้ดอื่น ๆ

v1.1.0-rc-3
Google Dorks

ซอร์สโค้ดอื่น ๆ

1.0
shepherd

ซอร์สโค้ดอื่น ๆ

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

ซอร์สโค้ดอื่น ๆ

v1.1.0-rc-3

ข้อมูลที่เกี่ยวข้อง ทั้งหมด