TuoTuoダウンロードTuoTuoソースコードのダウンロード

TuoTuo

AI ソースコード

Stable 0.2.7 release

ダウンロード

Tuotuo

Tuotuo Pythonで書かれたトピックモデリングライブラリ。 Tuotuoもかわいい男の子、私の息子で、現在は6ヶ月です。

インストール

パッケージマネージャーPIPを使用して、Tuotuoをインストールします。ここでは、Pypiの分布を見つけることができます。

pip install TuoTuo --upgrade

使用法

現在、ライブラリは、潜在的なディリクレアロケーション（LDA）を介したトピックモデリングのみをサポートしています。私たちが知っているように、LDAはギブスのサンプリングと変分推論を使用して実装できます。これは数学的により洗練されているため、後者を選択します

5つの異なるトピックと40のユニークな単語を超えて、事前に定義されたディリクレパラメーターに基づいていくつかのドキュメントを生成します

 import torch as tr
from tuotuo . generator import doc_generator 

gen = doc_generator (
    M = 100 , 
    # we sample 100 documents 
    L = 20 , 
    # each document would contain 20 pre-defined words 
    topic_prior = tr . tensor ([ 1 , 1 , 1 , 1 , 1 ], dtype = tr . double )
    # we use a exchangable Dirichlet Distribution as our topic prior, 
    # that is a uniform distribution on 5 topics
)
train_docs = gen . generate_doc ()

トレーニング文書を形成し、LDAの変動推論パラメーターをトレーニングします

 from tuotuo . lda_model import LDASmoothed 
import matplotlib . pyplot as plt 

lda = LDASmoothed (
    num_topics = 5 , 
)

perplexes = lda . fit (
    train_docs ,
    sampling = False ,
    verbose = True , 
    return_perplexities = True ,
)
plt . plot ( perplexes )

= >= >= >= >= >= >= >= >
Topic Dirichlet Prior , Alpha
1

Exchangeable Word Dirichlet Prior , Eta 
1

Var Inf - Word Dirichlet prior , Lambda
( 5 , 40 )

Var Inf - Topic Dirichlet prior , Gamma
( 100 , 5 )

Init perplexity = 84.99592157507153
End perplexity = 45.96696541539976

100の反復を超える困惑

変分推論パラメーターに従って、各トピックの上位5つの単語をチェックしてください。 $ lambda $

 for topic_index in range ( lda . _lambda_ . shape [ 0 ]):

    top5 = np . argsort ( lda . _lambda_ [ topic_index ,:],)[ - 5 :]
    print ( f"Topic { topic_index } " )
    for i , idx in enumerate ( top5 ):
        print ( f"Top { i + 1 } -> { lda . train_doc . idx_to_vocab [ idx ] } " )
    print ()

= >= >= >= >= >= >= >= >
Topic 0 
Top 1 -> physical
Top 2 -> quantum
Top 3 -> research
Top 4 -> scientst
Top 5 -> astrophysics

Topic 1
Top 1 -> divorce
Top 2 -> attorney
Top 3 -> court
Top 4 -> bankrupt
Top 5 -> contract

Topic 2
Top 1 -> content
Top 2 -> Craftsmanship
Top 3 -> concert
Top 4 -> asymmetrical
Top 5 -> Symmetrical

Topic 3
Top 1 -> recreation
Top 2 -> FIFA
Top 3 -> football
Top 4 -> Olympic
Top 5 -> athletics

Topic 4
Top 1 -> fever
Top 2 -> appetite
Top 3 -> contagious
Top 4 -> decongestant
Top 5 -> injection