ngramModelTrainer Download - ngramModelTrainer Source code download

ngramModelTrainer

AI Source Code

1.0.0

Download

ngramModelTrainer

Learns an n-gram language model given a corpus. The corpus should be text file, with a single word per line, containing no inter-word spaces.

The learned quantities are:

Probabilities of unigrams, p( g_i )
Probabilities of bigrams, p( g_i | g_i-1 )
Probabilities of trigrams, p( g_i | g_i-1, g_i-2 )

Testing and running

Test the script by running with no argument:

python3 ngramModelTrainer

Use the -h flag for details on how to use the tool with proper input:

python3 ngramModelTrainer -h

There are a few example inputs on fixtures/.

The output is saved as four MATLAB matrices.

unigrams: u(i) stands for p(i).
bigrams: b(i, j) stands for p(j | i).
trigrams: t(i, j, k) stands for p(k | j, i).
quadgrams (tetragrams): q(i, j, k, l) stands for p(l | k, j, i).

Alphabet

An alphabet of specific acceptable unigrams is required to be defined. By default, we are using an alphabet of 36 possible letters/digits. These are held in a python list called 'alphabet', in the following order:

Positions 0-25: Latin lowercase alphabet letters, in standard alphabetical order.
Positions 26-35: Digits 0-9.

'Alternative' alphabets

Non-'standard' versions of the above alphabet may be used. These include: dutta_extended: a number of extra characters (these are notably encodings of the characters and punctuation found in the George Washington handwritten document set). sophia: polytonic greek characters. dummy: a limited testing set of 3 characters

Expand

Additional Information

Version 1.0.0
Type AI Source Code
Update Time 2025-09-09
size 4.89MB
From Github

Related Applications

ML stack

2025-07-01
awesome free chatgpt

2025-01-04
pywin_contextmenu

2025-08-31
promptl

2025-02-17
tick.chat

2025-09-16
FastLoRAChat

2025-09-03

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
ML stack

AI Source Code

1.0.0
awesome free chatgpt

AI Source Code

1.0.0
pywin_contextmenu

AI Source Code

Version update
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3

Related Information All