Basically, NK-VEC is a Neural Network Embeddings with ideas from Word2VEC and has the same task as all embedded models from the present, but it has a much simpler structure. Through the NK-Vector library, you can use the NK-VEC model to build the vector set according to your own data in the simplest way. In addition, NK-Vector also provides you with some useful features and algorithms used to solve NLP problems.
I created it while researching a large project at the Informatics Room of Nguyen Khuyen Secondary School (Da Nang), so NK was the abbreviation of the school name, this imprint I wanted to save and it deserves that.
| Jaw | Parameter | For example | Note |
|---|---|---|---|
| Create_one_hot | <file_url, url_save> | "E: /project/data.txt", "e: /project/onehot.json" | In this will default to filtering English and special characters except '_' |
| Create_window_words | <file_url, window_size, url_save> | "E: /project/data.txt", 5, "e: /project/window.txt" | In this will default to Filter Stopword English and special characters except '_'. Window_size must be an odd number |
| train | <size_output, url_data_one_hot, url_data_window_words, url_save> | 512, "e: /project/onehot.json", "e: /project/window.txt", "e: /project/data_vector.json" | Size_output is the number of output vector dimensions and it must be smaller than the first dimension of onehot vector input |
| building_vec_sentences | <"doc", url_vecs_of_words, url_save> | "Hello everyone", "e: /project/data_vector.json", "" " | If url_save has a length of zero, then default returns the vector without saving, if saved, let the JSON format - Example: "E: /project/data_Sentence_Vector" " |
| search_word_similarity | <"Target", url_vecs_of_word, size_result> | "King", "e: /project/data_vector.json", 15 | Size_result corresponds to the number of words with the same highest to low from being returned " |
| Treasury | <"Target", type_distance, data, k> | [7, 8], 'Eculid', Points, 4 | See examples of use of the function below |
| Vn_segmentation_tag | <"document"> | "Welcome to me" | Make sure your Node version is a full version of 10.16.0 or more |
| clear_sentence_vn | <"document"> | "Welcome to me" | Here your Vietnamese sentence will be filtered from Stopword Vietnamese to special characters |
| clear_sentence_en | <"document"> | "Welcome to me" | Here your English sentence will be filtered from Stopword English to special characters |
| Remove_duplicate_words | <"document"> | "Welcome to me" | There will be duplicated words in the sentence and it is used for both English and Vietnamese |
| fast_build_chatbot | <"Text"> | "How is the weather today" | Here BOT will return one of the labels: Chemistry, General_ASKING, MATH, Good_bye, Hello, Introduction, Thanks, Ask_weather, Unknown |
| sentiment_vn | <"Text"> | "Today is so gloomy" | Here will return one of the labels: sad, happy, frustrated, normal, unknown - as for example, the result return is a string: sad " |
| Fix_telex | <"Text"> | "Anh Thisch Awn Busn Char Cas" | Here will return the result that has been telex - as for example: I like to eat fish bun |
| English_or_vietnamese | <"Text"> | "Hello, huh, you?" | It will return the result as an object including your_text, label, fix_text schools - as for example {your_text: 'Hello, hue you?', Label: 'English', Fix_Text: 'Hello, How are you?' |
| } |
- Install node.js
- Run: NPM I NK-Vector
let NKV = require ( 'nk-vector' )Example: Use the jaw function
let NKV = require ( 'nk-vector' )
let points = [
[ 1 , 2 ] ,
[ 3 , 4 ] ,
[ 5 , 6 ] ,
[ 7 , 8 ]
] ;
let nearest = NKV . knn ( [ 7 , 8 ] , 'eculid' , points , 4 ) ;
console . log ( nearest ) ;
/*Result:
[ [ [ 7, 8 ], 0 ],
[ [ 5, 6 ], 8 ],
[ [ 3, 4 ], 32 ],
[ [ 1, 2 ], 72 ] ]
Giải thích kết quả mảng trả về: [<vector trong tập dữ liệu>, <khoảng cách từ vector đầu vào tới vector này>]
*/Example: Use the function building_vec_sentences
let NKV = require ( 'nk-vector' )
let sentence = NKV . VN_segmentation_tag ( NKV . clear_sentence_vn ( 'cân bằng phương trình hóa học' ) )
let full_sentence = ''
for ( let word in sentence ) {
full_sentence += sentence [ word ] . replace ( ' ' , '_' ) + ' '
}
if ( full_sentence . length > 0 ) {
console . log ( full_sentence )
console . log ( NKV . build_vec_sentences ( full_sentence . trim ( ) , 'E:/<name_project>/data_vec.json' , '' ) )
}
/*Result:
{"cân_bằng phương_trình hóa học":[0.002338010428122218,...,0,0,0.00111962700489077,0.0009866701202071657,0.00111962700489077,0,0.00111962700489077,0,0,0.0009866701202071657,0,0.0010865777210490053,0,0.0010865777210490053,0,0,0,0,0,0.0009866701202071657,0,0,0,0,0,0,0.0010865777210490053,...0,0.0010865777210490053,...,0]}
*/Example: Use Clear_sentence_VN
let NKV = require ( 'nk-vector' )
let clear_sentence = NKV . clear_sentence_vn ( "Chào mừng các bạn lên trên trời, ở đây là trên trời" )
console . log ( clear_sentence ) ;
//Result: chào mừng trời trờiRed : The error cannot continue
Yellow : This is just a normal notice, still running
If you encounter an error that does not find the File Word file, then find the error line according to the path in the Terminal and revise it into:
- Path.join (__ dirke, "/src/stop_word.txt"): Let function clear_sentence_en
- Path.join (__ direct, "/src/stop_word_vnt_vn.txt"): For Function Clear_Sentence_VN
Or a filter path exactly your wayThis error is notified to the user at a red level
This error occurs when vocabulary, characters that make up the sentence in the stopword filter and special characters remove during the training process should lead to no vector of these vocabulary, resulting in the loading sentence will be empty and no build.
This error is notified to the user with yellow level
Thank you for using NK-Vector, I will update new algorithms regularly!
Thank you, VNB for developing the Holy VNTK package
Code search: https://code-search-vni.herokuapp.com/
Before the GPT-3 Publish key for everyone, I can still create a place to search Python in the semantics of GPT-3's Example