torchKbert Download - torchKbert Source code download

torchKbert

AI Source Code

1.0.0

Download

torchKbert

Our customized version of bert for pytorch

illustrate

This is a model library that I have partially customized and modified based on Meelfy's pytorch_pretrained_BERT library.

The original intention of this project is to satisfy the convenience of personal experiments, so it will not be updated frequently.

Function

The functionality in the original model library pytorch_pretrained_BERT is still supported;
Supports hierarchical decomposition position coding.
Supports woBERT based on word granularity. Pytorch weights (the WoBERT Plus model is provided here):
- chinese_wobert_plus.zip (extraction code: fg6j)

use

Install:
```
pip install torchKbert
```
For typical usage examples, please refer to the official examples directory.
If you want to use hierarchical decomposition position encoding so that BERT can process long text, just pass the parameter is_hierarchical=True in model . Examples are as follows:
```
 model = BertModel(config)
encoder_outputs, _ = model(input_ids, token_ids, input_mask, is_hierarchical=True)
```
If you want to use Chinese WoBERT based on word granularity, just pass in new parameters when building the BertTokenizer object:
```
 from torchKbert.tokenization import BertTokenizer

tokenizer = BertTokenizer(
    vocab_file=vocab_path, 
    pre_tokenizer=lambda s: jieba.cut(s, HMM=False))
```
When not passed in, the default is None . When participling words, the default is to be used as words. If you want to restore the use of word units, just pass in the new parameter pre_tokenize=False when tokenize :
```
 tokenzier.tokenize(text, pre_tokenize=False)
```

background

I have been writing pytorch_pretrained_BERT in Meelfy before, and it is very convenient to call pretrained models or perform fine-tuning. Later, due to personal needs, I wanted to rewrite a version that supports hierarchical decomposition position coding.

Sushen's bert4keras has implemented such a function. But because I am used to using pytorch, I haven't used keras for a long time, so I plan to rewrite one by myself.

renew

2021.03.07 : Add hierarchical decomposition position coding.
2021.05.27 : Add Chinese WoBERT based on word granularity.
2022.03.27 : Refer to pytorch_transformers to refactor the BertPretrainedModel code implementation.

refer to

Thanks to Meelfy's implementation of pytorch_pretrained_BERT, this implementation is entirely based on the source code of pytorch_pretrained_BERT.
Thanks to Su Shen for his insight and selfless sharing: Hierarchical decomposition position coding allows BERT to process ultra-long text.
WoBERT: Word-based Chinese BERT model - ZhuiyiAI.

Expand

Additional Information

Version 1.0.0
Type AI Source Code
Update Time 2025-09-08
size 85.53KB
From Github

Related Applications

ML stack

2025-07-01
awesome free chatgpt

2025-01-04
pywin_contextmenu

2025-08-31
promptl

2025-02-17
tick.chat

2025-09-16
FastLoRAChat

2025-09-03

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
ML stack

AI Source Code

1.0.0
awesome free chatgpt

AI Source Code

1.0.0
pywin_contextmenu

AI Source Code

Version update
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3

Related Information All