KoBART Download - KoBART Source code download

KoBART

AI Source Code

1.0.0

Download

? Kobart

? Kobart
- How to Install
- Data
- Tokenizer
- Model
  - Performance
    - Classification or Regression
    - Summarization
- DEMOS
- Examples
- Release
- Contacts
- License

BART ( B IDIRECTIONAL AND A UTO- R EGRERASSIVE T RANSFORMERS) is learned in the form of autoencoder , which adds noise to some of the input text and restores it as an original text. Korean BART ( KOBART ) is a Korean encoder-decoder language model that has been learned about Korean text of 40GB or more using Text Infilling noise function used in the paper. This distributes the derived KoBART-base .

bart

How to Install

pip install git+https://github.com/SKT-AI/KoBART#egg=kobart

Data

Data	# of sentences
Korean wiki	5M
Other Corpus	0.27b

In addition to Korean Wikipedia, various data such as news, books, and all of the horses of v1.0 (conversation, news, ...) were used for model learning.

Tokenizer

Learned with Character BPE tokenizer in the tokenizers Package.

vocab size is 30,000 and added emoticons and emoji, which are often used for conversations, and the recognition ability of the token is raised.

?, :) ,?, (-: -) :-)

In addition, we have defined unused tokens such as <unused0> to <unused99> so that they can be freely defined according to the necessary subtasks .

 > >> from kobart import get_kobart_tokenizer
> >> kobart_tokenizer = get_kobart_tokenizer ()
> >> kobart_tokenizer . tokenize ( "안녕하세요. 한국어 BART 입니다.?:)l^o" )
[ '▁안녕하' , '세요.' , '▁한국어' , '▁B' , 'A' , 'R' , 'T' , '▁입' , '니다.' , '?' , ':)' , 'l^o' ]

Model

Model	# of params	Type	# of layers	# of heads	ffn_dim	hidden_dims
`KoBART-base`	124m	Encoder	6	16	3072	768
		Decoder	6	16	3072	768

 > >> from transformers import BartModel
> >> from kobart import get_pytorch_kobart_model , get_kobart_tokenizer
> >> kobart_tokenizer = get_kobart_tokenizer ()
> >> model = BartModel . from_pretrained ( get_pytorch_kobart_model ())
> >> inputs = kobart_tokenizer ([ '안녕하세요.' ], return_tensors = 'pt' )
> >> model ( inputs [ 'input_ids' ])
Seq2SeqModelOutput ( last_hidden_state = tensor ([[[ - 0.4418 , - 4.3673 ,  3.2404 ,  ...,  5.8832 ,  4.0629 ,  3.5540 ],
         [ - 0.1316 , - 4.6446 ,  2.5955 ,  ...,  6.0093 ,  2.7467 ,  3.0007 ]]],
       grad_fn = < NativeLayerNormBackward > ), past_key_values = (( tensor ([[[[ - 9.7980e-02 , - 6.6584e-01 , - 1.8089e+00 ,  ...,  9.6023e-01 , - 1.8818e-01 , - 1.3252e+00 ],

Performance

Classification or Regression

	NSMC (ACC)	KORSTS (Spearman)	Question Pair (ACC)
-----------------------------------------
Kobart-base	90.24	81.66	94.34

Summarization

Scheduled update *

DEMOS

Summary demo

The above example is the result of summarizing the zdnet article.

Examples

NSMC Classification
Kobart chitchatbot
Kobart Summarization
Kobart translation
Legalqa Using Sentence Kobart
KOBART Question Generation

If you have an interesting example using KOBART, please PR!

Release

V0.5.1
- Guide Default'Import Statements'
V0.5
- Download Large Files from aws s3
V0.4
- Update model binary
V0.3
- The issue of the <unk> token disappearing due to the talk nicer bug
V0.2
- KoBART model update (SAMPLE EFFICENT improves)
- All versions 모두의 말뭉치
- Downloader bug modification
- pip installation support

Contacts

Please upload KoBART -related issues here.

License

KoBART is released under modified MIT license. If you are using models and code, please follow the license content. License specialists can be found in LICENSE file.

Expand

Additional Information

Version 1.0.0
Type AI Source Code
Update Time 2025-09-10
size 144.9KB
From Github

Related Applications

ML stack

2025-07-01
awesome free chatgpt

2025-01-04
pywin_contextmenu

2025-08-31
promptl

2025-02-17
tick.chat

2025-09-16
FastLoRAChat

2025-09-03

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
ML stack

AI Source Code

1.0.0
awesome free chatgpt

AI Source Code

1.0.0
pywin_contextmenu

AI Source Code

Version update
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3

Related Information All