ดาวน์โหลด PL BERT - PL BERT Source Source Download

PL BERT

โค้ดแหล่งที่มา AI

1.0.0

ดาวน์โหลด

เบิร์ตระดับฟอนิมเพื่อเพิ่มฉันทลักษณ์ของข้อความเป็นคำพูดด้วยการทำนายกราฟ

Yinghao Aaron Li, Cong Han, Xilin Jiang, Nima Mesgarani

โมเดลภาษาที่ผ่านการฝึกอบรมมาก่อนมีขนาดใหญ่แสดงให้เห็นว่ามีประโยชน์ในการปรับปรุงความเป็นธรรมชาติของแบบจำลองข้อความเป็นคำพูด (TTS) โดยทำให้พวกเขาสามารถสร้างรูปแบบฉันทลักษณ์ที่เป็นธรรมชาติได้มากขึ้น อย่างไรก็ตามโมเดลเหล่านี้มักจะเป็นระดับคำหรือระดับ sup-phoneme และได้รับการฝึกฝนร่วมกับหน่วยเสียงทำให้ไม่มีประสิทธิภาพสำหรับงาน TTS ดาวน์สตรีมที่จำเป็นต้องใช้หน่วยเสียงเท่านั้น ในงานนี้เราเสนอเบิร์ตระดับฟอนิม (PL-BERT) พร้อมงานข้ออ้างในการทำนายกราฟที่เกี่ยวข้องพร้อมกับการทำนายฟอนิมแบบสวมหน้ากากปกติ การประเมินอัตนัยแสดงให้เห็นว่าการเข้ารหัส Bert ระดับเสียงของเราได้ปรับปรุงคะแนนความคิดเห็นเฉลี่ย (MOS) ของความเป็นธรรมชาติที่ได้รับการจัดอันดับของคำพูดสังเคราะห์เมื่อเทียบกับ Styletts ที่ทันสมัย (SOTA) ในการแจกจ่าย (OOD)

กระดาษ: https://arxiv.org/abs/2301.08810

ตัวอย่างเสียง: https://pl-bert.github.io/

สิ่งที่ต้องมีก่อน

Python> = 3.7
โคลนที่เก็บนี้:

git clone https://github.com/yl4579/PL-BERT.git
cd PL-BERT

สร้างสภาพแวดล้อมใหม่ (แนะนำ):

conda create --name BERT python=3.8
conda activate BERT
python -m ipykernel install --user --name BERT --display-name " BERT "

ติดตั้งข้อกำหนด Python:

pip install pandas singleton-decorator datasets " transformers<4.33.3 " accelerate nltk phonemizer sacremoses pebble

การประมวลผลล่วงหน้า

โปรดดูที่โน้ตบุ๊ก preprocess.ipynb สำหรับรายละเอียดเพิ่มเติม การประมวลผลล่วงหน้าใช้สำหรับชุดข้อมูล Wikipedia ภาษาอังกฤษเท่านั้น ฉันจะสร้างสาขาใหม่สำหรับภาษาญี่ปุ่นถ้าฉันมีเวลาพิเศษในการลดการฝึกอบรมเกี่ยวกับภาษาอื่น ๆ คุณอาจอ้างถึง #6 สำหรับการประมวลผลล่วงหน้าในภาษาอื่น ๆ เช่นภาษาญี่ปุ่น

การแสดงสามครั้ง

โปรดเรียกใช้แต่ละเซลล์ใน Notebook Train.ipynb คุณจะต้องเปลี่ยน line config_path = "Configs/config.yml" ในเซลล์ 2 หากคุณต้องการใช้ไฟล์กำหนดค่าอื่น รหัสการฝึกอบรมอยู่ในสมุดบันทึก Jupyter เป็นหลักเนื่องจาก epxeriment เริ่มต้นได้ดำเนินการในสมุดบันทึก Jupyter แต่คุณสามารถทำให้เป็นสคริปต์ Python ได้อย่างง่ายดายหากคุณต้องการ

การทำให้หมดแรง

นี่คือตัวอย่างของวิธีการใช้สำหรับ Styletts Finetuning คุณสามารถใช้สำหรับรุ่น TTS อื่น ๆ โดยแทนที่ตัวเข้ารหัสข้อความด้วย PL-BERT ที่ผ่านการฝึกอบรมมาก่อน

แก้ไข Line 683 ใน models.py ด้วยรหัสต่อไปนี้เพื่อโหลดรุ่น Bert ใน Styletts:

 from transformers import AlbertConfig , AlbertModel

log_dir = "YOUR PL-BERT CHECKPOINT PATH"
config_path = os . path . join ( log_dir , "config.yml" )
plbert_config = yaml . safe_load ( open ( config_path ))

albert_base_configuration = AlbertConfig ( ** plbert_config [ 'model_params' ])
bert = AlbertModel ( albert_base_configuration )

files = os . listdir ( log_dir )
ckpts = []
for f in os . listdir ( log_dir ):
    if f . startswith ( "step_" ): ckpts . append ( f )

iters = [ int ( f . split ( '_' )[ - 1 ]. split ( '.' )[ 0 ]) for f in ckpts if os . path . isfile ( os . path . join ( log_dir , f ))]
iters = sorted ( iters )[ - 1 ]
        
checkpoint = torch . load ( log_dir + "/step_" + str ( iters ) + ".t7" , map_location = 'cpu' )
state_dict = checkpoint [ 'net' ]
from collections import OrderedDict
new_state_dict = OrderedDict ()
for k , v in state_dict . items ():
    name = k [ 7 :] # remove `module.`
    if name . startswith ( 'encoder.' ):
        name = name [ 8 :] # remove `encoder.`
        new_state_dict [ name ] = v
bert . load_state_dict ( new_state_dict )

nets = Munch ( bert = bert ,
  # linear projection to match the hidden size (BERT 768, StyleTTS 512)
  bert_encoder = nn . Linear ( plbert_config [ 'model_params' ][ 'hidden_size' ], args . hidden_dim ),
  predictor = predictor ,
    decoder = decoder ,
             pitch_extractor = pitch_extractor ,
                 text_encoder = text_encoder ,
                 style_encoder = style_encoder ,
             text_aligner = text_aligner ,
            discriminator = discriminator )

แก้ไขบรรทัด 126 ใน train_second.py ด้วยรหัสต่อไปนี้เพื่อปรับอัตราการเรียนรู้ของรุ่น Bert:

 # for stability
for g in optimizer . optimizers [ 'bert' ]. param_groups :
    g [ 'betas' ] = ( 0.9 , 0.99 )
    g [ 'lr' ] = 1e-5
    g [ 'initial_lr' ] = 1e-5
    g [ 'min_lr' ] = 0
    g [ 'weight_decay' ] = 0.01

แก้ไขบรรทัด 211 ใน train_second.py ด้วยรหัสต่อไปนี้เพื่อแทนที่ text encoder ด้วย bert encoder:

            bert_dur = model . bert ( texts , attention_mask = ( ~ text_mask ). int ()). last_hidden_state
            d_en = model . bert_encoder ( bert_dur ). transpose ( - 1 , - 2 )
            d , _ = model . predictor ( d_en , s , 
                                                    input_lengths , 
                                                    s2s_attn_mono , 
                                                    m )

บรรทัด 257:

            _ , p = model . predictor ( d_en , s , 
                                                    input_lengths , 
                                                    s2s_attn_mono , 
                                                    m )

และบรรทัด 415:

                bert_dur = model . bert ( texts , attention_mask = ( ~ text_mask ). int ()). last_hidden_state
                d_en = model . bert_encoder ( bert_dur ). transpose ( - 1 , - 2 )
                d , p = model . predictor ( d_en , s , 
                                                    input_lengths , 
                                                    s2s_attn_mono , 
                                                    m )

แก้ไขบรรทัด 347 ใน train_second.py ด้วยรหัสต่อไปนี้เพื่อให้แน่ใจว่าพารามิเตอร์ของรุ่น Bert ได้รับการปรับปรุง:

            optimizer . step ( 'bert_encoder' )
            optimizer . step ( 'bert' )

PL-BERT ที่ผ่านการฝึกอบรมมาก่อน Wikipedia เป็นเวลา 1 ม. สามารถดาวน์โหลดได้ที่: PL-BERT LINK

การสาธิตชุดข้อมูล LJSpeech พร้อมกับ repo repo styletts ล่วงหน้าและรุ่นที่ผ่านการฝึกอบรมล่วงหน้าสามารถดาวน์โหลดได้ที่นี่: ลิงค์ Styletts ไฟล์ zip นี้มีการปรับเปลี่ยนรหัสด้านบนโมเดล PL-Bert ที่ผ่านการฝึกอบรมมาแล้วที่ระบุไว้ข้างต้น styletts ที่ผ่านการฝึกอบรมมาก่อน w/ pl-bert, styletts ที่ผ่านการฝึกอบรมก่อน w/ o pl-bert และ hifigan ที่ผ่านการฝึกอบรมมาแล้วบน ljspeech จาก repo Styletts