reparam discrete diffusion下載 - reparam discrete diffusion源代碼下載

用於文本生成的重新聚集分散擴散模型

該存儲庫包含紙張的正式實施，用於文本生成的重新聚集離散擴散模型。

依賴性

代碼庫是用FairSeq實施的。要安裝依賴項，請運行（在虛擬環境中推薦）以下命令：

pip install -r requirements.txt

# install our package of discrete diffusion models
pip install -e discrete_diffusion

# install our fork of fairseq
cd fairseq
python3 setup.py build develop
cd ..

請注意，使用Python 3.8.10，Pytorch 1.10.0/1.12.0和CUDA 11.3測試環境。另請注意，我們的FairSeq叉會修改原始代碼庫中的幾個文件；使用FairSeq的最新版本可能會導致意外的依賴衝突。

離散擴散庫的基本用法

我們在一個自包含的庫discrete_diffusion中實現離散擴散模型，以供一般使用。該庫提供了各種典型離散擴散模型的實現，包括

(Vanilla/Reparameterized) multinomial diffusion ：向令牌序列注入uniform噪聲的擴散過程。香草多項式擴散的實現緊隨原始論文的代碼庫非常緊密。
(Vanilla/Reparameterized) absorbing diffusion ：擴散過程，如D3pm紙中所述，序列中令牌中的令牌可以吸收到masking狀態。

單擊以檢查實施詳細信息及其論點嗎？

這些擴散模型共享相同的接口集，允許外部用途。特別是，它們被定義為DiscreteDiffusion類的子類，採用以下形式：

 class DiscreteDiffusion ( nn . Module ):
    """
    The parent class for discrete denoising diffusion probabilistic models.

    It supports the following methods:
    - q_sample()
        Sample x_t ~ q(x_t | x_0) to construct noisy Transformer inputs.
    - compute_losses()
        Compute the loss L_t = KL(q||p) at t-th time step.
    - sample_step()
        Sample x_t ~ p(x_{t-1} | x_t, x_0) at t-th time step.
    """
    
    def __init__ ( self , num_timesteps ):
        super (). __init__ ()
        self . num_timesteps = num_timesteps

    def q_sample ( self , x_0 , t , ** kwargs ):
        """

        Sample from q(x_t | x_0), which is used as the model inputs.

        Args:
            x_0: token ids with shape [B, N]
            t: current time step, tensor with shape [B]

        Returns:
            return a dict of relevant outputs including x_t.
            
        """

    def compute_losses ( self , inputs , ** kwargs ):
        """
        
        Compute the loss objective KL(q||p) to train our generative process.

        Args:
            inputs: a dict that contains input types specific to different diffusion processes, containing
                - x_t: token ids with shape [B, N]
                - t: scalar timesteps, with shape [B]

        Returns:
            possibly return a dict of relevant outputs, including the loss used for training.
            
        """

    def sample_step ( self , decoder_out , denoising_fn , ** kwargs ):
        """
        Given a time step t, start from x_t and sample x_{t-k} from q(x_{t-k} | x_t).
        
        Args:
            decoder_out: a namedtuple that contains decoding info, including
                - x_t: token ids with shape [B, N]
                - t: scalar timesteps
                - max_steps: the maximum number of decoding steps
                - ...
            
            denoising_fn: a function that takes in x_t and t and returns model logits

            kwargs: other arguments that are used to control decoding.
        
        Returns:
            return a new decoder_out namedtuple.
        """

可以通過配置以下內容來實例化DiscreteDiffusion模型：

基本屬性，包括
- --num-diffusion-timesteps <int>指定擴散時間步驟的全數（默認：50）
- --diffusion-type <str>指定擴散模型類型（選擇： {absorbing, multinomial, reparam-absorbing, reparam-multinomial} ）
- --noise-scheduler-type <str>僅在香草/reparam多項式擴散中指定噪聲時間表（典型選擇： {linear, cosine} ;默認： cosine ）
q_sample()中針對遠期採樣例程的重要參數，包括
- --q-sample-mode <str>指定採樣策略（選擇： {default, coupled, multi-step, multi-sample} ;默認值： default ）。我們提供了各種選擇的選擇 $ q（x_t | x_0）$準備損壞的令牌序列以降級，包括
  - default ：將單個樣本繪製為 $ x_t sim q（x_t | x_0）$ ，與以前的實踐相同；
  - multi-step ：示例兩個IID時間步驟 $ s，t $並畫 $ x_s sim q（x_s | x_0）$和 $ x_t sim q（x_t | x_0）$ ，分別。然後我們優化平均 $ frac {1} {2}（ Mathcal {l} _s + Mathcal {l} _T）$降低差異；
  - multi-sample ：樣本兩個IID樣品 $ x_t sim q（x_t | x_0）$和 $ x_t^{'} sim q（x_t | x_0）$在同一步驟中，併計算在這兩個樣本上平均的損失；
  - coupled ：也稱為條件培訓，該培訓在本文的附錄F中詳細介紹。這首先要採樣兩個IID時間步驟 $ s，t $ （認為 $ s＆lt; t $ ）。我們畫 $ x_t sim q（x_t | x_0）$像往常 $ x_s $從分配條件下 $ x_t $作為 $ x_s sim q（x_s | x_t，x_0）$ 。然後我們計算平均 $ frac {1} {2}（ Mathcal {l} _s + Mathcal {l} _T）$作為目標。該策略可以模擬向後的過渡過程，並有助於穩定訓練。在初步實驗期間，我們發現coupled採樣模式為兩種香草多項式/吸收擴散帶來了顯著改善，但是在重新聚集變體中，增益並不是一致的。
- --not-diffusing-special-sym表明是否在擴散過程中包括特殊符號（默認：false）
針對compute_losses()中損失目標計算的重要參數，包括
- --reweighting-type <str>指定我們重新聚集家庭中的重新加權方案（選擇： {linear, reciprocal, none} ;默認值： linear ）
- --label-smoothing <float>指定標籤平滑速率（默認：0.1）
sample_step()中特定於解碼例程的重要參數，包括
- --argmax-decoding指示是否使用Argmax解碼用於Denoed Transformer輸出 $ tilde {x} _0 $ （默認：false）
- --temperature <float>指定溫度 $ tau $用於採樣 $ tilde {x} _0 sim operatatorName {percorical}（f（x_t; theta）/ tau）$如果不使用Argmax解碼方案。（默認：1.0）
- --decoding-strategy <str>指定使用香草（ default ） /重新聚集（ reparam-<options> ;請參閱詳細信息）解碼策略（選擇： {default, reparam-<options>} ; default：default：default：default：default： default ：
- --load-ema-weights表示是否加載了生成的EMA模型權重（默認：false）
- --iter-decode-max-iter <int>指定解碼的最大時間段數（默認：10）
- --iter-decode-with-beam <int>指定樑的大小，用於解碼多個並行長度的多個序列（默認：1）
- --iter-decode-force-max-iter表示迭代解碼必須運行指定的迭代數並且不退出。建議將此標誌設置為true。

請參閱此處以獲取更全面的論點列表。

解碼策略

香草採樣方案

通過傳遞--decoding-strategy default ，使用了香草採樣方案（特定於每個離散擴散過程）。

改進採樣並重新聚集

可以通過傳遞--decoding-strategy reparam-<conditioning-of-v>-<topk_mode>-<schedule>來調用更高級的解碼方法。這種方法基於本文中提出的重新聚集化，並允許更有效的解碼程序。選項通過

<conditioning-of-v> ： uncond或cond （默認uncond ）：是否生成路由變量 $ v_t $以條件或無條件的方式；
<topk_mode> ： stochastic<float>或deterministic （默認deterministic ）：是使用隨機或確定性的top- $ k $選擇。 stochastic<float>中的浮點值指定隨機上$ k $選擇的隨機程度；
<schedule> ： linear或cosine （默認cosine ）： $ k $在我們的DeNoising過程中，該過程用於控制下一個解碼步驟的頂部$ K $代幣數量。

有關選項的更多詳細信息，請參見實現。

機器翻譯

數據預處理

請參閱下面的腳本以獲取詳細信息。

筆記
請注意，這項工作中考慮的所有任務均在原始數據上運行，並且不採用知識蒸餾（KD）。

iwslt14 de-en

我們遵循Fairseq/示例中的標準預處理以準備二進制數據：

 # fetch and preprocess the data to BPE codes
cd examples/translation/
bash prepare-iwslt14.sh
cd ../..

# binarize the data
TEXT=examples/translation/iwslt14.tokenized.de-en
fairseq-preprocess --joined-dictionary --source-lang de --target-lang en 
    --trainpref $TEXT /train --validpref $TEXT /valid --testpref $TEXT /test 
    --destdir data-bin/iwslt14.tokenized.de-en 
    --workers 20

WMT14 en-de

我們使用FairSeq/示例中發布的數據來準備數據集：

wget http://dl.fbaipublicfiles.com/nat/original_dataset.zip
unzip original_dataset.zip
TEXT=wmt14_ende
fairseq-preprocess --joined-dictionary 
    --source-lang en --target-lang de 
    --trainpref $TEXT /train.en-de --validpref $TEXT /valid.en-de --testpref $TEXT /test.en-de 
    --destdir data-bin/wmt14_ende --thresholdtgt 0 --thresholdsrc 0 
    --workers 20

WMT16 ENRO

對於此數據集，我們使用此存儲庫中預處理的原始數據WMT16.GZ。

tar xzvf wmt16.tar.gz

TEXT=wmt16/en-ro

# move train/ dev/ test/ bpe codes into the $TEXT folder
mv $TEXT /train/corpus.bpe.en $TEXT /train.bpe.en
mv $TEXT /train/corpus.bpe.ro $TEXT /train.bpe.ro
mv $TEXT /dev/dev.bpe.en $TEXT /dev.bpe.en
mv $TEXT /dev/dev.bpe.ro $TEXT /dev.bpe.ro
mv $TEXT /test/test.bpe.en $TEXT /test.bpe.en
mv $TEXT /test/test.bpe.ro $TEXT /test.bpe.ro

# binarize the data
fairseq-preprocess --joined-dictionary 
    --source-lang en --target-lang ro 
    --trainpref $TEXT /train.bpe --validpref $TEXT /dev.bpe --testpref $TEXT /test.bpe 
    --destdir data-bin/wmt16_enro --thresholdtgt 0 --thresholdsrc 0 
    --workers 20

訓練

我們首先進入fairseq文件夾，然後運行以下命令來訓練模型。

 # ####### training scripts for IWSLT'14 , WMT'14, and WMT'16 
# first cd to fairseq
# we use 1 GPU for IWSLT'14, 4 GPUs for WMT'14 and 2 GPUs for WMT'16 datasets respectively.
CUDA_VISIBLE_DEVICES=0 bash experiments/mt_train.sh -m absorbing -d < iwslt/wmt14/wmt 16> -s default -e True --store-ema --label-smoothing 0.1
CUDA_VISIBLE_DEVICES=1 bash experiments/mt_train.sh -m multinomial -d < iwslt/wmt14/wmt 16> -s default -e True --not-diffusing-special-sym --store-ema --label-smoothing 0.0
CUDA_VISIBLE_DEVICES=2 bash experiments/mt_train.sh -m reparam-absorbing -d < iwslt/wmt14/wmt 16> -s default -e True --q-sample-mode coupled  --store-ema --label-smoothing 0.1 --reweighting-type linear
CUDA_VISIBLE_DEVICES=3 bash experiments/mt_train.sh -m reparam-multinomial -d < iwslt/wmt14/wmt 16> -s default -e True --not-diffusing-special-sym --q-sample-mode coupled --store-ema --label-smoothing 0.1 --reweighting-type linear

筆記
-s <str>用於指定實驗名稱。
我們可以通過自定義參數，這些論點可能會通過培訓附加在-e True之後附加。

發電與評估

評估管道通過experiments/mt_generate.sh處理。該腳本將生成翻譯結果並評估BLEU分數。

 # ########## IWLS'14, WMT'14, and WMT'16 datasets
# we recommend putting each checkpoint into a separate folder
# since the script will put the decoded results into a file under the same folder of each checkpoint.
CUDA_VISIBLE_DEVICES=0 bash experiments/mt_generate.sh -a false -c < checkpoint_path > -d < iwslt/wmt14/wmt 16>

參數：

-a ：是否平均多個檢查點
-c ：指示檢查點的位置。如果-a false （不是平均檢查點），請通過檢查點路徑；如果-a true ，請傳遞在不同訓練步驟中存儲多個檢查點的目錄以進行平均。
-d ：數據集名稱

訓練有素的模型檢查點

我們還提供了訓練有素的模型的檢查點。

數據集	模型	檢查點鏈接
IWSLT'14	多項式	關聯
IWSLT'14	吸收	關聯
IWSLT'14	reparam-multinomial	關聯
IWSLT'14	Reparam-Absorbing	關聯
WMT'14	多項式	關聯
WMT'14	吸收	關聯
WMT'14	reparam-multinomial	關聯
WMT'14	Reparam-Absorbing	關聯
WMT'16	多項式	關聯
WMT'16	吸收	關聯
WMT'16	reparam-multinomial	關聯
WMT'16	Reparam-Absorbing	關聯

問題產生和釋義任務

我們遵循diffuseq中的實驗設置，以產生問題和釋義任務。

數據預處理

這兩個任務的原始數據可以從原始的divFuseQ存儲庫中獲取。然後，我們通過提供的腳本對數據進行分類。

 # put the raw data in the directory ``diffuseq_data/QG``
# Preprocess the question generation dataset
bash diffusion_mt/scripts/preprocess_diffuseq_datasets.sh QG

# put the raw data in the directory ``diffuseq_data/QQP``
# Preprocess the paraphrasing dataset
bash diffusion_mt/scripts/preprocess_diffuseq_datasets.sh QQP

訓練

 # QQP or QG datasets
# first cd to fairseq
CUDA_VISIBLE_DEVICES=0,1 bash experiments/diffuseq_train.sh -m absorbing -d < qqp/qg > -s default -e True --store-ema --label-smoothing 0.1
CUDA_VISIBLE_DEVICES=2,3 bash experiments/diffuseq_train.sh -m multinomial -d < qqp/qg > -s default -e True      --not-diffusing-special-sym --store-ema --label-smoothing 0.0 
CUDA_VISIBLE_DEVICES=0,1 bash experiments/diffuseq_train.sh -m reparam-multinomial -d < qqp/qg > -s default -e True  --not-diffusing-special-sym  --q-sample-mode coupled --store-ema --label-smoothing 0.1 --reweighting-type linear
CUDA_VISIBLE_DEVICES=2,3 bash experiments/diffuseq_train.sh -m reparam-absorbing -d < qqp/qg > -s default -e True      --q-sample-mode coupled --store-ema --label-smoothing 0.1 --reweighting-type linear

發電與評估

我們與divFuseQ中的生成和評估協議緊密遵循，以確保正面比較。整個管道在fairseq/diffusion_mt/scripts/decode_diffuseq.py和fairseq/diffusion_mt/scripts/eval_diffuseq.py中重新完成，以與FairSeq兼容。運行以下命令：

 # we recommend putting each checkpoint into a separate folder
# since the script will put the decoded results into a file under the same folder of each checkpoint.
CUDA_VISIBLE_DEVICES=0 bash experiments/diffuseq_generate.sh -a false -b true -c < checkpoint_path > -d < qqp/qg >

參數：

-a ：是否平均多個檢查點
-b ：是否要使用多個樣本進行MBR解碼
-c ：指示檢查點的位置。如果-a false （不是平均檢查點），請通過檢查點路徑；如果-a true ，請傳遞在不同訓練步驟中存儲多個檢查點的目錄以進行平均。
-d ：數據集名稱

訓練有素的模型檢查點

我們還提供了訓練有素的模型的檢查點。

數據集	模型	檢查點鏈接
QG	多項式	關聯
QG	吸收	關聯
QG	reparam-multinomial	關聯
QG	Reparam-Absorbing	關聯
QQP	多項式	關聯
QQP	吸收	關聯
QQP	reparam-multinomial	關聯
QQP	Reparam-Absorbing	關聯

引用

 @article { zheng2023rdm ,
  title = { A Reparameterized Discrete Diffusion Model for Text Generation } ,
  author = { Zheng, Lin and Yuan, Jianbo and Yu, Lei and Kong, Lingpeng } ,
  journal = { arXiv preprint arXiv:2302.05737 } ,
  year = { 2023 }
}