ดาวน์โหลด DeepFilterNet - ดาวน์โหลดซอร์สโค้ด DeepFilterNet

DeepFilternet

กรอบการเพิ่มประสิทธิภาพการพูดที่ซับซ้อนต่ำสำหรับเสียงเต็มวง (48kHz) โดยใช้การกรองลึก

สำหรับการรวม Pipewire เป็นไมโครโฟนการปราบปรามเสียงเสมือนจริงดูที่นี่

การสาธิต

DeepFilternet-demo-new.mp4

ในการเรียกใช้การสาธิต (Linux เท่านั้น) ใช้:

cargo +nightly run -p df-demo --features ui --bin df-demo --release

ข่าว

DEEPFILTERNET DEMO ใหม่: DeepFilternet: การเพิ่มประสิทธิภาพการพูดแบบเรียลไทม์แบบเรียลไทม์
- กระดาษ: https://arxiv.org/abs/2305.08227
- วิดีโอ: https://youtu.be/eo7n96ywnye
กระดาษกรองหลายเฟรมใหม่: การกรองหลายเฟรมลึกสำหรับเครื่องช่วยฟัง
- กระดาษ: https://arxiv.org/abs/2305.08225
เวอร์ชันเรียลไทม์และปลั๊กอิน ladspa
- ไบนารีที่คอมไพล์ล่วงหน้าไม่มีการพึ่งพา Python การใช้งาน: deep-filter audio-file.wav
- ปลั๊กอิน LADSPA พร้อมการรวมโซ่ฟิลเตอร์ PIPEWIRE เพื่อลดเสียงรบกวนแบบเรียลไทม์บนไมค์ของคุณ
DeepFilternet2 Paper: DeepFilternet2: ไปสู่การเพิ่มประสิทธิภาพการพูดแบบเรียลไทม์บนอุปกรณ์ฝังตัวสำหรับเสียงเต็มวง
- กระดาษ: https://arxiv.org/abs/2205.05474
- ตัวอย่าง: https://rikorose.github.io/deepfilternet2-samples/
- ตัวอย่าง: https://huggingface.co/spaces/hshr/deepfilternet2
กระดาษ DeepFilternet ดั้งเดิม: DeepFilternet: กรอบการเพิ่มประสิทธิภาพการพูดที่ซับซ้อนต่ำสำหรับเสียงเต็มวงดนตรีจากการกรองลึก
- กระดาษ: https://arxiv.org/abs/2110.05588
- ตัวอย่าง: https://rikorose.github.io/deepfilternet-samples/
- ตัวอย่าง: https://huggingface.co/spaces/hshr/deepfilternet
- การบรรยายวิดีโอ: https://youtu.be/it90gbqky6k

การใช้งาน

กรอง

ดาวน์โหลดไบนารีแบบกรองลึกที่คอมไพล์ล่วงหน้าจากหน้าปล่อย คุณสามารถใช้ deep-filter เพื่อระงับเสียงรบกวนในไฟล์เสียงที่มีเสียงดัง. wav ปัจจุบันรองรับไฟล์ WAV ที่มีอัตราการสุ่มตัวอย่าง 48kHz เท่านั้น

USAGE:
    deep-filter [OPTIONS] [FILES]...

ARGS:
    < FILES > ...

OPTIONS:
    -D, --compensate-delay
            Compensate delay of STFT and model lookahead
    -h, --help
            Print help information
    -m, --model < MODEL >
            Path to model tar.gz. Defaults to DeepFilterNet2.
    -o, --out-dir < OUT_DIR >
            [default: out]
    --pf
            Enable postfilter
    -v, --verbose
            Logging verbosity
    -V, --version
            Print version information

หากคุณต้องการใช้แบ็กเอนด์ pytorch เช่นสำหรับการประมวลผล GPU โปรดดูเพิ่มเติมด้านล่างสำหรับการใช้ Python

เฟรมเวิร์ก DeepFilternet

เฟรมเวิร์กนี้รองรับ Linux, MacOS และ Windows การฝึกอบรมได้รับการทดสอบภายใต้ Linux เท่านั้น เฟรมเวิร์กมีโครงสร้างดังนี้:

libDF มีรหัสสนิมที่ใช้สำหรับการโหลดข้อมูลและการเพิ่ม
DeepFilterNet มีการฝึกอบรมรหัส DeepFilternet การประเมินผลและการสร้างภาพรวมถึงน้ำหนักแบบจำลองที่ผ่านการฝึกอบรม
pyDF มีเสื้อคลุม Python ของ Libdf STFT/ลูปประมวลผลการประมวลผล
pyDF-data มี wrapper Python ของฟังก์ชันชุดข้อมูล LIBDF และให้ตัวโหลดข้อมูล pytorch
ladspa มีปลั๊กอิน LADSPA สำหรับการปราบปรามเสียงแบบเรียลไทม์
models จำลองประกอบด้วยการใช้งานในการใช้งานใน DeepFilternet (Python) หรือ libdf/deep-filter (Rust)

DeepFilternet Python: PYPI

ติดตั้งล้อ Python DeepFilternet ผ่าน PIP:

 # Install cpu/cuda pytorch (>=1.9) dependency from pytorch.org, e.g.:
pip install torch torchaudio -f https://download.pytorch.org/whl/cpu/torch_stable.html
# Install DeepFilterNet
pip install deepfilternet
# Or install DeepFilterNet including data loading functionality for training (Linux only)
pip install deepfilternet[train]

เพื่อปรับปรุงไฟล์เสียงที่มีเสียงดังโดยใช้ DeepFilterNet Run

 # Specify an output directory with --output-dir [OUTPUT_DIR]
deepFilter path/to/noisy_audio.wav

การติดตั้งด้วยตนเอง

ติดตั้งสินค้าผ่าน Rustup แนะนำการใช้ conda หรือ virtualenv โปรดอ่านความคิดเห็นและดำเนินการเฉพาะคำสั่งที่คุณต้องการ

การติดตั้ง Python Dependencies และ Libdf:

 cd path/to/DeepFilterNet/  # cd into repository
# Recommended: Install or activate a python env
# Mandatory: Install cpu/cuda pytorch (>=1.8) dependency from pytorch.org, e.g.:
pip install torch torchaudio -f https://download.pytorch.org/whl/cpu/torch_stable.html
# Install build dependencies used to compile libdf and DeepFilterNet python wheels
pip install maturin poetry

#  Install remaining DeepFilterNet python dependencies
# *Option A:* Install DeepFilterNet python wheel globally within your environment. Do this if you want use
# this repos as is, and don't want to develop within this repository.
poetry -C DeepFilterNet install -E train -E eval
# *Option B:* If you want to develop within this repo, install only dependencies and work with the repository version
poetry -C DeepFilterNet install -E train -E eval --no-root
export PYTHONPATH= $PWD /DeepFilterNet # And set the python path correctly

# Build and install libdf python package required for enhance.py
maturin develop --release -m pyDF/Cargo.toml
# *Optional*: Install libdfdata python package with dataset and dataloading functionality for training
# Required build dependency: HDF5 headers (e.g. ubuntu: libhdf5-dev)
maturin develop --release -m pyDF-data/Cargo.toml
# If you have troubles with hdf5 you may try to build and link hdf5 statically:
maturin develop --release --features hdf5-static -m pyDF-data/Cargo.toml

ใช้ DeepFilternet จากบรรทัดคำสั่ง

เพื่อปรับปรุงไฟล์เสียงที่มีเสียงดังโดยใช้ DeepFilterNet Run

$ python DeepFilterNet/df/enhance.py --help
usage: enhance.py [-h] [--model-base-dir MODEL_BASE_DIR] [--pf] [--output-dir OUTPUT_DIR] [--log-level LOG_LEVEL] [--compensate-delay]
                  noisy_audio_files [noisy_audio_files ...]

positional arguments:
  noisy_audio_files     List of noise files to mix with the clean speech file.

optional arguments:
  -h, --help            show this help message and exit
  --model-base-dir MODEL_BASE_DIR, -m MODEL_BASE_DIR
                        Model directory containing checkpoints and config.
                        To load a pretrained model, you may just provide the model name, e.g. ` DeepFilterNet ` .
                        By default, the pretrained DeepFilterNet2 model is loaded.
  --pf                  Post-filter that slightly over-attenuates very noisy sections.
  --output-dir OUTPUT_DIR, -o OUTPUT_DIR
                        Directory in which the enhanced audio files will be stored.
  --log-level LOG_LEVEL
                        Logger verbosity. Can be one of (debug, info, error, none)
  --compensate-delay, -D
                        Add some paddig to compensate the delay introduced by the real-time STFT/ISTFT implementation.

# Enhance audio with original DeepFilterNet
python DeepFilterNet/df/enhance.py -m DeepFilterNet path/to/noisy_audio.wav

# Enhance audio with DeepFilterNet2
python DeepFilterNet/df/enhance.py -m DeepFilterNet2 path/to/noisy_audio.wav

ใช้ DeepFilternet ภายในสคริปต์ Python ของคุณ

 from df import enhance , init_df

model , df_state , _ = init_df ()  # Load default model
enhanced_audio = enhance ( model , df_state , noisy_audio )

ดูที่นี่สำหรับตัวอย่างเต็มรูปแบบ

การฝึกอบรม

จุดเริ่มต้นคือ DeepFilterNet/df/train.py คาดว่าไดเรกทอรีข้อมูลที่มีชุดข้อมูล HDF5 รวมถึงไฟล์การกำหนดค่าชุดข้อมูล JSON

ดังนั้นคุณต้องสร้างชุดข้อมูลของคุณในรูปแบบ HDF5 ก่อน โดยทั่วไปแล้วชุดข้อมูลแต่ละชุดจะมีการฝึกอบรมการตรวจสอบหรือทดสอบชุดเสียงพูดหรือ RIRS เท่านั้น

 # Install additional dependencies for dataset creation
pip install h5py librosa soundfile
# Go to DeepFilterNet python package
cd path / to / DeepFilterNet / DeepFilterNet
# Prepare text file (e.g. called training_set.txt) containing paths to .wav files
#
# usage: prepare_data.py [-h] [--num_workers NUM_WORKERS] [--max_freq MAX_FREQ] [--sr SR] [--dtype DTYPE]
#                        [--codec CODEC] [--mono] [--compression COMPRESSION]
#                        type audio_files hdf5_db
#
# where:
#   type: One of `speech`, `noise`, `rir`
#   audio_files: Text file containing paths to audio files to include in the dataset
#   hdf5_db: Output HDF5 dataset.
python df / scripts / prepare_data . py - - sr 48000 speech training_set . txt TRAIN_SET_SPEECH . hdf5

ชุดข้อมูลทั้งหมดควรมีอยู่ในโฟลเดอร์ชุดข้อมูลเดียวสำหรับสคริปต์รถไฟ

ไฟล์กำหนดค่าชุดข้อมูลควรมี 3 รายการ: "Train", "ถูกต้อง", "ทดสอบ" แต่ละรายการมีรายการชุดข้อมูล (เช่นคำพูดเสียงรบกวนและชุดข้อมูล RIR) คุณสามารถใช้ชุดข้อมูลคำพูดหรือเสียงรบกวนได้หลายชุด ทางเลือกอาจระบุปัจจัยการสุ่มตัวอย่างที่สามารถใช้ในการใช้งานชุดข้อมูลได้มากเกินไป/ต่ำกว่า/ต่ำกว่า สมมติว่าคุณมีชุดข้อมูลเฉพาะที่มีเสียงชั่วคราวและต้องการเพิ่มปริมาณของเสียงที่ไม่คงที่โดยการสุ่มตัวอย่าง ในกรณีส่วนใหญ่คุณต้องการตั้งค่าปัจจัยนี้เป็น 1

ชุดข้อมูลกำหนดค่าตัวอย่าง:

dataset.cfg

{
  "train" : [
    [
      " TRAIN_SET_SPEECH.hdf5 " ,
      1.0
    ],
    [
      " TRAIN_SET_NOISE.hdf5 " ,
      1.0
    ],
    [
      " TRAIN_SET_RIR.hdf5 " ,
      1.0
    ]
  ],
  "valid" : [
    [
      " VALID_SET_SPEECH.hdf5 " ,
      1.0
    ],
    [
      " VALID_SET_NOISE.hdf5 " ,
      1.0
    ],
    [
      " VALID_SET_RIR.hdf5 " ,
      1.0
    ]
  ],
  "test" : [
    [
      " TEST_SET_SPEECH.hdf5 " ,
      1.0
    ],
    [
      " TEST_SET_NOISE.hdf5 " ,
      1.0
    ],
    [
      " TEST_SET_RIR.hdf5 " ,
      1.0
    ]
  ]
}

ในที่สุดเริ่มสคริปต์การฝึกอบรม สคริปต์การฝึกอบรมอาจสร้างโมเดล base_dir หากไม่มีการใช้สำหรับการบันทึกตัวอย่างเสียงบางจุดตรวจสอบแบบจำลองและการกำหนดค่า หากไม่พบไฟล์กำหนดค่ามันจะสร้างการกำหนดค่าเริ่มต้น ดู DeepFilternet/pretrained_models/DeepFilternet สำหรับไฟล์กำหนดค่า

 # usage: train.py [-h] [--debug] data_config_file data_dir base_dir
python df / train . py path / to / dataset . cfg path / to / data_dir / path / to / base_dir /

คู่มือการอ้างอิง

ในการทำซ้ำตัวชี้วัดใด ๆ เราขอแนะนำการใช้งาน Python ผ่าน pip install deepfilternet

หากคุณใช้เฟรมเวิร์กนี้โปรดอ้างอิง: DeepFilternet: กรอบการปรับปรุงคำพูดที่ซับซ้อนต่ำสำหรับเสียงเต็มวงดนตรีตามการกรองลึก

 @inproceedings { schroeter2022deepfilternet ,
  title = { {DeepFilterNet}: A Low Complexity Speech Enhancement Framework for Full-Band Audio based on Deep Filtering } , 
  author = { Schröter, Hendrik and Escalante-B., Alberto N. and Rosenkranz, Tobias and Maier, Andreas } ,
  booktitle = { ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) } ,
  year = { 2022 } ,
  organization = { IEEE }
}

หากคุณใช้โมเดล DeepFilternet2 โปรดอ้างอิง: DeepFilternet2: ไปสู่การเพิ่มประสิทธิภาพการพูดแบบเรียลไทม์บนอุปกรณ์ฝังตัวสำหรับเสียงเต็มวงดนตรี

 @inproceedings { schroeter2022deepfilternet2 ,
  title = { {DeepFilterNet2}: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio } ,
  author = { Schröter, Hendrik and Escalante-B., Alberto N. and Rosenkranz, Tobias and Maier, Andreas } ,
  booktitle = { 17th International Workshop on Acoustic Signal Enhancement (IWAENC 2022) } ,
  year = { 2022 } ,
}

หากคุณใช้โมเดล DeepFilternet3 โปรดอ้างอิง: DeepFilternet: การเพิ่มประสิทธิภาพการพูดแบบเรียลไทม์ที่ได้รับแรงบันดาลใจจากการรับรู้

 @inproceedings { schroeter2023deepfilternet3 ,
  title = { {DeepFilterNet}: Perceptually Motivated Real-Time Speech Enhancement } ,
  author = { Schröter, Hendrik and Rosenkranz, Tobias and Escalante-B., Alberto N. and Maier, Andreas } ,
  booktitle = { INTERSPEECH } ,
  year = { 2023 } ,
}

หากคุณใช้อัลกอริธึมการแสดง Beamforming หลายเฟรม โปรดอ้างอิง การกรองหลายเฟรมแบบลึกสำหรับเครื่องช่วยฟัง

 @inproceedings { schroeter2023deep_mf ,
  title = { Deep Multi-Frame Filtering for Hearing Aids } ,
  author = { Schröter, Hendrik and Rosenkranz, Tobias and Escalante-B., Alberto N. and Maier, Andreas } ,
  booktitle = { INTERSPEECH } ,
  year = { 2023 } ,
}

ใบอนุญาต

DeepFilternet ฟรีและโอเพ่นซอร์ส! รหัสทั้งหมดในที่เก็บนี้ได้รับใบอนุญาตคู่ภายใต้:

ใบอนุญาต MIT (ใบอนุญาต-MIT หรือ http://opensource.org/licenses/mit)
ใบอนุญาต Apache, เวอร์ชัน 2.0 (License-apache หรือ http://www.apache.org/licenses/license-2.0)

ตามตัวเลือกของคุณ ซึ่งหมายความว่าคุณสามารถเลือกใบอนุญาตที่คุณต้องการ!

เว้นแต่คุณจะระบุอย่างชัดเจนเป็นอย่างอื่นการมีส่วนร่วมใด ๆ ที่ส่งโดยเจตนาเพื่อรวมไว้ในงานโดยคุณตามที่กำหนดไว้ในใบอนุญาต Apache-2.0 จะได้รับใบอนุญาตคู่ดังกล่าวข้างต้นโดยไม่มีข้อกำหนดหรือเงื่อนไขเพิ่มเติมใด ๆ

ขยาย