tevr asr tool下载-TEVR tevr asr tool源代码下载

tevr asr tool

Ai源码

v1.0.0

下载

TEVR ASR工具

最先进的表现
- 3.64％的普通语音德语
- paperswithcode.com上的排名1
不需要GPU
100％离线
100％私人
100％免费
麻省理工学院许可证
Linux X86_64
命令行工具
易于理解
- 仅284行C ++代码
- 拥抱面上的AI模型

高转录质量

在2022年8月，我们在“普通语音德语（使用额外的培训数据）上的语音识别”上排名第一，单词错误率为3.64％。因此，该工具的性能被认为是德国语音识别当前可能的最好的：

这是如何运作的？

L175-L185加载WAV文件。 L189-L229执行声学AI模型。 L260-L275将预测的令牌logits转换为字符串片段。 L73-L162基于KENLM语言模型实现了梁搜索重新评分。

如果您很好奇声学AI模型是如何工作的以及为什么我这样设计的，这是论文：https：//arxiv.org/abs/2206.12693，这是预先训练的Huggingface Transformers模型：https：//huggingface.co/fxtententacle.co/fxtententacle/wwav2vec2-xls-xls-r-1b-1b-1b-1b-1b-1b-1b-b------b------------------------------b-1b-------------b-tevver

安装Debian/Ubuntu软件包

从github下载tevr_asr_tool-1.0.0-Linux-x86_64.deb ，然后提取多部分zip：

wget " https://github.com/DeutscheKI/tevr-asr-tool/releases/download/v1.0.0/tevr_asr_tool-1.0.0-Linux-x86_64.zip.001 "
wget " https://github.com/DeutscheKI/tevr-asr-tool/releases/download/v1.0.0/tevr_asr_tool-1.0.0-Linux-x86_64.zip.002 "
wget " https://github.com/DeutscheKI/tevr-asr-tool/releases/download/v1.0.0/tevr_asr_tool-1.0.0-Linux-x86_64.zip.003 "
wget " https://github.com/DeutscheKI/tevr-asr-tool/releases/download/v1.0.0/tevr_asr_tool-1.0.0-Linux-x86_64.zip.004 "
wget " https://github.com/DeutscheKI/tevr-asr-tool/releases/download/v1.0.0/tevr_asr_tool-1.0.0-Linux-x86_64.zip.005 "
cat tevr_asr_tool-1.0.0-Linux-x86_64.zip.00 * > tevr_asr_tool-1.0.0-Linux-x86_64.zip
unzip tevr_asr_tool-1.0.0-Linux-x86_64.zip

安装它：

sudo dpkg -i tevr_asr_tool-1.0.0-Linux-x86_64.deb

从源代码安装

下载子模型：

git submodule update --init

CMAKE配置和构建：

cmake -DCMAKE_BUILD_TYPE=MinSizeRel -DCPACK_CMAKE_GENERATOR=Ninja -S . -B build
cmake --build build --target tevr_asr_tool -j 16

创建Debian软件包：

(cd build && cpack -G DEB)

安装它：

sudo dpkg -i build/tevr_asr_tool-1.0.0-Linux-x86_64.deb

用法

tevr_asr_tool --target_file=test_audio.wav 2> log.txt

应显示正确的转录mückenstiche sollte man nicht aufkratzen 。 log.txt将包含执行过程中记录到stderr的诊断和进度。

开发人员的GPU加速

我计划很快为开发人员发布一个Vulkan＆OpenGL加速实时低延迟转录软件。就像此工具一样，它将运行100％的私有 + 100％离线，但是不用处理CPU上的WAV文件，而是通过具有WEBRTC的REST REST API来流传输麦克风输入的实时GPU转录，以便您可以轻松地将其与自己的语音控制项目集成在一起。例如，这将启用可黑客的语音与pynput.keyboard一起键入。

如果您想在启动时获得通知，请通过https://madmimi.com/signups/f0da3b13840d40ce9e061cafea6280d5/join输入您的电子邮件

商业定制

此工具本身也可以免费用于商业用途。当然，它没有任何形式的保证。

但是，如果您想对此工具的定制版本或类似技术的商业用例有一个想法 - 理想情况下，可以帮助德国北部的中小型企业变得更加竞争力 - 请通过[email protected]与我联系。

研究引用

如果您将其用于研究，请引用：

 @misc { https://doi.org/10.48550/arxiv.2206.12693 ,
  doi = { 10.48550/ARXIV.2206.12693 } ,
  url = { https://arxiv.org/abs/2206.12693 } ,
  author = { Krabbenhöft, Hajo Nils and Barth, Erhardt } ,  
  keywords = { Computation and Language (cs.CL), Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Computer and information sciences, FOS: Computer and information sciences, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering, F.2.1; I.2.6; I.2.7 } ,  
  title = { TEVR: Improving Speech Recognition by Token Entropy Variance Reduction } ,  
  publisher = { arXiv } ,  
  year = { 2022 } , 
  copyright = { Creative Commons Attribution 4.0 International }
}