sparseml下载 - sparseml源代码下载

sparseml

其他源码

v1.8.0

下载

Sparseml

用几行代码将稀疏食谱应用于神经网络的库，使更快且较小的型号

概述

SPARSEML是一种开源模型优化工具包，使您可以使用修剪，量化和蒸馏算法创建推理优化的稀疏模型。然后可以将Sparseml优化的模型导出到ONNX，并使用DeepSparse部署，以在CPU硬件上进行GPU级性能。

Sparseml流

新的Sparseml单发LLM压缩

神经魔术很高兴能使用新的SparseGPTModfier预览单发LLM压缩工作流程！

要修剪和量化Tinyllama聊天模型，这只是安装依赖项，下载食谱并将其应用于模型的几个步骤：

 git clone https://github.com/neuralmagic/sparseml
pip install -e "sparseml[transformers]"
wget https://huggingface.co/neuralmagic/TinyLlama-1.1B-Chat-v0.4-pruned50-quant-ds/raw/main/recipe.yaml
sparseml.transformers.text_generation.oneshot --model_name TinyLlama/TinyLlama-1.1B-Chat-v1.0 --dataset_name open_platypus --recipe recipe.yaml --output_dir ./obcq_deployment --precision float16

src/sparseml/transformers/sparsification/obcq的README具有详细的演练。

工作流程

Sparseml使您能够以两种方式在数据集上训练的稀疏模型：

稀疏的传输学习使您可以从Sparsezoo（稀疏模型的开源存储库（例如Bert，Yolov5和Resnet-50）中的开源存储库微调预先放置的模型，同时保持稀疏性。这种途径就像您在训练简历和NLP型号中习惯的典型微调一样工作，并且如果您的模型体系结构在Sparsezoo中可用，则非常喜欢。
从头开始稀疏，您可以将最先进的修剪（例如逐步修剪或修剪）和量化（例如量化意识培训）算法应用于任意的pytorch和拥抱的面部模型。该途径需要更多的实验，但允许您创建任何模型的稀疏版本。

集成

教程

Pytorch

使用CLI稀疏转移学习
使用Python API稀疏转移学习
与Python API从头开始稀疏

拥抱面部变压器

使用Python API稀疏转移学习概述
使用CLI稀疏转移学习概述
稀疏转移学习用于情感分析，用于文本分类，用于代币分类的稀疏转移学习，以回答问题

Ultrytics Yolov5

使用CLI稀疏转移学习
与CLI从头开始稀疏

Ultrytics Yolov8

使用CLI稀疏转移学习

其他例子

Pytorch
拥抱面部变压器
Ultrytics Yolov5
Ultrytics Yolov8

安装

该存储库在Python 3.8-3.11和Linux/Debian系统上进行了测试。

建议在虚拟环境中安装以保持系统的顺序。当前支持的ML框架如下： torch>=1.1.0,<=2.0 ， tensorflow>=1.8.0,<2.0.0 ， tensorflow.keras >= 2.2.0 。

使用PIP安装：

pip install sparseml

有关安装的更多信息，例如可选依赖项和要求，请参见此处。

快速游览

食谱

为了启用灵活性，易用性和可重复性，Sparseml使用称为recipes声明界面来指定应由Sparseml应用的与稀疏性相关算法和超参数。

Recipes是YAML文件格式为modifiers列表，该列表编码Sparseml的指令。示例modifiers可以是从设置学习率到编码逐渐修剪算法的超参数的任何东西。 Sparseml系统将recipes解析为每个框架的天然格式，并将修改应用于模型和训练管道。

Python API

由于采用了陈述性，基于配方的方法，您可以在现有的Pytorch培训管道中添加Sparseml。 ScheduleModifierManager类负责解析YAML recipes和覆盖标准Pytorch模型和优化对象，并从配方中编码稀疏算法的逻辑。调用manager.modify后，您可以像往常一样使用模型和优化器，因为Sparseml抽象了稀疏算法的复杂性。

工作流如下所示：

 model = Model ()            # model definition
optimizer = Optimizer ()    # optimizer definition
train_data = TrainData ()   # train data definition
batch_size = BATCH_SIZE    # training batch size
steps_per_epoch = len ( train_data ) // batch_size

from sparseml . pytorch . optim import ScheduledModifierManager
manager = ScheduledModifierManager . from_yaml ( PATH_TO_RECIPE )
optimizer = manager . modify ( model , optimizer , steps_per_epoch )

# typical PyTorch training loop, using your model/optimizer as usual

manager . finalize ( model )

查看Pytorch集成文档，以获取Python API的完整使用示例。
查看拥抱的脸部集成文档，以获取与拥抱面孔Trainer一起使用Sparseml的详细信息。

Sparseml CLI

除了代码级API外，Sparseml还通过CLI接口为常见的NLP和CV任务提供了预制的培训管道。 CLI使您可以通过各种公用事业（例如数据集加载和预处理，保存检查点保存，度量报告以及为您处理的日志记录）进行启动培训。这使得在通用训练途径中启动和运行变得容易。

例如，我们可以使用以下启动Yolov5稀疏传输学习运行到VOC数据集（使用Sparsezoo Stubs拉下稀疏模型检查点并转移学习食谱）：

sparseml.yolov5.train 
  --weights zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned75_quant-none ? recipe_type=transfer_learn 
  --recipe zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned75_quant-none ? recipe_type=transfer_learn 
  --data VOC.yaml 
  --hyp hyps/hyp.finetune.yaml --cfg yolov5s.yaml --patience 0

Yolov5 CLI
Yolov8 CLI
拥抱脸部CLI
火炬传记

其他资源

有关代码库和包含过程的更多信息，请参见Sparseml文档：

示例和教程
稀疏代码
稀疏食谱
导出到ONNX

资源

了解更多

文档：Sparseml，Sparsezoo，Sparsify，deepsparse
神经魔术：博客，资源

发布历史

官方版本在PYPI上托管

稳定：Sparseml
每晚（DEV）：杂乱无章

此外，可以通过GitHub释放找到更多信息。

执照

该项目是根据Apache许可证2.0版获得许可的。

社区

贡献

我们感谢对代码，示例，集成和文档以及错误报告和功能请求的贡献！了解这里的方式。

加入

对于用户帮助或有关Sparseml的问题，请注册或登录我们的神经魔术社区。我们正在成员成长，很高兴见到您。错误，功能请求或其他问题也可以发布到我们的GitHub问题队列中。

您可以通过订阅神经魔术社区来获得最新的新闻，网络研讨会和活动邀请，研究论文以及其他ML性能。

有关神经魔术的更多一般性问题，请填写此表格。

引用

发现此项目在您的研究或其他沟通中有用吗？请考虑引用：

 @InProceedings {
    pmlr-v119-kurtz20a, 
    title = { Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks } , 
    author = { Kurtz, Mark and Kopinsky, Justin and Gelashvili, Rati and Matveev, Alexander and Carr, John and Goin, Michael and Leiserson, William and Moore, Sage and Nell, Bill and Shavit, Nir and Alistarh, Dan } , 
    booktitle = { Proceedings of the 37th International Conference on Machine Learning } , 
    pages = { 5533--5543 } , 
    year = { 2020 } , 
    editor = { Hal Daumé III and Aarti Singh } , 
    volume = { 119 } , 
    series = { Proceedings of Machine Learning Research } , 
    address = { Virtual } , 
    month = { 13--18 Jul } , 
    publisher = { PMLR } , 
    pdf = { http://proceedings.mlr.press/v119/kurtz20a/kurtz20a.pdf } ,
    url = { http://proceedings.mlr.press/v119/kurtz20a.html } , 
    abstract = {Optimizing convolutional neural networks for fast inference has recently become an extremely active area of research. One of the go-to solutions in this context is weight pruning, which aims to reduce computational and memory footprint by removing large subsets of the connections in a neural network. Surprisingly, much less attention has been given to exploiting sparsity in the activation maps, which tend to be naturally sparse in many settings thanks to the structure of rectified linear (ReLU) activation functions. In this paper, we present an in-depth analysis of methods for maximizing the sparsity of the activations in a trained neural network, and show that, when coupled with an efficient sparse-input convolution algorithm, we can leverage this sparsity for significant performance gains. To induce highly sparse activation maps without accuracy loss, we introduce a new regularization technique, coupled with a new threshold-based sparsification method based on a parameterized activation function called Forced-Activation-Threshold Rectified Linear Unit (FATReLU). We examine the impact of our methods on popular image classification models, showing that most architectures can adapt to significantly sparser activation maps without any accuracy loss. Our second contribution is showing that these these compression gains can be translated into inference speedups: we provide a new algorithm to enable fast convolution operations over networks with sparse activations, and show that it can enable significant speedups for end-to-end inference on a range of popular models on the large-scale ImageNet image classification task on modern Intel CPUs, with little or no retraining cost.} 
}

 @misc {
    singh2020woodfisher,
    title = { WoodFisher: Efficient Second-Order Approximation for Neural Network Compression } , 
    author = { Sidak Pal Singh and Dan Alistarh } ,
    year = { 2020 } ,
    eprint = { 2004.14340 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.LG }
}

展开

附加信息

版本 v1.8.0
类型其他源码
更新时间 2025-04-19
大小 7.95MB
来自于 Github