TensorRT下载TensorRT源代码下载

TensorRT

Python

v2.5.0

下载

火炬吨

对于NVIDIA平台上的任何Pytorch模型，可以轻松实现最佳推理性能。

火炬 - 吨托将Tensorrt的力量带到Pytorch。与仅在一行代码中急切地执行相比，将推断潜伏期高达5倍。

安装

PYPI上发布了稳定版本的火炬tensorrt

pip install torch-tensorrt

火炬tensorrt的夜间版本发表在Pytorch包装索引上

pip install --pre torch-tensorrt --index-url https://download.pytorch.org/whl/nightly/cu124

Torch-Tensorrt还分布在现成的NVIDIA NGC PYTORCH容器中，该容器具有与适当的版本和示例笔记本的所有依赖关系。

有关更高级的安装方法，请参阅此处

Quickstart

选项1：torch.compile

您可以在使用torch.compile的任何地方使用火炬tensorrt：

 import torch
import torch_tensorrt

model = MyModel (). eval (). cuda () # define your model here
x = torch . randn (( 1 , 3 , 224 , 224 )). cuda () # define what the inputs to the model will look like

optimized_model = torch . compile ( model , backend = "tensorrt" )
optimized_model ( x ) # compiled on first run

optimized_model ( x ) # this will be fast!

选项2：导出

如果您想提前优化模型和/或在C ++环境中部署，则Torch-Tensorrt提供了一个导出风格的工作流，该工作流程序列化了优化的模块。该模块可以部署在Pytorch或使用libtorch（即没有python依赖性）中。

步骤1：优化 +序列化

 import torch
import torch_tensorrt

model = MyModel (). eval (). cuda () # define your model here
inputs = [ torch . randn (( 1 , 3 , 224 , 224 )). cuda ()] # define a list of representative inputs here

trt_gm = torch_tensorrt . compile ( model , ir = "dynamo" , inputs = inputs )
torch_tensorrt . save ( trt_gm , "trt.ep" , inputs = inputs ) # PyTorch only supports Python runtime for an ExportedProgram. For C++ deployment, use a TorchScript file
torch_tensorrt . save ( trt_gm , "trt.ts" , output_format = "torchscript" , inputs = inputs )

步骤2：部署

Pytorch部署：

 import torch
import torch_tensorrt

inputs = [ torch . randn (( 1 , 3 , 224 , 224 )). cuda ()] # your inputs go here

# You can run this in a new python session!
model = torch . export . load ( "trt.ep" ). module ()
# model = torch_tensorrt.load("trt.ep").module() # this also works
model ( * inputs )

在C ++中部署：

# include " torch/script.h "
# include " torch_tensorrt/torch_tensorrt.h "

auto trt_mod = torch::jit::load( " trt.ts " );
auto input_tensor = [...]; // fill this with your inputs
auto results = trt_mod.forward({input_tensor});

进一步的资源

使用一行代码最多可快50％稳定扩散推断
通过用火炬tensorrt（即将推出）拥抱脸部来优化LLM
用火炬 - 吨托在FP8中运行模型
解决图形断裂并提高性能的工具[即将推出]
技术谈话（GTC '23）
文档

平台支持

平台	支持
Linux AMD64 / GPU	支持
Windows / GPU	支持（仅发电机）
linux aarch64 / gpu	JetPack-4.4+支持的本机汇编（暂时使用V1.0.0）
linux aarch64 / dla	JetPack-4.4+支持的本机汇编（暂时使用V1.0.0）
Linux PPC64LE / GPU	不支持