TensorRT下載TensorRT源代碼下載

TensorRT

Python

v2.5.0

下載

火炬噸

對於NVIDIA平台上的任何Pytorch模型，可以輕鬆實現最佳推理性能。

火炬 - 噸托將Tensorrt的力量帶到Pytorch。與僅在一行代碼中急切地執行相比，將推斷潛伏期高達5倍。

安裝

PYPI上發布了穩定版本的火炬tensorrt

pip install torch-tensorrt

火炬tensorrt的夜間版本發表在Pytorch包裝索引上

pip install --pre torch-tensorrt --index-url https://download.pytorch.org/whl/nightly/cu124

Torch-Tensorrt還分佈在現成的NVIDIA NGC PYTORCH容器中，該容器具有與適當的版本和示例筆記本的所有依賴關係。

有關更高級的安裝方法，請參閱此處

Quickstart

選項1：torch.compile

您可以在使用torch.compile的任何地方使用火炬tensorrt：

 import torch
import torch_tensorrt

model = MyModel (). eval (). cuda () # define your model here
x = torch . randn (( 1 , 3 , 224 , 224 )). cuda () # define what the inputs to the model will look like

optimized_model = torch . compile ( model , backend = "tensorrt" )
optimized_model ( x ) # compiled on first run

optimized_model ( x ) # this will be fast!

選項2：導出

如果您想提前優化模型和/或在C ++環境中部署，則Torch-Tensorrt提供了一個導出風格的工作流，該工作流程序列化了優化的模塊。該模塊可以部署在Pytorch或使用libtorch（即沒有python依賴性）中。

步驟1：優化 +序列化

 import torch
import torch_tensorrt

model = MyModel (). eval (). cuda () # define your model here
inputs = [ torch . randn (( 1 , 3 , 224 , 224 )). cuda ()] # define a list of representative inputs here

trt_gm = torch_tensorrt . compile ( model , ir = "dynamo" , inputs = inputs )
torch_tensorrt . save ( trt_gm , "trt.ep" , inputs = inputs ) # PyTorch only supports Python runtime for an ExportedProgram. For C++ deployment, use a TorchScript file
torch_tensorrt . save ( trt_gm , "trt.ts" , output_format = "torchscript" , inputs = inputs )

步驟2：部署

Pytorch部署：

 import torch
import torch_tensorrt

inputs = [ torch . randn (( 1 , 3 , 224 , 224 )). cuda ()] # your inputs go here

# You can run this in a new python session!
model = torch . export . load ( "trt.ep" ). module ()
# model = torch_tensorrt.load("trt.ep").module() # this also works
model ( * inputs )

在C ++中部署：

# include " torch/script.h "
# include " torch_tensorrt/torch_tensorrt.h "

auto trt_mod = torch::jit::load( " trt.ts " );
auto input_tensor = [...]; // fill this with your inputs
auto results = trt_mod.forward({input_tensor});

進一步的資源

使用一行代碼最多可快50％穩定擴散推斷
通過用火炬tensorrt（即將推出）擁抱臉部來優化LLM
用火炬 - 噸託在FP8中運行模型
解決圖形斷裂並提高性能的工具[即將推出]
技術談話（GTC '23）
文件

平台支持

平台	支持
Linux AMD64 / GPU	支持
Windows / GPU	支持（僅發電機）
linux aarch64 / gpu	JetPack-4.4+支持的本機彙編（暫時使用V1.0.0）
linux aarch64 / dla	JetPack-4.4+支持的本機彙編（暫時使用V1.0.0）
Linux PPC64LE / GPU	不支持