TensorRT 다운로드 - TensorRT 소스 코드 다운로드

TensorRT

파이썬

v2.5.0

다운로드

토치-텐소르트

NVIDIA 플랫폼의 모든 Pytorch 모델에 대한 최상의 추론 성능을 쉽게 달성 할 수 있습니다.

Torch-Tensorrt는 Tensorrt의 힘을 Pytorch로 가져옵니다. 단 한 줄의 코드에서 열망하는 실행에 비해 추론 대기 시간을 최대 5 배까지 가속합니다.

설치

Torch-Tensorrt의 안정적인 버전은 PYPI에 게시됩니다

pip install torch-tensorrt

Torch-Tensorrt의 야간 버전은 Pytorch 패키지 색인에 게시됩니다.

pip install --pre torch-tensorrt --index-url https://download.pytorch.org/whl/nightly/cu124

Torch-Tensorrt는 또한 적절한 버전 및 예제 노트북이 포함 된 모든 종속성을 갖는 즉시 실행되는 NVIDIA NGC PYTORCH 컨테이너에 배포됩니다.

보다 고급 설치 방법은 여기를 참조하십시오

QuickStart

옵션 1 : Torch.compile

Torch-Tensorrt를 사용할 수 있습니다. torch.compile :

 import torch
import torch_tensorrt

model = MyModel (). eval (). cuda () # define your model here
x = torch . randn (( 1 , 3 , 224 , 224 )). cuda () # define what the inputs to the model will look like

optimized_model = torch . compile ( model , backend = "tensorrt" )
optimized_model ( x ) # compiled on first run

optimized_model ( x ) # this will be fast!

옵션 2 : 내보내기

Torch-Tensorrt는 C ++ 환경에 미리 모델을 최적화하거나 C ++ 환경에 배포하려면 최적화 된 모듈을 직렬화하는 내보내기 스타일 워크 플로를 제공합니다. 이 모듈은 Pytorch 또는 Libtorch (즉, 파이썬 의존성이없는)에 배치 할 수 있습니다.

1 단계 : 최적화 + 직렬화

 import torch
import torch_tensorrt

model = MyModel (). eval (). cuda () # define your model here
inputs = [ torch . randn (( 1 , 3 , 224 , 224 )). cuda ()] # define a list of representative inputs here

trt_gm = torch_tensorrt . compile ( model , ir = "dynamo" , inputs = inputs )
torch_tensorrt . save ( trt_gm , "trt.ep" , inputs = inputs ) # PyTorch only supports Python runtime for an ExportedProgram. For C++ deployment, use a TorchScript file
torch_tensorrt . save ( trt_gm , "trt.ts" , output_format = "torchscript" , inputs = inputs )

2 단계 : 배포

Pytorch의 배포 :

 import torch
import torch_tensorrt

inputs = [ torch . randn (( 1 , 3 , 224 , 224 )). cuda ()] # your inputs go here

# You can run this in a new python session!
model = torch . export . load ( "trt.ep" ). module ()
# model = torch_tensorrt.load("trt.ep").module() # this also works
model ( * inputs )

C ++ 배포 :

# include " torch/script.h "
# include " torch_tensorrt/torch_tensorrt.h "

auto trt_mod = torch::jit::load( " trt.ts " );
auto input_tensor = [...]; // fill this with your inputs
auto results = trt_mod.forward({input_tensor});

추가 리소스

한 줄의 코드로 최대 50% 더 빠른 안정 확산 추론
Torch-Tensorrt로 포옹 얼굴에서 LLM을 최적화 [곧 출시]
Torch-Tensorrt와 함께 FP8에서 모델을 실행하십시오
그래프 브레이크를 해결하고 성능을 향상시키는 도구 [곧 출시]
테크 토크 (GTC '23)
선적 서류 비치

플랫폼 지원

플랫폼	지원하다
Linux AMD64 / GPU	지원
Windows / GPU	지원 (Dynamo 만 해당)
Linux aarch64 / gpu	JetPack-4.4+에서 지원되는 기본 편집 (당분간 v1.0.0 사용)
Linux aarch64 / dla	JetPack-4.4+에서 지원되는 기본 편집 (당분간 v1.0.0 사용)
Linux PPC64LE / GPU	지원되지 않습니다

참고 : Jetpack의 Pytorch 라이브러리 용 Nvidia L4T Pytorch NGC 컨테이너를 참조하십시오.

의존성

테스트 케이스를 검증하는 데 사용되는 다음 종속성입니다. Torch-Tensorrt는 다른 버전과 함께 작동 할 수 있지만 테스트는 통과 할 수 없습니다.

바젤 6.3.2
Libtorch 2.5.0.dev (최신 야간) (Cuda 12.4와 함께 제작)
CUDA 12.4
Tensorrt 10.6.0.26

감가 상각 정책

감가 상징은 개발자에게 일부 API 및 도구가 더 이상 사용하는 것이 권장되지 않음을 알리는 데 사용됩니다. 버전 2.3부터 Torch-Tensorrt는 다음과 같은 감가 상각 정책을 가지고 있습니다.

릴리스 노트에 감가 상각 통지가 전달됩니다. 감가 상각 된 API 함수는 더 이상 사용되지 않은 경우 소스 문서에 명세서가 있습니다. 더 이상 사용되지 않은 방법 및 클래스는 사용되는 경우 런타임에 감가 상각 경고를 발행합니다. Torch-Tensorrt는 감가 상각 후 6 개월 마이그레이션 기간을 제공합니다. API 및 도구는 마이그레이션 기간 동안 계속 작동합니다. 마이그레이션 기간이 종료 된 후, API 및 도구는 시맨틱 버전화와 일치하는 방식으로 제거됩니다.