xla下载 - xla源代码下载

xla

其他源码

1.0.0

下载

Pytorch/XLA

当前的CI状态：

Pytorch/XLA是一个Python软件包，它使用XLA深度学习编译器连接Pytorch深度学习框架和云TPU。您可以立即在带有Kaggle的单个云TPU VM上免费尝试！

看一下我们的Kaggle笔记本之一开始：

使用Pytorch/XLA 2.0稳定扩散
分布式Pytorch/XLA基础知识

安装

TPU

要在新的TPU VM中安装Pytorch/XLA稳定构建：

 pip install torch~=2.5.0 torch_xla[tpu]~=2.5.0 -f https://storage.googleapis.com/libtpu-releases/index.html

在新的TPU VM中安装Pytorch/XLA每晚构建：

 pip3 install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cpu
pip install 'torch_xla[tpu] @ https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.6.0.dev-cp310-cp310-linux_x86_64.whl' -f https://storage.googleapis.com/libtpu-releases/index.html

GPU插件

Pytorch/XLA现在通过类似于libtpu的插件包提供GPU支持：

 pip install torch~=2.5.0 torch_xla~=2.5.0 https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla_cuda_plugin-2.5.0-py3-none-any.whl

入门

要更新您现有的培训循环，请进行以下更改：

 - import torch.multiprocessing as mp
+ import torch_xla as xla
+ import torch_xla.core.xla_model as xm

 def _mp_fn(index):
   ...

+  # Move the model paramters to your XLA device
+  model.to(xla.device())

   for inputs, labels in train_loader:
+    with xla.step():
+      # Transfer data to the XLA device. This happens asynchronously.
+      inputs, labels = inputs.to(xla.device()), labels.to(xla.device())
       optimizer.zero_grad()
       outputs = model(inputs)
       loss = loss_fn(outputs, labels)
       loss.backward()
-      optimizer.step()
+      # `xm.optimizer_step` combines gradients across replicas
+      xm.optimizer_step(optimizer)

 if __name__ == '__main__':
-  mp.spawn(_mp_fn, args=(), nprocs=world_size)
+  # xla.launch automatically selects the correct world size
+  xla.launch(_mp_fn, args=())

如果您使用的是DistributedDataParallel ，请进行以下更改：

 import torch.distributed as dist
- import torch.multiprocessing as mp
+ import torch_xla as xla
+ import torch_xla.distributed.xla_backend

 def _mp_fn(rank):
   ...

-  os.environ['MASTER_ADDR'] = 'localhost'
-  os.environ['MASTER_PORT'] = '12355'
-  dist.init_process_group("gloo", rank=rank, world_size=world_size)
+  # Rank and world size are inferred from the XLA device runtime
+  dist.init_process_group("xla", init_method='xla://')
+
+  model.to(xm.xla_device())
+  # `gradient_as_bucket_view=True` required for XLA
+  ddp_model = DDP(model, gradient_as_bucket_view=True)

-  model = model.to(rank)
-  ddp_model = DDP(model, device_ids=[rank])

   for inputs, labels in train_loader:
+    with xla.step():
+      inputs, labels = inputs.to(xla.device()), labels.to(xla.device())
       optimizer.zero_grad()
       outputs = ddp_model(inputs)
       loss = loss_fn(outputs, labels)
       loss.backward()
       optimizer.step()

 if __name__ == '__main__':
-  mp.spawn(_mp_fn, args=(), nprocs=world_size)
+  xla.launch(_mp_fn, args=())

有关Pytorch/XLA的其他信息，包括其语义和功能的描述，可在pytorch.org上获得。在编写XLA设备（TPU，CUDA，CPU和...）上运行的网络时，请参见“最佳实践指南”。

我们的全面用户指南可用：

Pytorch/XLA教程

云TPU VM QuickStart
云TPU POD SLICE快速启动
在TPU VM上进行分析
GPU指南

可用的码头图像和车轮

Python包

Pytorch/XLA从版本R2.1开始的版本将在PYPI上提供。现在，您可以使用pip install torch_xla安装主构建。要安装与您tpu安装的torch_xla相对应的Cloud TPU插件

 pip install torch_xla[tpu] -f https://storage.googleapis.com/libtpu-releases/index.html

我们的公共GCS存储桶中有GPU和夜间构建。

版本	云GPU VM车轮
2.5（Cuda 12.1 + Python 3.9）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp39-cp39-manylinux_2_28_x86_64.whl`
2.5（Cuda 12.1 + Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.5（Cuda 12.1 + Python 3.11）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp311-cp311-manylinux_2_28_x86_64.whl`
2.5（Cuda 12.4 + Python 3.9）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.4/torch_xla-2.5.0-cp39-cp39-manylinux_2_28_x86_64.whl`
2.5（Cuda 12.4 + Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.4/torch_xla-2.5.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.5（Cuda 12.4 + Python 3.11）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.4/torch_xla-2.5.0-cp311-cp311-manylinux_2_28_x86_64.whl`
每晚（Python 3.8）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.6.0.dev-cp38-cp38-linux_x86_64.whl`
每晚（Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.6.0.dev-cp310-cp310-linux_x86_64.whl`
每晚（CUDA 12.1 + Python 3.8）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.6.0.dev-cp38-cp38-linux_x86_64.whl`

在08/13/2024之前使用夜间构建

您也可以在“ Torch_xla-nightly”之后添加“+Yyyymmdd”，以获取指定日期的夜间轮子。这是一个示例：

 pip3 install torch==2.6.0.dev20240925+cpu --index-url https://download.pytorch.org/whl/nightly/cpu
pip3 install https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-nightly%2B20240925-cp310-cp310-linux_x86_64.whl

可以在https://download.pytorch.org/whl/nightly/torch/上找到火炬轮版2.6.0.dev20240925+cpu 。

在08/20/2024之后使用夜间构建

您也可以在torch_xla-2.6.0.dev之后添加yyyymmdd以获取指定日期的夜间轮。这是一个示例：

 pip3 install torch==2.5.0.dev20240820+cpu --index-url https://download.pytorch.org/whl/nightly/cpu
pip3 install https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.5.0.dev20240820-cp310-cp310-linux_x86_64.whl

可以在https://download.pytorch.org/whl/nightly/torch/上找到火炬轮版2.6.0.dev20240925+cpu 。

较旧的版本

版本	云TPU VMS轮
2.4（Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.4.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.3（Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.3.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.2（Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.2.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.1（XRT + Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/xrt/tpuvm/torch_xla-2.1.0%2Bxrt-cp310-cp310-manylinux_2_28_x86_64.whl`
2.1（Python 3.8）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.1.0-cp38-cp38-linux_x86_64.whl`

版本	GPU轮
2.5（Cuda 12.1 + Python 3.9）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp39-cp39-manylinux_2_28_x86_64.whl`
2.5（Cuda 12.1 + Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.5（Cuda 12.1 + Python 3.11）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp311-cp311-manylinux_2_28_x86_64.whl`
2.5（Cuda 12.4 + Python 3.9）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.4/torch_xla-2.5.0-cp39-cp39-manylinux_2_28_x86_64.whl`
2.5（Cuda 12.4 + Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.4/torch_xla-2.5.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.5（Cuda 12.4 + Python 3.11）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.4/torch_xla-2.5.0-cp311-cp311-manylinux_2_28_x86_64.whl`
2.4（Cuda 12.1 + Python 3.9）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.4.0-cp39-cp39-manylinux_2_28_x86_64.whl`
2.4（Cuda 12.1 + Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.4.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.4（Cuda 12.1 + Python 3.11）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.4.0-cp311-cp311-manylinux_2_28_x86_64.whl`
2.3（Cuda 12.1 + Python 3.8）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.3.0-cp38-cp38-manylinux_2_28_x86_64.whl`
2.3（Cuda 12.1 + Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.3.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.3（Cuda 12.1 + Python 3.11）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.3.0-cp311-cp311-manylinux_2_28_x86_64.whl`
2.2（Cuda 12.1 + Python 3.8）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.2.0-cp38-cp38-manylinux_2_28_x86_64.whl`
2.2（Cuda 12.1 + Python 3.10）	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.2.0-cp310-cp310-manylinux_2_28_x86_64.whl`
2.1 + CUDA 11.8	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/11.8/torch_xla-2.1.0-cp38-cp38-manylinux_2_28_x86_64.whl`
夜间 + cuda 12.0> = 2023/06/27	`https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.0/torch_xla-nightly-cp38-cp38-linux_x86_64.whl`

Docker

版本	云TPU VMS码头
2.5	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.5.0_3.10_tpuvm`
2.4	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.4.0_3.10_tpuvm`
2.3	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.3.0_3.10_tpuvm`
2.2	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.2.0_3.10_tpuvm`
2.1	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.1.0_3.10_tpuvm`
夜间python	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.10_tpuvm`

要使用上述码头机，请通过--privileged --net host --shm-size=16G 。这是一个示例：

docker run --privileged --net host --shm-size=16G -it us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.10_tpuvm /bin/bash

版本	GPU CUDA 12.4 Docker
2.5	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.5.0_3.10_cuda_12.4`
2.4	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.4.0_3.10_cuda_12.4`

版本	GPU CUDA 12.1 DOCKER
2.5	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.5.0_3.10_cuda_12.1`
2.4	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.4.0_3.10_cuda_12.1`
2.3	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.3.0_3.10_cuda_12.1`
2.2	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.2.0_3.10_cuda_12.1`
2.1	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.1.0_3.10_cuda_12.1`
每晚	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_12.1`
每晚约会	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_12.1_YYYYMMDD`

版本	GPU CUDA 11.8 + DOCKER
2.1	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.1.0_3.10_cuda_11.8`
2.0	`us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.0_3.8_cuda_11.8`

使用GPU在计算实例上运行。

故障排除

如果Pytorch/XLA未按预期执行，请参见“故障排除指南”，该指南具有调试和优化网络的建议。

提供反馈

Pytorch/XLA团队总是很高兴收到用户和OSS贡献者的来信！最好的方法是在此Github上提出问题。欢迎问题，错误报告，功能请求，构建问题等！

贡献

请参阅贡献指南。

免责声明

该存储库由Google，Meta以及贡献者文件中列出的许多个人贡献者共同维护和维护。有关针对Meta的问题，请发送电子邮件至[email protected]。有关针对Google的问题，请发送电子邮件至[email protected]。对于所有其他问题，请在此处在此存储库中打开一个问题。

其他读数

您可以在

云TPU VM上的性能调试
懒张量介绍
使用Pytorch / XLA和Cloud TPU VM缩放深度学习工作负载
使用FSDP在云TPU上缩放Pytorch模型

xla

Pytorch/XLA

安装

TPU

GPU插件

入门

Pytorch/XLA教程

可用的码头图像和车轮

Python包

在08/20/2024之后使用夜间构建

Docker

故障排除

提供反馈

贡献

免责声明

其他读数

相关项目

Google Dorks

shepherd

hidusbf

mongo express

Free Algorithms Books

markdownpedia

chat.petals.dev

GPT Prompt Templates

GPTyped

Google Dorks

shepherd

hidusbf

Google Dorks

shepherd

hidusbf