PerceiverIO_Pytorch PerceiverIO

PerceiverIO_Pytorch

AI Исходный код

1.0.0

Скачать

PERCEIVERIO PYTORCH

Адаптация модели восприятия DeepMind (https://arxiv.org/abs/2103.03206) к Pytorch. Оригинальный код JAX/Haiku можно найти здесь: https://github.com/deepmind/deepmind-research/tree/master/perceiver

Установка

Клонировать репозиторий:

git clone https://github.com/JOBR0/PerceiverIO_Pytorch
cd PerceiverIO_Pytorch

Создайте виртуальную среду и активируйте ее:

python3 -m venv perceiverEnv
source perceiverEnv/bin/activate

Установите Pytorch после официальных инструкций: https://pytorch.org/get-started/locally/
Установите другие необходимые пакеты из требований .txt:

pip3 install -r requirements.txt

Примеры

Реализация охватывает следующие примеры задач, для которых доступны предварительные модели:

Моделирование языка в масках (example_language.py)
Классификация изображений (example_img_classify.py)
Многомодальное видео-кодирование (example_multimodal.py)
Оценка оптического потока (example_opt_flow.py)

Предварительные модели

Контрольные точки Haiku из официального репозитория DeepMind были преобразованы в контрольно-пропускные пункты Pytorch и могут быть загружены с Google-Drive. Контрольные точки Pytorch должны быть размещены в папке «pytorch_checkpoints», чтобы примерный код мог их найти.

Использование

Для создания нового Preceiverio для пользовательской задачи используется класс восприятия в воспринимательстве.

 class PerceiverIO ( nn . Module ):
    """The Perceiver: a scalable, fully attentional architecture.
    Args:
        num_blocks (int): Number of times the block is applied with shared weights. Default: 8
        num_self_attends_per_block (int): Number of self-attentions in the block. Default: 6,
        num_latents: (int): Number of latent vectors. Default 512,
        num_latent_channels (int): Number of channels for the latent vectors. Default: 1024,
        final_project (bool): Whether to apply a linear layer to the outputs before the post-processors. Default: True,
        final_project_out_channels (int): Number of output channels for the final projection layer. Default: None,
        perceiver_encoder_kwargs (Dict): Additional arguments for the perceiver encoder class. Default: {},
        perceiver_decoder_kwargs (Dict): Additional arguments for the perceiver decoder class. Default: {},
        input_preprocessors (dict / nn.Module): Optional input preprocessors. 1 or none for each modality. Default: None,
        output_postprocessors (dict / nn.Module): Optional output postprocessors. 1 or none for each modality. Default: None,
        output_queries (dict / nn.Module): Modules that create the output queries. 1 for each modality. Default: None,
        output_query_padding_channels (int): Number of learnable features channels that are added to the output queries. Default: 0,
        input_padding_channels (int): Number of learnable features channels that are added to the preprocessed inputs. Default: 0,
        input_channels (dict, int): = The number of input channels need to be specified if NO preprocessor is used. Otherwise,
                                    the number will be inferred from the preprocessor. Default: None,
        input_mask_probs (dict): Probability with which each input modality will be masked out. Default None,
    """

Ниже приведена диаграмма восприятия для мультимодального применения:

Входные препроцессоры (необязательно)

Входные препроцессоры принимают необработанные входные данные и предварительно обработают их так, чтобы их можно было запрашивать первым перекрестным атмосфером. Это может быть, например, что -то вроде создания патчей с изображения. Обычно позиционные кодирования включаются препроцессором. Вместо использования препроцессора входные данные также могут обрабатываться вручную.

Несколько input_preprocessors можно найти в perceiver_io/io_processors/preprocessors.py

Вывод постпроцессоров (необязательно)

Выходные постпроцессоры принимают окончательный результат воспринимающего и обрабатывают его, чтобы получить желаемый выходной формат.

Несколько output_postprocessors можно найти в perceiver_io/io_processors/postprocessors.py

Выходные запросы

Запросы на Outpult Создают функции, которые используются для запроса окончательного скрытого представления воспринимающего, чтобы создать выход. Они получают предварительно обработанный ввод в качестве аргумента, чтобы они могли использовать его при желании. Они также обычно включают позиционные кодировки.

Несколько output_queries можно найти в perceiver_io/output_queries.py ### Несколько модальностей для обработки нескольких модальностей одновременно, словарь с сопоставлением от модальности в модуль может использоваться для input_preprocessors, output_postprocessors и output_queries (см. Preceiver_io/multimodal_perceiver.py). Чтобы сделать различные входы совместимыми друг с другом, они пролажены одним и тем же размером канала с обучаемыми параметрами. Также можно использовать другое количество выходных запросов, чем приведены входные данные.

Цитаты

 @misc { jaegle2021perceiver ,
    title   = { Perceiver IO: A General Architecture for Structured Inputs & Outputs } ,
    author  = { Andrew Jaegle and Sebastian Borgeaud and Jean-Baptiste Alayrac and Carl Doersch and Catalin Ionescu and David Ding and Skanda Koppula and Andrew Brock and Evan Shelhamer and Olivier Hénaff and Matthew M. Botvinick and Andrew Zisserman and Oriol Vinyals and João Carreira } ,
    year    = { 2021 } ,
    eprint  = { 2107.14795 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.LG }
}