Quick introduction to large language models (theoretical learning and fine-tuning practice)
Please use the requirements.txt file for Python dependency package installation:
pip install -r requirements.txtGenerally, the GPU driver and CUDA versions are required to meet the installed versions of PyTorch and TensorFlow.
Most newly released large language models use the newer PyTorch v2.0+ version, which Pytorch officially believes is 11.8 and matched GPU driver versions. For details, please refer to the CUDA minimum version requested reply provided by Pytorch.
In short, it is recommended to install the current latest CUDA 12.3 version directly. For details, please refer to the official Nvidia installation package.
After the installation is complete, use nvidia-smi directive to view the version:
nvidia-smiFri Mar 1 11:16:55 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 529.08 Driver Version: 529.08 CUDA Version: 12.0 |
| -------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
| ===============================+======================+====================== |
| 0 NVIDIA GeForce ... WDDM | 00000000:01:00.0 Off | N/A |
| N/A 45C P8 6W / 30W | 0MiB / 4096MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
| ============================================================================= |
+-----------------------------------------------------------------------------+In order to use the OpenAI API, you need to get an API key from the OpenAI console. Once you have the key, you can set it as an environment variable:
For Unix-based systems such as Ubuntu or MacOS, you can run the following command in the terminal:
export OPENAI_API_KEY= '你的-api-key 'For Windows, you can use the following command in the command prompt:
set OPENAI_API_KEY=你的-api-keyAbout requirements, you can download it according to the situation
pip install -r requirements.txtDevelopment environment construction includes several parts
Miniconda is a Python environment management tool that can be used to create and manage multiple Python environments. It is a lightweight alternative to Anaconda and does not include any IDE tools. Miniconda can download the installation package from the official website. You can also download it from the mirror website:
# 下载 Miniconda 安装包
$ wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-latest-Linux-x86_64.sh
# 也可以使用curl命令下载
$ curl -O https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-latest-Linux-x86_64.sh
# 安装 Miniconda
$ bash Miniconda3-latest-Linux-x86_64.shDuring the installation process, some questions need to be answered, such as the installation path, whether to add Miniconda to environment variables, etc. After the installation is completed, the terminal needs to be restarted to make the environment variable take effect.
You can use the following command to verify that Miniconda is installed successfully:
$ conda --versionMiniconda configuration files are stored in ~/.condarc. You can modify them manually by referring to the document, or you can use the conda config command to modify them.
# 配置清华镜像
$ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
$ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
$ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
$ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
$ conda config --set show_channel_urls yes
# 查看~/.condarc配置
$ conda config --show-sources
channels:
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
- defaults
show_channel_urls: True # 安装mamba
$ conda install -n base -c conda-forge mamba
# 安装micromamba
$ conda install -n base -c conda-forge micromambaThen you can use the mamba or micromamba command instead of the conda command.
# 创建虚拟环境,指定 Python 版本为 3.11
(base) $ conda create -n transformers python=3.11
# 激活 openai 环境
$ conda activate transformersIf there is no special description below, all of them will be carried out in the newly created openai environment here.
Jupyter Lab is an interactive development environment that can run in a browser. It supports a variety of programming languages, including Python, R, Julia, etc. Jupyter Lab is provided by conda-forge. Please configure the image first and then install it using the following command:
(transformers) $ conda install jupyterlabHugging Face Transformers is a natural language processing toolkit based on PyTorch and TensorFlow, which provides a large number of pre-trained models that can be used to complete a variety of NLP tasks. Hugging Face Transformers can be installed via conda:
(transformers) $ conda install -c huggingface transformersInstallation documentation: Hugging Face Transformers
Transformers need to use tensorflow for actual model reasoning. The following command installs the CPU and GPU versions of tensorflow:
(transformers) $ pip install tensorflowIf you are using a Mac, you can install Metal plug-in for the M1/M2 chip, and you can also try some smaller models:
(transformers) $ pip install tensorflow-metalInstallation documentation:
Transformers need to use pytorch for actual model reasoning. The pytorch and conda-forge image sources used have been configured in the previous step. You can use the following command to install the Pytorch version corresponding to the CUDA version:
# Linux
# CUDA 11.8
(transformers) $ conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c nvidia
# CUDA 12.1
(transformers) $ conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c nvidia
# Mac
(transformers) $ conda install pytorch::pytorch torchvision torchaudioInstallation documentation: pytorch
When processing images, audio and other data, other dependencies need to be used, including:
(transformers) $ conda install tqdm iprogress ffmpeg ffmpeg-python pillowWish you progress in your studies