FMAT下載 - FMAT源代碼下載

FMAT

Ai源碼

1.0.0

下載

FMAT

作者

Han-Wu-Shuang（Bruce）Bao包寒吳霜

？ [email protected]

？ psychbruce.github.io

引用

Bao，H.-W.-S。（2023）。 FMAT：填充面具關聯測試。 https://cran.r-project.org/package=fmat
- 注意：這是原始引用。請參考您安裝版本的APA-7格式library(FMAT)時，請參閱信息。
Bao，H.-W.-S。（2024）。填充面具協會測試（FMAT）：自然語言的命題。人格與社會心理學雜誌，127 （3），537–561。 https://doi.org/10.1037/pspa0000396
Bao，H.-W.-S。，＆Gries，P。（2024）。自然語言中的跨種族 - 性別刻板印象。英國社會心理學雜誌，63 （4），1771– 1786年。 https://doi.org/10.1111/bjso.12748

安裝

要使用FMAT，需要安裝R包FMAT和三個Python軟件包（ transformers ， torch ， huggingface-hub ）。

（1）R包

 # # Method 1: Install from CRAN
install.packages( " FMAT " )

# # Method 2: Install from GitHub
install.packages( " devtools " )
devtools :: install_github( " psychbruce/FMAT " , force = TRUE )

（2）Python環境和包裹

安裝Anaconda（推薦的軟件包管理器，該管理器會自動安裝Python，Spyder等Python ID，以及大量必要的Python軟件包依賴項）。

在Rstudio中指定Anaconda的Python解釋器。

rstudio→工具→全球/項目選項
→Python→選擇→ Conda環境
→選擇“ .../anaconda3/python.exe”

安裝特定版本的Python軟件包“變壓器”，“ Torch”和“ HuggingFace-Hub”。
（rstudio終端 / anaconda提示 / Windows命令）

對於CPU用戶：

 pip install transformers==4.40.2 torch==2.2.1 huggingface-hub==0.20.3

對於GPU（CUDA）用戶：

 pip install transformers==4.40.2 huggingface-hub==0.20.3
pip install torch==2.2.1 --index-url https://download.pytorch.org/whl/cu121

如果您在PC上有NVIDIA GPU設備，並且想使用GPU加速管道，請參見[GPU加速度指南]的安裝指南。
根據2024年5月的發行版，“變形金剛”≥4.41取決於“ HuggingFace-Hub”≥0.23。建議的“變形金剛”（4.40.2）和“ HuggingFace-Hub”（0.20.3）的版本可確保在下載Bert模型時顯示進度欄的控制台，同時使這些軟件包盡可能新。
代理用戶應使用“全局模式”（全局模式）下載模型。
If you see the error HTTPSConnectionPool(host='huggingface.co', port=443) , please try to (1) reinstall Anaconda so that some unknown issues may be fixed or (2) downgrade the "urllib3" package to version ≤ 1.25.11 ( pip install urllib3==1.25.11 ) so that it will use HTTP proxies (rather than HTTPS proxies as in later版本）連接到擁抱的臉。
- https://www.cnblogs.com/devilmaycry812839668/p/17872452.html
- https://zhuanlan.zhihu.com/p/350015032

FMAT指南

步驟1：下載伯特模型

使用BERT_download()下載[BERT模型]。模型文件保存到您的本地文件夾“％userProfile％/。緩存/擁抱面”。 Hugging Face可以使用完整的BERT模型列表。

使用BERT_info()和BERT_vocab()查找BERT模型的詳細信息。

步驟2：設計FMAT查詢

從概念上代表您要測量的構造的設計查詢（有關如何設計查詢，請參見Bao，2024， JPSP ）。

使用FMAT_query()和/或FMAT_query_bind()來準備data.table 。

步驟3：運行FMAT

使用FMAT_run()獲取原始數據（概率估計）進行進一步分析。

該功能已包含了幾個預處理步驟，以便於使用（有關詳細信息，請參見FMAT_run() ）。

對於使用<mask>而不是[MASK]作為掩碼令牌的BERT變體，將自動修改輸入查詢，以便用戶可以始終在查詢設計中使用[MASK] 。
對於某些BERT變體，將自動添加特殊的前綴字符（例如u0120和u2581 ，以匹配[MASK]的整個單詞（而不是子字）。

筆記

進步正在進行中，尤其是為了適應更多樣化（不太受歡迎）的BERT模型。
如果您發現錯誤或使用這些功能遇到問題，請在GitHub問題上報告或給我發送電子郵件。

GPU加速指南

默認情況下， FMAT軟件包使用CPU啟用所有用戶的功能。但是，對於想要使用GPU加速管道的高級用戶， FMAT_run()函數現在支持使用GPU設備，比CPU快3倍。

測試結果（在開發人員的計算機上，取決於BERT模型大小）：

CPU（Intel 13th I7-1355U）：500〜1000查詢/分鐘
GPU（NVIDIA GEFORCE RTX 2050）：1500〜3000查詢/分鐘

清單：

確保您在系統上安裝了NVIDIA GPU設備（例如GeForce RTX系列）和NVIDIA GPU驅動程序。
將Pytorch（Python torch套件）安裝在CUDA支持的情況下。
- 在https://pytorch.org/get-started/locally/上找到安裝命令的指南。
- CUDA僅在Windows和Linux上可用，但在MacOS上不可用。
- 如果您在沒有CUDA支持的情況下安裝了torch的版本，請首先卸載它（命令： pip uninstall torch ），然後安裝建議的一個。
- 您也可以安裝相應的CUDA工具包（例如，對於支持CUDA 12.1的torch版本，也可以安裝相同版本的CUDA工具包）。

示例代碼，用於使用CUDA支持的Pytorch：
（rstudio終端 / anaconda提示 / Windows命令）

 pip install torch==2.2.1 --index-url https://download.pytorch.org/whl/cu121

BERT模型

在我的研究文章中已經建立了以下12個代表性BERT模型的可靠性和有效性，但是需要將來的工作來檢查其他模型的性能。

（擁抱面上的型號名稱 - 下載的型號文件大小）

基於Bert-Base-uncon（420 MB）
Bert-base Casted（416 MB）
Bert-large-uncunge（1283 MB）
Bert-large Casted（1277 MB）
大陶土基於基於基的（256 MB）
大型培訓型（251 MB）
Albert-Base-V1（45 MB）
Albert-Base-V2（45 MB）
羅伯塔基地（476 MB）
Distilroberta-base（316 MB）
Vinai/Bertweet-base（517 MB）
Vinai/Bertweet-large（1356 MB）

如果您是Bert的新手，這些參考可能會有所幫助：

什麼是填充面具？ [huggingface]
可探索的伯特[huggingface]
BERT模型文檔[HuggingFace]
伯特解釋了
打破伯特
插圖伯特
伯特視覺指南

library( FMAT )
models = c(
  " bert-base-uncased " ,
  " bert-base-cased " ,
  " bert-large-uncased " ,
  " bert-large-cased " ,
  " distilbert-base-uncased " ,
  " distilbert-base-cased " ,
  " albert-base-v1 " ,
  " albert-base-v2 " ,
  " roberta-base " ,
  " distilroberta-base " ,
  " vinai/bertweet-base " ,
  " vinai/bertweet-large "
)
BERT_download( models )

 ℹ Device Info:

R Packages:
FMAT          2024.5
reticulate    1.36.1

Python Packages:
transformers  4.40.2
torch         2.2.1+cu121

NVIDIA GPU CUDA Support:
CUDA Enabled: TRUE
CUDA Version: 12.1
GPU (Device): NVIDIA GeForce RTX 2050


── Downloading model "bert-base-uncased" ──────────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 570/570 [00:00<00:00, 114kB/s]
→ (2) Downloading tokenizer...
tokenizer_config.json: 100%|██████████| 48.0/48.0 [00:00<00:00, 23.9kB/s]
vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 1.50MB/s]
tokenizer.json: 100%|██████████| 466k/466k [00:00<00:00, 1.98MB/s]
→ (3) Downloading model...
model.safetensors: 100%|██████████| 440M/440M [00:36<00:00, 12.1MB/s] 
✔ Successfully downloaded model "bert-base-uncased"

── Downloading model "bert-base-cased" ────────────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 570/570 [00:00<00:00, 63.3kB/s]
→ (2) Downloading tokenizer...
tokenizer_config.json: 100%|██████████| 49.0/49.0 [00:00<00:00, 8.66kB/s]
vocab.txt: 100%|██████████| 213k/213k [00:00<00:00, 1.39MB/s]
tokenizer.json: 100%|██████████| 436k/436k [00:00<00:00, 10.1MB/s]
→ (3) Downloading model...
model.safetensors: 100%|██████████| 436M/436M [00:37<00:00, 11.6MB/s] 
✔ Successfully downloaded model "bert-base-cased"

── Downloading model "bert-large-uncased" ─────────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 571/571 [00:00<00:00, 268kB/s]
→ (2) Downloading tokenizer...
tokenizer_config.json: 100%|██████████| 48.0/48.0 [00:00<00:00, 12.0kB/s]
vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 1.50MB/s]
tokenizer.json: 100%|██████████| 466k/466k [00:00<00:00, 1.99MB/s]
→ (3) Downloading model...
model.safetensors: 100%|██████████| 1.34G/1.34G [01:36<00:00, 14.0MB/s]
✔ Successfully downloaded model "bert-large-uncased"

── Downloading model "bert-large-cased" ───────────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 762/762 [00:00<00:00, 125kB/s]
→ (2) Downloading tokenizer...
tokenizer_config.json: 100%|██████████| 49.0/49.0 [00:00<00:00, 12.3kB/s]
vocab.txt: 100%|██████████| 213k/213k [00:00<00:00, 1.41MB/s]
tokenizer.json: 100%|██████████| 436k/436k [00:00<00:00, 5.39MB/s]
→ (3) Downloading model...
model.safetensors: 100%|██████████| 1.34G/1.34G [01:35<00:00, 14.0MB/s]
✔ Successfully downloaded model "bert-large-cased"

── Downloading model "distilbert-base-uncased" ────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 483/483 [00:00<00:00, 161kB/s]
→ (2) Downloading tokenizer...
tokenizer_config.json: 100%|██████████| 48.0/48.0 [00:00<00:00, 9.46kB/s]
vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 16.5MB/s]
tokenizer.json: 100%|██████████| 466k/466k [00:00<00:00, 14.8MB/s]
→ (3) Downloading model...
model.safetensors: 100%|██████████| 268M/268M [00:19<00:00, 13.5MB/s] 
✔ Successfully downloaded model "distilbert-base-uncased"

── Downloading model "distilbert-base-cased" ──────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 465/465 [00:00<00:00, 233kB/s]
→ (2) Downloading tokenizer...
tokenizer_config.json: 100%|██████████| 49.0/49.0 [00:00<00:00, 9.80kB/s]
vocab.txt: 100%|██████████| 213k/213k [00:00<00:00, 1.39MB/s]
tokenizer.json: 100%|██████████| 436k/436k [00:00<00:00, 8.70MB/s]
→ (3) Downloading model...
model.safetensors: 100%|██████████| 263M/263M [00:24<00:00, 10.9MB/s] 
✔ Successfully downloaded model "distilbert-base-cased"

── Downloading model "albert-base-v1" ─────────────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 684/684 [00:00<00:00, 137kB/s]
→ (2) Downloading tokenizer...
tokenizer_config.json: 100%|██████████| 25.0/25.0 [00:00<00:00, 3.57kB/s]
spiece.model: 100%|██████████| 760k/760k [00:00<00:00, 4.93MB/s]
tokenizer.json: 100%|██████████| 1.31M/1.31M [00:00<00:00, 13.4MB/s]
→ (3) Downloading model...
model.safetensors: 100%|██████████| 47.4M/47.4M [00:03<00:00, 13.4MB/s]
✔ Successfully downloaded model "albert-base-v1"

── Downloading model "albert-base-v2" ─────────────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 684/684 [00:00<00:00, 137kB/s]
→ (2) Downloading tokenizer...
tokenizer_config.json: 100%|██████████| 25.0/25.0 [00:00<00:00, 4.17kB/s]
spiece.model: 100%|██████████| 760k/760k [00:00<00:00, 5.10MB/s]
tokenizer.json: 100%|██████████| 1.31M/1.31M [00:00<00:00, 6.93MB/s]
→ (3) Downloading model...
model.safetensors: 100%|██████████| 47.4M/47.4M [00:03<00:00, 13.8MB/s]
✔ Successfully downloaded model "albert-base-v2"

── Downloading model "roberta-base" ───────────────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 481/481 [00:00<00:00, 80.3kB/s]
→ (2) Downloading tokenizer...
tokenizer_config.json: 100%|██████████| 25.0/25.0 [00:00<00:00, 6.25kB/s]
vocab.json: 100%|██████████| 899k/899k [00:00<00:00, 2.72MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 8.22MB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 8.56MB/s]
→ (3) Downloading model...
model.safetensors: 100%|██████████| 499M/499M [00:38<00:00, 12.9MB/s] 
✔ Successfully downloaded model "roberta-base"

── Downloading model "distilroberta-base" ─────────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 480/480 [00:00<00:00, 96.4kB/s]
→ (2) Downloading tokenizer...
tokenizer_config.json: 100%|██████████| 25.0/25.0 [00:00<00:00, 12.0kB/s]
vocab.json: 100%|██████████| 899k/899k [00:00<00:00, 6.59MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 9.46MB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 11.5MB/s]
→ (3) Downloading model...
model.safetensors: 100%|██████████| 331M/331M [00:25<00:00, 13.0MB/s] 
✔ Successfully downloaded model "distilroberta-base"

── Downloading model "vinai/bertweet-base" ────────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 558/558 [00:00<00:00, 187kB/s]
→ (2) Downloading tokenizer...
vocab.txt: 100%|██████████| 843k/843k [00:00<00:00, 7.44MB/s]
bpe.codes: 100%|██████████| 1.08M/1.08M [00:00<00:00, 7.01MB/s]
tokenizer.json: 100%|██████████| 2.91M/2.91M [00:00<00:00, 9.10MB/s]
→ (3) Downloading model...
pytorch_model.bin: 100%|██████████| 543M/543M [00:48<00:00, 11.1MB/s] 
✔ Successfully downloaded model "vinai/bertweet-base"

── Downloading model "vinai/bertweet-large" ───────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 614/614 [00:00<00:00, 120kB/s]
→ (2) Downloading tokenizer...
vocab.json: 100%|██████████| 899k/899k [00:00<00:00, 5.90MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 7.30MB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 8.31MB/s]
→ (3) Downloading model...
pytorch_model.bin: 100%|██████████| 1.42G/1.42G [02:29<00:00, 9.53MB/s]
✔ Successfully downloaded model "vinai/bertweet-large"

── Downloaded models: ──

                           size
albert-base-v1            45 MB
albert-base-v2            45 MB
bert-base-cased          416 MB
bert-base-uncased        420 MB
bert-large-cased        1277 MB
bert-large-uncased      1283 MB
distilbert-base-cased    251 MB
distilbert-base-uncased  256 MB
distilroberta-base       316 MB
roberta-base             476 MB
vinai/bertweet-base      517 MB
vinai/bertweet-large    1356 MB

✔ Downloaded models saved at C:/Users/Bruce/.cache/huggingface/hub (6.52 GB)

BERT_info( models )

                      model   size vocab  dims   mask
                     <fctr> <char> <int> <int> <char>
 1:       bert-base-uncased  420MB 30522   768 [MASK]
 2:         bert-base-cased  416MB 28996   768 [MASK]
 3:      bert-large-uncased 1283MB 30522  1024 [MASK]
 4:        bert-large-cased 1277MB 28996  1024 [MASK]
 5: distilbert-base-uncased  256MB 30522   768 [MASK]
 6:   distilbert-base-cased  251MB 28996   768 [MASK]
 7:          albert-base-v1   45MB 30000   128 [MASK]
 8:          albert-base-v2   45MB 30000   128 [MASK]
 9:            roberta-base  476MB 50265   768 <mask>
10:      distilroberta-base  316MB 50265   768 <mask>
11:     vinai/bertweet-base  517MB 64001   768 <mask>
12:    vinai/bertweet-large 1356MB 50265  1024 <mask>

（在開發人員的計算機上測試了2024-05-16：HP Probook 450 G10筆記本電腦PC）