ดาวน์โหลด FMAT - ดาวน์โหลดซอร์สโค้ด FMAT

FMAT

โค้ดแหล่งที่มา AI

1.0.0

ดาวน์โหลด

fmat

ผู้เขียน

Han-Wu-Shuang (Bruce) Bao 包寒吴霜

- [email protected]

- psychbruce.github.io

การอ้างอิง

Bao, H.-W.-S. (2023) FMAT: การทดสอบการเชื่อมโยงมาสก์ https://cran.r-project.org/package=fmat
- หมายเหตุ : นี่คือการอ้างอิงดั้งเดิม โปรดดูข้อมูลเมื่อคุณ library(FMAT) สำหรับรูปแบบ APA-7 ของเวอร์ชันที่คุณติดตั้ง
Bao, H.-W.-S. (2024) การทดสอบการเชื่อมโยงการเติม (FMAT): การวัดข้อเสนอในภาษาธรรมชาติ วารสารบุคลิกภาพและจิตวิทยาสังคม, 127 (3), 537–561 https://doi.org/10.1037/pspa0000396
Bao, H.-W.-S. , & Gries, P. (2024) แบบแผนของการแข่งขันแบบแยกส่วน - เพศในภาษาธรรมชาติ วารสารจิตวิทยาสังคมของอังกฤษ, 63 (4), 1771–1786 https://doi.org/10.1111/bjso.12748

การติดตั้ง

ในการใช้ FMAT แพ็คเกจ R FMAT และแพ็คเกจ Python สามชุด ( transformers , torch , huggingface-hub ) ทั้งหมดจำเป็นต้องติดตั้ง

(1) แพ็คเกจ R

 # # Method 1: Install from CRAN
install.packages( " FMAT " )

# # Method 2: Install from GitHub
install.packages( " devtools " )
devtools :: install_github( " psychbruce/FMAT " , force = TRUE )

(2) สภาพแวดล้อมและแพ็คเกจ Python

ติดตั้ง Anaconda (ตัวจัดการแพ็คเกจที่แนะนำซึ่งติดตั้ง Python, Python Ides โดยอัตโนมัติเช่น Spyder และรายการขนาดใหญ่ของการพึ่งพาแพ็คเกจ Python ที่จำเป็น)

ระบุล่าม Python ของ Anaconda ใน rstudio

RSTUDIO →เครื่องมือ→ตัวเลือก Global/Project
→ Python →เลือก→ สภาพแวดล้อม conda
→เลือก ".../anaconda3/python.exe"

ติดตั้งแพ็คเกจ Python รุ่นเฉพาะ "Transformers", "Torch" และ "HuggingFace-Hub"
(คำสั่ง RSTUDIO TERMINAL / Anaconda Prompt / Windows)

สำหรับผู้ใช้ CPU:

 pip install transformers==4.40.2 torch==2.2.1 huggingface-hub==0.20.3

สำหรับผู้ใช้ GPU (CUDA):

 pip install transformers==4.40.2 huggingface-hub==0.20.3
pip install torch==2.2.1 --index-url https://download.pytorch.org/whl/cu121

ดู [คำแนะนำสำหรับการเร่งความเร็ว GPU] สำหรับคำแนะนำการติดตั้งหากคุณมีอุปกรณ์ NVIDIA GPU บนพีซีของคุณและต้องการใช้ GPU เพื่อเร่งความเร็วไปป์ไลน์
ตามการเผยแพร่เดือนพฤษภาคม 2567 "Transformers" ≥ 4.41 ขึ้นอยู่กับ "HuggingFace-Hub" ≥ 0.23 เวอร์ชันที่แนะนำของ "Transformers" (4.40.2) และ "HuggingFace-Hub" (0.20.3) ตรวจสอบให้แน่ใจว่าการแสดงคอนโซลของแถบความคืบหน้าเมื่อดาวน์โหลดรุ่น Bert ในขณะที่เก็บแพ็คเกจเหล่านี้ให้ใหม่ที่สุด
ผู้ใช้พร็อกซีควรใช้ "โหมดทั่วโลก" (全局模式) เพื่อดาวน์โหลดรุ่น
หากคุณเห็นข้อผิดพลาด HTTPSConnectionPool(host='huggingface.co', port=443) pip install urllib3==1.25.11 ลอง (1) ติดตั้ง anaconda ใหม่เพื่อให้ปัญหาที่ไม่รู้จักบางอย่างอาจได้รับการแก้ไขหรือลดระดับ "urllib3" เวอร์ชัน) เพื่อเชื่อมต่อกับใบหน้ากอด
- https://www.cnblogs.com/devilmaycry812839668/p/17872452.html
- https://zhuanlan.zhihu.com/p/350015032

คำแนะนำสำหรับ fmat

ขั้นตอนที่ 1: ดาวน์โหลดรุ่น Bert

ใช้ BERT_download() เพื่อดาวน์โหลด [Bert Models] ไฟล์รุ่นจะถูกบันทึกไว้ในโฟลเดอร์ท้องถิ่นของคุณ "%UserProfile%/. แคช/HuggingFace" รายการเต็มรูปแบบของเบิร์ตมีให้บริการที่ Hugging Face

ใช้ BERT_info() และ BERT_vocab() เพื่อค้นหาข้อมูลรายละเอียดของรุ่น Bert

ขั้นตอนที่ 2: การออกแบบแบบสอบถาม FMAT

แบบสอบถามการออกแบบที่แสดงแนวคิดเกี่ยวกับโครงสร้างที่คุณจะวัด (ดู Bao, 2024, JPSP สำหรับวิธีการออกแบบการสืบค้น)

ใช้ FMAT_query() และ/หรือ FMAT_query_bind() เพื่อเตรียม data.table การสอบถาม

ขั้นตอนที่ 3: เรียกใช้ fmat

ใช้ FMAT_run() เพื่อรับข้อมูลดิบ (ประมาณการความน่าจะเป็น) สำหรับการวิเคราะห์เพิ่มเติม

หลายขั้นตอนของการประมวลผลล่วงหน้าได้รวมอยู่ในฟังก์ชั่นเพื่อการใช้งานที่ง่ายขึ้น (ดูรายละเอียด FMAT_run() )

สำหรับตัวแปร Bert ที่ใช้ <mask> มากกว่า [MASK] เป็นโทเค็นหน้ากากแบบสอบถามอินพุตจะถูกแก้ไข โดยอัตโนมัติ เพื่อให้ผู้ใช้สามารถใช้ [MASK] ในการออกแบบการสืบค้น
สำหรับตัวแปร Bert บางตัวอักขระคำนำหน้าพิเศษเช่น u0120 และ u2581 จะถูกเพิ่ม โดยอัตโนมัติ เพื่อให้ตรงกับคำทั้งหมด (แทนที่จะเป็น subwords) สำหรับ [MASK]

หมายเหตุ

การปรับปรุงยังคงดำเนินต่อไปโดยเฉพาะอย่างยิ่งสำหรับการปรับตัวให้เข้ากับรุ่นเบิร์ตที่หลากหลาย (ได้รับความนิยมน้อยกว่า)
หากคุณพบข้อบกพร่องหรือมีปัญหาในการใช้ฟังก์ชั่นโปรดรายงานที่ปัญหา GitHub หรือส่งอีเมลถึงฉัน

คำแนะนำสำหรับการเร่งความเร็ว GPU

โดยค่าเริ่มต้นแพ็คเกจ FMAT ใช้ CPU เพื่อเปิดใช้งานฟังก์ชันการทำงานสำหรับผู้ใช้ทุกคน แต่สำหรับผู้ใช้ขั้นสูงที่ต้องการเร่งท่อด้วย GPU ตอนนี้ฟังก์ชั่น FMAT_run() รองรับการใช้อุปกรณ์ GPU เร็วกว่า CPU ประมาณ 3 เท่า

ผลการทดสอบ (บนคอมพิวเตอร์ของนักพัฒนาขึ้นอยู่กับขนาดของรุ่น Bert):

CPU (Intel 13th-Gen I7-1355U): 500 ~ 1,000 Queries/นาที
GPU (Nvidia GeForce RTX 2050): 1500 ~ 3000 Queries/Min

รายการตรวจสอบ:

ตรวจสอบให้แน่ใจว่าคุณมีอุปกรณ์ NVIDIA GPU (เช่นซีรี่ส์ GeForce RTX) และไดรเวอร์ NVIDIA GPU ที่ติดตั้งในระบบของคุณ
ติดตั้ง Pytorch (แพ็คเกจ Python torch ) พร้อมรองรับ CUDA
- ค้นหาคำแนะนำสำหรับคำสั่งการติดตั้งที่ https://pytorch.org/get-started/locally/
- CUDA ใช้ได้เฉพาะบน Windows และ Linux แต่ไม่ใช่ใน MacOS
- หากคุณได้ติดตั้ง torch เวอร์ชันโดยไม่ต้องรองรับ CUDA โปรดถอนการติดตั้งก่อน (คำสั่ง: pip uninstall torch ) จากนั้นติดตั้งอันที่แนะนำ
- นอกจากนี้คุณยังสามารถติดตั้ง CUDA Toolkit เวอร์ชันที่สอดคล้องกัน (เช่นสำหรับรุ่น torch ที่รองรับ CUDA 12.1 ซึ่งเป็นรุ่นเดียวกันของ CUDA Toolkit 12.1 อาจติดตั้ง)

ตัวอย่างรหัสสำหรับการติดตั้ง pytorch ด้วยการสนับสนุน CUDA:
(คำสั่ง RSTUDIO TERMINAL / Anaconda Prompt / Windows)

 pip install torch==2.2.1 --index-url https://download.pytorch.org/whl/cu121

รุ่นเบิร์ต

ความน่าเชื่อถือและความถูกต้องของโมเดล Bert ตัวแทน 12 รายการต่อไปนี้ได้รับการจัดตั้งขึ้นในบทความวิจัยของฉัน แต่จำเป็นต้องมีงานในอนาคตเพื่อตรวจสอบประสิทธิภาพของโมเดลอื่น ๆ

(ชื่อรุ่นบน Hugging Face - ขนาดไฟล์รุ่นดาวน์โหลด)

Bert-Base-uncased (420 MB)
Bert-Base-Cased (416 MB)
Bert-Large-uncased (1283 MB)
เบิร์ต-ขนาดใหญ่ (1277 MB)
Distilbert-base-uncased (256 MB)
Distilbert-base-cas (251 MB)
Albert-Base-V1 (45 MB)
Albert-Base-V2 (45 MB)
Roberta-base (476 MB)
distilroberta-base (316 MB)
Vinai/bertweet-base (517 MB)
Vinai/Bertweet-Large (1356 MB)

หากคุณยังใหม่กับ Bert การอ้างอิงเหล่านี้จะมีประโยชน์:

Fill-Mask คืออะไร? [HuggingFace]
เบิร์ตที่สามารถอธิบายได้ [HuggingFace]
เอกสาร Bert Model [HuggingFace]
เบิร์ตอธิบาย
ทำลายเบิร์ตลง
Bert ภาพประกอบ
คู่มือภาพเพื่อเบิร์ต

library( FMAT )
models = c(
  " bert-base-uncased " ,
  " bert-base-cased " ,
  " bert-large-uncased " ,
  " bert-large-cased " ,
  " distilbert-base-uncased " ,
  " distilbert-base-cased " ,
  " albert-base-v1 " ,
  " albert-base-v2 " ,
  " roberta-base " ,
  " distilroberta-base " ,
  " vinai/bertweet-base " ,
  " vinai/bertweet-large "
)
BERT_download( models )

 ℹ Device Info:

R Packages:
FMAT          2024.5
reticulate    1.36.1

Python Packages:
transformers  4.40.2
torch         2.2.1+cu121

NVIDIA GPU CUDA Support:
CUDA Enabled: TRUE
CUDA Version: 12.1
GPU (Device): NVIDIA GeForce RTX 2050


── Downloading model "bert-base-uncased" ──────────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 570/570 [00:00<00:00, 114kB/s]
→ (2) Downloading tokenizer...
tokenizer_config.json: 100%|██████████| 48.0/48.0 [00:00<00:00, 23.9kB/s]
vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 1.50MB/s]
tokenizer.json: 100%|██████████| 466k/466k [00:00<00:00, 1.98MB/s]
→ (3) Downloading model...
model.safetensors: 100%|██████████| 440M/440M [00:36<00:00, 12.1MB/s] 
✔ Successfully downloaded model "bert-base-uncased"

── Downloading model "bert-base-cased" ────────────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 570/570 [00:00<00:00, 63.3kB/s]
→ (2) Downloading tokenizer...
tokenizer_config.json: 100%|██████████| 49.0/49.0 [00:00<00:00, 8.66kB/s]
vocab.txt: 100%|██████████| 213k/213k [00:00<00:00, 1.39MB/s]
tokenizer.json: 100%|██████████| 436k/436k [00:00<00:00, 10.1MB/s]
→ (3) Downloading model...
model.safetensors: 100%|██████████| 436M/436M [00:37<00:00, 11.6MB/s] 
✔ Successfully downloaded model "bert-base-cased"

── Downloading model "bert-large-uncased" ─────────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 571/571 [00:00<00:00, 268kB/s]
→ (2) Downloading tokenizer...
tokenizer_config.json: 100%|██████████| 48.0/48.0 [00:00<00:00, 12.0kB/s]
vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 1.50MB/s]
tokenizer.json: 100%|██████████| 466k/466k [00:00<00:00, 1.99MB/s]
→ (3) Downloading model...
model.safetensors: 100%|██████████| 1.34G/1.34G [01:36<00:00, 14.0MB/s]
✔ Successfully downloaded model "bert-large-uncased"

── Downloading model "bert-large-cased" ───────────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 762/762 [00:00<00:00, 125kB/s]
→ (2) Downloading tokenizer...
tokenizer_config.json: 100%|██████████| 49.0/49.0 [00:00<00:00, 12.3kB/s]
vocab.txt: 100%|██████████| 213k/213k [00:00<00:00, 1.41MB/s]
tokenizer.json: 100%|██████████| 436k/436k [00:00<00:00, 5.39MB/s]
→ (3) Downloading model...
model.safetensors: 100%|██████████| 1.34G/1.34G [01:35<00:00, 14.0MB/s]
✔ Successfully downloaded model "bert-large-cased"

── Downloading model "distilbert-base-uncased" ────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 483/483 [00:00<00:00, 161kB/s]
→ (2) Downloading tokenizer...
tokenizer_config.json: 100%|██████████| 48.0/48.0 [00:00<00:00, 9.46kB/s]
vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 16.5MB/s]
tokenizer.json: 100%|██████████| 466k/466k [00:00<00:00, 14.8MB/s]
→ (3) Downloading model...
model.safetensors: 100%|██████████| 268M/268M [00:19<00:00, 13.5MB/s] 
✔ Successfully downloaded model "distilbert-base-uncased"

── Downloading model "distilbert-base-cased" ──────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 465/465 [00:00<00:00, 233kB/s]
→ (2) Downloading tokenizer...
tokenizer_config.json: 100%|██████████| 49.0/49.0 [00:00<00:00, 9.80kB/s]
vocab.txt: 100%|██████████| 213k/213k [00:00<00:00, 1.39MB/s]
tokenizer.json: 100%|██████████| 436k/436k [00:00<00:00, 8.70MB/s]
→ (3) Downloading model...
model.safetensors: 100%|██████████| 263M/263M [00:24<00:00, 10.9MB/s] 
✔ Successfully downloaded model "distilbert-base-cased"

── Downloading model "albert-base-v1" ─────────────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 684/684 [00:00<00:00, 137kB/s]
→ (2) Downloading tokenizer...
tokenizer_config.json: 100%|██████████| 25.0/25.0 [00:00<00:00, 3.57kB/s]
spiece.model: 100%|██████████| 760k/760k [00:00<00:00, 4.93MB/s]
tokenizer.json: 100%|██████████| 1.31M/1.31M [00:00<00:00, 13.4MB/s]
→ (3) Downloading model...
model.safetensors: 100%|██████████| 47.4M/47.4M [00:03<00:00, 13.4MB/s]
✔ Successfully downloaded model "albert-base-v1"

── Downloading model "albert-base-v2" ─────────────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 684/684 [00:00<00:00, 137kB/s]
→ (2) Downloading tokenizer...
tokenizer_config.json: 100%|██████████| 25.0/25.0 [00:00<00:00, 4.17kB/s]
spiece.model: 100%|██████████| 760k/760k [00:00<00:00, 5.10MB/s]
tokenizer.json: 100%|██████████| 1.31M/1.31M [00:00<00:00, 6.93MB/s]
→ (3) Downloading model...
model.safetensors: 100%|██████████| 47.4M/47.4M [00:03<00:00, 13.8MB/s]
✔ Successfully downloaded model "albert-base-v2"

── Downloading model "roberta-base" ───────────────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 481/481 [00:00<00:00, 80.3kB/s]
→ (2) Downloading tokenizer...
tokenizer_config.json: 100%|██████████| 25.0/25.0 [00:00<00:00, 6.25kB/s]
vocab.json: 100%|██████████| 899k/899k [00:00<00:00, 2.72MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 8.22MB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 8.56MB/s]
→ (3) Downloading model...
model.safetensors: 100%|██████████| 499M/499M [00:38<00:00, 12.9MB/s] 
✔ Successfully downloaded model "roberta-base"

── Downloading model "distilroberta-base" ─────────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 480/480 [00:00<00:00, 96.4kB/s]
→ (2) Downloading tokenizer...
tokenizer_config.json: 100%|██████████| 25.0/25.0 [00:00<00:00, 12.0kB/s]
vocab.json: 100%|██████████| 899k/899k [00:00<00:00, 6.59MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 9.46MB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 11.5MB/s]
→ (3) Downloading model...
model.safetensors: 100%|██████████| 331M/331M [00:25<00:00, 13.0MB/s] 
✔ Successfully downloaded model "distilroberta-base"

── Downloading model "vinai/bertweet-base" ────────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 558/558 [00:00<00:00, 187kB/s]
→ (2) Downloading tokenizer...
vocab.txt: 100%|██████████| 843k/843k [00:00<00:00, 7.44MB/s]
bpe.codes: 100%|██████████| 1.08M/1.08M [00:00<00:00, 7.01MB/s]
tokenizer.json: 100%|██████████| 2.91M/2.91M [00:00<00:00, 9.10MB/s]
→ (3) Downloading model...
pytorch_model.bin: 100%|██████████| 543M/543M [00:48<00:00, 11.1MB/s] 
✔ Successfully downloaded model "vinai/bertweet-base"

── Downloading model "vinai/bertweet-large" ───────────────────────────────────────
→ (1) Downloading configuration...
config.json: 100%|██████████| 614/614 [00:00<00:00, 120kB/s]
→ (2) Downloading tokenizer...
vocab.json: 100%|██████████| 899k/899k [00:00<00:00, 5.90MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 7.30MB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 8.31MB/s]
→ (3) Downloading model...
pytorch_model.bin: 100%|██████████| 1.42G/1.42G [02:29<00:00, 9.53MB/s]
✔ Successfully downloaded model "vinai/bertweet-large"

── Downloaded models: ──

                           size
albert-base-v1            45 MB
albert-base-v2            45 MB
bert-base-cased          416 MB
bert-base-uncased        420 MB
bert-large-cased        1277 MB
bert-large-uncased      1283 MB
distilbert-base-cased    251 MB
distilbert-base-uncased  256 MB
distilroberta-base       316 MB
roberta-base             476 MB
vinai/bertweet-base      517 MB
vinai/bertweet-large    1356 MB

✔ Downloaded models saved at C:/Users/Bruce/.cache/huggingface/hub (6.52 GB)

BERT_info( models )

                      model   size vocab  dims   mask
                     <fctr> <char> <int> <int> <char>
 1:       bert-base-uncased  420MB 30522   768 [MASK]
 2:         bert-base-cased  416MB 28996   768 [MASK]
 3:      bert-large-uncased 1283MB 30522  1024 [MASK]
 4:        bert-large-cased 1277MB 28996  1024 [MASK]
 5: distilbert-base-uncased  256MB 30522   768 [MASK]
 6:   distilbert-base-cased  251MB 28996   768 [MASK]
 7:          albert-base-v1   45MB 30000   128 [MASK]
 8:          albert-base-v2   45MB 30000   128 [MASK]
 9:            roberta-base  476MB 50265   768 <mask>
10:      distilroberta-base  316MB 50265   768 <mask>
11:     vinai/bertweet-base  517MB 64001   768 <mask>
12:    vinai/bertweet-large 1356MB 50265  1024 <mask>

(ทดสอบ 2024-05-16 ในคอมพิวเตอร์ของนักพัฒนา: HP Probook 450 G10 Notebook PC)

แพ็คเกจที่เกี่ยวข้อง

ในขณะที่ FMAT เป็นวิธีการที่เป็นนวัตกรรมสำหรับการวิเคราะห์ อัจฉริยะเชิงคำนวณ ของจิตวิทยาและสังคมคุณอาจหากล่องเครื่องมือบูรณาการสำหรับวิธีการวิเคราะห์ข้อความอื่น ๆ อีกแพ็คเกจ R ที่ฉันพัฒนาขึ้น --- PsychwordVec --- มีประโยชน์และเป็นมิตรกับผู้ใช้สำหรับการวิเคราะห์การฝังคำ (เช่นการทดสอบการฝังความสัมพันธ์ของคำว่า Weat) โปรดดูเอกสารและอย่าลังเลที่จะใช้งาน

ขยาย

ข้อมูลเพิ่มเติม

เวอร์ชัน 1.0.0
ประเภท โค้ดแหล่งที่มา AI
เวลาอัปเดต 2025-09-11
ขนาด 66.54KB
มาจาก Github

แอปที่เกี่ยวข้อง

ML stack

2025-07-01
awesome free chatgpt

2025-01-04
pywin_contextmenu

2025-08-31
promptl

2025-02-17
tick.chat

2025-09-16
FastLoRAChat

2025-09-03

แนะนำสำหรับคุณ

chat.petals.dev

ซอร์สโค้ดอื่น ๆ

1.0.0
GPT Prompt Templates

ซอร์สโค้ดอื่น ๆ

1.0.0
GPTyped

ซอร์สโค้ดอื่น ๆ

GPTyped 1.0.5
ML stack

โค้ดแหล่งที่มา AI

1.0.0
awesome free chatgpt

โค้ดแหล่งที่มา AI

1.0.0
pywin_contextmenu

โค้ดแหล่งที่มา AI

Version update
Google Dorks

ซอร์สโค้ดอื่น ๆ

1.0
shepherd

ซอร์สโค้ดอื่น ๆ

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

ซอร์สโค้ดอื่น ๆ

v1.1.0-rc-3

ข้อมูลที่เกี่ยวข้อง ทั้งหมด