ดาวน์โหลด scpscraper - scpscraper ซอร์สโค้ดดาวน์โหลด

scpscraper

โค้ดแหล่งที่มา AI

v1.0.1)

ดาวน์โหลด

SCP มีดโกน

ไลบรารี Python ขนาดเล็กที่ออกแบบมาสำหรับการขูดข้อมูลจาก SCP Wiki ทำด้วยการฝึกอบรม AI (คือโมเดล NLP) และชุดข้อมูล (สำหรับสิ่งต่าง ๆ เช่นการจัดหมวดหมู่ของ SCPs สำหรับโครงการภายนอก) ในใจและมีข้อโต้แย้งเพื่อให้ใช้งานง่ายในแอปพลิเคชันเหล่านั้น

ด้านล่างคุณจะพบคำแนะนำในการติดตั้งตัวอย่างวิธีการใช้ห้องสมุดนี้และวิธีการที่คุณสามารถใช้ประโยชน์ได้ ฉันหวังว่าคุณจะพบว่าสิ่งนี้มีประโยชน์อย่างที่ฉันมี!

รหัสตัวอย่าง

การติดตั้ง

scpscraper สามารถติดตั้งผ่าน pip install นี่คือคำสั่งที่ฉันแนะนำโดยใช้ดังนั้นคุณจึงมีเวอร์ชันล่าสุดอย่างสม่ำเสมอ

 pip3 install --upgrade scpscraper

พื้นฐาน

การนำเข้าห้องสมุด

 # Before we begin, we obviously have to import scpscraper.
import scpscraper

คว้าชื่อ SCP

 # Let's use 3001 (Red Reality) as an example.
name = scpscraper . get_scp_name ( 3001 )

print ( name ) # Outputs "Red Reality"

คว้ารายละเอียดให้ได้มากที่สุดเกี่ยวกับ SCP

 # Again using 3001 as an example
info = scpscraper . get_scp ( 3001 )

print ( info ) # Outputs a dictionary with the
# name, object id, rating, page content by section, etc.

สิ่งที่สนุก

การคว้า `page-content` ของ SCP ของ SCP

สำหรับการอ้างอิง div page-content มีสิ่งที่ผู้ใช้เขียนจริงโดยไม่มีสิ่งภายนอกวิกิดอทพิเศษทั้งหมด

 # Once again, 3001 is the example
scp = scpscraper . get_single_scp ( 3001 )

# Grab the page-content div specifically
content = scp . find_all ( 'div' , id = 'page-content' )

print ( content ) # Outputs "<div id="page-content"> ... </div>"

การขูด HTML หรือข้อมูลจาก SCP หลายตัว

 # Grab info on SCPs 000-099
scpscraper . scrape_scps ( 0 , 100 )

# Same as above, but only grabbing Keter-class SCPs
scpscraper . scrape_scps ( 0 , 100 , tags = [ 'keter' ])

# Grab 000-099 in a format that can be used to train AI
scpscraper . scrape_scps ( 0 , 100 , ai_dataset = True )

 # Scrape the page-content div's HTML from SCP-000 to SCP-099

# Only including this as an example, but scrape_scps_html() has
# all the same options as scrape_scps().
scpscraper . scrape_scps_html ( 0 , 100 )

การใช้งาน Google colaboratory เท่านั้น

เนื่องจากโมดูล google.colab ที่รวมอยู่ใน Google Colaboratory เราจึงสามารถทำสิ่งพิเศษบางอย่างที่เราไม่สามารถทำได้

ติดตั้ง Google ไดรฟ์ของคุณไปที่ VM Colaboratory

 # Mounts it to the directory /content/drive/
scpscraper . gdrive . mount ()

Scrape SCP Info/HTML และคัดลอกไปยัง Google Drive ของคุณหลังจากนั้น

 # Requires your Google Drive to be mounted at the directory /content/drive/
scpscraper . scrape_scps ( 0 , 100 , copy_to_drive = True )

scpscraper . scrape_scps_html ( 0 , 100 , copy_to_drive = True )

คัดลอกไฟล์อื่น ๆ ไปที่/จาก Google Drive ของคุณ

 # Requires your Google Drive to be mounted at the directory /content/drive/
scpscraper . gdrive . copy_to_drive ( 'example.txt' )

scpscraper . gdrive . copy_from_drive ( 'example.txt' )

การอัปเดตที่วางแผนไว้

การอัปเดตที่เป็นไปได้ในอนาคตเพื่อสร้างข้อมูลการขูดจากเว็บไซต์ใด ๆ ที่ง่าย/ทำงานได้ช่วยให้สามารถรวบรวมข้อมูลจำนวนมากได้

ลิงก์ไปยัง GitHub repo

โปรดพิจารณาตรวจสอบ! คุณสามารถรายงานปัญหาคุณสมบัติการร้องขอมีส่วนร่วมในโครงการนี้ ฯลฯ ใน GitHub repo นั่นคือวิธีที่ดีที่สุดในการติดต่อฉันสำหรับปัญหา/ข้อเสนอแนะที่เกี่ยวข้องกับโครงการนี้

https://github.com/jaonhax/scpscraper/

ขยาย

ข้อมูลเพิ่มเติม