scpscraperダウンロードscpscraperソースコードのダウンロード

scpscraper

AI ソースコード

v1.0.1)

ダウンロード

SCPスクレーパー

SCP Wikiからデータを削減するために設計された小さなPythonライブラリ。 AIトレーニング（すなわち、NLPモデル）とデータセットコレクション（外部プロジェクトのSCPの分類など）を念頭に置いて作成されており、これらのアプリケーションでの使いやすさを可能にする議論があります。

以下に、インストール手順、このライブラリの使用方法の例、およびそれを利用できる方法を示します。これが私と同じように便利だと思うことを願っています！

サンプルコード

インストール

scpscraper pip installを介してインストールできます。使用するコマンドは次のとおりです。そのため、一貫して最新バージョンがあります。

 pip3 install --upgrade scpscraper

基本

ライブラリのインポート

 # Before we begin, we obviously have to import scpscraper.
import scpscraper

SCPの名前をつかむ

 # Let's use 3001 (Red Reality) as an example.
name = scpscraper . get_scp_name ( 3001 )

print ( name ) # Outputs "Red Reality"

SCPについてできるだけ多くの詳細をつかむ

 # Again using 3001 as an example
info = scpscraper . get_scp ( 3001 )

print ( info ) # Outputs a dictionary with the
# name, object id, rating, page content by section, etc.

楽しいもの

SCPの`page-content` DIV HTMLを取得します

参照のために、 page-content Divには、追加のWikidot外部のものがなく、ユーザーが実際に書いたものが含まれています。

 # Once again, 3001 is the example
scp = scpscraper . get_single_scp ( 3001 )

# Grab the page-content div specifically
content = scp . find_all ( 'div' , id = 'page-content' )

print ( content ) # Outputs "<div id="page-content"> ... </div>"

HTMLまたは複数のSCPからの情報の削減

 # Grab info on SCPs 000-099
scpscraper . scrape_scps ( 0 , 100 )

# Same as above, but only grabbing Keter-class SCPs
scpscraper . scrape_scps ( 0 , 100 , tags = [ 'keter' ])

# Grab 000-099 in a format that can be used to train AI
scpscraper . scrape_scps ( 0 , 100 , ai_dataset = True )

 # Scrape the page-content div's HTML from SCP-000 to SCP-099

# Only including this as an example, but scrape_scps_html() has
# all the same options as scrape_scps().
scpscraper . scrape_scps_html ( 0 , 100 )

Google Colaboratoryのみの使用

google.colabモジュールがGoogle Colaboratoryに含まれているため、他の方法ではできないことをいくつか追加することができます。

GoogleドライブをColaboratory VMにマウントします

 # Mounts it to the directory /content/drive/
scpscraper . gdrive . mount ()

SCP情報/HTMLをスクレイプし、その後Googleドライブにコピーします

 # Requires your Google Drive to be mounted at the directory /content/drive/
scpscraper . scrape_scps ( 0 , 100 , copy_to_drive = True )

scpscraper . scrape_scps_html ( 0 , 100 , copy_to_drive = True )

Googleドライブに出入りする他のファイルをコピーします

 # Requires your Google Drive to be mounted at the directory /content/drive/
scpscraper . gdrive . copy_to_drive ( 'example.txt' )

scpscraper . gdrive . copy_from_drive ( 'example.txt' )

計画された更新

将来の潜在的な更新は、あらゆるウェブサイトからのスクレイプデータを簡単/実行可能にし、データの容易なマス収集を可能にします。

Github Repoへのリンク

チェックアウトしてください！ GitHub Repoで、問題、リクエスト機能、このプロジェクトに貢献するなどを報告できます。これが、このプロジェクトに関連する問題/フィードバックに私に連絡するための最良の方法です。

https://github.com/jaonhax/scpscraper/

拡大する

追加情報

バージョン v1.0.1)
タイプ AI ソースコード
更新時間 2025-09-12
サイズ 14.54KB
から Github

scpscraper

SCPスクレーパー

サンプルコード

インストール

基本

ライブラリのインポート

SCPの名前をつかむ

SCPについてできるだけ多くの詳細をつかむ

楽しいもの

SCPの`page-content` DIV HTMLを取得します

HTMLまたは複数のSCPからの情報の削減

Google Colaboratoryのみの使用

GoogleドライブをColaboratory VMにマウントします

SCP情報/HTMLをスクレイプし、その後Googleドライブにコピーします

Googleドライブに出入りする他のファイルをコピーします

計画された更新

Github Repoへのリンク

ML stack

awesome free chatgpt

pywin_contextmenu

promptl

tick.chat

FastLoRAChat

chat.petals.dev

GPT Prompt Templates

GPTyped

ML stack

awesome free chatgpt

pywin_contextmenu

Google Dorks

shepherd

mongo express

scpscraper

SCPスクレーパー

サンプルコード

インストール

基本

ライブラリのインポート

SCPの名前をつかむ

SCPについてできるだけ多くの詳細をつかむ

楽しいもの

SCPのpage-content DIV HTMLを取得します

HTMLまたは複数のSCPからの情報の削減

Google Colaboratoryのみの使用

GoogleドライブをColaboratory VMにマウントします

SCP情報/HTMLをスクレイプし、その後Googleドライブにコピーします

Googleドライブに出入りする他のファイルをコピーします

計画された更新

Github Repoへのリンク

SCPの`page-content` DIV HTMLを取得します