pychan下载pychan源代码下载

pychan

Ai源码

v0.9.0

下载

Pychan

概述
安装
用法
1. 一般笔记
2. 设置
3. 获取板的名称
4. 获取线程
5. 获取存档的线程
6. 搜索4chan
  1. 关于Cloudflare
  2. 搜索4chan代码示例
7. 获取特定线程的帖子
Pychan模型
1. 线程
2. 帖子
  1. 关于答复的注释
3. 海报
4. 文件
贡献

概述

pychan是与4chan互动的Python客户。 4chan没有官方的API，并且试图通过第三方实施一个API，因此倾向于苦苦挣扎，因此，该库提供了与（刮擦）4chan交互的抽象。 pychan是面向对象的，其实现是懒惰的，在合理的地方（使用Python发电机），以优化性能并最大程度地减少多余的阻塞I/O操作。

安装

如果您安装了python> = 3.10和<4.0，则可以使用类似的东西从PYCI安装pychan

pip install pychan

用法

一般笔记

所有4chan互动都可以通过睡觉线程来内部进行。如果您以多线程的方式执行pychan ，则将无法获得此节流的好处。在这种情况下， pychan对过度HTTP请求的后果不承担责任。

设置

 from pychan import FourChan , LogLevel , PychanLogger

# With all defaults (logging disabled, all exceptions raised)
fourchan = FourChan ()

# Tell pychan to gracefully ignore HTTP exceptions, if any, within its internal logic
fourchan = FourChan ( raise_http_exceptions = False )

# Tell pychan to gracefully ignore parsing exceptions, if any, within its internal logic
fourchan = FourChan ( raise_parsing_exceptions = False )

# Configure logging explicitly
logger = PychanLogger ( LogLevel . INFO )
fourchan = FourChan ( logger = logger )

# Use all of the above settings at once
logger = PychanLogger ( LogLevel . INFO )
fourchan = FourChan ( logger = logger , raise_http_exceptions = True , raise_parsing_exceptions = True )

此README中的其余示例假定您已经创建了FourChan类的实例，如上所示。

获取板的名称

此功能在呼叫时动态从4chan获取板。

注意：此列表中未返回与pychan不兼容的板。

 boards = fourchan . get_boards ()
# Sample return value:
# ['a', 'b', 'c', 'd', 'e', 'g', 'gif', 'h', 'hr', 'k', 'm', 'o', 'p', 'r', 's', 't', 'u', 'v', 'vg', 'vm', 'vmg', 'vr', 'vrpg', 'vst', 'w', 'wg', 'i', 'ic', 'r9k', 's4s', 'vip', 'qa', 'cm', 'hm', 'lgbt', 'y', '3', 'aco', 'adv', 'an', 'bant', 'biz', 'cgl', 'ck', 'co', 'diy', 'fa', 'fit', 'gd', 'hc', 'his', 'int', 'jp', 'lit', 'mlp', 'mu', 'n', 'news', 'out', 'po', 'pol', 'pw', 'qst', 'sci', 'soc', 'sp', 'tg', 'toy', 'trv', 'tv', 'vp', 'vt', 'wsg', 'wsr', 'x', 'xs']

获取线程

 # Iterate over all threads in /b/
for thread in fourchan . get_threads ( "b" ):
    # Do stuff with the thread
    print ( thread . title )
    # You can also iterate over all the posts in the thread
    for post in fourchan . get_posts ( thread ):
        # Do stuff with the post - refer to the model documentation in pychan's README for details
        print ( post . text )

获取存档的线程

注意：某些板没有存档（例如/b/ ）。此类董事会将根据您如何配置FourChan实例返回空列表或提出异常。

此功能返回的线程将始终具有一个title字段，其中包含“摘录”列标题下4chan接口中显示的文本。该文本可以是线程的真实标题，也可以是原始帖子文本的预览。将该方法返回的任何线程传递给get_posts()方法将在附加到返回的帖子的线程上自动纠正title字段（如有必要）。有关更多详细信息，请参见获取帖子以获取特定线程。

从技术上讲， pychan可以通过为每个线程发出额外的HTTP请求来解决上述title行为，以获取其真实标题，但本着使最小数量的HTTP请求成为可能的精神， pychan直接使用摘录。

 for thread in fourchan . get_archived_threads ( "pol" ):
    # Do stuff with the thread
    print ( thread . title )
    # You can also iterate over all the posts in the thread
    for post in fourchan . get_posts ( thread ):
        # Do stuff with the post - refer to the model documentation in pychan's README for details
        print ( post . text )

搜索4chan

关于Cloudflare

对4chan进行搜索要比访问其余的4chan数据要繁琐。这是因为4chan在其REST API面前有一个Cloudflare防火墙，因此从搜索中获取数据的唯一方法是提供绕过Cloudflare的反机器人检查所需的HTTP请求信息。最终，这相当于通过某些标题以及HTTP请求，但挑战来自实际获取此类标题。

目前，为您生成这些标题的范围超出了pychan的范围，因此，如果您想自动化Cloudflare保护措施的规范，则可能需要考虑使用以下一个项目之一（此列表已按字母列表且不详尽）：

Ultrafunkamsterdam/未检测到的 - 染色器
有毒/Cloudscraper
wkeeling/selenium-wire

获取这些值的手动方法是使用Web浏览器执行4chan搜索，并利用浏览器的开发人员工具来跟踪搜索过程中提出的网络请求。包含CloudFlare值的请求将通过一些查询参数提出https://find.4chan.org/api 。找到此请求后，请复制您请求中发送的User-Agent和Cookie值，然后将它们传递给pychan的search()方法。请注意，Cloudflare cookie（S）对它们有效期，因此此手动解决方法只能返回结果，直到Cloudflare无效您的cookie。之后，您需要获取新值。

搜索4chan代码示例

注意：搜索结果中永远不会返回封闭/粘连/存档的线程。

 # This "threads" variable will contain a Python Generator (not a list) in order to facilitate laziness
threads = fourchan . search (
   board = "b" ,
   text = "ylyl" ,
   user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36" ,
   cloudflare_cookies = {
      "cf_clearance" : "bm2RICpcDeR4cXoC2nfI_cnZcbAkN4UYpN6c1zzeb8g-1440859602-0-160"
   }
)
for thread in threads :
    # The thread object is the same class as the one returned by get_threads()
    for post in fourchan . get_posts ( thread ):
       # Do stuff with the post - refer to the model documentation in pychan's README for details
       print ( post . text )

获取特定线程的帖子

 from pychan . models import Thread

# Instantiate a Thread instance with which to query for posts
thread = Thread ( "int" , 168484869 )

# Note: the thread contained within the returned posts will have all applicable metadata (such as
# title and sticky status), regardless of whether you provided such data above - pychan will
# "auto-discover" all metadata and include it in the post models' copy of the thread
posts = fourchan . get_posts ( thread )

Pychan模型

以下表总结了此库使用的各种模型上可用的所有数据类型。

另请注意， pychan中的所有模型类都实现以下方法：

__repr__
__str__
__hash__
__eq__
__iter__ __-这将实现，以便可以将模型传递给Python的tuple()函数
__copy__
__deepcopy__

线程

下表对应于pychan.models.Thread类。

场地	类型	示例值
`thread.board`	`str`	`"b"` ， `"int"`
`thread.number`	`int`	`882774935` `168484869`
`thread.title`	`Optional[str]`	`None` ， `"YLYL thread"`
`thread.is_stickied`	`bool`	`True` ， `False`
`thread.is_closed`	`bool`	`True` ， `False`
`thread.is_archived`	`bool`	`True` ， `False`
`thread.url`	`str`	`"https://boards.4chan.org/a/thread/251097344"`

帖子

下表对应于pychan.models.Post类。

场地	类型	示例值
`post.thread`	`Thread`	`pychan.models.Thread`
`post.number`	`int`	`882774935` `882774974`
`post.timestamp`	datetime.datetime	datetime.datetime
`post.poster`	`Poster`	`pychan.models.Poster`
`post.text`	`str`	`">be men>be boredn>write pychann>somehow it works"`
`post.is_original_post`	`bool`	`True` ， `False`
`post.file`	`Optional[File]`	`None` ， `pychan.models.File`
`post.replies`	`list[Post]`	`[]` ， `[pychan.models.Post, pychan.models.Post]`
`post.url`	`str`	`"https://boards.4chan.org/a/thread/251097344#p251097419"`

关于答复的注释

上面显示的replies字段纯粹是pychan提供的便利功能，用于访问线程中的所有帖子，该线程使用>>操作员“回复”当前帖子。但是，不必使用replies字段访问线程中的所有可用帖子。当您调用get_posts()方法时，您仍然会收到所有帖子（以发布的顺序）为单个平面列表。

海报

下表对应于pychan.models.Poster类。

场地	类型	示例值
`poster.name`	`str`	`"Anonymous"`
`poster.is_moderator`	`bool`	`True` ， `False`
`poster.id`	`Optional[str]`	`None` ， `"BYagKQXI"`
`poster.flag`	`Optional[str]`	`None` ， `"United States"` ， `"Canada"`

文件

下表对应于pychan.models.File类。

场地	类型	示例值
`file.url`	`str`	`"https://i.4cdn.org/pol/1658892700380132.jpg"`
`file.name`	`str`	`"wojak.jpg"` ， `"i feel alone.jpg"`
`file.size`	`str`	`"601 KB"`
`file.dimensions`	`tuple[int, int]`	`(1920, 1080)` ， `(800, 600)`
`file.is_spoiler`	`bool`	`True` ， `False`