GoogleImageScraper下载-GoogleImages GoogleImageScraper源代码下载

GoogleImageScraper

Ai源码

1.0.0

下载

Google图像刮板

这是一个用于从Google图像中检索和下载图像的库。
它使用输入查询和参数来搜索和检索图像对象。这些图像可能会受到版权保护，您不应对它们进行任何惩罚，例如将其用于商业用途。该图书馆的灵感来自Hardikvasa的google-images-download ，但增加了一些生活质量的改进，例如也能够检索URL。但是，如果没有他们的工作以及正在努力继续它的人们，就不可能。

争论

两个主要功能中都有一个必需的参数和两个参数：

争论	类型	描述
询问：	str，列表	包含要搜索的关键字的字符串或列表。如果查询是字符串，则将其分为不同的关键字。
限制	int	要搜索的图像数量。不能比100更大。默认为1
参数：	dict	这是一个包含许多可选值的字典，所有这些值将在此处列出。它们分为两个类别：搜索参数和下载参数

下载参数

争论	类型	描述
download_format	str	指定文件扩展名，以下载所有图像。必须是PIL识别的有效图像文件扩展名。注意：这需要大量图像花费大量时间
目录	str	这指定了将图像下载到的目录名称。除非指定了目录或路径，否则将在目录中自动创建该函数。
小路	str	这指定了创建下载目录的路径。
暂停	int float	这指定了程序将等待以秒为单位的单个图像的最长时间。
冗长	布尔	设置为`True` ，以打印对控制台的进度更新。

搜索参数

争论	接受的值	描述
颜色	“红色”，“橙色”，“黄色”，“绿色”，“蓝绿色”，“蓝色”，“紫色”，“粉红色”，“白色”，“灰色”，“灰色”，“黑色”，“棕色”，“棕色”	通过主要颜色过滤图像。
color_type	“ full'，'灰度'，“透明”	通过颜色类型，全彩色，灰度或透明的滤光图像。
执照	'Creative_commons'，'其他_licenss'	通过使用许可证过滤图像。
类型	“脸”，“照片”，“剪贴画”，“ Lineart”，“ gif”	通过要搜索的图像类型的过滤器。不要与search_format混淆
时间	'past_day'，'past_week'，'past_month'，'past_year'	仅查找指定时间发布的图像。
extack_ratio	“高个子”，“正方形”，“宽”，“全景”	指定图像的纵横比。
search_format	'jpg'，'gif'，'png'，'bmp'，'svg'，'webp'，'ico'，'raw'	滤除不是指定格式的图像。如果您想将图像作为特定格式下载，请改用'download_format'参数。

用法

有四个可用的功能，下载， URL ， image_objects和download_image ，其工作方式与其他功能不同：

下载：

 import GoogleImageScraper

images = GoogleImageScraper ( query , limit , arguments )

这将根据参数下载图像。返回的值将遵循此格式：

{ 'images' : [ images ], 'errors' : Number of Errors }

图像列表中的每个图像也将遵循特定格式：

{ 'path' : Image Path , 'url' : Image Url }

URL：

 import GoogleImageScraper

urls = GoogleImageScraper . urls ( query , limit , arguments )

此功能只需返回搜索字词的图像URL列表即可。

Image_Objects：

此功能有些利基，但对某些人可能有用。它没有像使用URL函数那样返回图像URL列表，而是返回包含有用数据的图像对象列表，如下所示：

{ 'url' : Image url , 'thumbnail' : Url of image thumbnail , 'source_url' : The webpage the image was found on , 'source' : The base url of the source }

用法类似于以前的功能：

 import GoogleImageScraper

image_objects = GoogleImageScraper . image_objects ( query , limit , arguments )

download_image：

使用此功能通过URL下载图像。此功能与其他功能不同的是，它采用不同的输入参数，如下所示：

争论	类型	描述
URL	str	从下载图像的URL。必需的
姓名	str	文件的名称。不包括文件扩展名。必需的
小路	str	将图像下载到的路径。
download_format	str	下载图像的格式。需要一段时间
覆盖	布尔	是否要覆盖具有相同名称的文件。默认为`True` 。如果`False`升级`FileExistsError` ，则文件存在。

错误

您有可能无法达到极限参数中指定的图像数量。当有错误下载映像，不是图像格式或请求时间出现时，就会发生这种情况。下载大量图像时，这可能会导致您无法达到限制。从下载中返回的字典中的“错误”项目是您跟踪这一点的方式。例如，如果您的限制为100，并且3张图像丢弃了错误，则您将获得97张图像，并且“错误”项目将为3个。但是，如果您的限制为20，而3张映像则丢弃了错误，那么您仍然会收到20个项目， “错误”项目将为0。这是一个最多100个url在一个限制中会增加限制的机会，因此您的限制会增加，因此您的限制会增加。

包括错误

错误	描述
`LimitError`	当限制参数高于100或不适当的类型时，会提出。
`ArgumentError`	当给出无效的参数值时提出
`QueryError`	如果没有查询或查询不是合适的类型，则会提出
`UnpackError`	如果在页面上找不到图像，则提出。
`DownloadError`	独家download_image函数。如果图像未能下载，则提出。

包括这些类似：

 from GoogleImageScraper . errors import < error >

示例：

这里列出了一些真实示例：

URL：

 import GoogleImageScraper
urls = GoogleImageScraper . urls ( query = 'cats' , limit = 10 , arguments = { 'color' : 'black' })

结果：

[ 'https://www.rd.com/wp-content/uploads/2021/01/GettyImages-1175550351.jpg' , 
'https://www.history.com/.image/ar_4:3%2Cc_fill%2Ccs_srgb%2Cfl_progressive%2Cq_auto:good%2Cw_1200/MTg0NTEzNzgyNTMyNDE2OTk5/black-cat-gettyimages-901574784.jpg' , 
'https://www.thesprucepets.com/thmb/kF3_dQW_JT1ClMQDlISxq3BgeT4=/6843x5132/smart/filters:no_upscale()/facts-about-black-cats-554102-hero-7281a22d75584d448290c359780c2ead.jpg' , 
'https://i.guim.co.uk/img/media/c5e73ed8e8325d7e79babf8f1ebbd9adc0d95409/2_5_1754_1053/master/1754.jpg?width=465&quality=45&auto=format&fit=max&dpr=2&s=065f279099ded1062688e357b155dc29' , 
'https://cdn.cnn.com/cnnnext/dam/assets/141030105303-kiki-irpt.jpg' , 
'https://imagesvc.meredithcorp.io/v3/mm/image?url=https%3A%2F%2Fstatic.onecms.io%2Fwp-content%2Fuploads%2Fsites%2F34%2F2021%2F09%2F27%2Fblack-cat-kitchen-rug-getty-0921-2000.jpg' , 
'https://www.gannett-cdn.com/presto/2021/10/28/USAT/1bf79c6a-5d88-4e64-b398-c40418a79829-XXX_iStock_000017680551Large.jpg' ,
'https://cdn.sanity.io/images/0vv8moc6/dvm360/f28cc9b680aed62edd018ce47a5cbb96c4f78f3b-4860x3024.jpg' , 
'https://vbspca.com/wp-content/uploads/2019/10/Image-e1570199876255.jpeg' , 
'https://ichef.bbci.co.uk/news/976/cpsprodpb/AECE/production/_99805744_gettyimages-625757214.jpg' ]

下载：

 import GoogleImageScraper
images = GoogleImageScraper . download ( query = 'dogs' , limit = 1 , arguments = { 'color' : 'brown' , 'download_format' : 'png' })

结果：

{ 'images' : [{ 'path' : '<path> \ images \ dogs-0.png' , 'url' : 'https://post.medicalnewstoday.com/wp-content/uploads/sites/3/2020/02/322868_1100-800x825.jpg' }], 'errors' : 0 }

Image_Objects：

 import GoogleImageScraper
objects = GoogleImageScraper . image_objects ( query = 'birds' , limit = 1 , arguments = { 'color' : 'yellow' })

结果：

[{ 'thumbnail' : 'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQwDI5y3_n2rwFQLZKrBXs5VL_J38zlZVvdZAooD8F8d7lY8ZA9iLEb1-AoBBWpGftpdoc&usqp=CAU' , 'url' : 'https://www.sfvaudubon.org/wp-content/uploads/2020/03/YEWAcrop.jpg' , 'source_url' : 'https://www.sfvaudubon.org/sfv-backyard-bird-identification/' , 'source' : 'sfvaudubon.org' }, { 'thumbnail' : 'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcR1k5IhGCAPgU468tyPrgkuY9WC3T83zRxzFrTOOUs0OL_kanPG8VPKXV3euijAlzW9AsE&usqp=CAU' , 'url' : 'https://ca.audubon.org/sites/default/files/styles/article_teaser/public/yellowwarbler_peter_latourrette.jpg?itok=PFRtxcGN' , 'source_url' : 'https://ca.audubon.org/birds-0' , 'source' : 'ca.audubon.org' }]