SerpScraper
v4.0.1
该库的目的是提供一种简单,不可检测和验证码的耐药方法,以从Google和Bing等流行的搜索引擎中提取搜索结果。
推荐安装此操作的方法是通过作曲家:
composer require athlon1600/serpscraper " ^4.0 " <?php
use SerpScraper Engine GoogleSearch ;
$ page = 1 ;
$ google = new GoogleSearch ();
// all available preferences for Google
$ google -> setPreference ( ' results_per_page ' , 100 );
//$google->setPreference('google_domain', 'google.lt');
//$google->setPreference('date_range', 'hour');
$ results = array ();
do {
$ response = $ google -> search ( " how to scrape google " , $ page );
// error field must be empty otherwise query failed
if ( empty ( $ response -> error )){
$ results = array_merge ( $ results , $ response -> results );
$ page ++;
} else if ( $ response -> error == ' captcha ' ){
// read below
break ;
}
} while ( $ response -> has_next_page );为此,您需要注册2captcha.com服务,并获取一个API密钥。也强烈建议使用代理服务器。
在您自己的VPS上安装专用代理服务器:
https://github.com/athlon1600/useful#squid
<?php
use SerpScraper Engine GoogleSearch ;
use SerpScraper GoogleCaptchaSolver ;
$ google = new GoogleSearch ();
$ browser = $ google -> getBrowser ();
$ browser -> setProxy ( ' PROXY:IP ' );
$ solver = new GoogleCaptchaSolver ( $ browser );
while ( true ){
$ response = $ google -> search ( ' famous people born in ' . mt_rand ( 1500 , 2020 ));
if ( $ response -> error == ' captcha ' ) {
echo " Captcha detected! " . PHP_EOL ;
$ temp = $ solver -> solveUsingTwoCaptcha ( $ response , ' 2CAPTCHA_API_KEY ' , 90 );
if ( $ temp -> status == 200 ) {
echo " Captcha solved successfully! " . PHP_EOL ;
} else {
echo ' Solving captcha has failed... ' . PHP_EOL ;
}
} else {
echo " OK. " ;
}
sleep ( 2 );
} <?php
use SerpScraper Engine BingSearch ;
$ bing = new BingSearch ();
$ results = array ();
for ( $ page = 1 ; $ page < 10 ; $ page ++){
$ response = $ bing -> search ( " search bing using php " , $ page );
if ( $ response -> error == false ){
$ results = array_merge ( $ results , $ response -> results );
}
if ( $ response -> has_next_page == false ){
break ;
}
}
var_dump ( $ results );