使用phantomjs進行網頁抓取的實現代碼

作者：Eve Cole 更新時間：2025-05-16 11:48:02

phantomjs因為是無頭瀏覽器可以跑js，所以同樣可以跑dom節點，用來進行網頁抓取是再好不過了。

比如我們要批量抓取網頁“歷史上的今天” 的內容。網站

對dom結構的觀察發現，我們只需要取到.list li a的title值即可。因此我們利用高級選擇器構建dom片段

var d= ''var c = document.querySelectorAll('.list li a')var l = c.length;for(var i =0;i<l;i++){d=d+c[i].title+'/n'}

之後只需要讓js代碼在phantomjs裡跑起來即可~

 var page = require('webpage').create();page.open('http://www.todayonhistory.com/', function (status) { //打開頁面if (status !== 'success') {console.log('FAIL to load the address');} else {console.log(page.evaluate(function () {var d= ''var c = document.querySelectorAll('.list li a')var l = c.length;for(var i =0;i<l;i++){d=d+c[i].title+'/n'}return d}))}phantom.exit();});

最終我們另存為catch.js，在dos裡面執行一下，輸出內容到txt文件(也可以用phantomjs的文件api來寫)