This article introduces the tutorial sharing of batch crawling a list page. Let’s take a look at the detailed tutorial below. Friends who need it can refer to it.
Some people are a treasure when crawling programs, but they are still selling them for TND. It’s really true that these guys are! Maybe the following thing is a bit bad
The following does not have a write-to-store function, and it has reached this point. The entry function is very simple. Please complete it yourself if you need it. Please improve other functions by yourself! Copy the code and run it directly to see the effect
Dim Url,List_PageCode,Array_ArticleID,i,ArticleID
Dim Content_PageCode,Content_TempCode
Dim Content_CategoryID,Content_CategoryName,BorderID,ClassID,BorderName,ClassName
Dim ArticleTitle,ArticleAuthor,ArticleFrom,ArticleContent
Url = "http://www.webasp.net/article/class/1.htm"
List_PageCode = getHTTPPage(Url)
List_PageCode = RegExpText(List_PageCode,"Print","
List_PageCode = RegExpText(List_PageCode," 'Get the article link of the current list page to separate it
Array_ArticleID = Split(List_PageCode,",") 'Create an array and store the article ID
For i=0 To Ubound(Array_ArticleID)-1
ArticleID = Array_ArrayID(i) 'ArrayID
Content_PageCode = getHTTPPage("http://www.webasp.net/article/"&ArticleID) 'Get the content of the article page
'=================================================
Content_TempCode = RegExpText(Content_PageCode,"Technical Tutorial >> ",">> Content",0)
Content_CategoryID = RegExpText(Content_PageCode,"",1)
BorderID = Split(Content_CategoryID,",")(0) 'Big Class ID
ClassID = Split(Content_CategoryID,",")(1) 'Subclass ID
'============== Check whether the major category exists. Start=================
'If it does not exist, enter the database
'============== Check whether the major category exists End==================
'Response.Write(BorderID & "," & ClassID & "
")
Content_CategoryName = RegExpText(Content_PageCode,"/'>","",1)
BorderName = Split(Content_CategoryName,",")(0) 'Big class name
ClassName = Split(Content_CategoryName,",")(1) 'Subclass name
'============== Check whether the subclass exists Start=================
'If it does not exist, enter the database
'============== Check whether the subclass exists End=================
'===================================================
'========================================================
ArticleTitle = RegExpText(Content_PageCode," "," ",0)
ArticleAuthor = RegExpText(Content_PageCode," Author:","",0)
ArticleFrom = RegExpText(Content_PageCode," Source:","",0)
ArticleContent = RegExpText(Content_PageCode,"",""&VBCrlf&" "&VBCrlf&" ",0)
'==========================================================
Response.Write(ArticleTitle& "
")
Response.Flush()
Next
Several functions are attached:
Function getHTTPPage(url)IF(IsObjInstalled("Microsoft.XMLHTTP") = False)THEN
Response.Write "
The server does not support Microsoft.XMLHTTP component"
Err.Clear
Response.End
END IF
On Error Resume Next
Dim http
SET http=Server.CreateObject("Msxml2.XMLHTTP")
Http.open "GET",url,False
Http.send()
IF(Http.readystate4)THEN
Exit Function
END IF
getHTTPPage=BytesToBSTR(Http.responseBody,"GB2312")
SET http=NOTHING
IF(Err.number0)THEN
Response.Write "
An error occurred when obtaining the file contents"
'Response.End
Err.Clear
END IF
End Function
Function BytesToBstr(CodeBody,CodeSet)
Dim objStream
SET objStream = Server.CreateObject("adodb.stream")
objStream.Type = 1
objStream.Mode =3
objStream.Open
objStream.Write CodeBody
objStream.Position = 0
objStream.Type = 2
objStream.Charset = CodeSet
BytesToBstr = objStream.ReadText
objStream.Close
SET objStream = NOTHING
End Function
'==================================================================
'Function: Check whether the component has been installed
'Return value: True ----installed
' False ---Not installed
'==================================================================
Function IsObjInstalled(objName)
On Error Resume Next
IsObjInstalled = False
Err = 0
Dim testObj
SET testObj = Server.CreateObject(objName)
IF(0 = Err)THEN IsObjInstalled = True
SET testObj = NOTHING
Err = 0
End Function
Function RegExpText(strng,strStart,strEnd,n)
Dim regEx, Match, Matches, RetStr
SET regEx = New RegExp
regEx.Pattern = strStart&"([/s/S]*?)"&strEnd
regEx.IgnoreCase = True
regEx.Global = True
SET Matches = regEx.Execute(strng)
For Each Match in Matches
IF(n=1)THEN
RetStr = RetStr & regEx.Replace(Match.Value,"$1") & ","
ELSE
RetStr = RetStr & regEx.Replace(Match.Value,"$1")
END IF
Next
RegExpText = RetStr
SET regEx=NOTHING
The above is the introduction of the entire content of the tutorial sharing of batch crawling a certain list page. I hope the relevant knowledge and materials compiled by the editor will be helpful to you. For more content, please continue to pay attention to the website of the Wuxin Technology Channel!