목록 페이지를 크롤링하는 배치의 튜토리얼 공유

저자：Eve Cole 업데이트 시간：2025-08-28 00:48:01

이 기사는 목록 페이지를 크롤링하는 배치의 자습서 공유를 소개합니다. 아래의 자세한 자습서를 살펴 보겠습니다. 필요한 친구는 그것을 참조 할 수 있습니다.

어떤 사람들은 프로그램을 크롤링 할 때 보물이지만 여전히 TND를 위해 판매하고 있습니다. 이 사람들이 정말 사실입니다! 어쩌면 다음은 약간 나쁩니다

다음은 쓰기-투-투어 기능이 없으며이 시점에 도달했습니다. 입력 기능은 매우 간단합니다. 필요한 경우 직접 완료하십시오. 스스로 다른 기능을 개선하십시오! 코드를 복사하여 직접 실행하여 효과를 확인하십시오.

dim url, list_pagecode, array_articleid, i, articleId

dim content_pagecode, content_tempcode

Dim Content_CategoryID, Content_CategoryName, BorderID, ClassID, BorderName, ClassName

Dim Articletitle, Articleeuthor, Article, Articlecontent

url = "http://www.webasp.net/article/class/1.htm"

list_pagecode = gethttppage (url)

list_pagecode = regexptext (list_pagecode, "print", "

list_pagecode = regexptext (list_pagecode, " '현재 목록 페이지의 기사 링크를 분리하려면

array_articleid = split (list_pagecode, ",") '배열 생성 및 기사 ID를 저장

i = 0에서 ubound (array_articleid) -1

articleid = array_arrayid (i) 'arrayid

content_pagecode = gethttppage ( "http://www.webasp.net/article/"& articleid) '기사 페이지의 내용 얻기

'=================================================

content_tempcode = regexptext (content_pagecode, "기술 자습서 >>", ">> content", 0)

content_categoryId = regexptext (content_pagecode, "", 1)

BorderId = split (content_categoryId, ","(0) '큰 클래스 ID

classId = split (content_categoryId, ","(1) '서브 클래스 ID

'=============== 주요 범주가 존재하는지 확인하십시오. 시작 =================

'존재하지 않으면 데이터베이스를 입력하십시오

'=============== 주요 범주가 존재하는지 확인하십시오 =================

'Response.write (Borderid &”, "& ClassId &"

content_categoryName = regexptext (content_pagecode, "/'>", "", 1)

BorderName = split (content_categoryName, ","(0) '큰 클래스 이름

className = split (content_categoryName, ","(1) '서브 클래스 이름

'============== 서브 클래스가 존재하는지 확인하십시오 ================

'존재하지 않으면 데이터베이스를 입력하십시오

'=============== 서브 클래스가 존재하는지 확인하십시오 =================

'====================================================

'===========================================================

articletitle = regexptext (content_pagecode, " ", ", 0)

ArticleAuthor = regexptext (content_pagecode, "author :", "", 0)

articlefrom = regexptext (content_pagecode, "source :", "", 0)

articleContent = regexptext (content_pagecode, "", ""& vbcrlf & ""& vbcrlf & "", 0)

'==============================================================

response.write (articletitle & "

응답. 플러시 ()

여러 기능이 첨부됩니다.

기능 gethttppage (URL)

if (isobjinstalled ( "microsoft.xmlhttp") = false)

Response.Write "

서버는 Microsoft.xmlhttp 구성 요소를 지원하지 않습니다. "

err.clear

응답. 엔드

끝 If

다음에 오류가 재개됩니다

희미한 HTTP

set http = server.createobject ( "msxml2.xmlhttp")

http.open "get", url, false

http.send ()

if (http.readystate4)

종료 기능

끝 If

gethttppage = bytestobstr (http.responsebody, "gb2312")

HTTP = 아무것도 설정하지 않습니다

if (err.number0) 그러면

Response.Write "

파일 내용을 얻을 때 오류가 발생했습니다.

'응답. 엔드

err.clear

끝 If

엔드 기능

bytestobstr (codebody, codeset) 함수

희미한 objstream

set objstream = server.createobject ( "adodb.stream")

objstream.type = 1

objstream.mode = 3

objstream.open

objstream.write CodeBody

objstream.position = 0

objstream.type = 2

objstream.charset = codeset

bytestobstr = objstream.readtext

objstream.close

objstream = 아무것도 설정하지 않습니다

엔드 기능

'================================================================================

'기능 : 구성 요소가 설치되었는지 확인하십시오

'반환 값 : true ---- 설치

'false --- 설치되지 않았습니다

'================================================================================

기능 isobjinstalled (objname)

다음에 오류가 재개됩니다

isobjinstalled = false

err = 0

Dim testobj

set testobj = server.createobject (objname)

if (0 = err) if isobjinstalled = true입니다

testobj = 아무것도 설정하지 않습니다

err = 0

엔드 기능

함수 regexptext (strng, strstart, strend, n)

Dim Regex, Match, Matches, Retstr

REGEX = New Regexp를 설정하십시오

regex.pattern = strstart & "([/s/s]*?)"& Strend

regex.ignorecase = true

regex.global = true

매치 세트 = regex.execute (strng)

경기에서 각 경기마다

if (n = 1) 그러면

retstr = retstr & regex.replace (match.value, "$ 1") & ","

또 다른

retstr = retstr & regex.replace (match.value, "$ 1")

끝 If

regexptext = retstr

REGEX = 아무것도 설정하지 않습니다

위의 것은 배치의 튜토리얼 공유의 전체 내용을 특정 목록 페이지를 크롤링하는 것입니다. 편집자가 편집 한 관련 지식과 자료가 도움이되기를 바랍니다. 자세한 내용은 Wuxin Technology Channel의 웹 사이트에 계속주의를 기울이십시오!