A brief discussion on the principle of ASP acquisition program

Author：Eve Cole Update Time：2025-04-12 04:16:02

Recommended: ASP and ACCESS link
This article illustrates two ways of linking ASP and ACCESS in ASP examples. The following is the referenced content: %dim dd,connstrdd=data.mdbconnstr=&q

The main steps of the collection procedure are as follows:

1. Obtain the content of the collected page

2. Extract all the data used from the acquisition code

1. Obtain the content of the collected page

I currently know the commonly used methods of obtaining collected pages in ASP:

1. Use serverXMLHTTP component to obtain data

The following content is the program code:

The following is the quoted content:

Function GetBody(weburl)

'Create an object

Dim ObjXMLHTTP

Set ObjXMLHTTP=Server.CreateObject(MSXML2.serverXMLHTTP)

'Request file, in asynchronous form

ObjXMLHTTP.Open GET,webuurl,False

ObjXMLHTTP.send

While ObjXMLHTTP.readyState <> 4

ObjXMLHTTP.waitForResponse 1000

Wend

'Get the result

GetBody=ObjXMLHTTP.responseBody

'Release the object

Set ObjXMLHTTP=Nothing

End Function

Calling method:

GetBody (the URLf address of the file)

2. Or XMLHTTP component to obtain data

The following content is the program code:

The following is the quoted content:

Function GetBody(weburl)

'Create an object

Set Retrieval = CreateObject(Microsoft.XMLHTTP)

With Retrieval

.Open Get, weburl, False, ,

.Send

GetBody = .ResponseBody

End With

'Release the object

Set Retrieval = Nothing

End Function

Calling method:

GetBody (the URLf address of the file)

The data obtained in this way also needs to be encoded and converted before it can be used

The following content is the program code:

The following is the quoted content:

Function BytesToBstr(body,Cset)

dim objstream

set objstream = Server.CreateObject(adodb.stream)

objstream.Type = 1

objstream.Mode =3

objstream.Open

objstream.Write body

objstream.Position = 0

objstream.Type = 2

objstream.Charset = Cset

BytesToBstr = objstream.ReadText

objstream.Close

set objstream = nothing

End Function

Call method: BytesToBstr (data to be converted, encoding)' encoding is commonly used as GB2312 and UTF-8

2. Extract all the data used from the acquisition code

The methods I have mastered are:

1. Use the built-in MID function of ASP to intercept the required data

The following content is the program code:

Function body(wstr,start,over)

start=Newstring(wstr,start)

'Set the unique start mark for the data to be processed

over=Newstring(wstr,over)

'The corresponding start is the only end mark of the data to be processed

body=mid(wstr,start,over-start)

'Set the scope of the page displayed

End Function

Call method: body (the content of the collected page, start mark, end mark)

2. Use regularity to obtain the required data

The following content is the program code:

Function body(wstr,start,over)

Set xiaoqi = New Regexp'Set configuration object

xiaoqi.IgnoreCase = True' Ignore case

xiaoqi.Global = True' Set as full text search

xiaoqi.Pattern = &start&. ?&over&' regular expression

Set Matches =xiaoqi.Execute(wstr)'Start the configuration execution

set xiaoqi=nothing

body=

For Each Match in Matches

body=body&Match.Value 'loop match

End Function

Call method: body (the content of the collected page, start mark, end mark)

Collecting procedures and detailed ideas:

1. Obtain the address of each page of the page of the website's paging list page

Currently, most dynamic websites have rules for paging addresses, such as:

Dynamic page

Page 1: index.asp?page=1

Page 2: index.asp?page=2

Page 3: index.asp?page=3

.....

Static page

Page 1: page_1.htm

Page 2: page_2.htm

Page 3: page_3.htm

.....

Get the address of each page of the page of the website's paging list page, you only need to replace the characters that change the address of each page with variables, such as: page_<%=&page&%>.htm

2. Obtain the content of the page page of the collected website

3. Extract the URL connection address of the collected content page from the paging list code

There are also fixed rules for connecting content pages in most paging pages, such as:

Connect 1

Connect 2

Connection 3

Use the following code to get a URL connection collection

The following content is the program code:

The following is the quoted content:

Set xiaoqi = New Regexp

xiaoqi.IgnoreCase = True

xiaoqi.Global = True

xiaoqi.Pattern = . ?

Set Matches =xiaoqi.Execute(page list content)

set xiaoqi=nothing

url=

For Each Match in Matches

url=url&Match.Value

4. Obtain the content of the collected content page and intercept the data to be obtained from the collected content page according to the extraction mark.

Because it is a dynamically generated page, most content pages have the same html tags, and we can extract the content of each part of the required content based on these regular tags.

like:

Each page has a web page title. You can get the values between them by using the MID intercept function I wrote above, or you can get them with regular expressions.

Example: body(,)

Share: How to use ASP to customize your own XML file reading and writing methods
Before this, I wrote two posts about ASP combining XML, introducing the technologies of replacing databases with XML and integrating databases with XML, so that databases are no longer our only choice in some cases. Compared with traditional databases, XML has the following advantages