From the collection principle I mentioned earlier, you can see that most collection programs rely on analysis rules for collection, such as analyzing the paging file name rules and analyzing the page code rules.
1. Prevention of collection of paging file names
Most collectors rely on analyzing the paging file name rules to perform batch and multi-page collection. If others cannot find the file name rules for your paging file, then others will not be able to collect multiple pages of your website in batches.
Implementation method:
I think encrypting paging file names with MD5 is a better way. Speaking of this, some people will say that if you encrypt paging file names with MD5, others can also simulate your encryption rules to get your paging file names according to this rule.
What I want to point out is that when we encrypt the paging file name, don't just encrypt the part that changes the file name
If I represent the page number of the page, then we should not encrypt it like this: page_name=Md5(I,16)&".htm"
It is best to follow up one or more characters on the page number to be encrypted, such as: page_name=Md5(I&"any one or several letters", 16)&".htm"
Because MD5 cannot be decrypted, the page letters that others see are the result of MD5 encryption, so the adder cannot know what the letters you follow after I, unless he uses violent **** MD5, but it is not realistic.
2. Prevention of collection of page code rules
If our content page has no code rules, then others cannot extract the pieces of content they need from your code. So the step we need to prevent collection is to make the code free of rules.
Implementation method:
Randomize the markers that the other party needs to extract
1. Customize multiple web templates. The important HTML tags in each web template are different. When presenting the page content, randomly select web templates. Some pages are layout with CSS+DIV, and some pages are layout with table. This method is a bit troublesome. For a content page, you need to make several more template pages. However, anti-collection is a very tedious thing. Making more templates can play a role in preventing collection, which is worth it for many people.
2. If the above method is too troublesome, randomize the important HTML tags in the web page.
The more web templates you make, the more random the HTML code is. The more trouble it will be when the other party analyzes the content code. When the other party writes a collection strategy for your website, it will be more difficult. At this time, most people will retreat because this person is lazy and collects data from other people's websites~~~ Let's talk about it again. At present, most people use collection programs developed by others to collect data. After all, there are a few people who develop collection programs to collect data by themselves.
There are some simple ideas for you:
1. Use client scripts to display content that is important to data collectors but not to search engines.
2. Dividing one page of data into N pages to display, which is also a way to increase the difficulty of collection.