learning_spider
1.0.0
This project is mainly divided into three parts
| Difficulty | content | information | Way | difficulty |
|---|---|---|---|---|
| Base | Simple usage of various libraries | Basic usage method | View document writing demo | |
| getting Started | Maoyan movie rankings top 100 | Static web pages | Requests | |
| Amazon China Mall Search Page | Static web pages | Requests | ||
| Today's headline search results | Dynamic web pages | Requests | ||
| Weibo mobile user dynamic information | Dynamic web pages | Requests | Location of the since_id parameter | |
| Bilibili observer sees the same index | Dynamic web pages | Requests | Restore of compressed exponential data | |
| The easiest slider verification code | Dynamic web pages | Selenium | Slider Move | |
| Simple | Password encryption method of a certain router | Single js file | Looking for encryption functions | |
| Unlimited debugger processing | Dynamic web pages | Reres | Anti-debugging | |
| AAEncode decryption | Dynamic web pages | Devtool | Encoding encryption | |
| CSS element absolute positioning reverse crawling | Static web pages | pyppeteer | Restore element order | |
| CSS pseudo-class anti-crawling | Static web pages | Requests | Restore pseudo-class content | |
| 58.com Brand Apartment | Static web pages | Requests | Static font encryption | |
| Anjuke fingerprint study | Single js file | Devtool | Research and understand the significance of collected information | |
| generally | Zhihu article information | Dynamic web pages | Requests | Header `x-zse-86` parameter encryption time-based anti-debugging |
| China_cn Font encryption processing | Dynamic web pages | fontTool | Dynamic font encryption processing | |
| Baidu obfuscated code processing | Single js file | @bebel | Writing various restore plug-ins | |
| Accelerate obfuscation code processing | Blocking settings cookies | @bebel | OB obfuscated code restoration | |
| Difficult | Carbosynch captures a picture | Simple TLS fingerprint | Modify the default security component configuration | Understand TLS |
Website URL (recorded): http://learnspider.evilreclose.top/
| type | Difficulty | name | information |
|---|---|---|---|
| Slider Verification | getting Started | The easiest slider verification | Just drag the slider and slide to the end and you can pass, without any detection |
| Simple | SliderCaptcha | Default settings are deployed, basic human-machine verification exists, constant speed pull/linear pull will not pass the verification | |
| CSS anti-crawl | getting Started | Absolute positioning reverse crawling | Using the characteristics of absolute positioning, after dispersing the data into html, the view is restored through coordinates |
| Simple | Pseudo-Crash | Using the characteristics of pseudo-class content that can display data, display some data in content | |
| generally | Rest font encryption | Let some Unicode text use custom font parsing, so that people who use standard Unicode parsing cannot climb the data and the font will not change during a single access process. | |
| js anti-crawl | getting Started | Anti-debugging | Use timed startup/necked debugger to keep the browser in a debug state that cannot be exited |
| Simple | Disable debugging | Writing code prohibits opening of the browser console | |
| Simple | AAEncode | Replace common characters with emoticon characters, making it difficult to read | |
| Simple | JSFuck | Replace most common characters with several basic characters, making it difficult to read | |
| Data encryption | generally | AES symmetric encryption | Encrypt the transmitted data |
| generally | Custom Base64 code table encryption | Encrypt the transmitted data | |
| Fingerprint reverse crawling | Simple | The easiest Selenium recognition | Check to automatically create two variables |
| use | information | |
|---|---|---|
| specification | REST | Standard API, standard response |
| CDN | bootcdn.cn | Free front-end open source project CDN acceleration service |
| front end | JQuery 2.2.4 | A fast and concise JavaScript framework |
| Materialize | Front-end responsive framework based on Material Design | |
| twitter-bootstrap 3.4.1 | An open source toolkit for front-end development by Twitter | |
| font-awesome 4.7.0 | Set of icon font library and CSS framework | |
| metisMenu 3.0.6 | Vanilla-JS Collapse Menu Plug-in | |
| Proxy server | nginx | High-performance HTTP/reverse proxy server |
| Web Server | uWSGI | A web server |
| rear end | Flask 1.1.2 | Python lightweight web framework |
| Flask-RESTful 0.3.8 | A Flask plugin that supports the rapid creation of REST APIs |
| Tools/Script Manufacturing | |||
|---|---|---|---|
| content | information | ||
| Auto DL ChromeWebDriver | In Windows, automatically download the Selenium ChromeWebDriver script to get Chrome version information from the registry, and download the most consistent version of Web Driver from Google, so that Selenium can run normally. (In fact, it is more recommended to deploy docker on the server, pull Selenium's Image, and then deploy and remotely call it) | ||
| Slother | A layer is encapsulated on Selenium to deal with common problems that you will encounter when using Selenium to crawl | ||
| @Babel/traverse API document | The content of Babel/traverse API documents and use cases written by itself has been transferred to another warehouse. Since Babel official does not provide Babel/traverse documents, it can only record the content and understand/write it by itself based on the source code content. There may be errors. Please correct it. | ||
| Font Encryption Detective | Defont encryption script based on OCR | ||
November 7, 2021