Because of project requirements, vehicle brand information and vehicle system information are needed. Yesterday, I spent a day studying jsoup crawling website information. The project is written using maven+spring+springmvc+mybatis.
jsoup development guide address
This is the address of the website that needs to be crawled https://car.autohome.com.cn/zhaoche/pinpai/
1. First add dependencies in pom.xml
Because I need to save the image locally, I added the commons-net package.
<!-- https://mvnrepository.com/artifact/org.jsoup/jsoup --> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.10.3</version> </dependency><!-- https://mvnrepository.com/artifact/commons-net/commons-net --> <dependency> <groupId>commons-net</groupId> <artifactId>commons-net</artifactId> <version>3.3</version> </dependency>
2. Implementation of crawler code
@Controller@RequestMapping("/car/")public class CarController { //Image saving path private static final String saveImgPath="C://imgs"; /** * @Title: insert Brand name and image crawling and adding* @Description: * @param @throws IOException * @return void * @throws * @date January 29, 2018 at 4:42:57 pm */ @RequestMapping("add") public void insert() throws IOException { //Define the address where the data you want to crawl String url = "https://car.autohome.com.cn/zhaoche/pinpai/"; //Get the web page text Document doc = Jsoup.connect(url).get(); //Get the text content based on the class name Elements elementsByClass = doc.getElementsByClass("uibox-con"); //Transf the collection of the class for (Element element: elementsByClass) { //Get the number of child tags of the class int childNodeSize_1 = element.childNodeSize(); //Loop to get the content in the child tag for (int i = 0; i < childNodeSize_1; i++) { //Get the car logo image address String tupian = element.child(i).child(0).child(0).child(0).child(0).attr("src"); //Get the brand name String pinpai = element.child(i).child(0).child(1).text(); //Output the content to see if it is correct System.out.println("car logo image address----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- URL url1 = new URL(tupian_1); URLConnection uri=url1.openConnection(); //Get the data stream InputStream is=uri.getInputStream(); //Get the suffix name String imageName = tupian.substring(tupian.lastIndexOf("/") + 1,tupian.length()); //Write the data stream OutputStream os = new FileOutputStream(new File(saveImgPath, imageName)); byte[] buf = new byte[1024]; int p=0; while((p=is.read(buf))!=-1){ os.write(buf, 0, p); } /** * Because there are multiple joint venture factories under each brand* For example, FAW-Volkswagen and Shanghai Volkswagen and imported Volkswagen* All need to cycle to obtain the name of the joint venture factory and its subsidiaries*/ //Get the number of car series int childNodeSize_2 = element.child(i).child(1).child(0).childNodeSize(); /** * Get the number of sub-labels under the label* If it is equal to 1, there are no other joint venture factories*/ int childNodeSize_3 = element.child(i).child(1).childNodeSize(); if(childNodeSize_3==1){ //Cyclically obtain vehicle system information for (int j = 0; j < childNodeSize_2; j++) { String chexi = element.child(i).child(1).child(0).child(j).child(0).child(0).child(0).text(); System.out.println("Car system---------------" + chexi); } }else{ /** * If childNodeSize_3 is greater than 1 *, there are multiple joint venture factories*/ //Capture the vehicle systems under each joint venture factory respectively for (int j = 0; j < childNodeSize_3; j++) { int childNodeSize_4 = element.child(i).child(1).child(j).childNodeSize(); /** * If j is an odd number, it is the name of the joint venture factory* Otherwise it is the vehicle system information*/ int k = j%2; if(k==0){ //Get joint venture factory information String hezipinpai = element.child(i).child(1).child(j).child(0).text(); System.out.println("Joint Venture Name-------------" + hezipinpai); }else{ //int childNodeSize_5 = element.child(i).child(1).child(0).childNodeSize(); //Cyclically obtains the information of the vehicle system of the joint venture factory for(int l = 0; l < childNodeSize_4; l++){ String chexi = element.child(i).child(1).child(j).child(l).child(0).child(0).text(); System.out.println("Car system-------------" + chexi); } } } System.out.println("*********************"); System.out.println("***************************"); } } }}3. Operation results
The above is all the content of this article. I hope it will be helpful to everyone's learning and I hope everyone will support Wulin.com more.