This article mainly studies the problem of Java network crawler connection timeout, as follows.
In network crawlers, the following errors are often encountered. That is, the connection timed out. For this problem, the general solution is: set the connection time and request time for a longer period of time. If the connection timeout occurs, re-request [Set the number of re-requests].
Exception in thread "main" java.net.ConnectException: Connection timed out: connect
The following code is a sample program that uses httpclient to solve the connection timeout. Go directly to the program.
package daili;import java.io.IOException;import java.net.URI;import org.apache.http.HttpRequest;import org.apache.http.HttpResponse;import org.apache.http.client.ClientProtocolException;import org.apache.http.client.methods.HttpGet;import org.apache.http.client.params.CookiePolicy;import org.apache.http.client.protocol.ClientContext;import org.apache.http.impl.client.BasicCookieStore;import org.apache.http.impl.client.CloseableHttpClient;import org.apache.http.impl.client.DefaultHttpClient;import org.apache.http.impl.client.DefaultHttpClient2;import org.apache.http.params.HttpConnectionParams;import org.apache.http.params.HttpParams;import org.apache.http.protocol.BasicHttpContext;import org.apache.http.protocol.ExecutionContext;import org.apache.http.protocol.HttpContext;import org.apache.http.util.EntityUtils;/* * author:College of Management, Hefei University of Technology*[email protected]*/public class Test1 {public static void main(String[] args) throws ClientProtocolException, IOException, InterruptedException {getRawHTML("http://club.autohome.com.cn/bbs/forum-c-2098-1.html#pvareaid=103447");}public static String getRawHTML ( String url ) throws ClientProtocolException, IOException, InterruptedException{//Initialize DefaultHttpClient httpclient = new DefaultHttpClient();httpclient.getParams().setParameter("http.protocol.cookie-policy", CookiePolicy.BROWSER_COMPATIBILITY);//Set parameters HttpParams params = httpclient.getParams();//Connection time HttpConnectionParams.setConnectionTimeout(params, 6000);HttpConnectionParams.setSoTimeout(params, 6000*20);//Timeout re-requests DefaultHttpRequestRetryHandler dhr = new DefaultHttpRequestRetryHandler(5,true);HttpContext localContext = new BasicHttpContext();HttpRequest request2 = (HttpRequest) localContext.getAttribute( ExecutionContext.HTTP_REQUEST);httpclient.setHttpRequestRetryHandler(dhr);BasicCookieStore cookieStore = new BasicCookieStore();BasicClientCookie2 cookie = new BasicClientCookie2("Content-Type","text/html;charset=UTF-8");BasicClientCookie2 cookie1 = new BasicClientCookie2("User-Agent","Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");cookieStore.addCookie(cookie);cookieStore.addCookie(cookie1);localContext.setAttribute(ClientContext.COOKIE_STORE, cookieStore);HttpGet request = new HttpGet();request.setURI(URI.create(url));HttpResponse response = null;String rawHTML = "";response = httpclient.execute(request,localContext);int StatusCode = response.getStatusLine().getStatusCode();//Get the response status code System.out.println(StatusCode); if(StatusCode == 200){//StatusCode 200 indicates that the response is successful//Get the entity content rawHTML = EntityUtils.toString (response.getEntity()); System.out.println(rawHTML);//Output the entity content EntityUtils.consume(response.getEntity());//Consuming entity} else {//Close the streaming entity of HttpEntityEntityUtils.consume(response.getEntity());//Consumes the entity Thread.sleep(20*60*1000);//If an error is reported, take a 30-minute break}httpclient.close();System.out.println(rawHTML);return rawHTML;}}
result:
Summarize
The above is all the content of this article about the Java web crawler connection timeout solution code, I hope it will be helpful to everyone. Interested friends can continue to refer to other related topics on this site. If there are any shortcomings, please leave a message to point it out. Thank you friends for your support for this site!