Due to the need to work, I recently studied the function of HTML converting into PDF. The key technology of HTML to PDF is how to handle the complex CSS style in the webpage. By collecting data online, it is found that the current HTML converting solution is mainly divided into three categories:
Client mode : Call the client program in front desk and use the function of the client program to complete the PDF file conversion. This test tool is: wkhtmltopdf and Phantomjs. Java jar package solution analysis mode: Java code analysis CSS style, translation HTML file is PDF file. This test category is: Itext, Flying Sauser, PD4ML. JS front -end analysis mode: JS front -end analysis HTML file is PDF file. This test case is: HTML2CANVAS.For the solutions introduced on the Internet this time, combine the needs of actual projects, do one by one to do the following analysis in terms of performance and function.
1. Test page introductionBy viewing the introduction of each conversion case on the Internet, the simple HTML style, the general table style is supported by the above schemes when making PDF file conversion. However, considering the needs of the actual business, this test specially uses the CSS style of Bootstrap (v 3.3.6), and the page also applies the new features of CSS3. Based on this new feature, write a static HTML page. The display effect of the HTML page in the browser is as follows:
2.WKHTMLTOPDF testWKHTMLTOPDF is a tool developed using the WebKit webpage rendering engine to convert HTML into PDF, which can be integrated with multiple script language to convert documents. Official website address http://wkhtmltopdf.org/
Technical features: WKHTMLTOPDF can directly convert the web page browse in the browser into a PDF. He is a software that converts the HTML page into PDF (it needs to be installed on the server). When using the Java code, you can call the CMD instruction to complete the function of the webpage to PDF.
Function test: Enter the test instruction directly in CMD to view the processing progress.
The first parameter: the path where wkhtmltopdf.exe is located
The second parameter: need to be converted to the HTML page of PDF
The third parameter: PDF file path and file name
The export effect of the page is as follows:
Test description:
It is found that WKHTMLTOPDF has better support for Bootstap's CSS style as a whole. The new features of CSS3 are not well supported for round pictures. Some page styles will fail. For chart display, the export program of the Eachart chart will report an error and do not support it. However, EChart has a chart converted into a picture interface, which can be exported to the PDF by obtaining the picture address.
3.Phantomjs testPhantomjs is a headless browser based on the WebKit kernel, that is, there is no UI interface, that is, it is a browser, but it is a click -through, page turning, etc. for related operations to the procedure design. It provides the JavaScript API interface, that is, by writing the JS program, you can directly interact with the WebKit kernel. On this top, you can combine Java language, etc., and call JS related operations through Java, so that the previous C/C ++ can be better developed based on WebKit development. Restrictions of high -quality collectors. It also provides installation and use packages of different OS installations such as Windows, Linux, Mac, that is, it can develop and collect projects or automatic project testing on different platforms. Official website address http://phantomjs.org/
Phantomjs can be used as a webpage analysis. There are many functions. This time, only the screenshot function of the webpage is called. The test in CMD is as follows:
The test page is guided as follows:
Test description:
It was found that Phantomjs supports Bootstap's style. The new features of CSS3 are not well supported for round pictures. Some page styles will fail. For EChart chart display, it can also be exported directly. The effect is as follows:
3.itext and Flying SauserItext implements HTML2PDF, fast speed, poor error correction ability, and supports Chinese (requiring HTML to use Unicode encoding), but China supports a Chinese font, open source. Flying Sauser implements HTML2PDF, poor error correction ability, supports a variety of Chinese fonts (some styles cannot be recognized), open source.
Technical features: Analysis and processing of HTML CSS style based on Java programming, currently only support for simpler pages and styles. For the style of CSS3 and the complex CSS style compatibility, the CSS3 is extremely poor. When the page content is long, the processing time is slow. Reference address: https://code.google.com/archive/p/flying-saucer/
Test results: The test page of this experiment cannot come out. The effect of the ordinary test page is as follows:
Test description:
It is found that the two open source projects of ITEXT and Flying Sauser are basically not compatible with CSS3. Through consulting the data, this technology is relatively old. This open source project is not updated and maintained. For simple tables, exports of statistical data, and updated technologies include Bootstrap table, EasyUI DataGrid table export. This scheme introduced online is not recommended.
4.pd4ml testPD4ML is a class library of pure Java. HTML and CSS are used as page layout and content definition format to generate powerful tools for generating PDF documents, which can simplify the work of the end user to generate PDF. Reference website: http://www.pd4ml.com
The advantage of the software is:
The supported HTML tags and CSS attributes are relatively full, and the conversion distortion is relatively small. HTML+CSS can be used to achieve accurate layout control. The fault tolerance of web file tags and CSS grammar errors is better. Without additional control, support the transformation output of the picture.
The disadvantage of the software is:
Without open source, the latest DEMO version, after downloading and testing, it is found that Chinese conversion is not supported. You must buy a commercial version. (Here are very pit. Testing garbled problems can not be available, and it is found that it is not supported later). Some of the old versions after the cracking can solve the garbled problem, but the supported CSS style is not complete.
Test results:
Test description:
The new version of Chinese garbled, but supports some CSS styles. After the decipline of the old version, the style compatibility of the interface is poor, and the support for Bootsrtap is low. You can basically produce a data and show that there is no problem with the picture. Considering that it is a charging software and the performance is not perfect. For ordinary pages, template export or other tools can be used. It is not recommended.
5.html2canvas testHTML2CANVAS is a pretty good JavaScript class library. It uses some new features of HTML5 and CSS3 to achieve screenshots of the web page on the client. HTML2CANVAS uses the DOM and element style information of the page and renders it into a Canvas image to achieve the function of the page screenshot. It does not need any rendering from the server. The entire picture is created on the client browser. When the browser does not support Canvas, Flashcanvas or Explorercanvas will be used instead. The following browsers can well support the script: Firefox 3.5+, Google Chrome, Opera's new version, IE9 above browser. Because each browser renders the page different ways, the pictures generated are not the same. Although it is still in the development stage, it is still worth looking forward to. This plug -in depends on the jQuery plug -in, it is recommended to use the latest version.
Do not support cross -domain pictures and cannot use part of the browser plug -in without supporting SVG pictures, not supporting Flash, do not support iFream (can modify the original code of JS, support iFream)
When using HTML2CANVAS for this time, I found that many items pages can take screenshots normally, including EChart charts. Only the new features of a small amount of CSS3 are not supported. The screenshot effect is better. However, when testing, a fatal problem was found. When the page module called the HTML2CANVAS screenshot, it was found that some of the original css on the original page suddenly failed. After the tracking analysis, it was found that the JS function of HTML2CANVAS to handle the CSS style that it cannot recognize. Especially for hidden and displayed modules, it supports unfriendly.
Page screenshot effect is as follows:
However, the CSS of the original page is invalid, the page appears abnormal, the hidden style, the displayed style display is chaotic.
Test description:
It was found that HTML2CANVAS supports Bootstap's style better. The new features of CSS3 are not well supported for round pictures. Its main advantage is the light front end. For the style of changing the original page, the picture can be exported first, and the page is refreshed again.
6. SummaryThrough the test of the above cases, most of the commonly used HTML to convert to PDF introduced online. Most of them are simple HTML conversion available, but in practical applications, there are still many problems and difficult to apply. By analyzing the principle of these methods, the following conclusions can be obtained:
The HTML webpage is completely converted to PDF, and all solutions are insufficient. If it is only a form page page, the HTML style does not adopt the attribute of CSS3 as much as possible. It can be treated with client mode and HTML2CANVAS. HTML front -end style is developing rapidly, the new features of CSS3 have good effects, CSS defines new rules and grammar. The Java conversion classes such as ITEXT and Flying Sauser cannot be compatible with these changes at all, because they cannot write the conversion functions in time, and these open source projects are older technology. The later open source team has stopped maintenance and update. The essence of PD4ML is also the style of Java processing CSS. He is a commercial software. He has team support in compatible CSS3. Compared with Itext and Flying Sauser, it is more powerful in terms of performance and function. But it also does not support a small number of CSS styles. And the Chinese garbled problem is inconvenient. For the client browser kernel mode, Phantomjs and WKHTMLTOPDF are more powerful. The screenshot is only one of the small features. It can also be used for web analysis through it. It is recommended to use Phantomjs. The screenshot mode of HTML2CANVAS is flexible and is a lightweight front -end screenshot tool. At present, some of the functions are incomplete, but the overall effect is not bad. For the problems that affect the original page for some screenshots, the screenshot can be successfully stored first, and the page is refreshed once to achieve the effect of the screenshot to the PDF.
The above is all the contents of this article. I hope the content of this article will help everyone's learning or work ~ If you have any questions, you can leave a message to communicate. Thank you for your support for VEVB Wulin.com!