Apache POI is a free and open source cross-platform Java API written in Java. Apache POI provides the function of Java programs to read and write Microsoft Office format archives.
Project download page: http://poi.apache.org/download.html
Apache POI is a Java API that creates and maintains various Java APIs that comply with the Office Open XML (OOXML) standard and Microsoft's OLE 2 composite document format (OLE2). It can be used to read and create and modify MS Excel files using Java. Moreover, you can also use Java to read and create MS Word and MSPowerPoint files. Apache POI provides Java operation Excel solution.
Example of reading Excel document
We use HSSFWorkbook in POI to read Excel data.
public void test(File file) throws IOException { InputStream inp = new FileInputStream(file); HSSFWorkbook workbook = new HSSFWorkbook(inp); // workbook...traversal operation} The above code shows that it is OK to read the file of Excel2003 (xls), but once the file of Excel2007 (xlsx) is read, an exception will be reported: "The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)"
After reviewing the information, Excel 2007 version of Excel files need to be read using XSSFWorkbook, as follows:
public void test(File file) throws IOException { InputStream inp = new FileInputStream(file); XSSFWorkbook workbook = new XSSFWorkbook(inp); // workbook...traversal operation} Note: XSSFWorkbook requires additional import of poi-ooxml-3.9-sources.jar and poi-ooxml-schemas-3.9.jar.
In this way, there is no problem in importing Excel2007, but an exception is reported to import Excel2003.
Therefore, when importing Excel, try to judge the version of Excel and call different methods.
I have thought of using file suffix name to determine the type, but if someone changes the suffix of xlsx to xls, if he uses xlsx function to read, the result is an error; although the suffix name is correct, the file content encoding and so on are not correct.
Finally, it is recommended to use WorkbookFactory.create(inputStream) in poi-ooxml to create a Workbook, because both HSSFWorkbook and XSSFWorkbook implement the Workbook interface. The code is as follows:
Workbook wb = WorkbookFactory.create(is);
As you can imagine, in the WorkbookFactory.create() function, there must be a judgment on the file type. Let’s take a look at how the source code judges:
/** * Creates the appropriate HSSFWorkbook / XSSFWorkbook from * the given InputStream. * Your input stream MUST either support mark/reset, or * be wrapped as a {@link PushbackInputStream}! */ public static Workbook create(InputStream inp) throws IOException, InvalidFormatException { // If clearly doesn't do mark/reset, wrap up if(! inp.markSupported()) { inp = new PushbackInputStream(inp, 8); } if(POIFSFileSystem.hasPOIFSHeader(inp)) { return new HSSFWorkbook(inp); } if(POIXMLDocument.hasOOXMLHeader(inp)) { return new XSSFWorkbook(OPCPackage.open(inp)); } throw new IllegalArgumentException("Your InputStream was neither an OLE2 stream, nor an OOXML stream"); }You can see that appropriate Workbook objects are created separately according to the file type. It is judged based on the header information of the file. At this time, even if the suffix name is changed, it still cannot be passed.