1. About InputStream.read()
When reading data from a data stream, the graph is simple and the InputStream.read() method is often used. This method is to read only one byte from the stream at a time, which will be very inefficient. A better way is to use the InputStream.read(byte[] b) or InputStream.read(byte[] b, int off, int len) method to read multiple bytes at a time.
2. About the available() method of the InputStream class
When reading multiple bytes at a time, the InputStream.available() method is often used. This method can first know how many bytes in the data stream can be read before reading and writing operations. It should be noted that if this method is used to read data from local files, it will generally not encounter problems, but if it is used for network operations, it will often encounter some trouble. For example, when Socket is communicating, the other party clearly sent 1,000 bytes, but his program calls the available() method only gets 900, or 100, or even 0. It feels a bit inexplicable and can't find the reason. In fact, this is because network communication is often intermittent, and a string of bytes is often sent in several batches. The local program calls the available() method sometimes gets 0. This may be because the other party has not responded yet, or it may be because the other party has responded, but the data has not been delivered to the local area. The other party sent 1,000 bytes to you, and maybe it arrived in 3 batches. You have to call the available() method 3 times to get all the data.
If you write the code like this:
int count = in.available(); byte[] b = new byte[count]; in.read(b);
There are often errors when performing network operations because when you call the available() method, the data sent to the send may not have arrived yet, and the count you get is 0.
Need to change it like this:
int count = 0; while (count == 0) { count = in.available(); } byte[] b = new byte[count]; in.read(b);3. About InputStream.read(byte[] b) and InputStream.read(byte[] b,int off,int len)
Both methods are used to read multiple bytes from a stream. Experienced programmers will find that these two methods often cannot read the number of bytes they want to read. For example, in the first method, programmers often hope that the program can read b.length bytes, but the actual situation is that the system often cannot read so many. After carefully reading the Java API instructions, you will find that this method does not guarantee that it can read so many bytes, it can only guarantee that it can read up to so many bytes (at least 1). Therefore, if you want the program to read count bytes, it is best to use the following code:
byte[] b = new byte[count]; int readCount = 0; // The number of bytes that have been successfully read while (readCount < count) { readCount += in.read(bytes, readCount, count - readC ount) ; }This code can ensure that count bytes are read unless an IO exception is encountered in the middle or the end of the data stream (EOFException)
4. Example of reading PowerPoint files
import java.io.InputStream; import org.apache.lucene.document.Document; import org.apache.poi.hslf.HSLFSlideShow; import org.apache.poi.hs lf.model.TextRun; import org.apache.poi.hslf .model.Slide; import org.apache.poi.hslf.usermodel.SlideShow; public Document getDocument(Index index, String url, String title, InputStream is) throws DocCenterException { StringBuffer content = new StringBuffer(""); try{ SlideShow ss = new SlideShow(new HSLFSlideShow(is));//is is the InputStream of the file, create SlideShow Slide[] slides = ss.getSlides();//Get each slide for(int i=0;i < slides.length;i++){ TextRun[] t = slides[i].getTextRuns();//In order to obtain the text content of the slide, create TextRun for(int j=0;j<t.length;j++){ content .append(t[j].getText());//Here the text content will be added to the content} content.append(slides[i].getTitle()); } index.AddIndex(url, title, content .toString()); }catch(Exception ex){ System.out.println(ex.toString()); } return null; }