Tutorial on reading byte stream files in Java (I)

Author：Eve Cole Update Time：2025-08-18 17:48:02

Preface

In the previous article, we introduced the File type of abstract disk files. It is only used to abstractly describe a disk file or directory, but does not have the ability to access and modify the content of a file.

Java IO stream is a design used to read and write file contents. It can complete the data transfer of output disk file contents to memory or output memory data to disk files.

The design of Java IO streams is not perfect. It has designed a large number of classes, which has increased our understanding of IO streams, but there are only two major categories: one is byte streams for binary files, and the other is character streams for text files. In this article, we will first learn the principles and usage scenarios of related types of byte streams. The specific stream types mainly involved are as follows:

Base class byte stream Input/OutputStream

InputStream and OutputStream are the base classes for reading byte streams and writing byte streams respectively. All byte-related streams must inherit from any of them. As an abstract class, they also define the most basic read and write operations. Let's take a look:

Take InputStream as an example:

 public abstract int read() throws IOException;

This is an abstract method, and does not provide a default implementation, requiring that subclasses must be implemented. The purpose of this method is to return the next byte of the current file for you.

Of course, you will also find that the return value of this method is received using the integer type "int", so why not use "byte"?

First of all, the value returned by the read method must be an eight-bit binary, and the value interval that can be taken by an eight-bit binary is: "0000 0000, 1111 1111", that is, the range [-128, 127].

The read method also specifies that when the file is read to the end, that is, the file has no next byte for reading, the value -1 will be returned. So if byte is used as the return value type, then when the method returns a -1, should we determine whether this is the data content in the file or the end of the stream?

The int type occupies four bytes, and the three bytes in the high bit are all 0. We only use its lowest bit byte. When encountering the end of the stream flag, it returns -1 (32 1s) represented by four bytes, which is naturally different from the value -1 (24 0 + 8 1s) representing the data.

Next is also a read method, but InputStream provides a default implementation:

 public int read(byte b[]) throws IOException { return read(b, 0, b.length);}public int read(byte b[], int off, int len) throws IOException { //In order not to make the length too long, you can view the jdk source code by yourself}

These two methods are essentially the same. The first method is a special form of the second method, which allows an array of bytes to be passed in and requires the program to fill the bytes read in the file starting from the array index position 0 to fill the number of bytes in the length of the array.

The second method is a little broader, which allows you to specify the starting position and the total number of bytes.

There are several other methods in InputStream, which are basically not implemented in detail. Let's take a look at it briefly.

public long skip(long n): skip n bytes and return the actual number of bytes skipped
public void close(): close the stream and release the corresponding resources
public synchronized void mark(int readlimit)
public synchronized void reset()
public boolean markSupported()

The mark method will mark a flag at the current stream reading position, and the reset method will reset the read pointer to the flag.

In fact, it is impossible to reset back reading for file reading, but generally all bytes between the flag position and the reset point are temporarily saved. When the reset method is called, it is actually repeated reading from the saved temporary byte set, so readlimit is used to limit the maximum cache capacity.

The markSupported method is used to determine whether the current stream supports this "fallback" read operation.

OutputStream and InputStream are similar, except that one is written and the other is read. We will not repeat it here.

File Byte Stream FileInput/OutputStream

We are still focusing on FileInputStream, and FileOutputStream is similar.

First, FileInputStream has the following constructors to instantiate an object:

 public FileInputStream(String name) throws FileNotFoundException { this(name != null ? new File(name) : null);}

 public FileInputStream(File file) throws FileNotFoundException { String name = (file != null ? file.getPath() : null); SecurityManager security = System.getSecurityManager(); if (security != null) { security.checkRead(name); } if (name == null) { throw new NullPointerException(); } if (file.isInvalid()) { throw new FileNotFoundException("Invalid file path"); } fd = new FileDescriptor(); fd.attach(this); path = name; open(name);}

These two constructors are essentially the same, the former is the special form of the latter. In fact, don’t look at the latter method, most of which are just doing security verification. The core is an open method, which is used to open a file.

Mainly these two constructors, if the file does not exist or the file path and name are illegal, a FileNotFoundException will be thrown.

Remember we said that there is an abstract method read in the base class InputStream that requires all subclasses to be implemented, and FileInputStream is implemented using a local method:

 public int read() throws IOException { return read0();}private native int read0() throws IOException;

We have no way to explore the specific implementation of read0 for the time being, but you must be clear that the function of this read method is used to return the next byte in the stream, and return -1. It means that it is read to the end of the file and there are no bytes to read.

In addition, there are some other reading-related methods in FileInputStream, but most of them are implemented using local methods. Let's take a look here:

public int read(byte b[]): read b.length() bytes in the array
public int read(byte b[], int off, int len): read the number of bytes of the specified length into the array
public native long skip(long n): skip n bytes for reading
public void close(): Release flow resources

The internal methods of FileInputStream are basically just like this, and there are some advanced and complex ones that we cannot use for the time being. We will learn it later. Let’s take a brief look at an example of file reading:

 public static void main(String[] args) throws IOException { FileInputStream input = new FileInputStream("C://Users//yanga//Desktop//test.txt"); byte[] buffer = new byte[1024]; int len = input.read(buffer); String str = new String(buffer); System.out.println(str); System.out.println(len); input.close();}

The output result is very simple. It will print out the content in our test file and the actual number of bytes read out, but careful students will find out, how can you ensure that the content in the test file will not exceed 1024 bytes?

In order to fully read out the contents of the file, one solution is to define the buffer large enough to expect to store all the contents of the file as much as possible.

This method is obviously undesirable because it is impossible for us to realize the actual size of the file to be read. It is a very bad solution to simply create an oversized byte array.

The second way is to use our dynamic byte array stream, which can dynamically adjust the size of the internal byte array to ensure appropriate capacity, which we will introduce in detail later.

Regarding FileOutputStream, one more thing to emphasize is its constructor, which has the following two constructors:

 public FileOutputStream(String name, boolean append)public FileOutputStream(File file, boolean append)

The parameter append indicates whether the write operation of this stream is overwritten or appended, true means appended, false means overwritten.

ByteArrayInput/OutputStream

The so-called "byte array stream" is a stream that operates around a byte array. It does not read and write streams to files like other streams.

Although the byte array stream is not a file-based stream, it is still a very important stream, because the byte array encapsulated inside is not fixed, but dynamically extensible, and is often based on certain scenarios, which is very suitable.

ByteArrayInputStream is a stream of read byte arrays that can be instantiated by the following constructor:

 protected byte buf[];protected int pos;protected int count;public ByteArrayInputStream(byte buf[]) { this.buf = buf; this.pos = 0; this.count = buf.length;}public ByteArrayInputStream(byte buf[], int offset, int length)

buf is a byte array encapsulated inside ByteArrayInputStream. All read operations of ByteArrayInputStream revolve around it.

Therefore, when instantiating a ByteArrayInputStream object, at least one target byte array is passed in.

The pos attribute is used to record the position of the current stream reading, and count records the latter position of the last valid byte index of the target byte array.

After understanding this, it is not difficult to read various ways to read it:

 //Read the next byte public synchronized int read() { return (pos < count) ? (buf[pos++] & 0xff) : -1;}//Read len bytes and put them in byte array b public synchronized int read(byte b[], int off, int len){ //Same, the method body is longer, everyone check their own jdk}

In addition, ByteArrayInputStream also implements the "repeat read" operation very simply.

 public void mark(int readAheadLimit) { mark = pos;}public synchronized void reset() { pos = mark;}

Because ByteArrayInputStream is based on byte arrays, all repeated read operations are easier to implement, and it is enough to implement based on indexes.

ByteArrayOutputStream is a byte array stream written. Many implementations still have their own characteristics. Let’s take a look together.

First, these two properties are required:

 protected byte buf[];//The count here represents the number of valid bytes in buf protected int count;

Constructor:

 public ByteArrayOutputStream() { this(32);} public ByteArrayOutputStream(int size) { if (size < 0) { throw new IllegalArgumentException("Negative initial size: "+ size); } buf = new byte[size];}

The core task of the constructor is to initialize the internal byte array buf, allowing you to pass in size to explicitly limit the initialized byte array size, otherwise the default length will be 32.

Write content to ByteArrayOutputStream from the outside:

 public synchronized void write(int b) { ensureCapacity(count + 1); buf[count] = (byte) b; count += 1;}public synchronized void write(byte b[], int off, int len){ if ((off < 0) || (off > b.length) || (len < 0) || ((off + len) - b.length > 0)) { throw new IndexOutOfBoundsException(); } ensureCapacity(count + len); System.arraycopy(b, off, buf, count, len); count += len;}

Seeing that, the first step of all write operations is to call the ensureCapacity method, the purpose is to ensure that the byte array in the current stream can accommodate this write operation.

This method is also very interesting. If you find that the internal buf cannot support this write operation after calculation, the growth method will be called for expansion. The principle of capacity expansion is similar to that of ArrayList, expanded to twice the original capacity.

In addition, ByteArrayOutputStream also has a writeTo method:

 public synchronized void writeTo(OutputStream out) throws IOException { out.write(buf, 0, count);}

Write our internally encapsulated byte array into an output stream.

Some of the remaining methods are also very commonly used:

public synchronized byte toByteArray()[]: Returns the internally encapsulated byte array
public synchronized int size(): Returns the number of valid bytes of buf
public synchronized String toString(): Returns the string form corresponding to the array

Note that although these two streams are called "streams", they essentially do not allocate some resources like real streams, so we don't need to call its close method, and it's useless to call it (the official said, has no effect).

The test cases will not be released. I will upload all the code cases used in this article later. You can choose to download them by yourself.

In order to control the length, the remaining learning will be placed in the next article.

All codes, images, and files in the article are stored in the cloud on my GitHub:

(https://github.com/SingleYam/overview_java)

You can also choose to download locally.

Summarize

The above is the entire content of this article. I hope that the content of this article has certain reference value for everyone's study or work. If you have any questions, you can leave a message to communicate. Thank you for your support to Wulin.com.