This article analyzes the difference between character streams and byte streams in Java for your reference. The specific content is as follows
1. What is flow
The stream in Java is an abstraction of byte sequences. We can imagine there is a water pipe, but now it is no longer water flowing in the water pipe, but a byte sequence. Like water flows, streams in Java also have a "direction of flow". An object from which a sequence of bytes can be read is called an input stream; an object to which a sequence of bytes is written is called an output stream.
2. Byte Stream
The most basic unit of byte stream processing in Java is a single byte, which is usually used to process binary data. The two most basic byte stream classes in Java are InputStream and OutputStream, which represent the group of basic input byte streams and output byte streams, respectively. Both the InputStream class and the OutputStream class are abstract classes. In actual use, we usually use a series of subclasses of them provided in the Java class library. Let’s take the InputStream class as an example to introduce the byte stream in Java.
The InputStream class defines a basic method read for reading bytes from a byte stream. The definition of this method is as follows:
public abstract int read() throws IOException;
This is an abstract method, that is, any input byte stream class derived from InputStream needs to implement this method. The function of this method is to read a byte from the byte stream, and return -1 if it reaches the end, otherwise return the read byte. What we need to note about this method is that it will keep blocking and returning a read byte or -1. In addition, byte streams do not support caching by default, which means that every time the read method is called, the operating system will request the operating system to read one byte, which is often accompanied by disk IO, so it will be relatively inefficient. Some friends may think that the overloaded method of read in the InputStream class with byte array as parameters can read multiple bytes at a time without frequent disk IO. So is this the case? Let's take a look at the source code of this method:
public int read(byte b[]) throws IOException { return read(b, 0, b.length);}It calls another version of the read overload method, so we will continue to follow:
public int read(byte b[], int off, int len) throws IOException { if (b == null) { throw new NullPointerException(); } else if (off < 0 || len < 0 || len > b.length - off) { throw new IndexOutOfBoundsException(); } else if (len == 0) { return 0; } int c = read(); if (c == -1) { return -1; } b[off] = (byte)c; int i = 1; try { for (; i < len ; i++) { c = read(); if (c == -1) { break; } b[off + i] = (byte)c; } } catch (IOException ee) { } return i; }From the above code, we can see that in fact, the read(byte[]) method also uses a loop to call the read() method to read into a byte array "at one time", so essentially this method does not use a memory buffer. To use memory buffers to improve read efficiency, we should use BufferedInputStream.
3. Character stream
The most basic unit of character stream processing in Java is the Unicode symbol (2 bytes in size), which is usually used to process text data. The so-called Unicode symbol is a Unicode code unit with a range of 0x0000~0xFFFF. Each number in the above range corresponds to a character. The String type in Java encodes characters in Unicode rules by default and then stores them in memory. However, unlike stored in memory, data stored on disk usually has various encoding methods. Using different encoding methods, the same characters will have different binary representations. Actually, character streams work like this:
Output character stream: convert the character sequence (actually a Unicode symbol sequence) to the byte sequence under the specified encoding method, and then write it to the file;
Input character stream: Decode the sequence of bytes to be read into the corresponding character sequence (actually the Unicode symbol sequence) in the specified encoding method so that it can be stored in memory.
We use a demo to deepen our understanding of this process. The sample code is as follows:
import java.io.FileWriter;import java.io.IOException;public class FileWriterDemo { public static void main(String[] args) { FileWriter fileWriter = null; try { try { fileWriter = new FileWriter("demo.txt"); fileWriter.write("demo"); } finally { fileWriter.close(); } } catch (IOException e) { e.printStackTrace(); } }}In the above code, we use FileWriter to write the four characters "demo" to demo.txt. We use the hexadecimal editor WinHex to view the content of demo.txt:
As can be seen from the above figure, the "demo" we wrote is encoded as "64 65 6D 6F", but we do not explicitly specify the encoding method in the above code. In fact, when we do not specify, the operating system's default character encoding method is used to encode the characters we want to write.
Since the character stream actually needs to complete the conversion of Unicode symbol sequence to the corresponding encoding method before output, it will use a memory buffer to store the converted byte sequence, and wait for the conversion to be completed before writing to the disk file together.
4. The difference between character stream and byte stream
After the above description, we can know that the main differences between byte streams and character streams are reflected in the following aspects:
The basic unit of byte stream operation is bytes; the basic unit of character stream operation is Unicode symbols.
By default, the byte stream does not use buffers; the character stream uses buffers.
A byte stream is usually used to process binary data. In fact, it can process any type of data, but it does not support writing or reading Unicode symbols directly; a character stream usually processes text data, which supports writing and reading Unicode symbols.
The above are some of my understanding of character streams and byte streams in Java. If there are any unclear or inaccurate descriptions, I hope you can correct them. Thank you.