Detailed explanation of the Buffer module in NodeJS

Author：Eve Cole Update Time：2025-05-01 13:48:02

1. Opening analysis

The so-called buffer buffer means "temporary storage area", which is a piece of memory that temporarily stores input and output data.

The JS language itself only has string data types and no binary data types. Therefore, NodeJS provides a global constructor Buffer that is peer to String to provide operations on binary data. In addition to reading files to get Buffer instances, it can also be constructed directly, for example:

The code copy is as follows:

var buffer = new Buffer([ 0x68, 0x65, 0x6c, 0x6c, 0x6f ]) ;

Buffer is similar to a string. In addition to using the .length attribute to obtain the byte length, you can also use the [index] method to read the bytes at the specified position, for example:

The code copy is as follows:

buffer[0] ; // 0x68;

Buffer and string can be converted to each other, for example, binary data can be converted to strings using a specified encoding:

The code copy is as follows:

var str = buffer.toString("utf-8"); // hello

Converts a string to binary data under the specified encoding:

The code copy is as follows:

var buffer= new Buffer("hello", "utf-8"); // <Buffer 68 65 6c 6c 6f>

A little difference:

There is an important difference between a Buffer and a String. The string is read-only, and any modification to the string results in a new string, and the original string remains unchanged.

As for Buffer, it is more like a C-language array that can do pointer operations. For example, the bytes at a certain position can be directly modified in the [index] method.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

The slice method does not return a new buffer, but rather returns a pointer to a location in the middle of the original buffer, as shown below.

[ 0x68, 0x65, 0x6c, 0x6c, 0x6f ]

^ ^

| |

bin bin.slice(2)

Therefore, the modification of the buffer returned by the slice method will act on the original buffer, for example:

The code copy is as follows:

var buffer= new Buffer([ 0x68, 0x65, 0x6c, 0x6c, 0x6f ]) ;

var sub = bin.slice(2);

sub[0] = 0x65;

console.log(buffer); // <Buffer 68 65 65 6c 6f>

If you want to copy a Buffer, you must first create a new Buffer and copy the data in the original Buffer through the .copy method.

This is similar to applying for a new piece of memory and copying the data in existing memory over. Here is an example.

The code copy is as follows:

var buffer= new Buffer([ 0x68, 0x65, 0x6c, 0x6c, 0x6f ]) ;

var dup = new Buffer(bin.length) ;

buffer.copy(dup) ;

dup[0] = 0x48;

console.log(buffer); // <Buffer 68 65 6c 6c 6f>

console.log(dup) ; // <Buffer 48 65 65 6c 6f>

In short, Buffer extends JS's data processing capabilities from strings to arbitrary binary data.

The above briefly lets you understand what a Buffer is. Let's talk about how to use it and the specific usage scenarios below.

Second, talk about Buffer

JavaScript is very friendly to string processing, and whether it is a wide byte or single byte string, it is considered a string. Node needs to process network protocols, operating databases, processing pictures, file uploads, etc., and also needs to process a large amount of binary data. The strings that come with them are far from meeting these requirements, so Buffer came into being.

Buffer structure

Buffer is a typical module that combines Javascript and C++. The performance-related part is implemented in C++, and the non-performance-related part is implemented in Javascript.

Node is installed into memory when the process starts and puts it into the global object, so there is no need to require it.

Buffer object: Similar to an array, its elements are double digits in hexadecimal.

Buffer memory allocation

The memory allocation of the Buffer object is not in the heap memory of V8, but it implements memory application at the C++ level of Node.

In order to efficiently use the memory application, Node uses the slab allocation mechanism, which is a dynamic memory management mechanism that applies various *nix operating systems. There are three states of slab:

(1) full: Fully allocated state

(2) partial: partial allocation status

(3) empty: Not assigned state

Buffer conversion

Buffer objects can be converted to strings, and the supported encoding types are as follows:

ASCII, UTF-8, UTF-16LE/UCS-2, Base64, Binary, Hex

String to Buffer

new Buffer(str, [encoding]), default UTF-8

buf.write(string, [offset], [length], [encoding])

Buffer to string

buf.toString([encoding], [start], [end])

Coding types that are not supported by Buffer

Determine whether it supports it by Buffer.isEncoding(encoding)

iconv-lite: Pure JavaScript implementation, lighter, better performance without C++ to javascript conversion

iconv: Calling the libiconv library in C++ completes

Buffer splicing

Note "res.on('data', function(chunk) {})", where the parameter chunk is a Buffer object. Directly using + stitching will automatically be converted into a string. For wide byte characters, garbled code may occur.

Solution:

(1) Through the setEncoding() method in the readable stream, this method allows the data event to pass a Buffer object, but an encoded string, and the StringEncoder module is used internally.

(2) Storing the Buffer object into an array, and finally assembling it into a large Buffer and then encoding it into a string output.

Buffer is widely used in file I/O and network I/O, and its performance is of great importance and is much higher than that of ordinary strings.

In addition to the performance loss of string conversion, when using a Buffer, a highWaterMark setting is crucial to the performance impact when reading a file.

a. HighWaterMark settings have a certain impact on the allocation and use of Buffer memory.

b. The highWaterMark setting is too small, which may lead to too many system calls.

When should I use buffer, when should I not use it------- Pure JavaScript supports unicode code but does not support binary. When solving TCP streams or file streams, it is necessary to process streams. When we save non-utf-8 strings, binary and other formats, we must use "Buffer".

3. Introduce examples

The code copy is as follows:

var buf = new Buffer("this is text concat test !") ,str = "this is text concat test !" ;

console.time("buffer concat test!");

var list = [] ;

var len = 100000 * buf.length ;

for(var i=0;i<100000;i++){

list.push(buf);

len += buf.length ;

}

var s1 = Buffer.concat(list, len).toString();

console.timeEnd("buffer concat test !");

console.time("string concat test !");

var list = [] ;

for (var i = 100000; i >= 0; i--) {

list.push(str);

}

var s2 = list.join("");

console.timeEnd("string concat test !");

The following are the running results:

The reading speed is definitely faster, and the buffer also requires the operation of toString(). So when we save strings, we still need to use string. Even if we splice strings in large strings, the speed of strings will not be slower than buffers.

So when do we need to use buffers again? When there is no way, when we save non-utf-8 strings, binary and other formats, we must use them.

Four, let's summarize

(1) JavaScript is suitable for processing Unicode encoded data, but is not friendly to the processing of binary data.

(2), so when processing TCP streams or file systems, it is necessary to process octet streams.

(3), Node has several methods for processing, creating and consuming octet streams.

(4) The raw data is stored in a Buffer instance. A Buffer is similar to an integer array, but its memory is allocated outside the V8 stack. The size of a Buffer cannot be changed.

(5), the encoding types processed are: ascii, utf8, utf16le, ucs2 (alias for utf16le), base64, binary, hex.

(6) Buffer is a global element, and a Buffer instance is obtained by directly new Buffer().