Why do you need performance monitoring? Let’s talk about Node.js performance monitoring

Author：Eve Cole Update Time：2022-08-18 08:58:05

Why do you need performance monitoring? This article will take you through Node.js performance monitoring. I hope it will be helpful to you!

Why performance monitoring is needed

Node, as a runtime for Javascript on the server side, greatly enriches the application scenarios of Javascript.

However, Node.js Runtime itself is a black box. We cannot perceive the runtime status, and it is difficult to reproduce online problems.

Therefore, performance monitoring is the cornerstone of the "normal operation" of Node.js applications. Not only can various runtime indicators be monitored at any time, but it can also help troubleshoot abnormal scenario problems.

Component

performance monitoring can be divided into two parts:

collection and display of performance indicators
- . Process-level data: CPU, Memory, Heap, GC and other
- system-level data: disk occupancy, I/O load, TCP/UDP connection status, etc.
- Application layer data:
capture and analysis of performance data
such as QPS, slow HTTP, business processing link logs, etc.
- Heapsnapshot: heap memory snapshot
- Cpuprofile: CPU snapshot
- Coredump:

comparison of application crash snapshot solutions

From the picture above, you can see the advantages and disadvantages of the three current mainstream Node.js performance monitoring solutions. The following is a brief introduction to the composition of these three solutions:

Prometheus
- prom-client is the nodejs implementation of prometheus, which is used to collect performance indicators.
- Grafana is a A visualization platform used to display various data charts. The access to prometheus
- only supports the collection and display of performance indicators. Other snapshot tools are needed to troubleshoot problems to form a closed-loop
AliNode.
- Alinode is an extended runtime compatible with official nodejs, providing Some additional functions:
  - v8's runtime memory status monitoring
  - libuv's runtime status monitoring
  - Online fault diagnosis functions: heap snapshot, CPU Profile, GC Trace, etc.
- agenthub is a resident process used to collect performance indicators and report them.
  - Integrated agentx + The convenient tools of commdx
- form a closed loop from monitoring, display, snapshot, and analysis. The access is convenient and simple, but there are still risks when expanding the runtime.
Easy-Monitor
- xprofiler is responsible for real-time runtime status sampling and output performance logs (that is, performance data Fetching)
- xtransit is responsible for the collection and transmission of performance logs.
- The biggest difference from AliNode is the use Node.js Addon to implement the sampler

performance indicator

CPU

The CPU time consumption data of the current process can be obtained through process.cpuUsage() The unit of the return value is microseconds

user: the CPU time consumed by the process itself when the process is executed
system: the CPU time consumed by the system when the process is executed

Memory

The memory allocation data of the current process can be obtained through process.memoryUsage() . The unit of the return value is bytes

rss: resident memory, the total memory size allocated by the node process
heapTotal: the heap memory size applied for by v8
heapUsed: the heap used by v8 Memory
sizeexternal: The memory size occupied by C++ managed by v8
arrayBuffers: The memory size allocated to ArrayBuffer

As can be seen from the above figure, rss includes code segment ( Code Segment ), stack memory ( Stack ), and heap memory ( Heap ).

Code Segment: stores code segments.
Stack: stores local variables and management function calls.
Heap: stores objects, closures, Or all other

Heaps

can obtain analysis data of v8 heap memory and heap space through v8.getHeapStatistics() and v8.getHeapSpaceStatistics() The following figure shows the heap memory composition distribution of v8:

The heap memory space is first divided into spaces, and the space is divided into pages. The memory is paged according to 1MB alignment.

New Space: New generation space, used to store some object data with a relatively short life cycle, divided into two spaces (space type is semi space ): from space , to space
- Promotion conditions: still survive after two GCs in New space
Old Space: the old generation space, used to store objects promoted by New Space
Code Space: stores the executable code compiled by v8 JIT.
Map Space: stores the pointer object of the hidden class pointed to by Object. The hidden class pointer is recorded by v8 according to the runtime. The object layout structure is used to quickly access object members.
Large Object Space: used to store objects larger than 1MB that cannot be allocated to pages.

The garbage collection algorithm of

GC

v8 is divided into two categories:

Major GC: uses Mark-Sweep-Compact algorithm.
for object recycling in the old generation
: Scavenge algorithm is used to recycle objects in the new generation

Scavenge

Premise: New space is divided into two object spaces: from and to

Trigger timing: when New space is full.

Steps:

In from space , perform a breadth-first traversal
and find that the surviving (reachable) object
- has survived once (experienced a Scavange) , promoted to Old space and
- other copies to to space
When copying ends, there are only surviving objects in to space , from space is emptied,
exchange from space and to space , and starts the next round Scavenge .

is suitable for frequent recycling and insufficient memory. For large objects, the typical space-for-time strategy has the disadvantage of wasting twice as much space as

Mark-Sweep-Compact.

Three steps: marking, clearing, and organizing

. Trigger timing: when Old space is full.

Steps:

Marking (three-color marking method).
- White: represents recyclable objects.
- Black: represents non-recyclable objects, and all the references generated have been scanned.
- Gray: represents non-recyclable objects, and the references generated by them have not yet been scanned.
- Put the objects directly referenced by the V8 root object into a marking queue (explicit stack), and mark these objects as gray.
- Start depth from these objects. Prioritize traversal. Each time an object is accessed, pop the object out of marking queue and mark it black.
- Then mark all white objects referenced by the object as gray and push to marking queue . Repeat
- until all objects on the stack are popped. Until they are dropped, there are only two types of objects in the old generation: black (non-recyclable) and white (can be recycled).
- PS: When an object is too large and cannot be pushed to the stack with limited space, v8 will keep the object in gray and skip it. , mark the entire stack as overflowed (overflowed), wait for the stack to be cleared, and traverse the mark again, which will require an additional scan of the heap.
Sweep
- to clear the white objects
- will cause the memory space to be discontinuous
. Compact
- Sweep will cause the memory space to be discontinuous. It is helpful for new objects to enter the GC
- and move the black (survival) objects to one end of Old space , so that the cleared space is continuous and complete.
- Although it can solve the memory fragmentation problem, it will increase the pause time (slow execution speed)
- and there is not enough space for new students. Mark-compact is only used when allocating promoted objects

Stop-The-World

When v8 initially performs garbage collection, it needs to stop the program, scan the entire heap, and reclaim the memory before re-running the program. This behavior is called a full pause ( Stop-The-World ).

Although the active objects in the new generation are small and recycled frequently, a full stop has little impact. However, the surviving objects in the old generation are many and large, and pauses caused by marking, cleaning, and sorting etc. It will be more serious.

Optimization strategy

incremental recycling (Incremental Marking): In the Marking phase, when the heap reaches a certain size, incremental GC starts. After each time a certain amount of memory is allocated, the running program is paused and marking is done for a few milliseconds to tens of milliseconds. , and then resume the program.

This concept is actually a bit like the Fiber architecture in the React framework. Only during the browser's free time will it traverse the Fiber Tree to perform the corresponding tasks. Otherwise, the execution will be delayed, affecting the main thread's tasks as little as possible, avoiding application lags, and improving Application performance.

Concurrent Sweeping: Let other threads do sweeping at the same time without worrying about conflict with the main thread of the executing program.
Parallel Sweeping: Let multiple Sweeping threads work at the same time, improve the throughput of sweeping, and shorten the entire GC Periodic

space adjustment.

Because v8 has a default limit on the space of the new and old generations.

New space default limit: 32M for 64-bit systems and 16M for 32-bit systems.
Old space default limits: 1400M for 64-bit systems and 700M for 32-bit systems.

Therefore, node Two parameters are provided to adjust the upper space limit of the new and old generations

--max-semi-space-size : Set the maximum value of New Space space
--max-old-space-size : Set the maximum value of Old Space space

View The GC log

node also provides three ways to view GC logs:

--trace_gc : A line of log briefly describes the time, type, heap size changes and causes of each GC
--trace_gc_verbose : Displays each V8 heap after each GC Detailed status of the space
--trace_gc_nvp : Detailed key-value pair information of each GC, including GC type, pause time, memory changes, etc.

Since the GC log is relatively primitive and requires secondary processing, you can use v8-gc- developed by the AliNode team. The log-parser

snapshot tool

Heapsnapshot

takes a snapshot of the heap memory of a running program and can be used to analyze memory consumption and change

generation methods

.heapsnapshot Heapsnapshot files can be generated in the following ways:

using heapdump

Using v8’s heap-profile

Use the api
- v8.getHeapSnapshot()
provided by the built-in v8 module of nodejs
- v8.writeHeapSnapshot(fileName)
Using v8-profiler-next

The .heapsnapshot file generated by

the analysis method

.heapsnapshot be uploaded in Memory on the Chrome devtools toolbar, and the results will be displayed as shown below:

The default view is Summary view. Here we need to pay attention to the two rightmost columns: Shallow Size and Retained Size

Shallow Size : Indicates the size of the object itself allocated in the v8 heap memory.
Retained Size : Indicates the sum of Shallow Size of all referenced objects of the object.

When it is found that Retained Size is particularly large, there may be a memory leak inside the object. You can further expand to locate the problem.

The Comparison view is used to compare and analyze the heap snapshots of two different periods. The Delta column can be used to filter out the objects with the largest memory changes.

Cpuprofile

performs snapshot sampling of the CPU running the program, which can be used to analyze the CPU time and proportion.

There are several ways to generate a .cpuprofile file:

v8

-profiler (a tool officially provided by node, but it can no longer support node v10 or above) version and is no longer maintained)
v8-profiler-next (Chinese maintenance version, supports the latest node v18, under continuous maintenance)

This is a 5-minute CPU Profile sample collection

The .cpuprofile file generated Javascript Profiler

the analysis method

can be displayed in the Javascript Profiler of the Chrome devtools toolbar (not in the default tab, you need to open it in More on the right side of the toolbar). After selecting to upload the file, the display results are as follows:

The default view is the Heavy view. Here we see two columns: Self Time and Total Time

Self Time : represents the execution time of this function itself (excluding other calls).
Total Time : represents the execution time of this function (including other calling functions).

When it is found that the Total Time and Self Time deviate greatly, the

function

Total Time Self Time a CPU-intensive calculation that takes a lot of time. You can also conduct further troubleshooting.

Codedump

When the application unexpectedly crashes and terminates, the system will automatically record it. The process crashes the memory allocation information, Program Counter and stack pointer and other key information at that moment to generate core file

generation methods

. Three methods to generate .core files:

ulimit -c unlimited opens the kernel limit
node --abort-on-uncaught-exception Adding this parameter when starting node can generate a core file when an uncaught exception occurs in the application.
gcore <pid> Manually generate core file

analysis method

. After obtaining the .core file, analysis and diagnosis can be achieved through tools such as mdb, gdb, lldb, etc. The actual cause of process crash

llnode `which node` -c /path/to/core/dump

case analysis

observation

It can be observed from monitoring that the heap memory continues to increase, so heap snapshots are needed for troubleshooting

and analysis.

According to heapsnapshot we can analyze and find out that there is a newThing object that has always maintained a relatively large memory

.

It can be seen from the code that although the unused method is not called, the newThing object is referenced from theThing, causing it to always exist in the execution context of the replaceThing function and has not been released. This is

a summary

unused newThing theThing replaceThing cases caused by closures.

Common memory leaks include the following situations:

global variables,
closures,
timers,
event listening
caches.

Therefore, in the above situations, you must carefully consider whether the object in the memory will be automatically recycled. If it will not be automatically recycled, you need to Manual recycling, such as manually setting objects to null , removing timers, unbinding event listeners, etc.

This

concludes

this article. This article has given a detailed introduction to the entire Node.js performance monitoring system.

First, it introduces the problems solved by performance monitoring, its components, and a comparison of the advantages and disadvantages of mainstream solutions.

Then, the two major parts of performance indicators and snapshot tools are introduced in detail.

The performance indicators mainly focus on CPU, memory, heap space, and GC indicators. At the same time, the GC strategy and GC optimization plan of v8 are introduced.
The snapshot tools mainly include heap Snapshot, CPU snapshot, and Coredump during crash.

Finally, a simple memory leak case is reproduced from observation, analysis, and troubleshooting, and common memory leak situations and solutions are summarized.

I hope this article can help everyone understand the entire Node.js performance monitoring system.