
Node, as a runtime for Javascript on the server side, greatly enriches the application scenarios of Javascript.
However, Node.js Runtime itself is a black box. We cannot perceive the runtime status, and it is difficult to reproduce online problems.
Therefore, performance monitoring is the cornerstone of the "normal operation" of Node.js applications. Not only can various runtime indicators be monitored at any time, but it can also help troubleshoot abnormal scenario problems.
performance monitoring can be divided into two parts:
collection and display of performance indicators
capture and analysis of performance data
such as QPS, slow HTTP, business processing link logs, etc.
From the picture above, you can see the advantages and disadvantages of the three current mainstream Node.js performance monitoring solutions. The following is a brief introduction to the composition of these three solutions:
Prometheus
AliNode.
Alinode is an extended runtime compatible with official nodejs, providing Some additional functions:
agenthub is a resident process used to collect performance indicators and report them.
form a closed loop from monitoring, display, snapshot, and analysis. The access is convenient and simple, but there are still risks when expanding the runtime.
Easy-Monitor
Node.js Addon to implement the sampler
The CPU time consumption data of the current process can be obtained through process.cpuUsage() The unit of the return value is microseconds

The memory allocation data of the current process can be obtained through process.memoryUsage() . The unit of the return value is bytes

As can be seen from the above figure, rss includes code segment ( Code Segment ), stack memory ( Stack ), and heap memory ( Heap ).
can obtain analysis data of v8 heap memory and heap space through v8.getHeapStatistics() and v8.getHeapSpaceStatistics() The following figure shows the heap memory composition distribution of v8:

The heap memory space is first divided into spaces, and the space is divided into pages. The memory is paged according to 1MB alignment.
New Space: New generation space, used to store some object data with a relatively short life cycle, divided into two spaces (space type is semi space ): from space , to space
Old Space: the old generation space, used to store objects promoted by New Space
Code Space: stores the executable code compiled by v8 JIT.
Map Space: stores the pointer object of the hidden class pointed to by Object. The hidden class pointer is recorded by v8 according to the runtime. The object layout structure is used to quickly access object members.
Large Object Space: used to store objects larger than 1MB that cannot be allocated to pages.
The garbage collection algorithm of
Mark-Sweep-Compact algorithm.Scavenge algorithm is used to recycle objects in the new generation
Premise: New space is divided into two object spaces: from and to
Trigger timing: when New space is full.
Steps:
In from space , perform a breadth-first traversal
and find that the surviving (reachable) object
Old space andto spaceWhen copying ends, there are only surviving objects in to space , from space is emptied,
exchange from space and to space , and starts the next round Scavenge .
is suitable for frequent recycling and insufficient memory. For large objects, the typical space-for-time strategy has the disadvantage of wasting twice as much space as

Three steps: marking, clearing, and organizing
. Trigger timing: when Old space is full.
Steps:
Marking (three-color marking method).
marking queue (explicit stack), and mark these objects as gray.pop the object out of marking queue and mark it black.push to marking queue . RepeatSweep
. Compact
Old space , so that the cleared space is continuous and complete.When v8 initially performs garbage collection, it needs to stop the program, scan the entire heap, and reclaim the memory before re-running the program. This behavior is called a full pause ( Stop-The-World ).
Although the active objects in the new generation are small and recycled frequently, a full stop has little impact. However, the surviving objects in the old generation are many and large, and pauses caused by marking, cleaning, and sorting etc. It will be more serious.
This concept is actually a bit like the Fiber architecture in the React framework. Only during the browser's free time will it traverse the Fiber Tree to perform the corresponding tasks. Otherwise, the execution will be delayed, affecting the main thread's tasks as little as possible, avoiding application lags, and improving Application performance.
Because v8 has a default limit on the space of the new and old generations.
New space default limit: 32M for 64-bit systems and 16M for 32-bit systems.Old space default limits: 1400M for 64-bit systems and 700M for 32-bit systems.Therefore, node Two parameters are provided to adjust the upper space limit of the new and old generations
--max-semi-space-size : Set the maximum value of New Space space--max-old-space-size : Set the maximum value of Old Space spacenode also provides three ways to view GC logs:
--trace_gc : A line of log briefly describes the time, type, heap size changes and causes of each GC--trace_gc_verbose : Displays each V8 heap after each GC Detailed status of the space--trace_gc_nvp : Detailed key-value pair information of each GC, including GC type, pause time, memory changes, etc.Since the GC log is relatively primitive and requires secondary processing, you can use v8-gc- developed by the AliNode team. The log-parser
takes a snapshot of the heap memory of a running program and can be used to analyze memory consumption and change
.heapsnapshot Heapsnapshot files can be generated in the following ways:
using heapdump

Using v8’s heap-profile

v8.getHeapSnapshot()
provided by the built-in v8 module of nodejs

v8.writeHeapSnapshot(fileName)

Using v8-profiler-next

.heapsnapshot be uploaded in Memory on the Chrome devtools toolbar, and the results will be displayed as shown below:

The default view is Summary view. Here we need to pay attention to the two rightmost columns: Shallow Size and Retained Size
Shallow Size : Indicates the size of the object itself allocated in the v8 heap memory.Retained Size : Indicates the sum of Shallow Size of all referenced objects of the object.When it is found that Retained Size is particularly large, there may be a memory leak inside the object. You can further expand to locate the problem.
The Comparison view is used to compare and analyze the heap snapshots of two different periods. The Delta column can be used to filter out the objects with the largest memory changes.

performs snapshot sampling of the CPU running the program, which can be used to analyze the CPU time and proportion.
There are several ways to generate a .cpuprofile file:
This is a 5-minute CPU Profile sample collection

The .cpuprofile file generated Javascript Profiler

The default view is the Heavy view. Here we see two columns: Self Time and Total Time
Self Time : represents the execution time of this function itself (excluding other calls).Total Time : represents the execution time of this function (including other calling functions).Total Time Self Time a CPU-intensive calculation that takes a lot of time. You can also conduct further troubleshooting.
When the application unexpectedly crashes and terminates, the system will automatically record it. The process crashes the memory allocation information, Program Counter and stack pointer and other key information at that moment to generate core file
. Three methods to generate .core files:
ulimit -c unlimited opens the kernel limitnode --abort-on-uncaught-exception Adding this parameter when starting node can generate a core file when an uncaught exception occurs in the application.gcore <pid> Manually generate core file. After obtaining the .core file, analysis and diagnosis can be achieved through tools such as mdb, gdb, lldb, etc. The actual cause of process crash
llnode `which node` -c /path/to/core/dump
It can be observed from monitoring that the heap memory continues to increase, so heap snapshots are needed for troubleshooting

According to heapsnapshot we can analyze and find out that there is a newThing object that has always maintained a relatively large memory

unused newThing theThing replaceThing cases caused by closures.
Common memory leaks include the following situations:
Therefore, in the above situations, you must carefully consider whether the object in the memory will be automatically recycled. If it will not be automatically recycled, you need to Manual recycling, such as manually setting objects to null , removing timers, unbinding event listeners, etc.
this article. This article has given a detailed introduction to the entire Node.js performance monitoring system.
First, it introduces the problems solved by performance monitoring, its components, and a comparison of the advantages and disadvantages of mainstream solutions.
Then, the two major parts of performance indicators and snapshot tools are introduced in detail.
Finally, a simple memory leak case is reproduced from observation, analysis, and troubleshooting, and common memory leak situations and solutions are summarized.
I hope this article can help everyone understand the entire Node.js performance monitoring system.