During the server operation and maintenance process, it is often necessary to monitor various resources of the server, such as: CPU load monitoring, disk usage monitoring, process number monitoring, etc., so as to promptly alarm when an abnormality occurs in the system and notify the system administrator. This article introduces several common monitoring requirements and the writing of shell scripts in Linux systems.
Article Directory:
1.Linux uses Shell to check whether the process exists
2. Linux uses Shell to detect process CPU utilization
3. Linux uses Shell to detect process memory usage
4. Linux uses Shell to detect process handle usage
5.Linux uses Shell to see if a TCP or UDP port is listening
6.Linux uses Shell to view the number of running processes
7. Linux uses Shell to detect system CPU load
8. Linux uses Shell to detect system disk space
9. Summary
Check if the process exists
When monitoring a process, we generally need to get the ID of the process. The process ID is the unique identifier of the process, but sometimes multiple processes with the same process name may be run under different users on the server. The function GetPID below gives the function of obtaining the process ID of the specified process name under the specified user (currently, only considering starting a process with this process name under this user). It has two parameters: the user name and the process name. It first uses ps to find process information, and filters out the required process through grep, and finally finds the ID value of the process through sed and awk (this function can be modified according to actual conditions, such as filtering other information, etc.).
Listing 1. Monitoring the process
The code copy is as follows:
function GetPID #User #Name
{
PsUser=$1
PsName=$2
pid=`ps -u $PsUser|grep $PsName|grep -v grep|grep -v vi|grep -v dbx/n
|grep -v tail|grep -v start|grep -v stop |sed -n 1p |awk '{print $1}'`
echo $pid
}
Sample demonstration:
1) Source program (for example, find the process ID of the user as root and the process name is CFTestApp)
The code copy is as follows:
PID=`GetPID root CFTestApp`
echo $PID
2) Results output
The code copy is as follows:
11426
[dyu@xilinuxbldsrv shell]$
3) Results Analysis
From the above output, we can see that 11426 is the process ID of the CFTestApp program under the root user.
4) Command introduction
1. ps: View instant process information in the system. Parameters: -u<User Identification Code> Lists the status of the program belonging to the user, and can also be specified using the user name. -p<Process Identification Code> Specify the process Identification Code and list the status of the process. -o Specify the output format 2. grep: Used to find the current line in the file that matches the string. Parameters: -v Reverse selection, that is, the line that shows no 'Search String' content. 3. sed: A non-interactive text editor that edits files or exported files from standard input, and can only process one line of content at a time. Parameters: -n Read the next input line and use the next command to process the new line instead of the first command. p flag prints matching lines 4. awk: A programming language for processing text and data under linux/unix. The data can be from standard input, one or more files, or the output of other commands. It supports advanced functions such as user-defined functions and dynamic regular expressions, and is a powerful programming tool under Linux/unix. It is used in the command line, but more as a script. awk's way of processing text and data: it scans the file step by step, from the first line to the last line, looking for rows of matching specific patterns, and doing what you want on those lines. If no processing action is specified, the matching rows are displayed to the standard output (screen). If no mode is specified, all rows specified by the operation are processed. Parameters: -F fs or field-separator fs: Specifies the input file delimiter, fs is a string or a regular expression, such as -F:.
Sometimes it is possible that the process has not started. The following function is to check whether the process ID exists. If this process does not run the output:
The code copy is as follows:
The process does not exist.
# Check if the process exists
if [ "-$PID" == "-" ]
Then
{
echo "The process does not exist."
}
fi
Detect process CPU utilization
When maintaining application services, we often encounter business blockage due to excessive CPU, resulting in business interruption. If the CPU is too high, it may be due to excessive business load or abnormal cycles such as dead cycles. The CPU of the business process is monitored in a timely manner through scripts, and the maintenance personnel can be notified in a timely manner when the CPU utilization is abnormal, so that maintenance personnel can be promptly analyzed, positioned, and avoided business interruptions. The following function can obtain the process CPU utilization of the specified process ID. It has a parameter as the process ID. It first uses ps to find process information, filters out %CPU rows through grep -v, and finally finds the integer part of the CPU utilization percentage through awk (if there are multiple CPUs in the system, the CPU utilization can exceed 100%).
Listing 2. Real-time monitoring of business process CPU
The code copy is as follows:
function GetCpu
{
CpuValue=`ps -p $1 -o pcpu |grep -v CPU | awk '{print $1}' | awk - F. '{print $1}'`
echo $CpuValue
}
The following function is to obtain the CPU utilization of this process through the above function GetCpu, and then use the conditional statement to determine whether the CPU utilization exceeds the limit. If it exceeds 80% (it can be adjusted according to the actual situation), an alarm will be output, otherwise normal information will be output.
Listing 3. Determine whether CPU utilization exceeds the limit
The code copy is as follows:
function CheckCpu
{
PID=$1
cpu=`GetCpu $PID`
if [ $cpu -gt 80 ]
Then
{
echo “The usage of cpu is larger than 80%”
}
else
{
echo “The usage of cpu is normal”
}
fi
}
Sample demonstration:
1) Source program (assuming that the process ID of CFTestApp has been queried above is 11426)
The code copy is as follows:
CheckCpu 11426
2) Results output
The code copy is as follows:
The usage of cpu is 75
The usage of cpu is normal
[dyu@xilinuxbldsrv shell]$
3) Results Analysis
As can be seen from the above output: The current CPU usage of the CFTestApp program is 75%, which is normal, and there is no alarm limit of more than 80%.
Detect process memory usage
When maintaining application services, it is often encountered that the process crashes due to excessive memory usage, resulting in business interruption (for example, the maximum memory space that a 32-bit program can address is 4G, if it exceeds the memory, the memory will fail, and the physical memory is also limited). Excessive memory usage may be due to memory leakage, message accumulation, etc. The memory usage of the business process can be monitored in a timely manner through scripts, and alarms can be sent in a timely manner when memory usage is abnormal (such as through SMS) to facilitate maintenance personnel to handle it in a timely manner. The following function can obtain the process memory usage of the specified process ID. It has a parameter as the process ID, which first uses ps to find process information, filters out VSZ lines through grep -v, and then takes the memory usage in megabytes by dividing 1000.
Listing 4. Monitoring of business process memory usage
The code copy is as follows:
function GetMem
{
MEMUsage=`ps -o vsz -p $1|grep -v VSZ`
(( MEMUsage /= 1000))
echo $MEMUsage
}
The following function is to obtain the memory usage of this process through the above function GetMem, and then use the conditional statement to determine whether the memory usage exceeds the limit. If it exceeds 1.6G (it can be adjusted according to the actual situation), an alarm will be output, otherwise normal information will be output.
Listing 5. Determine whether memory usage exceeds the limit
The code copy is as follows:
mem=`GetMem $PID`
if [ $mem -gt 1600 ]
Then
{
echo “The usage of memory is larger than 1.6G”
}
else
{
echo “The usage of memory is normal”
}
fi
Sample demonstration:
1) Source program (assuming that the process ID of CFTestApp has been queried above is 11426)
The code copy is as follows:
mem=`GetMem 11426`
echo "The usage of memory is $mem M"
if [ $mem -gt 1600 ]
Then
{
echo "The usage of memory is larger than 1.6G"
}
else
{
echo "The usage of memory is normal"
}
fi
2) Results output
The code copy is as follows:
The usage of memory is 248 M
The usage of memory is normal
[dyu@xilinuxbldsrv shell]$
3) Results Analysis
From the above output, we can see that the current memory usage of the CFTestApp program is 248M, which is normal and there is no alarm limit exceeding 1.6G.
Detect process handle usage
When maintaining application services, business interruptions are often encountered due to excessive handle use. Each platform uses process handles with limited use. For example, on Linux platform, we can use the ulimit n command (open files (-n) 1024) or view the content of /etc/security/limits.conf to obtain process handle restrictions. If the handle is used too high, the handle leakage may be due to excessive load, the handle leakage, etc. The business process handle usage is monitored in a timely manner through scripts, and alarms can be sent in a timely manner in the event of abnormalities (such as through SMS) to facilitate maintenance personnel to handle it in a timely manner. The following function can obtain the process handle usage of the specified process ID. It has a parameter as the process ID. It first uses ls to output the process handle information, and then counts the number of output handles through wc -l.
The code copy is as follows:
function GetDes
{
DES=`ls /proc/$1/fd | wc -l`
echo $DES
}
The following function is to obtain the handle usage of this process through the above function GetDes, and then use the conditional statement to determine whether the handle usage exceeds the limit. If it exceeds 900 (it can be adjusted according to the actual situation), an alarm will be output, otherwise normal information will be output.
The code copy is as follows:
des=` GetDes $PID`
if [ $des -gt 900 ]
Then
{
echo “The number of des is larger than 900”
}
else
{
echo “The number of des is normal”
}
fi
Sample demonstration:
1) Source program (assuming that the process ID of CFTestApp is found above is 11426)
The code copy is as follows:
des=`GetDes 11426`
echo "The number of des is $des"
if [ $des -gt 900 ]
Then
{
echo "The number of des is larger than 900"
}
else
{
echo "The number of des is normal"
}
fi
2) Results output
The code copy is as follows:
The number of des is 528
The number of des is normal
[dyu@xilinuxbldsrv shell]$
3) Results Analysis
From the above output, we can see that the current handle of the CFTestApp program is 528, which is normal, and there is no alarm limit of more than 900.
4) Command introduction
wc: Statistics the number of bytes, words, and lines in the specified file, and displays the statistical results to output. Parameters: -l count the number of rows. -c Count the number of bytes. -w count word count.
Check whether a TCP or UDP port is listening
Port detection is often encountered in system resource detection, especially in network communication, port status detection is often very important. Sometimes the processes, CPU, memory, etc. may be in normal state, but the port is in abnormal state, and the business is not running normally. The following function can determine whether the specified port is listening. It has a parameter that is the port to be detected. It first uses netstat to output the port occupation information, and then filters the output number of TCP ports through grep, awk, wc. The second statement is to output the number of monitors of UDP ports. If both TCP and UDP ports are 0, return 0, otherwise return 1.
Listing 6. Port Detection
The code copy is as follows:
function Listening
{
TCPListeningnum=`netstat -an | grep ":$1 " | /n
awk '$1 == "tcp" && $NF == "LISTEN" {print $0}' | wc -l`
UDPListeningnum=`netstat -an|grep ":$1 " /n
|awk '$1 == "udp" && $NF == "0.0.0.0:*" {print $0}' | wc -l`
(( Listeningnum = TCPListeningnum + UDPListeningnum ))
if [ $Listeningnum == 0 ]
Then
{
echo "0"
}
else
{
echo "1"
}
fi
}
Sample demonstration:
1) Source program (for example, query whether the status of port 8080 is listening)
The code copy is as follows:
isListen=`Listening 8080`
if [ $isListen -eq 1 ]
Then
{
echo "The port is listening"
}
else
{
echo "The port is not listening"
}
fi
2) Results output
The code copy is as follows:
The port is listening
[dyu@xilinuxbldsrv shell]$
3) Results Analysis
From the above output, it can be seen that the 8080 port of this Linux server is in the listening state.
4) Command introduction
netstat: Used to display statistical data related to IP, TCP, UDP and ICMP protocols, and is generally used to verify the network connection status of each port of the machine. Parameters: -a Displays Sockets in all connections. -n Use the IP address directly, not through the domain name server.
The following function is also to detect whether a TCP or UDP port is in a normal state.
The code copy is as follows:
tcp: netstat -an|egrep $1 |awk '$6 == "LISTEN" && $1 == "tcp" {print $0}'
udp: netstat -an|egrep $1 |awk '$1 == "udp" && $5 == "0.0.0.0:*" {print $0}'
Command introduction
egrep: Find the specified string in the file. The execution effect of egrep is like grep -E. The syntax and parameters used can be referred to the grep instruction. The difference from grep is the method of interpreting strings. egrep is interpreted using extended regular expression syntax, while grep uses basic regular expression syntax. Extended regular expressions have more complete expression specifications than basic regular expressions.
Check the number of running processes
Sometimes we may need to get the number of starts of a process on the server. The following function is to detect the number of running processes, such as the process name is CFTestApp.
The code copy is as follows:
Runnum=`ps -ef | grep -v vi | grep -v tail | grep "[ /]CFTestApp" | grep -v grep | wc -l
Detect system CPU load
When maintaining the server, sometimes business interruption is encountered due to excessive system CPU (utilization) load. It may be possible to run multiple processes on the server. It is normal to view the CPU of a single process, but the CPU load of the entire system may be abnormal. The system CPU load is monitored in a timely manner through scripts, and alarms can be sent in a timely manner in the event of abnormalities, which facilitates maintenance personnel to deal with it in a timely manner and prevent accidents. The following function can detect the system CPU usage. Use vmstat to take the idle value of the system CPU 5 times, take the average value, and then get the actual occupancy value of the current CPU by taking the difference from 100.
The code copy is as follows:
function GetSysCPU
{
CpuIdle=`vmstat 1 5 |sed -n '3,$p' /n
|awk '{x = x + $15} END {print x/5}' |awk -F. '{print $1}'
CpuNum=`echo "100-$CpuIdle" | bc`
echo $CpuNum
}
Sample demonstration:
1) Source program
The code copy is as follows:
cpu=`GetSysCPU`
echo "The system CPU is $cpu"
if [ $cpu -gt 90 ]
Then
{
echo "The usage of system cpu is larger than 90%"
}
else
{
echo "The usage of system cpu is normal"
}
fi
2) Results output
The code copy is as follows:
The system CPU is 87
The usage of system cpu is normal
[dyu@xilinuxbldsrv shell]$
3) Results Analysis
From the above output, we can see that the current CPU utilization rate of Linux server system is 87%, which is normal, and there is no alarm limit of more than 90%.
4) Command introduction
vmstat: Abbreviation of Virtual Meomory Statistics, which can monitor the operating system's virtual memory, processes, and CPU activities.
Parameters: -n means that when the outputted header information is displayed only once during periodic cyclic output.
Detect system disk space
System disk space detection is an important part of system resource detection. During system maintenance, we often need to check the usage of server disk space. Because some businesses need to write call sheets, logs, or temporary files at any time, if the disk space is exhausted, it may also cause business interruption. The following function can detect the disk space usage of a directory in the current system disk space. The input parameter is the directory name that needs to be detected, use df to output the system disk space usage information, and then obtain the disk space usage percentage of a directory through grep and awk filtering.
The code copy is as follows:
function GetDiskSpc
{
if [ $# -ne 1 ]
Then
return 1
fi
Folder="$1$"
DiskSpace=`df -k |grep $Folder |awk '{print $5}' |awk -F% '{print $1}'
echo $DiskSpace
}
Sample demonstration:
1) Source program (the detection directory is /boot)
The code copy is as follows:
Folder="/boot"
DiskSpace=`GetDiskSpc $Folder`
echo "The system $Folder disk space is $DiskSpace%"
if [ $DiskSpace -gt 90 ]
Then
{
echo "The usage of system disk($Folder) is larger than 90%"
}
else
{
echo "The usage of system disk($Folder) is normal"
}
fi
2) Results output
The code copy is as follows:
The system /boot disk space is 14%
The usage of system disk(/boot) is normal
[dyu@xilinuxbldsrv shell]$
3) Results Analysis
As can be seen from the above output: Currently, the disk space of the /boot directory on this Linux server system has been used by 14%, which is normal, and there is no alarm limit of more than the 90% use.
4) Command introduction
df: Check the disk space usage of the file system. This command can be used to obtain information such as how much space the hard disk has occupied and how much space is left. Parameters: -k is displayed in k bytes.
Summarize
Under the Linux platform, shell script monitoring is a very simple, convenient and effective method to monitor servers and processes, which is very helpful to system development and process maintenance personnel. It can not only monitor the above information and send alarms, but also monitor process logs and other information. I hope this article will be helpful to everyone.