_______. _______.___ ___ .______ ____ ____ _______ .______
/ | / | / / | _ / / | ____|| _
| (----` | (----` V / | |_) | / / | |__ | |_) |
> < | / / | __| | /
.----) | .----) | / . | | ----. / | |____ | | ----.
|_______/ |_______/ /__/ __ | _| `._____| __/ |_______|| _| `._____|
ssxrver is a high-performance, high-concurrency network library running on the Linux platform. It is written in C++17 and supports TCP and UDP protocols.
Please try to match the same development environment as me. If you do not need a database module, please modify CMakeLists.txt accordingly.
cmake installation
# debian/ubuntu
sudo apt-get install cmakeboost library installation
wget http://sourceforge.net/projects/boost/files/boost/1.72.0/boost_1_72_0.tar.bz2
tar -xvf boost_1_72_0.tar.bz2
cd ./boost_1_72_0
./bootstrap.sh --prefix=/usr/local
sudo ./b2 install --with=allRun ./build.sh in the ssxrver directory, you can modify build.sh to choose to generate the Debug version or the Release version (the default Release version)
./build.shThe compiled successfully will generate the build/ directory, and the executable file is in the corresponding version directory. For example, when you select the Release version, the executable file is in /build/Release/ssxrver.
Imitate the format of conf/ssxrver.json.example to create your configuration file (note that the configuration file cannot be commented, not commented, not commented). I will explain the options of each configuration file below. I actually set the default values for many parameters. If not configured, it will not affect it.
{
" port " : 4507, # 端口号,不填的话默认4507
" address " : " 127.0.0.1 " , # 绑定的地址
" worker_processes " : 4, # IO 线程数量,不填默认为 4 个
" worker_connections " : -1, # 一个 IO 线程最多支持多少连接, -1 表示最多能创建多少就创建多少,不做限制
" task_processes " : 0, # 任务线程,不填的话默认为 0
" cpu_affinity " : " off " , # cpu 亲和度 ,默认关闭
" http " : { # http 模块
" max_body_size " : 67108864, # 单个 http 包最大支持大小
" root_path " : " /home/randylambert/sunshouxun/ssxrver/html/ " # 文件访问根路径
},
" log " : { # log 模块
" level " : " INFO " , # 输出等级,可填三种等级, DEBUG,INFO,WARN 不填默认为 INFO 等级
" ansync_started " : " off " , # 是否打开异步日志线程,不填默认关闭
" flush_second " : 3, # 异步线程每隔多久持久化一次
" roll_size " : 67108864, # 日志文件滚动大小
" path " : " /home/randylambert/sunshouxun/ssxrver/logs/ " , # 日志文件存放路径
" base_name " : " ssxrver " # 日志文件基础名
},
" mysql " : { # 数据库模块
" mysql_started " : " off " , # 是否打开数据库模块,默认关闭
" address " : " 127.0.0.1 " , # 以下是对应数据库连接信息
" user " : " root " ,
" password " : " 123456 " ,
" database_name " : " ttms " ,
" port " : 0,
" unix_socket " : null,
" client_flag " : 0
},
" blocks_ip " : [ " 122.0.0.2 " , " 198.1.2.33 " ] # 可屏蔽部分恶意 IP
}Run the executable file.
./ssxrver -f /配置文件的路径
# 例如
./build/Release/ssxrver -f ./conf/ssxrver.json| Test environment | Value |
|---|---|
| Operating system hairstyle version | deepin v20.1 Community Edition (1030) |
| Kernel version | 5.4.70-amd64-desktop (64-bit) |
| Compiler version | gcc 8.3 |
| boost library version | 1.72 |
| processor | Intel(R) Core(TM) i7-8750H CPU @2.20GHz |
| L1 Cache Size | 32K |
| L2 Cache Size | 256K |
| L3 Cache Size | 9216K |
| Hard disk speed | 1.8 TiB mechanical hard drive 5400 rpm |
| Hard disk read and write speed | 370 MB in 3.03 seconds = 122.27 MB/sec |
| Memory | 7.6GB |
| Swap partition | 4.7GB |
| Logical core count | 12 cores |
To control variables, restart the computer before testing to ensure that the test environment does not have other applications with high CPU load and high IO load.
The test tool is webbench1.5. Remove the first warm-up data. The test command is as follows (100 clients have been accessed continuously for 15 seconds).
./webbench -c 100 -t 15 http://127.0.0.1:8081/The test objects are Apache/2.4.38, nginx/1.14.2, ssxrver.
Note: Whether using webbench or ab, the data measured by this pressure measurement tool can only be used as a simple reference. Pressure measurement is a test that requires all-round and multi-angle, rather than simply running a command. Even during pressure measurement, the data is not transmitted through the network at all, but just goes around in the kernel.
| Network library | Speed(pages/min) | Requests success rate |
|---|---|---|
| ssxrver returns the response generated in memory | 7107414 | 100% |
| ssxrver returns static files | 5114376 | 100% |
| Apache/2.4.28 | 2884072 | 100% |
| nginx/1.14.2 | 4728748 | 100% |
The test results of ssxrver are pretty good, but strangely, I thought the data would be higher, because when I was developing in the early days, I didn't do many optimizations at that time. When I returned the response generated directly in memory, it was measured at most close to 8000000 pages/min (the test results of 8000000 pages/min were not taken in screenshots, leaving a 7550778). At that time, nginx/1.14.2 had a maximum of 5000000 pages/min. However, no matter whether it was ssxrver or nginx/1.14.2, I couldn't find such a high value. I don't know what was the reason, which led to such a big gap in the final result (Is it because my computer is aging?  ̄□ ̄||)
At present, I personally will modify the Buffer module and Log module of ssxrver if I have time.
First of all, the easiest way to modify the Buffer module is to change it to a cyclic buffer, thereby effectively reducing the number of times the Buffer moves data forward, or directly abandoning this Buffer implementation and re-implementing a high-performance Buffer.
Secondly, the current Log module is written in the stream form of C++. Although it is definitely higher in performance than using C++ directly with iostream, overloading the Log in the << symbolic form will still cause inconvenient format control and performance problems caused by function call chains. Both of these problems can be solved by implementing the Log in the form of printf.
Due to time reasons, SSXRver does not implement the memory management module, so it is almost impossible to write a general high-performance memory management module (it is better to go directly to jemalloc or tcmalloc). However, by analyzing the network library scenario, it is still a little chance to write a memory management module with higher performance in this scenario. If I have time, I will take a look at the implementation in nginx and learn it.
When I was querying the information, I came to a conclusion that in C++17, you can use std::string_view to replace const string&, which will improve some efficiency. Therefore, I tried to replace all the places where const string& in my project with std::string_view. However, when I finally used perf -top to view the changed load, I unexpectedly discovered that some functions actually increased after I used std::string_view to replace it. I was very puzzled why this situation occurred. Due to time reasons, I will not investigate the specific cause of this problem for the time being. I have the opportunity to check the underlying implementation to check the specific reason.
When implementing the http parsing module, I used a handwritten state machine that directly matches strings in the first version. Then I replaced it with the state machine implemented by Ragel. However, during recent tests, I found that the load of the http parsing function is very exaggerated, reaching 10%. Could it be that using Ragel has caused performance degradation? (If parsing the header will cause such a high system load, then it seems that HTTP/2.0 will still improve performance significantly) Unfortunately, when I handwritten the state machine before, I did not test the load of the corresponding parsing function. Now I can't get the data comparison between the two at once, and I have the opportunity to write a BenchMark test.
Ssxrver supports simple UDP transmission, but I personally think that a UDP framework without congestion control, traffic control, and packet loss retransmission functions can basically be said to be unable to be used normally. In the future, I have time to learn QUIC and KCP protocols. I will supplement UDP related knowledge. I believe that the more efficient and flexible UDP protocol will be more and more widely used in the future!
In fact, I actually think the best network framework at present should be that port multiplexing address multiplexing plus multiple threads (multi-process) binds the same address and port, and the kernel automatically performs accept load balancing. At the same time, blocks system calls through coroutine framework + hook. After using this framework, it can ensure high performance without using the main thread to distribute connections, and there is no need to fall into asynchronous callback hell.
In addition, if you can use the asynchronous IO mechanism io_uring added after Linux kernel 5.1, I believe that the performance of the server will be higher. However, I don’t know much about io_uring at present, and I don’t have the ability to design an asynchronous IO network library based on io_uring.