Today, when Node.js is in full swing, we can already use it to do all kinds of things. Some time ago, UP host participated in the Geek Song event. During this event, we intend to create a game that allows the "head-down people" to communicate more. The core function is the real-time multi-person interaction of the Lan Party concept. The Geeks race was only a pitifully short 36 hours, requiring everything to be quick and quick. Under such a premise, the initial preparation seems a bit "natural". Cross-platform application solution We chose node-webkit, which is simple enough and meets our requirements.
According to requirements, our development can be carried out separately according to modules. This article specifically describes the process of developing Spaceroom (our real-time multiplayer game framework), including a series of explorations and attempts, as well as solutions to some restrictions on Node.js and WebKit platform itself, and the proposal of solutions.
Getting started
Spaceroom Glance
At the beginning, Spaceroom's design was definitely demand-driven. We hope this framework can provide the following basic functions:
Can distinguish a group of users based on rooms (or channels)
Able to receive instructions from users in the collection group
When matching between each client, the game data can be accurately broadcasted according to the specified interval
Can eliminate the impact of network latency as much as possible
Of course, in the later stages of coding, we provided Spaceroom with more functions, including pausing the game, generating consistent random numbers between each client, etc. (of course, these can be implemented by themselves according to requirements, and it is not necessary to use Spaceroom, a framework that works at the communication level, which is more of the communication level).
APIs
Spaceroom is divided into two parts: front and back end. The work required by the server includes maintaining a room list and providing the functions of creating and joining a room. Our client APIs look like this:
spaceroom.connect(address, callback) Connect to the server
spaceroom.createRoom(callback) Create a room
spaceroom.joinRoom(roomId) Join a room
spaceroom.on(event, callback) listens for events
...
After the client connects to the server, it receives various events. For example, a user in a room may receive an event where a new player joins, or an event where the game begins. We give the client a "life cycle", which will be in one of the following states at any time:
You can get the current status of the client through spaceroom.state.
Using a server-side framework is relatively simple. If you use the default configuration file, then you can run the server-side framework directly. We have a basic requirement: the server code can run directly in the client without the need for a separate server. Players who have played PS or PSP should know what I am talking about. Of course, it is also excellent to run on a special server.
The implementation of logical code is brief here. The first generation of Spaceroom completed the function of a Socket server, which maintains a list of rooms, including the status of the room, and the game time communications corresponding to each room (instruction collection, bucket broadcast, etc.). For specific implementation, please refer to the source code.
Synchronous Algorithm
So, how can we make the things displayed between each client consistent in real time?
This thing sounds interesting. Think about it carefully, what do we need the server to help us deliver? Naturally, you will think of what may cause logical inconsistencies between various clients: user instructions. Since the codes that deal with game logic are the same, given the same conditions, the code runs the same result. The only difference is the various player commands received during the game. Right, we need a way to synchronize these instructions. If all clients can get the same instruction, then all clients can theoretically have the same operation result.
The synchronization algorithms of online games are all kinds of strange and applicable scenarios. Spaceroom uses synchronization algorithms similar to the concept of frame locking. We divide the timeline into intervals one by one, and each interval is called a bucket. Bucket is used to load instructions and is maintained by the server side. At the end of each bucket time period, the server broadcasts the bucket to all clients. After the client gets the bucket, the instructions are retrieved from it, and then executed after verification.
In order to reduce the impact of network delay, each instruction received by the server from the client will be delivered to the corresponding bucket according to a certain algorithm, and the following steps are specifically followed:
Let order_start be the time of the command carried by the command, and t is the start time of the bucket where the order_start is located.
If t + delay_time <= start time of the bucket that is currently collecting the instruction, deliver the command to the bucket that is currently collecting the instruction, otherwise continue step 3
Deliver the instruction to the corresponding bucket of t + delay_time
where delay_time is the agreed server delay time, which can be taken as the average delay between clients. The default value in Spaceroom is 80, and the default value of bucket length is 48. At the end of each bucket time period, the server broadcasts this bucket to all clients and starts receiving instructions for the next bucket. The client controls the time error within an acceptable range when automatically performing the matching in logic based on the received bucket interval.
This means that under normal circumstances, the client will receive a bucket sent from the server every 48ms. When the time when the bucket needs to be processed, the client will handle it accordingly. Assuming that the client FPS=60, it will receive a bucket every 3 frames or so, and the logic will be updated according to this bucket. If the bucket has not been received after the time has expired due to network fluctuations, the client pauses the game logic and waits. At a time within a bucket, the logic can be updated using the lerp method.
In the case of delay_time = 80, bucket_size = 48, either instruction will be delayed by at least 96ms execution. Change these two parameters, for example, in the case of delay_time = 60, bucket_size = 32, either instruction will be delayed by at least 64ms.
A bloody incident caused by a timer
Looking at the whole thing, our framework needs to have an accurate timer when it is running. Execute the broadcast of the bucket under a fixed interval. Of course, we first thought of using setInterval(), but the next second we realized how unreliable this idea is: the naughty setInterval() seems to have very serious errors. And what's so bad is that every error will accumulate, causing increasingly serious consequences.
So we immediately thought of using setTimeout() to make our logic roughly stable around the specified interval by dynamically correcting the time of the next arrival. For example, this time setTimeout() is 5ms less than expected, so we will let it 5ms ahead of time next time. However, the test results are not satisfactory, and this is not elegant enough no matter how you look at it.
So we need to change our thinking. Is it possible to make setTimeout() expire as quickly as possible, and then we check whether the current time has reached the target time. For example, in our loop, using setTimeout(callback, 1) to keep checking the time, which seems like a good idea.
Disappointing timer
We immediately wrote a piece of code to test our ideas, and the results were disappointing. In the current latest node.js stable version (v0.10.32) and Windows platform, run this piece of code:
The code copy is as follows:
var sum = 0, count = 0;
function test() {
var now = Date.now();
setTimeout(function () {
var diff = Date.now() - now;
sum += diff;
count++;
test();
});
}
test();
After a period of time, enter sum/count in the console and you can see a result, similar to:
The code copy is as follows:
> sum / count
15.624555160142348
What?! I want 1ms interval, but you tell me that the actual average interval is 15.625ms! This picture is simply too beautiful. We did the same test on Mac and the result was 1.4ms. So we were confused: What the hell is this? If I were an Apple fan, I might have concluded that Windows was too trash and gave up on Windows. Fortunately, I was a rigorous front-end engineer, so I began to continue thinking about this number.
Wait, why is this number so familiar? Will this number be too similar to the maximum timer interval under Windows? I immediately downloaded a ClockRes for testing, and after running the console, I got the following results:
The code copy is as follows:
Maximum timer interval: 15.625 ms
Minimum timer interval: 0.500 ms
Current timer interval: 1.001 ms
Sure enough! Looking at the node.js manual, we can see a description of setTimeout like this:
The actual delay depends on external factors like OS timer granularity and system load.
However, the test results show that this actual delay is the maximum timer interval (note that the current timer interval of the system is only 1.001ms), which is unacceptable in any case. The strong curiosity drives us to look through the source code of node.js to get a glimpse of the truth.
BUG in Node.js
I believe that most of you and I have a certain understanding of the even loop mechanism of Node.js. Looking at the source code of the timer implementation, we can roughly understand the implementation principle of timer. Let's start with the main loop of event loop:
The code copy is as follows:
while (r != 0 && loop->stop_flag == 0) {
/* Update global time*/
uv_update_time(loop);
/* Check whether the timer expires and execute the corresponding timer callback*/
uv_process_timers(loop);
/* Call idle callbacks if nothing to do. */
if (loop->pending_reqs_tail == NULL &&
loop->endgame_handles == NULL) {
/* Prevent event loop from exiting*/
uv_idle_invoke(loop);
}
uv_process_reqs(loop);
uv_process_endgames(loop);
uv_prepare_invoke(loop);
/* Collect IO events*/
(*poll)(loop, loop->idle_handles == NULL &&
loop->pending_reqs_tail == NULL &&
loop->endgame_handles == NULL &&
!loop->stop_flag &&
(loop->active_handles > 0 ||
!ngx_queue_empty(&loop->active_reqs)) &&
!(mode & UV_RUN_NOWAIT));
/* setImmediate() etc*/
uv_check_invoke(loop);
r = uv__loop_alive(loop);
if (mode & (UV_RUN_ONCE | UV_RUN_NOWAIT))
break;
}
The source code of the uv_update_time function is as follows: (https://github.com/joyent/libuv/blob/v0.10/src/win/timer.c))
The code copy is as follows:
void uv_update_time(uv_loop_t* loop) {
/* Get the current system time*/
DWORD ticks = GetTickCount();
/* The assumption is made that LARGE_INTEGER.QuadPart has the same type */
/* loop->time, which happens to be. Is there any way to assert this? */
LARGE_INTEGER* time = (LARGE_INTEGER*) &loop->time;
/* If the timer has wrapped, add 1 to it's high-order dword. */
/* uv_poll must make sure that the timer can never overflow more than */
/* once between two subsequent uv_update_time calls. */
if (ticks < time->LowPart) {
time->HighPart += 1;
}
time->LowPart = ticks;
}
The internal implementation of this function uses the GetTickCount() function of Windows to set the current time. Simply put, after calling the setTimeout function, after a series of struggles, the internal timer->due will be set to the current loop time + timeout. In the event loop, first update the time of the current loop through uv_update_time, and then check whether the timer expires in uv_process_times. If so, enter the world of JavaScript. After reading the whole article, the event loop is roughly like this process:
Update global time
Check the timer. If the timer expires, execute the callback.
Check the reqs queue and execute the waiting request
Enter the poll function to collect IO events. If an IO event arrives, add the corresponding processing function to the reqs queue for execution in the next event loop. Inside the poll function, a system method is called to collect IO events. This method will cause the process to block until an IO event arrives or the set timeout time is reached. When this method is called, the timeout time is set to the time when the most recent timer expires. It means that IO events are collected by blocking and the maximum blocking time is the final time of the next timer.
Source code of one of the poll functions under Windows:
The code copy is as follows:
static void uv_poll(uv_loop_t* loop, int block) {
DWORD bytes, timeout;
ULONG_PTR key;
OVERLAPPED* overlapped;
uv_req_t* req;
if (block) {
/* Take out the expiration time of the most recent timer*/
timeout = uv_get_poll_timeout(loop);
} else {
timeout = 0;
}
GetQueuedCompletionStatus(loop->iocp,
&bytes,
&key,
&overlapped,
/* At most blocking until the next timer expires*/
timeout);
if (overlapped) {
/* Package was dequeued */
req = uv_overlapped_to_req(overlapped);
/* Insert IO events into the queue*/
uv_insert_pending_req(loop, req);
} else if (GetLastError() != WAIT_TIMEOUT) {
/* Serious error */
uv_fatal_error(GetLastError(), "GetQueuedCompletionStatus");
}
}
Following the above steps, assuming that we set a timeout = 1ms timer, the poll function will block at most 1ms and then recover after recovery (if there are no IO events during the period). When continuing to enter the event loop loop, uv_update_time will update the time, and then uv_process_timers will find that our timer expires and execute the callback. So the preliminary analysis is that either the uv_update_time has a problem (the current time is not updated correctly), or the poll function waits for 1ms and then recovers. There is a problem with this 1ms waiting.
Looking at MSDN, we surprisingly discovered a description of the GetTickCount function:
The resolution of the GetTickCount function is limited to the resolution of the system timer, which is typically in the range of 10 million seconds to 16 million seconds.
GetTickCount's accuracy is so rough! Assume that the poll function correctly blocks the time of 1ms, but the next time uv_update_time is executed, the current loop time is not correctly updated! So our timer was not judged to be expired, so poll waited for another 1ms and entered the next event loop. Until GetTickCount is finally updated correctly (the so-called 15.625ms is updated once), the current time of the loop is updated, and our timer is judged to expire in uv_process_timers.
Ask WebKit for help
This source code of Node.js is very helpless: he used a time function with low precision and did not do anything. But we immediately thought that since we use Node-WebKit, in addition to Node.js' setTimeout, we also have Chromium's setTimeout. Write a test code and use our browser or Node-WebKit to run: http://marks.lrednight.com/test.html#1 (# followed by the number indicates the interval to be measured). The result is as follows:
According to HTML5 specifications, the theoretical result should be that the first 5 results are 1ms, and the next results are 4ms. The results displayed in the test case start from the third time, which means that the data on the table should theoretically be 1ms for the first three times, and the results afterwards are 4ms. The result has certain errors, and according to regulations, the smallest theoretical result we can get is 4ms. Although we are not satisfied, it is obviously much more satisfying than the result of node.js. Strong Curiosity Trend Let's take a look at the source code of Chromium to see how it is implemented. (https://chromium.googlesource.com/chromium/src.git/+/38.0.2125.101/base/time/time_win.cc)
First, Chromium uses the timeGetTime() function in determining the current time of the loop. By looking at MSDN, you can find that the accuracy of this function is affected by the current timer interval of the system. On our test machine, it is theoretically the 1.001ms mentioned above. However, by default, the timer interval is its maximum value (15.625ms on the test machine), unless the application modifies the global timer interval.
If you follow news in the IT industry, you must have seen such a news. It seems that our Chromium has set the timer interval very small! It seems we don’t have to worry about the system timer interval? Don't be too happy too early, such a repair gave us a blow. In fact, this issue has been fixed in Chrome 38. Should we use fixing the previous Node-WebKit? This is obviously not elegant enough and prevents us from using a more performing version of Chromium.
Looking further at the Chromium source code, we can find that when there is a timer and the timeout of the timeout < 32ms, Chromium will change the system's global timer interval to achieve a timer with an accuracy of less than 15.625ms. (View source code) When starting the timer, something called HighResolutionTimerManager will be enabled. This class will call the EnableHighResolutionTimer function based on the power type of the current device. Specifically, when the current device uses battery, it will call EnableHighResolutionTimer(false), and true will be passed when using power. The implementation of the EnableHighResolutionTimer function is as follows:
The code copy is as follows:
void Time::EnableHighResolutionTimer(bool enable) {
base::AutoLock lock(g_high_res_lock.Get());
if (g_high_res_timer_enabled == enable)
return;
g_high_res_timer_enabled = enable;
if (!g_high_res_timer_count)
return;
// Since g_high_res_timer_count != 0, an ActivateHighResolutionTimer(true)
// was called which called timeBeginPeriod with g_high_res_timer_enabled
// with a value which is the opposite of |enable|. With that information we
// call timeEndPeriod with the same value used in timeBeginPeriod and
// therefore undo the period effect.
if (enable) {
timeEndPeriod(kMinTimerIntervalLowResMs);
timeBeginPeriod(kMinTimerIntervalHighResMs);
} else {
timeEndPeriod(kMinTimerIntervalHighResMs);
timeBeginPeriod(kMinTimerIntervalLowResMs);
}
}
where, kMinTimerIntervalLowResMs = 4 and kMinTimerIntervalHighResMs = 1. timeBeginPeriod and timeEndPeriod are functions provided by Windows to modify the system timer interval. That is to say, when connecting to the power supply, the smallest timer interval we can get is 1ms, while when using the battery, it is 4ms. Since our loop continuously calls setTimeout, according to the W3C specification, the minimum interval is also 4ms, so I feel relieved, this has little impact on us.
Another precision problem
Back to the beginning, we found that the test results show that the interval of setTimeout is not stable at 4ms, but is fluctuating continuously. The http://marks.lrednight.com/test.html#48 test results also show that the intervals are beating between 48ms and 49ms. The reason is that in the event loop of Chromium and Node.js, the accuracy of the Windows function call waiting for the IO event is affected by the timer of the current system. The implementation of game logic requires the requestAnimationFrame function (constantly updating the canvas), which can help us set the timer interval to at least kMinTimerIntervalLowResMs (because it needs a 16ms timer, which triggers the requirement of a high-precision timer). When the test machine uses power, the system timer interval is 1ms, so the test result has an error of ±1ms. If your computer has not changed the system timer interval and run the #48 test above, max may reach 48+16=64ms.
Using Chromium's setTimeout implementation, we can control the error of setTimeout(fn, 1) to about 4ms, while the error of setTimeout(fn, 48) can control the error of setTimeout(fn, 48) to about 1ms. So, we have a new blueprint in our minds, which makes our code look like this:
The code copy is as follows:
/* Get the max interval decoration */
var decoration = getMaxIntervalDeviation(bucketSize); // bucketSsize = 48, deviation = 2;
function gameLoop() {
var now = Date.now();
if (previousBucket + bucketSize <= now) {
previousBucket = now;
doLogic();
}
if (previousBucket + bucketSize - Date.now() > decoration) {
// Wait 46ms. The actual delay is less than 48ms.
setTimeout(gameLoop, bucketSize - design);
} else {
// Busy waiting. Use setImmediate instead of process.nextTick because the former does not block IO events.
setImmediate(gameLoop);
}
}
The above code lets us wait for a time with an error less than bucket_size( bucket_size definition) instead of directly equaling a bucket_size. Even if the maximum error occurs in delay of 46ms, according to the above theory, the actual interval is less than 48ms. The rest of the time we use the busy waiting method to make sure our gameLoop is executed under an interval with sufficient precision.
While we solved the problem to some extent with Chromium, this is obviously not elegant enough.
Remember our initial request? Our server-side code should be able to run directly on a computer with a Node.js environment without the Node-Webkit client. If you run the above code directly, the value of the definition is at least 16ms, which means that in each 48ms, we have to wait for 16ms. The CPU usage rate went up.
Unexpected surprise
It's so annoying. Didn't anyone notice such a big bug in Node.js? The answer really makes us overjoyed. This bug has been fixed in v.0.11.3 version. You can also see the modified results by directly viewing the master branch of the libuv code. The specific approach is to add a timeout to the current time of the loop after the poll function is waiting for completion. In this way, even if GetTickCount did not react, we still added this waiting time after the poll was waiting. So the timer can expire smoothly.
In other words, the problem that has been worked hard for a long time has been solved in v.0.11.3. However, our efforts were not in vain. Because even if the GetTickCount function is eliminated, the poll function itself is affected by the system timer. One solution is to write the Node.js plugin to change the intervals of system timers.
However, the initial settings for our game this time are not server-free. After the client creates a room, it becomes a server. Server code can run in Node-WebKit environment, so the priority of timer issues on Windows systems is not the highest. Following the solution we gave above, the results are enough to satisfy us.
ending
After solving the timer problem, our framework implementation will basically be no more hindered. We provide WebSocket support (in pure HTML5 environments), and customize the communication protocol to achieve higher performance Socket support (in Node-WebKit environments). Of course, Spaceroom's functions were relatively simple at the beginning, but as the demand was proposed and the time increased, we are gradually improving this framework.
For example, when we found that when we need to generate consistent random numbers in our game, we added this function to Spaceroom. At the beginning of the game, Spaceroom will distribute random number seeds. The client's Spaceroom provides a method to use md5's randomness to generate random numbers with the help of random number seeds.
It seems quite relieved. I also learned a lot in the process of writing such a framework. If you are interested in Spaceroom, you can also participate in it. I believe Spaceroom will use its fists and feet in more places.