10 issues that must be considered in designing and producing large-scale website architectures

Author：Eve Cole Update Time：2025-05-01 11:16:01

We will not discuss whether it is a PHP, JSP or .NET environment here. We look at the problem from the perspective of architecture. Implementing language is not a problem. The advantage of language lies in implementation rather than good or bad. No matter what language you choose, the architecture must face.

1. Processing of massive data

As we all know, for some relatively small sites, the amount of data is not very large. Select and update can solve the problems we face. The load itself is not very large, and at most a few indexes can be done. For large websites, the amount of data per day may be millions. If a poorly designed many-to-many relationship is not problematic in the early stage, but as the user grows, the amount of data will grow geometrically. At this time, when we choose and update a table (not to mention the joint query of multiple tables) the cost is very high.

2. Data concurrency processing

At some point, the 2.0 CTO has a Shangfang sword, which is cache. For cache, it is also a big problem when high concurrency and high processing are performed. In the entire application, the cache is shared globally. However, when we make modifications, if two or more requests require updates to the cache at the same time, the application will die directly. At this time, a good data concurrency processing strategy and caching strategy are needed.

In addition, it is the database deadlock problem. Maybe we don’t usually feel that the probability of deadlocks occurring in high concurrency is very high, and disk caching is a big problem.

3. File storage issues

For some sites that support file uploads 2.0, when we are lucky that the hard disk capacity is getting bigger and bigger, we should consider more how the files should be stored and indexed effectively. A common solution is to store files by date and type. However, when the file volume is massive, if a hard disk stores 500 G of trivial files, then the Io of the disk is a huge problem during maintenance and use. Even if your bandwidth is sufficient, your disk may not respond. If uploading is involved at this time, the disk will easily be over.

Perhaps using raid and dedicated storage servers can solve the current problem, but there is another problem that is access problems in various places. Maybe our server is in Beijing, maybe in Yunnan or Xinteng’s access speed? If it is distributed, how should our file index and architecture be planned.

So we have to admit that file storage is a very difficult problem

4. Processing of data relationships

We can easily plan a database that conforms to the third normal, which is full of many-to-many relationships and can also replace INDENTIFY COLUMN with GUID. However, in the 2.0 era where many-to-many relationships are flooded, the third normal is the first one that should be abandoned. Multi-table joint query must be effectively minimized.

5. Data indexing problem

As we all know, indexing is the most inexpensive and easiest solution to improve database efficiency queries. However, in the case of high UPDATE, the cost of update and delete will be high and unthinkable. The author encountered a situation where it takes 10 minutes to complete when updating a focused index, so for the site, these are basically unbearable.

Indexing and update are a pair of natural enemies. Questions A, D, and E are the issues we have to consider when doing architecture, and may also be the issues that take the most time.

6. Distributed processing

For 2.0 websites, due to their high interactivity, the effect achieved by CDN is basically 0, and the content is updated in real time, which is our regular processing. In order to ensure the access speed in various places, we need to face a huge problem, that is, how to effectively realize data synchronization and update and realize real-time communication of servers in various places is a problem that must be considered.

7. Analysis of the pros and cons of Ajax

Success is AJAX, failure is AJAX, AJAX has become the mainstream trend, and I suddenly found that post and get based on XMLHTTP is so easy. The client gets or posts the data to the server, and the server returns after receiving the data request. This is a very normal AJAX request. However, when processing AJAX, if we use a packet capture tool, the data return and processing are clear at a glance. For some AJAX requests with high computational volume, we can construct a contractor, which can easily kill a webserver.

8. Analysis of data security

For the HTTP protocol, data packets are transmitted in plain text. Maybe we can say that we can use encryption, but for the G issue, the encryption process may be plain text (for example, QQ we know can easily judge its encryption and effectively write an encryption and decryption method like it). When your site traffic is not very large, no one will care about you, but when your traffic comes up, the so-called plug-ins and so-called mass sending will follow one after another (the clues can be seen from the mass sending at the beginning of QQ). Perhaps we can say very much that we can use higher-level judgments or even HTTPS to achieve this. Note that when you do these processing, you will pay a lot of database, io and CPU costs. For some mass broadcasts, it is basically impossible. The author can already realize mass publishing for Baidu Space and QQ Space. It is not difficult for everyone to try it.

9. Problems of data synchronization and cluster processing

When one of our databaseservers is overwhelmed, we need to do database-based loads and clusters. This may be the most troublesome problem. Data is based on network transmission. Data delay is a terrible problem and an inevitable problem. In this way, we need to use other means to ensure effective interaction within a few seconds or more minutes of this delay. For example, data hashing, segmentation, content processing and other issues.

10. Data sharing channels and OPENAPI trends

Openapi has become an inevitable trend, from Google, Facebook, myspace to campuses at home, this issue is being considered. It can more effectively retain users and stimulate more interests and allow more people to help you do the most effective development. At this time, for an effective data sharing platform, the data open platform becomes an indispensable way. Ensuring data security and performance with open interfaces is another issue that we must seriously consider.