The term "Web Scale" has been hyped up recently, and people are also making their systems more "Wide Domain" by expanding their application architecture. But what exactly is a full domain? Or how to ensure the entire domain?
The most hyped Scaling load is the most popular domain. For example, systems that support single user access can also support 10, 100, or even 1 million user accesses. Ideally, our system should remain as "stateless" as possible. Even if a state must exist, it can be converted and transmitted on different processing terminals of the network. When the load becomes a bottleneck, there may be no delay. So for a single request, it is acceptable to take 50 to 100 milliseconds. This is called scaling out.
Extension performs completely differently in full domain optimization. For example, an algorithm that ensures that one data is successfully processed can also successfully process 10, 100, or even 1 million data. Event complexity (large O symbols) is the best description regardless of whether this metric type is feasible or not. Latency is a performance scaling killer. You will try your best to process all the operations on the same machine. This is called scaling up.
If pie can fall from the sky (of course this is impossible), we may be able to combine horizontal expansion and vertical expansion. However, today we are only going to introduce the following simple ways to improve efficiency.
Both ForkJoinPool in Java 7 and parallel data streams in Java 8 (parallelStream) are helpful for parallel processing. It is particularly obvious when deploying Java programs on multi-core processors, since all processors can access the same memory.
Therefore, the fundamental benefit of this parallel processing is to almost completely eliminate latency compared to scaling on different machines across networks.
But don't be confused by the effects of parallel processing! Please keep in mind the following two points:
Reducing algorithm complexity is undoubtedly the most effective way to improve performance. For example, for a HashMap instance lookup() method, event complexity O(1) or space complexity O(1) are the fastest. But this situation is often impossible, let alone easily achieve it.
If you cannot reduce the complexity of the algorithm, you can also improve performance by finding key points in the algorithm and improving methods. Suppose we have the following algorithm diagram:
The overall time complexity of this algorithm is O(N3), and if calculated in a separate access order, it can also be concluded that the complexity is O(N x O x P). But anyway, we will find some strange scenarios when we analyze this code:
Without production data reference, we may easily draw conclusions that we should optimize "high-overhead operations". But the optimizations we made did not have any effect on the delivered products.
The optimized golden rule is nothing more than the following:
That’s all for the theory. Assuming that we have found that the problem occurs on the right branch, it is likely that the simple processing in the product loses response due to a large amount of time (assuming the values of N, O and P are very large), please note that the time complexity of the left branch mentioned in the article is O(N3). The efforts made here cannot be extended, but can save time for users and delay difficult performance improvements until later.
Here are 10 tips for improving Java performance:
1. Use StringBuilder
StingBuilder should be used by default in our Java code, and the + operator should be avoided. Maybe you will have different opinions on the syntax sugar of StringBuilder, such as:
String x = "a" + args.length + "b";
Will be compiled as:
0 new java.lang.StringBuilder [16]3 dup4 ldc <String "a"> [18]6 invokespecial java.lang.StringBuilder(java.lang.String) [20]9 aload_0 [args]10 arraylength11 invokevirtual java.lang.StringBuilder.append(int) : java.lang.StringBuilder [23]14 ldc <String "b"> [27]16 invokevirtual java.lang.StringBuilder.append(java.lang.String) : java.lang.StringBuilder [29]19 invokevirtual java.lang.StringBuilder.toString() : java.lang.String [32]22 store_1 [x]
But what exactly happened? Do you need to use the following sections to improve String next?
String x = "a" + args.length + "b";if (args.length == 1)x = x + args[0];
Now we have used the second StringBuilder, which does not consume extra memory in the heap, but puts pressure on the GC.
StringBuilder x = new StringBuilder("a");x.append(args.length);x.append("b");if (args.length == 1);x.append(args[0]);In the example above, if you rely on the Java compiler to implicitly generate the instance, the compiled effect has nothing to do with whether the StringBuilder instance is used. Please remember: In the NOPE branch, the time spent on each CPU cycle is spent on GC or allocating default space for StringBuilder, and we are wasting N x O x P time.
Generally speaking, using StringBuilder is better than using the + operator. If possible, select StringBuilder if you need to pass references across multiple methods, because String consumes additional resources. JOOQ uses this method when generating complex SQL statements. Only one StringBuilder is used during the entire SQL pass-through of the Abstract Syntax Tree.
What's even more tragic is that if you are still using StringBuffer, then use StringBuilder instead of StringBuffer. After all, there are really not many cases where strings need to be synchronized.
Regular expressions give people the impression that they are quick and easy. But using regular expressions in NOPE branch would be the worst decision. If you have to use regular expressions in compute-intensive code, at least cache the Pattern to avoid repeatedly compiling the Pattern.
static final Pattern HEAVY_REGEX = Pattern.compile("((((X)*Y)*Z)*");If only a simple regular expression like the following is used:
String[] parts = ipAddress.split("//.");This is best to use a normal char[] array or index-based operation. For example, the following code with poor readability actually plays the same role.
int length = ipAddress.length();int offset = 0;int part = 0;for (int i = 0; i < length; i++) {if (i == length - 1 ||ipAddress.charAt(i + 1) == '.') {parts[part] =ipAddress.substring(offset, i + 1);part++;offset = i + 2;}}The above code also shows that premature optimization is meaningless. Although compared with the split() method, this code is less maintainable.
Challenge: Can smart friends come up with faster algorithms?
Regular expressions are very useful, but they also have to pay a price when used. Especially when you are deep in NOPE branches, you should avoid using regular expressions at all costs. Also be careful of various JDK string methods that use regular expressions, such as String.replaceAll() or String.split(). You can choose to use a more popular development library, such as Apache Commons Lang, to perform string operations.
This suggestion does not apply to general occasions, but only to scenarios deep in NOPE branch. Even so, you should understand something. The Java 5 format loop writing method is so convenient that we can forget the internal loop method, such as:
for (String value : strings) { // Do something useful here}Whenever the code runs into this loop, if the strings variable is an Iterable, the code will automatically create an instance of the Iterator. If you are using ArrayList, the virtual machine will automatically allocate 3 integer type memory to the object on the heap.
private class Itr implements Iterator<E> { int cursor; int lastRet = -1; int expectedModCount = modCount; // ...You can also use the following equivalent loop method to replace the above for loop, just "wasting" a piece of plastic surgery on the stack, which is quite cost-effective.
int size = strings.size();for (int i = 0; i < size; i++) { String value : strings.get(i); // Do something useful here}If the value of the string in the loop does not change much, you can also use arrays to implement the loop.
for (String value : stringArray) { // Do something useful here}Whether from the perspective of easy reading and writing, or from the perspective of API design, iterators, Iterable interfaces and foreach loops are very useful. But at the cost, when using them, an additional object is created on the heap for each loop child. If the loop is to be executed many times, please be careful to avoid generating meaningless instances. It is best to use the basic pointer loop method to replace the above-mentioned iterator, Iterable interface and foreach loop.
Some opinions that are objecting to the above (especially replacing iterators with pointers) are discussed on Reddit for details.
Some methods are expensive. Taking the NOPE branch as an example, we did not mention the relevant methods of leaves, but this one can be found. Suppose our JDBC driver needs to overcome all difficulties to calculate the return value of the ResultSet.wasNull() method. The SQL framework we implement ourselves may look like this:
if (type == Integer.class) {result = (T) wasNull(rs,Integer.valueOf(rs.getInt(index)));}// And then...static final <T> T wasNull(ResultSet rs, T value)throws SQLException {return rs.wasNull() ? null : value;}In the above logic, the ResultSet.wasNull() method is called every time the int value is obtained from the result set, but the method of getInt() is defined as:
Return type: variable value; if the SQL query result is NULL, return 0.
So a simple and effective way to improve it is as follows:
static final <T extends Number> T wasNull(ResultSet rs, T value)throws SQLException {return (value == null ||(value.intValue() == 0 && rs.wasNull()))? null : value;}This is a breeze.
Cache method calls to replace high overhead methods on leaf nodes, or avoid calling high overhead methods if the method convention allows.
The above introduces the use of a large number of generics in the example from jOOQ, resulting in the use of byte, short, int and long wrapper classes. But at least generics should not be a code limitation until they are specialized in Java 10 or Valhalla projects. Because it can be replaced by the following method:
//Stored on the heap Integer i = 817598;
...If you write this way:
// Store on the stack int i = 817598;
Things can get worse when using arrays:
//Three objects were generated on the heap Integer[] i = { 1337, 424242 };...If you write this way:
// Only one object is generated on the heap int[] i = { 1337, 424242 };When we are deep in the NOPE branch, we should try our best to avoid using packaging classes. The downside of doing this is that it puts a lot of pressure on GC. GC will be very busy to clear objects generated by the packaging class.
So an effective optimization method is to use basic data types, fixed-length arrays, and use a series of split variables to identify the position of the object in the array.
trove4j, which follows the LGPL protocol, is a Java collection class library, which provides us with better performance implementation than the shaping array int[].
The following exception is for this rule: because boolean and byte types are not enough to allow JDK to provide a cache method for it. We can write this way:
Boolean a1 = true; // ... syntax sugar for:Boolean a2 = Boolean.valueOf(true);Byte b1 = (byte) 123; // ... syntax sugar for:Byte b2 = Byte.valueOf((byte) 123);
Other basic types of integers also have similar situations, such as char, short, int, and long.
Do not automatically box these integer primitive types when calling constructors or call the TheType.valueOf() method.
Also don't call constructors on wrapper classes unless you want to get an instance that is not created on the heap. The advantage of doing this is to give you a huge April Fools' Day joke to your colleagues.
Of course, if you want to experience the off-heap function library, although this may be mixed with a lot of strategic decisions, rather than the most optimistic local solution. For an interesting article about non-heap storage written by Peter Lawrey and Ben Cotton, please click: OpenJDK and HashMap - Let veterans safely master (non-heap storage!) new techniques.
Nowadays, functional programming languages like Scala encourage recursion. Because recursion usually means tail-recursing that can be decomposed into individually optimized individuals. It would be better if the programming language you use can support it. But even so, be careful that the slight adjustment of the algorithm will turn tail recursion into ordinary recursion.
Hopefully the compiler can detect this automatically, otherwise we would have wasted a lot of stack frames for things that can be done with just a few local variables.
There is nothing to say in this section, except that iterating as much as possible in the NOPE branch instead of recursion.
When we want to iterate over a map saved in key-value pairs, we must find a good reason for the following code:
for (K key : map.keySet()) {V value : map.get(key);}Not to mention the following writing method:
for (Entry<K, V> entry : map.entrySet()) {K key = entry.getKey();V value = entry.getValue();}When we use NOPE branch, we should use map with caution. Because many access operations that seem to have a time complexity of O(1) are actually composed of a series of operations. And the access itself is not free. At least, if you have to use map, you need to use the entrySet() method to iterate! In this way, we only want to access an instance of Map.Entry.
When it is necessary to iterate over the Map in the form of key-value pairs, be sure to use the entrySet() method.
8. Use EnumSet or EnumMap
In some cases, such as when using a config map, we may know in advance the key value saved in the map. If this key value is very small, we should consider using EnumSet or EnumMap instead of using the HashSet or HashMap we commonly use. The following code gives a clear explanation:
private transient Object[] vals;public V put(K key, V value) {// ...int index = key.ordinal();vals[index] = maskNull(value);// ...}The key implementation of the previous code is that we use arrays instead of hash tables. Especially when inserting new values into the map, all you have to do is get a constant sequence number generated by the compiler for each enum type. If there is a global map configuration (for example, only one instance), EnumMap will achieve more outstanding performance than HashMap under the pressure of increasing access speed. The reason is that EnumMap uses one bit less heap memory than HashMap, and HashMap calls the hashCode() method and the equals() method on each key value.
Enum and EnumMap are close friends. When we use key values similar to enum-like structures, we should consider declaring these key values as enum types and using them as EnumMap keys.
In the event that EnumMap cannot be used, at least the hashCode() and equals() methods must be optimized. A good hashCode() method is necessary because it prevents unnecessary calls to the high-overhead equals() method.
In the inheritance structure of each class, simple objects that are easily acceptable are needed. Let's take a look at how jOOQ's org.jooq.Table is implemented?
The simplest and fastest hashCode() implementation method is as follows:
// AbstractTable A basic implementation of a general table: @Overridepublic int hashCode() {// [#1938] Compared with the standard QueryParts, this is a more efficient hashCode() implementation. Return name.hashCode();}name is the table name. We don't even need to consider schema or other table attributes, because table names are usually unique in the database. And the variable name is a string, which itself has already cached a hashCode() value.
Comments in this code are very important because AbstractTable inherited from AbstractQueryPart is the basic implementation of any abstract syntax tree element. Ordinary abstract syntax tree elements do not have any attributes, so they cannot have any fantasy about optimizing the hashCode() method implementation. The overwritten hashCode() method is as follows:
// AbstractQueryPart A general abstract syntax tree basic implementation: @Overridepublic int hashCode() {// This is a working default implementation. // The specific implementation subclass should override this method to improve performance. return create().renderInlined(this).hashCode();}In other words, the entire SQL rendering workflow is triggered to calculate the hash code for a normal abstract syntax tree element.
The equals() method is even more interesting:
// Basic implementation of AbstractTable general table: @Overridepublic boolean equals(Object that) {if (this == that) {return true;}// [#2144] Before calling the high-overhead AbstractQueryPart.equals() method, // You can know early whether the objects are not equal. if (that instanceof AbstractTable) {if (StringUtils.equals(name,(((AbstractTable<?>) that).name)))) {return super.equals(that);}return false;}return false;}First, don't use the equals() method too early (not only in NOPE), if:
Note: If we use instanceof too early to check compatible types, the following conditions actually include argument == null. I've already explained this in my previous blog, please refer to 10 exquisite Java coding best practices.
After our comparison of the above situations is over, we should be able to draw some conclusions. For example, the Table.equals() method of jOOQ is used to compare whether the two tables are the same. Regardless of the specific implementation type, they must have the same field name. For example, the following two elements cannot be the same:
If we can easily determine whether the incoming parameter is equal to the instance itself (this), we can give up the operation if the result is false. If the return result is true, we can further judge the super implementation of the parent class. In the case where most of the objects compared are not equal, we can end the method as early as possible to save CPU execution time.
Some objects have higher similarity than others.
In jOOQ, most table instances are generated by jOOQ's code generator, and the equals() method of these instances has been deeply optimized. Dozens of other table types (derived tables), table-valued functions, array tables, joined tables, pivot tables, common table expressions, etc., maintain the basic implementation of the equals() method.
Finally, there is another situation that can be applied to all languages and not just Java. In addition, the NOPE branch we previously studied will also be helpful in understanding from O(N3) to O(n log n).
Unfortunately, many programmers use simple, local algorithms to consider the problem. They are used to solving problems step by step. This is the "yes/or" form of imperative. This programming style is easy to model "bigger picture" when converting from pure imperative programming to object-oriented programming to functional programming, but these styles lack what only exists in SQL and R:
Declarative programming.
In SQL, we can declare the effect required to the database without considering the influence of the algorithm. The database can adopt the best algorithm according to the data type, such as constraints, keys, indexes, etc.
In theory, we first had basic ideas after SQL and relational calculations. In practice, SQL vendors have implemented overhead-based efficient optimizers (Cost-Based Optimisers) over the past few decades. Then in the 2010 version, we finally discovered all the potential of SQL.
But we don't need to implement SQL using the set method. All languages and libraries support Sets, collections, bags, and lists. The main benefit of using set is that it can make our code concise and clear. For example, the following writing method:
SomeSet INTERSECT SomeOtherSet
Instead
// The previous writing method of Java 8 Set result = new HashSet(); for (Object candidate : someSet)if (someOtherSet.contains(candidate))result.add(candidate);// Even if Java 8 is used, it is not very helpful.someSet.stream().filter(someOtherSet::contains).collect(Collectors.toSet());
Some people may have different opinions on functional programming and Java 8 that can help us write simpler and simpler algorithms. But this view is not necessarily true. We can convert the imperative Java 7 loop into Java 8's Stream collection, but we still use the same algorithm. But SQL-style expressions are different:
SomeSet INTERSECT SomeOtherSetThe above code can have 1000 different implementations on different engines. What we are studying today is to automatically convert two sets into EnumSet before calling the INTERSECT operation. Even we can perform parallel INTERSECT operations without calling the underlying Stream.parallel() method.
In this article, we discuss optimizations about NOPE branching. For example, deep into high-complexity algorithms. As developers of jOOQ, we are happy to optimize SQL generation.
jOOQ is at the "bottom of the food chain" because it is the last API called by our computer program when leaving the JVM and entering DBMS. Located at the bottom of the food chain means that any line takes N x O x P time when it is executed in jOOQ, so I want to optimize it as soon as possible.
Our business logic may not be as complicated as NOPE branch. However, the basic framework may be very complex (local SQL framework, local libraries, etc.). Therefore, we need to follow the principles mentioned today, use Java Mission Control or other tools to check whether there is any area that needs optimization.
Original link: jaxenter translation: ImportNew.com - Always on the road translation link: http://www.importnew.com/16181.html