This article will compare the performance of several commonly used compression algorithms. The results show that some algorithms still work properly under extremely demanding CPU constraints.
The comparisons in the article include:
JDK GZIP - This is a slow algorithm with high compression ratio, and the compressed data is suitable for long-term use. java.util.zip.GZIPInputStream / GZIPOutputStream in JDK is the implementation of this algorithm.
JDK deflate - This is another algorithm in JDK (this algorithm is used in zip files). What makes it different from gzip is that you can specify the compression level of the algorithm so that you can balance the compression time and output file size. Optional levels are 0 (not compressed), and 1 (fast compressed) to 9 (slow compressed). Its implementation is java.util.zip.DeflaterOutputStream/InflaterInputStream.
The Java implementation of LZ4 compression algorithm - this is the fastest compression speed among the algorithms introduced in this article. Compared with the fastest deflate, its compression results are slightly worse.
Snappy - This is a very popular compression algorithm developed by Google. It aims to provide compression algorithms with relatively good speed and compression ratio.
Compression test
It also took me a lot of time to find out which files are suitable for data compression testing and exist on most Java developers' computers (I don't want you to have a few hundred megabytes of files to run this test). Finally, I thought that most people should have the documentation for JDK locally installed. Therefore, I decided to merge the entire javadoc directory into one file - splicing all files. This can be done easily with the tar command, but not everyone is a Linux user, so I wrote a program to generate this file:
public class InputGenerator { private static final String JAVADOC_PATH = "your_path_to_JDK/docs"; public static final File FILE_PATH = new File( "your_output_file_path" ); static { try { if ( !FILE_PATH.exists() ) makeJavadocFile(); } catch (IOException e) { e.printStackTrace(); } } private static void makeJavadocFile() throws IOException { try( OutputStream os = new BufferedOutputStream( new FileOutputStream( FILE_PATH ), 65536 ) ) { appendDir(os, new File( JAVADOC_PATH )); } System.out.println( "Javadoc file created" ); } private static void appendDir( final OutputStream os, final File root ) throws IOException { for ( File f : root.listFiles() ) { if ( f.isDirectory() ) appendDir( os, f ); else Files.copy(f.toPath(), os); } }}The size of the entire file on my machine is 354,509,602 bytes (338MB).
test
At first I wanted to read the entire file into memory and then compress it. However, the results show that even 4G machines can easily run out of heap memory space.
So I decided to use the file cache of the operating system. The test framework we use here is JMH. This file will be loaded into cache by the operating system during the warm-up phase (it will be compressed twice in the warm-up phase). I'll compress the content into the ByteArrayOutputStream stream (I know this is not the fastest way, but it's relatively stable for each test and doesn't take time to write the compressed data to disk), so some memory space is needed to store this output.
Below is the base class of the test class. All tests are different only in the different implementations of the compressed output stream, so you can reuse this test base class and just generate a stream from the StreamFactory implementation:
@OutputTimeUnit(TimeUnit.MILLISECONDS)@State(Scope.Thread)@Fork(1)@Warmup(iterations = 2)@Measurement(iterations = 3)@BenchmarkMode(Mode.SingleShotTime)public class TestParent { protected Path m_inputFile; @Setup public void setup() { m_inputFile = InputGenerator.FILE_PATH.toPath(); } interface StreamFactory { public OutputStream getStream( final OutputStream underlyingStream ) throws IOException; } public int baseBenchmark( final StreamFactory factory ) throws IOException { try ( ByteArrayOutputStream bos = new ByteArrayOutputStream((int) m_inputFile.toFile().length()); OutputStream os = factory.getStream(bos ) ) { Files.copy(m_inputFile, os); os.flush(); return bos.size(); } }}These test cases are very similar (their source code is available at the end of the article), and only one example is listed here - the test class of JDK deflate;
public class JdkDeflateTest extends TestParent { @Param({"1", "2", "3", "4", "5", "6", "7", "8", "9"}) public int m_lvl; @Benchmark public int deflate() throws IOException { return baseBenchmark(new StreamFactory() { @Override public OutputStream getStream(OutputStream underlyingStream) throws IOException { final Deflater deflater = new Deflater( m_lvl, true ); return new DeflaterOutputStream( underlyingStream, deflater, 512 ); } }); }}Test results
The size of the output file
First, let’s look at the size of the output file:
||Implementation||File Size (Bytes)||||GZIP||64,200,201||||Snappy (normal)||138,250,196||||Snappy (framed)|| 101,470,113|||||LZ4 (fast)|| 98,316,501||||LZ4 (high)||82,076,909|||Deflate (lvl=1)||78,369,711||||Deflate (lvl=2)||75,261,711||||Deflate (lvl=3)||73,240,781||||Deflate (lvl=4) ||68,090,059|| ||Deflate (lvl=5) ||65,699,810||||Deflate (lvl=6) ||64,200,191|||||Deflate (lvl=7) ||64,013,638||||Deflate (lvl=8) ||63,845,758||||Deflate (lvl=9) ||63,839,200|||
It can be seen that the file size varies greatly (from 60Mb to 131Mb). Let's take a look at how long it takes for different compression methods.
Compression time
||Implementation||Compression time(ms)||||Snappy.framedOutput||2264.700||||Snappy.normalOutput||2201.120||||Lz4.testFastNative||1056.326||||Lz4.testFastUnsafe||1346.835||||Lz4.testFastSafe||1917.929||||Lz4.testHighNative||7489.958||||Lz4.testHighUnsafe||10306.973||||Lz4.testHighSafe ||14413.622||||deflate (lvl=1) ||4522.644||||deflate (lvl=2) ||4726.477||||deflate (lvl=3) ||5081.934||||deflate (lvl=4) ||6739.450||||deflate (lvl=5) ||7896.572||||deflate (lvl=6) ||9783.701||||deflate (lvl=7) ||10731.761||||deflate (lvl=8) ||14760.361||||deflate (lvl=9) ||14878.364||||GZIP ||10351.887||
We then merge the compression time and file size into a table to count the throughput of the algorithm and see what conclusions can be drawn.
Throughput and efficiency
||Implementation||Time (ms)||Uncompressed file size||Throughput (Mb/sec)|||Compressed file size(Mb)||||Snappy.normalOutput||2201.12||338 ||153.5581885586 ||131.8454742432|||||Snappy.framedOutput||2264.7 ||338 ||149.2471409017 ||96.7693328857||||Lz4.testFastNative ||1056.326 ||338 ||319.9769768045 ||93.7557220459||||Lz4.testFastSafe ||1917.929 ||338 ||176.2317583185 ||93.7557220459|||||Lz4.testFastUnsafe ||1346.835 ||338 ||250.9587291688 ||93.7557220459||||Lz4.testHighNative ||7489.958 ||338 ||45.1270888301 ||78.2680511475||||Lz4.testHighSafe ||14413.622 ||338 ||23.4500391366 ||78.2680511475||||Lz4.testHighUnsafe ||10306.973 ||338 ||32.7933332124 ||78.2680511475||||deflate (lvl=1) ||4522.644 ||338 ||74.7350443679 ||74.7394561768||||deflate (lvl=2) ||4726.477 ||338 ||71.5120374012 ||71.7735290527||||deflate (lvl=3) ||5081.934 ||338 ||66.5101120951 ||69.8471069336||||deflate (lvl=4) ||6739.45 ||338 ||50.1524605124 ||64.9452209473||||deflate (lvl=5) ||7896.572 ||338 ||42.8033835442 ||62.6564025879||||deflate (lvl=6) ||9783.701 ||338 ||34.5472536415 ||61.2258911133|||||deflate (lvl=7) ||10731.761 ||338 ||31.4952969974 ||61.0446929932||||deflate (lvl=8) ||14760.361 ||338 ||22.8991689295 ||60.8825683594||||deflate (lvl=9) ||14878.364 ||338 ||22.7175514727 ||60.8730316162||||GZIP ||10351.887 ||338 ||32.651051929 ||61.2258911133||
As can be seen, most of the implementations are very inefficient: on Xeon E5-2650 processors, the high-level deflate is about 23Mb/sec, and even GZIP is only 33Mb/sec, which is probably difficult to satisfy. At the same time, the fastest defalte algorithm can reach about 75Mb/sec, Snappy is 150Mb/sec, and LZ4 (fast, JNI implementation) can reach an incredible 320Mb/sec!
It can be clearly seen from the table that there are currently two implementations that are at a disadvantage: Snappy is slower than LZ4 (fast compression), and the compressed file is larger. On the contrary, LZ4 (high compression ratio) is slower than deflate from levels 1 to 4, and the size of the output file is much larger than that of deflate from level 1.
Therefore, if you need to perform "real-time compression", I will definitely choose in LZ4 (fast) JNI implementation or level 1 deflate. Of course, if your company does not allow third-party libraries, you can only use deflate. You also need to comprehensively consider how many free CPU resources are there and where the compressed data is stored. For example, if you want to store compressed data to HDD, the above 100Mb/s performance is not helpful to you (assuming your file is large enough) - the speed of HDD will become a bottleneck. If the same file is output to the SSD hard drive - even the LZ4 will appear too slow in front of it. If you want to compress data first and then send it to the network, it is best to choose LZ4, because the compression performance of deflate75Mb/s is really small compared to the throughput of 125Mb/s (of course, I know that there is also a Baotou in the network traffic, but even if it is included, the gap is quite considerable).
Summarize
If you think data compression is very slow, you can consider the LZ4 (fast) implementation, which can achieve text compression speeds of about 320Mb/sec - such compression speed should not be perceived for most applications.
If you are limited to being unable to use third-party libraries or just want to have a slightly better compression solution, you can consider using JDK deflate (lvl=1) for encoding and decoding - the same file can compress the same file to 75Mb/s.
source code
Java compression test source code