1. Overview
This tutorial will demonstrate how to efficiently read large files in Java. Java - Return to the basics.
2. Read in memory
The standard way to read file lines is to read in memory. Both Guava and ApacheCommonsIO provide methods to quickly read file lines as follows:
Files.readLines(new File(path), Charsets.UTF_8);
FileUtils.readLines(new File(path));
The problem with this method is that all lines of the file are stored in memory, and when the file is large enough, it will quickly cause the program to throw an OutOfMemoryError exception.
For example: Read a file of about 1G:
@Testpublic void givenUsingGuava_whenIteratingAFile_thenWorks() throws IOException { String path = ... Files.readLines(new File(path), Charsets.UTF_8);}This method takes up only a small amount of memory at the beginning: (it consumes about 0Mb of memory)
[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 128 Mb[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 116 Mb
However, when all the files are read into memory, we can finally see (about 2GB of memory is consumed):
[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 2666 Mb[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 490 Mb
This means that this process consumes about 2.1GB of memory - the reason is simple: now all lines of the file are stored in memory.
Putting all the contents of a file in memory will quickly run out of available memory - no matter how large the actual available memory is, this is obvious.
In addition, we usually don't need to put all lines of the file into memory at once - instead, we just need to traverse each line of the file, then do the corresponding processing, and throw it away after processing. So, that's exactly what we're going to do - iterate through rows, instead of putting all the rows in memory.
3. File Stream
Now let's look at this solution - we will use the java.util.Scanner class to scan the contents of the file and read it continuously line by line:
FileInputStream inputStream = null;Scanner sc = null;try { inputStream = new FileInputStream(path); sc = new Scanner(inputStream, "UTF-8"); while (sc.hasNextLine()) { String line = sc.nextLine(); // System.out.println(line); } // note that Scanner suppresses exceptions if (sc.ioException() != null) { throw sc.ioException(); }} finally { if (inputStream != null) { inputStream.close(); } if (sc != null) { sc.close(); }}This solution will traverse all lines in the file - allowing each line to be processed without keeping a reference to it. Anyway, they were not stored in memory: (about 150MB of memory was consumed)
[main]INFOorg.baeldung.java.CoreJavaIoUnitTest-TotalMemory:763Mb
[main]INFOorg.baeldung.java.CoreJavaIoUnitTest-FreeMemory:605Mb
4. ApacheCommonsIO stream
You can also use the CommonsIO library to implement it, using the custom LineIterator provided by the library:
LineIterator it = FileUtils.lineIterator(theFile, "UTF-8");try { while (it.hasNext()) { String line = it.nextLine(); // do something with line }} finally { LineIterator.closeQuietly(it);}Since the entire file is not stored in memory, this leads to a rather conservative memory consumption: (about 150MB of memory is consumed)
[main]INFOo.b.java.CoreJavaIoIntegrationTest-TotalMemory:752Mb
[main]INFOo.b.java.CoreJavaIoIntegrationTest-FreeMemory:564Mb
5. Conclusion
This short article describes how to process large files without repeated reading and running out of memory - this provides a useful solution for processing large files.
All of these examples are implemented and code snippets available on my github project - this is an Eclipse-based project, so it should be easily imported and run.
The above is all the content of this article about Java efficient reading of large files. I hope it will be helpful to everyone. Interested friends can continue to refer to other related topics on this site. If there are any shortcomings, please leave a message to point it out. Thank you friends for your support for this site!