Hibernate batch processing of massive data

Author：Eve Cole Update Time：2025-04-22 18:32:01

This article describes the method of Hibernate batch processing of massive data. Share it for your reference, as follows:

Hibernate batch processing massive amounts are actually undesirable from the perspective of performance and is wasted a lot of memory. From its mechanism, Hibernate first checks out the data that meets the conditions, puts it in memory, and then performs operations. The performance is very unsatisfactory in actual use. In my actual use, the data of the following third optimization solution is: 100,000 pieces of data are inserted into the database, which takes about 30 minutes. Haha, faint. (I inserted 1000,000 pieces of data in 10 minutes (the fields are relatively small))

There are three ways to deal with it to solve performance problems:

1: Bypass the Hibernate API and directly use the JDBC API. This method has better performance. It's also the fastest.

2: Use stored procedures.

3: Use the Hibernate API to perform regular batch processing. There can be changes, and the change will change. When we find a certain amount, we can delete the data in time after completing the operation, session.flush(); session.evict(XX object set); this can also save some performance losses. This "certain amount" needs to be used as a quantitative reference based on actual conditions. Generally around 30-60, but the effect is still not ideal.

1: Bypass the Hibernate API and do it directly through the JDBC API. This method has better performance and the fastest. (The example is an update operation)

 Transaction tx=session.beginTransaction(); //Note that you are using the hibernate transaction boundary Connection conn=session.connection(); PreparedStatement stmt=conn.preparedStatement("update CUSTOMER as C set C.sarlary=c.sarlary+1 where c.sarlary>1000"); stmt.excuteUpdate(); tx.commit(); //Note that you are using the hibernate transaction boundary

In this applet, it uses the API that directly calls JDBC to access the database, which is very efficient. Avoid performance problems caused by Hibernate first querying and loading into memory, and then performing operations.
2: Use stored procedures. However, this method is not recommended to use because of the convenience of portability and program deployment. (The example is an update operation)

If the underlying database (such as Oracle) supports stored procedures, batch updates can also be performed through stored procedures. Stored procedures run directly in the database, faster. In the Oracle database, a stored procedure named batchUpdateCustomer() can be defined, the code is as follows:
The code copy is as follows: create or replace procedure batchUpdateCustomer(p_age in number) as begin update CUSTOMERS set AGE=AGE+1 where AGE>p_age;end;
The above stored procedure has a parameter p_age, which represents the age of the client. The application can call the stored procedure in the following ways:

 tx = session.beginTransaction();Connection con=session.connection();String procedure = "{call batchUpdateCustomer(?) }";CallableStatement cstmt = con.prepareCall(procedure);cstmt.setInt(1, 0); //Set the age parameter to 0cstmt.executeUpdate();tx.commit();

As can be seen from the above program, the application must also bypass the Hibernate API and directly call stored procedures through the JDBC API.

3: Use the Hibernate API to perform regular batch processing. There can be changes, and the change will change. When we find a certain amount, we can delete the data in time after completing the operation, session.flush(); session.evict(XX object set); this can also save some performance losses. This "certain amount" needs to be a quantitative reference based on actual conditions...
(The example is a save operation)

The business logic is: We want to insert 10 0000 pieces of data into the database

 tx=session.beginTransaction();for(int i=0;i<100000;i++){Customer custom=new Customer();custom.setName("user"+i);session.save(custom);if(i%50==0) // Use every 50 data as a processing unit, which is what I mentioned above, this quantity must be considered as appropriate {session.flush();session.clear();}}

This will keep the system in a stable range...

During the project development process, due to project requirements, we often need to insert large quantities of data into the database. There are tens of thousands, tens of thousands, tens of millions, even tens of millions of them. If you use Hibernate for inserting data of this level of magnitude, an exception may occur. The common exception is OutOfMemoryError (memory overflow exception).

First, let’s briefly review the mechanism of Hibernate insertion operation. Hibernate needs to maintain its internal cache. When we perform the insert operation, we will put all the objects to operate in our internal cache for management.

When it comes to Hibernate's cache, Hibernate has theories of internal cache and secondary cache. Since Hibernate has different management mechanisms for these two caches, we can configure its size in relation to the secondary cache, while for internal caches, Hibernate adopts a "leash streaming" attitude, and there is no limit on its capacity. Now the crux of the problem is found. When we insert massive data, so many objects will be included in the internal cache (the internal cache is cached in memory), so that your system memory will be eaten up bit by bit. If the system is finally "fried", it is reasonable.

Let’s think about how to deal with this problem better? Some development conditions must be handled using Hibernate, and of course some projects are more flexible and you can find other methods.

Here I recommend two methods:

(1): Optimize Hibernate, and use the method of segmented insertion to clear cache in time on the program.
(2): Bypass the Hibernate API and do batch insertion directly through the JDBC API. This method has the best performance and the fastest.

For Method 1 above, the basic idea is: optimize Hibernate, set the hibernate.jdbc.batch_size parameter in the configuration file to specify the number of SQL submitted each time; the program uses the method of clearing the cache in time in segmented insertion (Session implements asynchronous write-behind, which allows Hibernate to explicitly write operations), that is, clear them from the internal cache in time after inserting a certain amount of data, and frees up the occupied memory.

To set the hibernate.jdbc.batch_size parameter, you can refer to the following configuration.

 <hibernate-configuration> <session-factory>…<property name=" hibernate.jdbc.batch_size">50</property>…<session-factory> <hibernate-configuration>

The reason for configuring the hibernate.jdbc.batch_size parameter is to read the database as little as possible. The larger the value of the hibernate.jdbc.batch_size parameter, the fewer the times you read the database, and the faster the speed. From the above configuration, it can be seen that Hibernate waits until the program accumulates 50 SQL before submitting it in batches.

The author is also thinking that the value of the hibernate.jdbc.batch_size parameter may not be set as large as possible, and it remains to be discussed from a performance perspective. This requires consideration of actual situation and setting it as appropriate. Generally, setting 30 or 50 can meet the needs.

In terms of program implementation, the author takes the insertion of 10,000 pieces of data as an example,

 Session session=HibernateUtil.currentSession();Transatcion tx=session.beginTransaction();for(int i=0;i<10000;i++){Student st=new Student();st.setName("feifei");session.save(st);if(i%50==0) //Use every 50 data as a processing unit {session.flush(); //Keep synchronous with database data session.clear(); //Clear all data cached internally and release the occupied memory in time}}tx.commit();...

Under a certain data scale, this approach can maintain system memory resources in a relatively stable range.

Note: The second-level cache mentioned earlier is necessary for me to mention it here. If the secondary cache is enabled, in order to maintain the secondary cache, Hibernate will charge the corresponding data to the secondary cache when we do insert, update, and delete operations. There will be a huge loss in performance, so the author recommends disabling Level 2 cache in batch processing.

For Method 2, traditional JDBC batch processing is used and the JDBC API is used to process it.

Please refer to Java batch processing and self-execution SQL.

Looking at the above code, do you always feel that something is inappropriate? Yes, didn't you notice it! This is still the traditional programming of JDBC, without a hibernate flavor.

The above code can be modified to the following:

 Transaction tx=session.beginTransaction(); //Use Hibernate transaction processing Connection conn=session.connection(); PrepareStatement stmt=conn.prepareStatement("insert into T_STUDENT(name) values(?)"); for(int j=0;j++;j<200){for(int i=0;i++;j<50){stmt.setString(1, "feifei");}}stmt.executeUpdate();tx.commit(); //Use Hibernate transaction processing boundary...

This change will have a Hibernate flavor. After testing, the author uses the JDBC API for batch processing, which is nearly 10 times higher in performance than using the Hibernate API. This is undoubtedly the dominant performance of JDBC.

In batch update and deletion of Hibernate2, for batch update operations, Hibernate finds out the data that meets the requirements and then performs the update operation. The same is true for batch deletion. First find out the data that meets the conditions, and then do the deletion operation.

This has two major disadvantages:

(1): Takes up a lot of memory.
(2): When processing massive data, executing the update/delete statement is a massive amount, and an update/delete statement can only operate one object. It is conceivable that the performance of the database is low if it is frequently operated.

After Hibernate3 was released, bulk update/delete was introduced for batch update/delete operations. The principle is to complete batch update/delete operations through an HQL statement, which is very similar to JDBC's batch update/delete operations. In terms of performance, there is a great improvement over batch updates/deletion of Hibernate2.

 Transaction tx=session.beginSession();String HQL="delete STUDENT";Query query=session.createQuery(HQL);int size=query.executeUpdate();tx.commit();...

The console outputs only one delete statement Hibernate: delete from T_STUDENT. The statement execution is less, and the performance is almost the same as using JDBC. It is a good way to improve performance. Of course, in order to have better performance, the author recommends that batch updates and deletion operations still use JDBC. The methods and basic knowledge points are basically the same as the above batch insertion method 2, so I will not describe it redundantly here.

Here I provide another method, which is to consider improving performance from the database side and call stored procedures on the Hibernate program side. Stored procedures run on the database side, faster. Taking batch updates as an example, the reference code is given.

First, create a stored procedure named batchUpdateStudent on the database side:

 create or replace produce batchUpdateStudent(a in number) asbeginupdate STUDENT set AGE=AGE+1 where AGE>a;end;

The call code is as follows:

 Transaction tx=session.beginSession();Connection conn=session.connection();String pd="…{call batchUpdateStudent(?)}";CallableStatement cstmt=conn.PrepareCall(pd);cstmt.setInt(1, 20); //Set the age parameter to 20tx.commit();

Observing the above code, it also bypasses the Hibernate API and uses the JDBC API to call stored procedures, and uses Hibernate's transaction boundaries. Stored procedures are undoubtedly a good way to improve batch processing performance. They run directly with the database side, and to some extent transfer the pressure of batch processing to the database.

Postscript

This article discusses Hibernate's batch processing operations, and the starting point is to consider improving performance, and it only provides a small aspect of improving performance.

No matter what method is adopted, it must be considered based on actual conditions. It is important to provide users with an efficient and stable system that meets their needs.

I hope this article will be helpful to everyone's Hibernate programming.