Misconceptions about ObjectId in MongoDB and a series of problems caused

Author：Eve Cole Update Time：2025-06-06 23:16:01

Recently, the two applications were transformed, and a series of problems occurred during the launch process (part of which was caused by the ObjectId misunderstanding)

Let’s first understand the ObjectId:

TimeStamp

The first 4 digits are a Unix timestamp, an int category. We extract the first 4 digits of the objectid in the above example "4df2dcec", and then install them in hexadecimal to decimal: "1307761900". This number is a timestamp. In order to make the effect more obvious, we convert this timestamp into the time format we are used to (exactly to seconds)

 $ date -d '1970-01-01 UTC 1307761900 sec' -u

Saturday, June 11, 2011 03:11:40 UTC

The first 4 bytes actually hide the time of document creation, and the timestamp is at the front of the character, which means that the ObjectId will be roughly sorted by insertion, which plays a great role in some aspects, such as improving search efficiency as an index, etc. Another advantage of using timestamps is that some client drivers can parse out when the record was inserted through ObjectId. This also answers the reality that when we create multiple Objectids in a fast and continuous manner, we will find that the first few digits rarely find changes, because they are using the current time. Many users are worried about synchronizing the server time. In fact, the true value of this timestamp is not important, as long as it is constantly increasing.

Machine

The next three bytes are 2cdcd2. These three bytes are the unique identifiers of the host where they are located, and are generally the hash value of the machine host name. This ensures that different hosts generate different machine hash values and ensure that there is no conflict in the distribution. This is why the strings in the objectid generated by the same machine are exactly the same.

pid

The above machine is to ensure that the objectids generated on different machines do not conflict, while the pid is to generate objects that do not conflict in different mongodb processes on the same machine. The next two bits of 0936 are the process identifiers that generate objects.

Increment

The first nine bytes ensure that objects generated by different machines and processes within one second do not conflict. The next three bytes a8b817 are an automatically increased counter to ensure that objects generated within the same second do not find conflicts, allowing 256 to 3rd power equal to the uniqueness of 16777216 records.

ObjectId uniqueness

You may think that to some extent, it can be guaranteed to be unique, whether on the client or on the server.

Misconception 1. Is the document order consistent with the insertion order?

Single threaded situation

The timestamp, machine, pid, and inc in ObjectId can be guaranteed to be unique because on the same machine and the same process.

There is a problem here, mongodb operations are multi-threaded. a, b, c... When several threads conduct in-store operations, it is not guaranteed which one can be before the other, so it will be out of order.

Multithreaded, multi-machine or multi-process situation

Let’s look at the mache and pid in ObjectId that cannot be guaranteed to be unique. Then the data will be even more out of order.

Solution:

Since the data in the collection is unordered (including capped collection), the easiest way is to sort the ObjectId.

There are two ways to sort,

1.mongoDB query statement

 jQuery query = new Query(); if (id != null) { jquery.addCriteria(Criteria.where("_id").gt(id)); } jquery.with(new Sort(Sort.Direction.ASC, "_id"));

2.java.util.PriorityQueue

 Comparator<DBObject> comparator = new Comparator<DBObject>() { @Override public int compare(DBObject o1, DBObject o2) { return ((ObjectId)o1.get("_id")).compareTo((ObjectId)o2.get("_id")); } }; PriorityQueue<DBObject> queue = new PriorityQueue<DBObject>(200,comparator);

Misconception 2: When multiple clients have high concurrency, can the order be guaranteed (after sort)?

If you always ensure that the write is much greater than the readout (more than one second interval), then there will never be any out of order.

Let's take a look at the following example

Now see the figure, take out the data twice

first

4df2dcec aaaa ffff 36a8b813
4df2dcec aaaa eeee 36a8b813
4df2dcec bbbb 1111 36a8b814

The second time

4df2dcec bbbb 1111 36a8b813
4df2dcec aaaa ffff 36a8b814
4df2dcec aaaa eeee 36a8b814

Now if you take the first maximum value (4df2dcec bbbb 1111 36a8b814) to do the next query result, then it will be missed

The three items of the second time, because (4df2dcec bbbb 1111 36a8b814) are greater than all records taken the second time.

This will lead to data loss.

Solution:

Since the timestamp of ObjectId is cut off to seconds, the first four digits of the counter operator are the machine and process numbers.

1. Process records before a certain time interval (more than one second), so that even if the machine and process numbers cause disorder, there will be no disorder before the interval.

2. Single-point insertion, the insertion operation that was originally distributed to several points, is now queried by one point to ensure that the machine and the process number are the same, and the counter operator is used to make the records orderly.

Here, we used the first method.

Misunderstanding 3. Don’t set DBObject_id using mongoDB to set ObjectId?

During the mongoDB insertion operation, when new DBBasicObject(), everyone sees that _id is not filled in, unless _id is set manually. So is it set up on the server?

Let’s take a look at the code for the insertion operation:

Implementation Class

 public WriteResult insert(List<DBObject> list, com.mongodb.WriteConcern concern, DBEncoder encoder ){ if (concern == null) { throw new IllegalArgumentException("Write concern can not be null"); } return insert(list, true, concern, encoder); }

You can see that you need to add, the default is to add

 protected WriteResult insert(List<DBObject> list, boolean shouldApply, com.mongodb.WriteConcern concern, DBEncoder encoder ){ if (encoder == null) encoder = DefaultDBEncoder.FACTORY.create(); if ( willTrace() ) { for (DBObject o : list) { trace( "save: " + _fullNameSpace + " " + JSON.serialize( o ) ); } } if ( shouldApply ){ for (DBObject o : list) { apply(o); _checkObject(o, false, false); Object id = o.get("_id"); if (id instanceof ObjectId) { ((ObjectId) id).notNew(); } } } WriteResult last = null; int cur = 0; int maxsize = _mongo.getMaxBsonObjectSize(); while ( cur < list.size() ) { OutMessage om = OutMessage.insert( this , encoder, concern ); for ( ; cur < list.size(); cur++ ){ DBObject o = list.get(cur); om.putObject( o ); // limit for batch insert is 4 x maxbson on server, use 2 x to be safe if ( om.size() > 2 * maxsize ){ cur++; break; } } last = _connector.say( _db , om , concern ); } return last; }

Automatically add ObjectId operations

 /** * calls {@link DBCollection#apply(com.mongodb.DBObject, boolean)} with ensureID=true * @param o <code>DBObject</code> to which to add fields * @return the modified parameter object */ public Object apply( DBObject o ){ return apply( o , true ); } /** * calls {@link DBCollection#doapply(com.mongodb.DBObject)}, optionally adding an automatic _id field * @param jo object to add fields to * @param ensureID whether to add an <code>_id</code> field * @return the modified object <code>o</code> */ public Object apply( DBObject jo , boolean ensureID ){ Object id = jo.get( "_id" ); if ( ensureID && id == null ){ id = ObjectId.get(); jo.put( "_id" , id ); } doapply( jo ); return id; }

As you can see, ObjectId will be automatically added to the mongoDB driver package.

The method of saving

 public WriteResult save( DBObject jo, WriteConcern concern ){ if ( checkReadOnly( true ) ) return null; _checkObject( jo , false , false ); Object id = jo.get( "_id" ); if ( id == null || ( id instanceof ObjectId && ((ObjectId)id).isNew() ) ){ if ( id != null && id instanceof ObjectId ) ((ObjectId)id).notNew(); if ( concern == null ) return insert( jo ); else return insert( jo, concern ); } DBObject q = new BasicDBObject(); q.put( "_id" , id ); if ( concern == null ) return update( q , jo , true , false ); else return update( q , jo , true , false , concern ); }

To sum up, by default, ObjectId is generated by the client and not by the server without setting.

Misunderstanding 4. Can findAndModify really get auto-increment variables?

 DBObject update = new BasicDBObject("$inc", new BasicDBObject("counter", 1)); DBObject query = new BasicDBObject("_id", key); DBObject result = getMongoTemplate().getCollection(collectionName).findAndModify(query, update); if (result == null) { DBObject doc = new BasicDBObject(); doc.put("counter", 1L); doc.put("_id", key); // insert(collectionName, doc); getMongoTemplate().save(doc, collectionName); return 1L; } return (Long) result.get("counter");

Getting autoincrement variables will be written using this method, but we will find out after execution.

findAndModify operation, first execute find and then execute modify, so when result is null, it should be added and returned 0

The above is the misunderstandings and a series of problems caused by ObjectId in MongoDB that the editor introduced to you. I hope it will be helpful to you. If you have any questions, please leave me a message and the editor will reply to you in time. Thank you very much for your support to Wulin.com website!