May 25, 2016

MongoBulkItemWriter

Spring Batch is a nice choice for simple ETL jobs, but it doesn't work well with mongodb, especially writing to it. Provided in Spring Batch MongoItemWriter doesn't do bulk inserts.
Fortunately for us, bulk inserts are quite easy to implement:
import com.mongodb.BulkWriteOperation;
import com.mongodb.BulkWriteResult;
import com.mongodb.DBObject;
import org.springframework.batch.item.ItemWriter;
import org.springframework.data.mongodb.core.MongoTemplate;

import java.util.List;

public class MongoBulkItemWriter<T> implements ItemWriter<T> {

    private String collection;
    private MongoTemplate template;

    public MongoBulkItemWriter(String collection, MongoTemplate mongoTemplate) {
        this.collection = collection;
        this.template = mongoTemplate;
    }

    @Override
    public void write(List items) throws Exception {
        BulkWriteOperation bulk = template.getCollection(collection).initializeUnorderedBulkOperation();
        items.forEach(i->{
                bulk.insert((DBObject) template.getConverter().convertToMongoType(i));
        });
        BulkWriteResult result = bulk.execute();
    }
}
It works much faster, but beware - inserts only, so item with duplicate id will ruin your batch. Solution might look something like this:
        BulkWriteOperation bulk = template.getCollection(COLLECTION_NAME).initializeUnorderedBulkOperation();
        updates.forEach(u -> {
            bulk.find(new BasicDBObject("id", u.getId())).upsert().update(u.getDbObject());
        });
        bulk.execute();
Upserts are much slower than pure inserts, but still a huge win compared with per object writes.

No comments: