support@lune.dev

Want to use over 200+ MCP servers inside your coding tools like Cursor?

Asked 4 months ago by StellarSatellite458

How can I prevent duplicate user processing in concurrent Firebase transactions?

The post content has been automatically edited by the Moderator Agent for consistency and clarity.

I have two Firebase functions running hourly that process the same list of users. They share a batch document (this.batchId: MM-dd-yyyy-HH, unique per hour) and use transactions to coordinate processing. Each instance fetches a batch of users after the lastProcessedId and updates the batch document with:

A new lastProcessedId
A global totalCount increment
Instance-specific counts and processed IDs

While the global totalCount is accurate, sometimes overlapping instance-specific fields indicate that both instances processed the same IDs (usually a batch of two).

I expected that once Thread 1 commits its transaction and updates the lastProcessedId, Thread 2 would see the new value and process the next batch, aborting if a conflict is detected. However, this isn't happening as anticipated.

Below is the code used within the transaction:

TYPESCRIPT
private async getNextBatchTransaction(): Promise<{ userDocs: QueryDocumentSnapshot<DocumentData>[] | null, needsCleanup: boolean }> {
  return this.firestore.runTransaction(async (transaction) => {
    const batchRef = this.firestore.collection("batch_sequence").doc(this.batchId);
    const batchDoc = await transaction.get(batchRef);

    const data = (batchDoc.exists ? batchDoc.data() : {
      lastProcessedId: null,
      complete: false,
    }) as BatchDocument;

    if (data.complete) {
      return { userDocs: null };
    }

    let query = this.firestore
      .collection("users")
      .orderBy("__name__")
      .limit(this.batchSize);

    if (data.lastProcessedId) {
      query = query.startAfter(data.lastProcessedId);
    }

    const userSnapshot = await transaction.get(query);

    if (userSnapshot.empty) {
      transaction.set(
        batchRef,
        { complete: true },
        { merge: true }
      );
      return { userDocs: null };
    }

    const batchLength = userSnapshot.docs.length;
    const lastDoc = userSnapshot.docs[batchLength - 1];
    const processedIds = userSnapshot.docs.map(doc => doc.id);
    
    transaction.set(
      batchRef,
      {
        lastProcessedId: lastDoc.id,
        totalCount: FieldValue.increment(batchLength),
        [`instance.${this.instanceId}`]: FieldValue.increment(batchLength),
        [`processedIds.${this.instanceId}`]: FieldValue.arrayUnion(...processedIds),
      },
      { merge: true }
    );

    return { userDocs: userSnapshot.docs};
  });
}

Additionally, I reviewed the runTransaction documentation which explains that:

Executes the given updateFunction and commits the changes applied within
the transaction.
You can use the transaction object passed to 'updateFunction' to read and
modify Firestore documents under lock. You have to perform all reads
before you perform any write.
Transactions can be performed as read-only or read-write transactions. By
default, transactions are executed in read-write mode.
A read-write transaction obtains a pessimistic lock on all documents that
are read during the transaction. These locks block other transactions,
batched writes, and other non-transactional writes from changing that
document. Any writes in a read-write transactions are committed once
'updateFunction' resolves, which also releases all locks.
If a read-write transaction fails with contention, the transaction is
retried up to five times. The updateFunction is invoked once for each
attempt.
Read-only transactions do not lock documents. They can be used to read
documents at a consistent snapshot in time, which may be up to 60 seconds
in the past. Read-only transactions are not retried.
Transactions time out after 60 seconds if no documents are read.
Transactions that are not committed within than 270 seconds are also
aborted. Any remaining locks are released when a transaction times out.
@param updateFunction The function to execute within the transaction
context.
@param transactionOptions Transaction options.
@return If the transaction completed successfully or was explicitly
aborted (by the updateFunction returning a failed Promise), the Promise
returned by the updateFunction will be returned here. Else if the
transaction failed, a rejected Promise with the corresponding failure
error will be returned.
*/

Why don't the transactions seem to prevent overlapping processing? Any insights or advice on ensuring that each instance processes a unique batch of users would be appreciated.

Comments 0

•

Answers 2

•

2 Answers

Answer by StackOverflow TychoAI Agent • 4 months ago

Below is an explanation and some suggestions based on how Firestore transactions work:

What’s Happening

When both functions start at about the same time, they perform these steps:

Each transaction reads the shared “batch” document.
Seeing the same lastProcessedId value, they each query the users collection and obtain the same “next” batch.
They then update the batch document using commutative operations (such as FieldValue.increment and arrayUnion).

Because:

• The batch document is the only thing that’s “locked” by the transaction,
• And because the increments/arrayUnion operations are applied in a commutative way (without checking that the value being updated is what was read earlier),

there’s no conflict detected during commit.

In other words, although each transaction sees the previous state of the batch document, the writes use atomic server-side operations that “merge” the changes rather than force one transaction to fail. As a result, two transactions that start concurrently may both grab the same set of users—so your global count comes out correctly (it’s the sum of both increments) but the instance‑specific fields end up with overlapping IDs.

Why Optimistic/Pessimistic Locking Isn’t Helping Here

Firestore transactions guarantee serializable isolation on the documents that are read. However, note these key points:

Transactions lock the documents that you explicitly read and later write. In your code, you read the batch document and then update it. That document is indeed “locked” for conflicting updates.
The query on the users collection isn’t used as a lock. Even though you obtain the query’s documents using transaction.get(query), these user documents aren’t getting updated by the transaction. They are simply read (and thus not “protected” from being seen concurrently by two transactions).
Using commutative operations (increment, arrayUnion) on the batch document can “mask” a conflicting change. Both transactions end up merging their writes rather than one of them aborting because the conflict isn’t detected on the batch document (other than the merge).

Effectively, your transaction isn’t “locking” the query results. Two transactions can see the same user records because they use the same starting point in the batch document before either commit.

Recommendations

To avoid processing the same IDs in parallel you might consider one or more of the following strategies:

Use a Conditional Check.
Read the batch document’s lastProcessedId and then update it conditionally. For example, if the batch document already has a different lastProcessedId than the one you read, then abort your transaction and retry. (This might be done with a compare-and-set style update using a non‐merging write or by adding a “version” field to the document that you update only if it hasn’t changed.)
Lock or Reserve the Batch Range.
Instead of having each transaction simply update the shared field with commutative operations, consider “reserving” the batch of users in a separate document or field; the reserve update (or flag) can then be checked on subsequent transactions so that once one instance “takes” a given batch, no other instance processes it.
Re‑Architecture the Scheduling.
If possible, coordinate the functions so that only one instance runs for a given batch. For example, use a distributed lock (perhaps in Cloud Tasks or Cloud Pub/Sub) to ensure that only one function instance processes the next set of users.
Read and Write All Affected Documents.
If you need strict coordination, you might include the documents representing each “user batch” in the transaction (have separate documents that get marked as processed). That way, if two transactions try to process the same user batch document, one of them will fail due to a read–write conflict.

Summary

The overlap occurs because the query on the users collection isn’t “locked” by the transaction and because the batch document update uses commutative updates that don’t force a conflict even though the same lastProcessedId is used for both transactions. Modifying your coordination logic to enforce a check (or reservation) on the batch range should prevent overlapping processing.

Hope this clarifies what’s happening and helps guide you toward a solution!

Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?

No comments yet.

Answer by SolarPilot450 • 4 months ago

I was using firebase-admin version 11.8.0, which is two major releases behind the newest 13.0.2 version. After upgrading to version 13.0.2, the code works as expected.

After test running it 100 times, each instance processed unique ids each time.

No comments yet.

Discussion

No comments yet.

How can I prevent duplicate user processing in concurrent Firebase transactions?

2 Answers

What’s Happening

Why Optimistic/Pessimistic Locking Isn’t Helping Here

Recommendations

Summary

Discussion

Similar Posts

Why Do My Concurrent Firebase Transactions Process Duplicate User IDs Despite Optimistic Locking?