Asked 1 month ago by StellarSatellite458
How can I prevent duplicate user processing in concurrent Firebase transactions?
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
Asked 1 month ago by StellarSatellite458
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
I have two Firebase functions running hourly that process the same list of users. They share a batch document (this.batchId: MM-dd-yyyy-HH, unique per hour) and use transactions to coordinate processing. Each instance fetches a batch of users after the lastProcessedId and updates the batch document with:
While the global totalCount is accurate, sometimes overlapping instance-specific fields indicate that both instances processed the same IDs (usually a batch of two).
I expected that once Thread 1 commits its transaction and updates the lastProcessedId, Thread 2 would see the new value and process the next batch, aborting if a conflict is detected. However, this isn't happening as anticipated.
Below is the code used within the transaction:
TYPESCRIPTprivate async getNextBatchTransaction(): Promise<{ userDocs: QueryDocumentSnapshot<DocumentData>[] | null, needsCleanup: boolean }> { return this.firestore.runTransaction(async (transaction) => { const batchRef = this.firestore.collection("batch_sequence").doc(this.batchId); const batchDoc = await transaction.get(batchRef); const data = (batchDoc.exists ? batchDoc.data() : { lastProcessedId: null, complete: false, }) as BatchDocument; if (data.complete) { return { userDocs: null }; } let query = this.firestore .collection("users") .orderBy("__name__") .limit(this.batchSize); if (data.lastProcessedId) { query = query.startAfter(data.lastProcessedId); } const userSnapshot = await transaction.get(query); if (userSnapshot.empty) { transaction.set( batchRef, { complete: true }, { merge: true } ); return { userDocs: null }; } const batchLength = userSnapshot.docs.length; const lastDoc = userSnapshot.docs[batchLength - 1]; const processedIds = userSnapshot.docs.map(doc => doc.id); transaction.set( batchRef, { lastProcessedId: lastDoc.id, totalCount: FieldValue.increment(batchLength), [`instance.${this.instanceId}`]: FieldValue.increment(batchLength), [`processedIds.${this.instanceId}`]: FieldValue.arrayUnion(...processedIds), }, { merge: true } ); return { userDocs: userSnapshot.docs}; }); }
Additionally, I reviewed the runTransaction documentation which explains that:
/*
updateFunction
is invoked once for eachWhy don't the transactions seem to prevent overlapping processing? Any insights or advice on ensuring that each instance processes a unique batch of users would be appreciated.
Comments 0
•Answers 2
•Views 57
Answer by TychoAI Agent • 1 month ago
Below is an explanation and some suggestions based on how Firestore transactions work:
When both functions start at about the same time, they perform these steps:
Because:
• The batch document is the only thing that’s “locked” by the transaction,
• And because the increments/arrayUnion operations are applied in a commutative way (without checking that the value being updated is what was read earlier),
there’s no conflict detected during commit.
In other words, although each transaction sees the previous state of the batch document, the writes use atomic server-side operations that “merge” the changes rather than force one transaction to fail. As a result, two transactions that start concurrently may both grab the same set of users—so your global count comes out correctly (it’s the sum of both increments) but the instance‑specific fields end up with overlapping IDs.
Firestore transactions guarantee serializable isolation on the documents that are read. However, note these key points:
Effectively, your transaction isn’t “locking” the query results. Two transactions can see the same user records because they use the same starting point in the batch document before either commit.
To avoid processing the same IDs in parallel you might consider one or more of the following strategies:
Use a Conditional Check.
Read the batch document’s lastProcessedId and then update it conditionally. For example, if the batch document already has a different lastProcessedId than the one you read, then abort your transaction and retry. (This might be done with a compare-and-set style update using a non‐merging write or by adding a “version” field to the document that you update only if it hasn’t changed.)
Lock or Reserve the Batch Range.
Instead of having each transaction simply update the shared field with commutative operations, consider “reserving” the batch of users in a separate document or field; the reserve update (or flag) can then be checked on subsequent transactions so that once one instance “takes” a given batch, no other instance processes it.
Re‑Architecture the Scheduling.
If possible, coordinate the functions so that only one instance runs for a given batch. For example, use a distributed lock (perhaps in Cloud Tasks or Cloud Pub/Sub) to ensure that only one function instance processes the next set of users.
Read and Write All Affected Documents.
If you need strict coordination, you might include the documents representing each “user batch” in the transaction (have separate documents that get marked as processed). That way, if two transactions try to process the same user batch document, one of them will fail due to a read–write conflict.
The overlap occurs because the query on the users collection isn’t “locked” by the transaction and because the batch document update uses commutative updates that don’t force a conflict even though the same lastProcessedId is used for both transactions. Modifying your coordination logic to enforce a check (or reservation) on the batch range should prevent overlapping processing.
Hope this clarifies what’s happening and helps guide you toward a solution!
Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?
No comments yet.
Answer by SolarPilot450 • 1 month ago
I was using firebase-admin version 11.8.0, which is two major releases behind the newest 13.0.2 version. After upgrading to version 13.0.2, the code works as expected.
After test running it 100 times, each instance processed unique ids each time.
No comments yet.
No comments yet.