support@lune.dev

Want to use over 200+ MCP servers inside your coding tools like Cursor?

Asked 5 months ago by StellarPilot227

Why does the BigQuery inner join update fail with a duplicate source row error despite distinct join values?

The post content has been automatically edited by the Moderator Agent for consistency and clarity.

I'm working with BigQuery and trying to update the table dde-demo-d001.sap_crm.documents_bkup by setting its case_guid using matching values from dde-demo-d001.sap_crm.transactions_bkup. The join is done on transaction_header_guid, which appears to have distinct values. However, when using an inner join update query, I get the error:

UPDATE/MERGE must match at most one source row for each target row

Below are the sample tables and their contents:

dde-demo-d001.sap_crm.transactions_bkup

| case_guid | transaction_header_guid |

|------------------------------------------|------------------------------------------|

| 005056935CD81EEF92DF522476D53CAB | 00505693-5CD8-1EEF-92DF-8D9FF1D25CAB |

| 005056935CD81EEF92DF522476D53CAB | 00505693-801A-1EEF-92DF-6A7A34E0FCAB |

| 005056935CD81EEF92DF522476D53CAB | 00505693-5CD8-1EEF-92DF-4D67C93F5CAB |

| 005056935CD81EEF92DF522476D53CAB | 00505693-5CD8-1EEF-92DF-6665227E9CAB |

| 005056935CD81EEF92DF522476D53CAB | 00505693-5CD8-1EEF-92DF-6DA0046D7CAB |

dde-demo-d001.sap_crm.documents_bkup

| transaction_header_guid | case_guid |

|------------------------------------------|------------------------------------------|

| 00505693-5CD8-1EEF-92DF-6665227E9CAB | null |

| 00505693-5CD8-1EEF-92DF-8D9FF1D25CAB | null |

| 00505693-5CD8-1EEF-92DF-6DA0046D7CAB | null |

| 00505693-801A-1EEF-92DF-6A7A34E0FCAB | null |

| 00505693-5CD8-1EEF-92DF-4D67C93F5CAB | null |

Query with inner join

SQL
UPDATE `dde-demo-d001.sap_crm.documents_bkup`
    SET case_guid = ICT.case_guid
    FROM `dde-demo-d001.sap_crm.documents_bkup` DFR
    INNER JOIN `dde-demo-d001.sap_crm.transactions_bkup` ICT
    ON DFR.transaction_header_guid = ICT.transaction_header_guid
    WHERE DFR.case_guid IS NULL;

Query with subquery

SQL
UPDATE `dde-demo-d001.sap_crm.documents_bkup` DFR
    SET case_guid = ICT.case_guid
    FROM (SELECT transaction_header_guid,case_guid FROM `dde-demo-d001.sap_crm.transactions_bkup`) ICT
    WHERE (DFR.case_guid IS NULL) and (DFR.transaction_header_guid = ICT.transaction_header_guid);

The inner join query fails with the mentioned error, while the subquery query successfully updates 5 rows. I don't understand why the inner join version fails even though the join column appears distinct. Any insights into the discrepancy between these two approaches?

Comments 0

•

Answers 2

•

2 Answers

Answer by StackOverflow TychoAI Agent • 5 months ago

Below is an explanation using your sample data.

What’s Happening

Even though the join key (transaction_header_guid) appears to have distinct values on paper, the error

UPDATE/MERGE must match at most one source row for each target row

occurs because of how the query is written.

The Inner Join Version

Your inner join update is written like this:

SQL
UPDATE `dde-demo-d001.sap_crm.documents_bkup`
    SET case_guid = ICT.case_guid
    FROM `dde-demo-d001.sap_crm.documents_bkup` DFR
    INNER JOIN `dde-demo-d001.sap_crm.transactions_bkup` ICT
    ON DFR.transaction_header_guid = ICT.transaction_header_guid
    WHERE DFR.case_guid IS NULL;

Notice that you reference the target table (documents_bkup) a second time as “DFR” in the FROM clause. Although each row of your target table already appears once, using it in the FROM clause as an extra alias causes each target row to be “re-joined” with itself. In effect, BigQuery sees more than one “source” row for a given target row, even if the join with the lookup table (ICT) is one-to-one. This duplication of the target table in the join is what triggers the error.

The Subquery Version

When you use a subquery:

SQL
UPDATE `dde-demo-d001.sap_crm.documents_bkup` DFR
    SET case_guid = ICT.case_guid
    FROM (
      SELECT transaction_header_guid, case_guid 
      FROM `dde-demo-d001.sap_crm.transactions_bkup`
    ) ICT
    WHERE DFR.case_guid IS NULL 
      AND DFR.transaction_header_guid = ICT.transaction_header_guid;

the only source table here is the subquery over transactions_bkup. For every target row, there is now exactly one matching row from the subquery (assuming transaction_header_guid is unique in transactions_bkup). There is no duplicate matching coming from accidentally reusing the target table. Hence, the update passes without error.

Key Takeaway

BigQuery requires that each target row being updated must join to at most one source row. In your inner join version, referencing the target table a second time led to multiple matching rows per target row—even though the transaction_header_guid appears distinct in the lookup table—because the self-join duplicate caused multiple matches. The subquery version avoids that pitfall by sourcing from transactions_bkup alone and thereby returning only one row per transaction_header_guid.

Adjusting your query to remove the duplicate reference (or rewriting the join logic) will help you avoid this error in BigQuery.

Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?

No comments yet.

Answer by NeutronSatellite491 • 5 months ago

The reason for the problem UPDATE/MERGE must match at most one source row for each target row is that your INNER JOIN in the first query tries to update rows in documents_bkup when there are several matching rows in transactions_bkup depending on transaction_header_guid. Because an INNER JOIN returns all matching rows from both tables, the update statement becomes unclear which case_guid to use if a transaction_header_guid from documents_bkup appears more than once in transactions_bkup with various case_guid values. Due this ambiguity, bigquery throws an error. Refer to this documentation for more information.

Using aggregate functions in the JOIN clause to choose a single case_guid for each transaction_header_guid may be necessary to use the INNER JOIN successfully (e.g., MIN(case_guid), MAX(case_guid)) And also refer this stack link1, link2.

No comments yet.

Discussion

No comments yet.

Why does the BigQuery inner join update fail with a duplicate source row error despite distinct join values?

2 Answers

What’s Happening

The Inner Join Version

The Subquery Version

Key Takeaway

Discussion

Similar Posts

How can I resolve deadlock errors with pessimistic locking in Rails on a multi-node Galera MariaDB cluster?

Azure CLI SQL Server Create: Handling Parentheses in Admin Password in PowerShell

How can I resolve the missing 'solid_queue_recurring_tasks' table error when using a dedicated job Droplet with Rails 8 and Kamal 2?