PostgreSQL’s INSERT ... ON CONFLICT DO UPDATE
(commonly known as UPSERT) statement, introduced in version 9.5, provides a powerful and atomic way to either insert a new row or update an existing one if a conflict, typically a unique constraint violation, occurs. While this feature greatly simplifies handling “insert-or-update” logic and is designed to be concurrency-safe for its specified conflict target, developers can sometimes encounter the dreaded ERROR: duplicate key value violates unique constraint
(SQLSTATE: 23505 unique_violation
) even when using it, especially under high concurrent loads.
This article explores the common reasons behind this apparent contradiction, focusing on how concurrent transactions interact with multiple unique constraints and other subtleties of the UPSERT mechanism. We will delve into diagnostic techniques and provide robust solutions to help you build more resilient data ingestion pipelines.
Understanding the “Duplicate Key” Error with ON CONFLICT
The error message ERROR: duplicate key value violates unique constraint "constraint_name"
clearly indicates an attempt to insert or update a row in a way that would create a duplicate value for a column (or set of columns) governed by a unique constraint.
The core of the confusion often arises because INSERT ... ON CONFLICT
is specifically designed to handle such conflicts. However, its conflict resolution (the DO UPDATE
or DO NOTHING
part) is tied to a single arbiter constraint (or the unique index backing it). If a concurrent operation leads to a violation of a different unique constraint on the same table, the error can still surface.
Core Reasons for Error 23505 During Concurrent Upserts
While ON CONFLICT
is atomic for its chosen conflict target, several scenarios can lead to unexpected duplicate key errors under concurrency.
1. Multiple Unique Constraints: The Primary Culprit
This is the most frequent reason. An INSERT ... ON CONFLICT
statement specifies a conflict_target
(either a list of columns that have a unique index or the name of a unique constraint). This target becomes the “arbiter” for deciding if a conflict exists and what action to take.
If a table has other unique constraints not designated as the arbiter for that specific UPSERT operation, a race condition can occur:
- Transaction A (TxA) starts an
INSERT ... ON CONFLICT (arbiter_column) DO UPDATE ...
. It performs a speculative check forarbiter_column
and finds no conflict. - Transaction B (TxB) starts a similar
INSERT
. It also finds no conflict onarbiter_column
. - TxB successfully inserts its row and commits. This row satisfies
arbiter_column
but also sets a value foranother_unique_column
. - TxA now proceeds to actually insert its row (or update based on
arbiter_column
). If the values TxA intends to write foranother_unique_column
now conflict with what TxB just committed, PostgreSQL will raise theduplicate key value violates unique constraint
error foranother_unique_column
. TheON CONFLICT
clause forarbiter_column
does not handle violations of other constraints.
Illustrative Scenario with Code:
Consider a table user_profiles
with a primary key id
and another unique constraint on email
:
|
|
Now, imagine two concurrent transactions. Transaction A attempts an UPSERT targeting id = 1
, intending to set email = 'common@example.com'
. Transaction B inserts a new row with id = 2
but also sets email = 'common@example.com'
and commits first.
|
|
If Transaction B inserts (2, 'common@example.com', 'user_b_name')
and commits, the email 'common@example.com'
is now present. When Transaction A proceeds (whether it’s inserting a new row with id=1
or updating an existing one), its ON CONFLICT (id)
logic applies. However, the attempt to set email = 'common@example.com'
will violate the separate unique constraint on the email
column (e.g., user_profiles_email_key
), and this violation is not handled by the ON CONFLICT (id)
clause. The PostgreSQL documentation notes that internal mechanisms suppress errors only on the arbiter indexes.
2. Complex WHERE
Clauses in DO UPDATE
If the DO UPDATE SET ... WHERE condition
clause is used, the condition
is evaluated after the conflict is detected. If the condition
on the existing row (the one that caused the conflict) evaluates to false
, then no update occurs, and the statement effectively behaves like DO NOTHING
for that conflicting row.
In some race conditions, if a row is inserted by another transaction after the initial conflict check but before the current transaction attempts its insert (perhaps because the DO UPDATE
path wasn’t taken for a subtly different reason initially), a duplicate key error could still occur if the logic doesn’t perfectly cover all interleavings.
3. Triggers Modifying Data
Triggers (e.g., BEFORE INSERT
or BEFORE UPDATE
) on the table might modify data in a way that introduces a unique constraint violation not directly anticipated by the ON CONFLICT
clause itself. The trigger’s operations are part of the same transaction but could create a conflicting state.
4. Index Corruption (Rare)
Though uncommon, a corrupted unique index might lead to incorrect behavior, including spurious duplicate key errors or missed conflict detections. This can sometimes occur after OS-level issues, storage problems, or certain types of crashes, especially if collation behavior changes (e.g., glibc updates).
Best Practices and Solutions
Addressing these errors involves careful schema design, precise ON CONFLICT
targeting, and sometimes application-level adjustments.
1. Specify the Correct and All-Encompassing conflict_target
If your intention is to avoid a duplicate on a specific set of columns, ensure your conflict_target
precisely matches that unique constraint. See the INSERT
documentation for details on conflict_target
.
- Using column(s):
ON CONFLICT (column1, column2, ...)
- Using constraint name:
ON CONFLICT ON CONSTRAINT constraint_name
Using the constraint name can be more robust against column reordering or renaming if the constraint itself remains stable.
|
|
This strategy ensures that if a row with unique_email@example.com
already exists, its username
and last_updated
fields are updated.
2. Handling Multiple Unique Constraints
This is the most challenging scenario. There isn’t a single ON CONFLICT
clause that can simultaneously arbitrate on multiple distinct unique constraints in one go if the conflicting values could violate any of them independently.
- Sequential Upserts or Checks: Perform separate checks or upsert attempts if you need to guarantee uniqueness across multiple dimensions that aren’t part of a single composite unique key. This adds complexity and reduces atomicity.
- Application-Level Locking: For critical sections where multiple unique constraints must be respected atomically in a complex way, consider explicit locking (e.g.,
SELECT ... FOR UPDATE
on related data, or advisory locks), but this significantly impacts concurrency and complexity. - Schema Review: Re-evaluate if multiple unique constraints are truly necessary or if the entity can be modeled differently.
- Ensure
DO UPDATE
Harmonizes: If an update occurs, ensure theSET
clause does not inadvertently violate another unique constraint. For instance, ifON CONFLICT (id) DO UPDATE SET email = EXCLUDED.email
, you are still relying on the database to throw an error ifEXCLUDED.email
violates the uniqueness of theemail
column. TheON CONFLICT
clause forid
won’t prevent this second error.
3. Careful Crafting of DO UPDATE WHERE
Clauses
Ensure the WHERE
clause in DO UPDATE SET ... WHERE condition
correctly identifies the row to update and doesn’t lead to scenarios where an update is skipped, causing a subsequent insert attempt to fail on a newly appeared concurrent row. The WHERE
clause applies to the existing row that caused the conflict.
|
|
If an inactive counter exists, this DO UPDATE
will be skipped. If a concurrent transaction then inserts an active counter with the same name before this transaction completes its alternative path (which might be to re-try an insert or fail), an error could occur.
4. Application-Level Retries
For transient issues, especially in highly concurrent systems or when deadlocks (though ON CONFLICT
reduces these) or other non-deterministic errors occur, implement a retry mechanism in your application logic. Use exponential backoff and a limited number of retries.
|
|
Note: The C# code is illustrative. Ensure actual implementation details, like connection management and specific command execution, are handled correctly.
5. Index Maintenance
If index corruption is suspected (e.g., after system issues or if behavior is inexplicable), rebuild the relevant indexes using the REINDEX
command:
|
|
For production systems, consider REINDEX CONCURRENTLY
where possible to minimize locking, though it has its own caveats and cannot be run inside a transaction block.
6. Review and Test Triggers
Thoroughly test the interaction of your triggers with concurrent ON CONFLICT
statements. Ensure triggers do not independently cause unique constraint violations that bypass the ON CONFLICT
logic.
Diagnostic Techniques
To effectively troubleshoot these errors:
- Examine the Full Error Message: The
DETAIL
part of the PostgreSQL error message is crucial. It usually indicates the name of the violated constraint and the conflicting key value.1 2
ERROR: duplicate key value violates unique constraint "user_profiles_email_key" DETAIL: Key (email)=(common@example.com) already exists.
- Verify Constraints and Indexes: List all unique constraints and associated indexes on the table. Use system catalogs like
pg_constraint
and views likepg_indexes
.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
-- List unique constraints for a table SELECT conname AS constraint_name, pg_get_constraintdef(oid) AS constraint_definition FROM pg_constraint WHERE contype = 'u' -- 'u' for unique constraint AND conrelid = 'user_profiles'::regclass; -- List indexes (unique indexes often back unique constraints) SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'user_profiles'; -- Check for invalid indexes (should return no rows) SELECT c.relname AS index_name FROM pg_index i JOIN pg_class c ON c.oid = i.indexrelid JOIN pg_class t ON t.oid = i.indrelid WHERE t.relname = 'user_profiles' AND NOT i.indisvalid;
- Simplify and Reproduce: Attempt to create a minimal, reproducible test case with simplified concurrent operations that reliably trigger the error. This helps isolate the exact interaction causing the problem.
- PostgreSQL Logging: Temporarily increase logging verbosity (
log_statement = 'all'
orlog_min_duration_statement = 0
inpostgresql.conf
or per-session) to capture the exact SQL statements being executed by concurrent transactions. Caution: This can generate very large logs and impact performance. - Monitor System Views: During periods when the error occurs, inspect
pg_locks
for lock contention andpg_stat_activity
for query states, including waiting queries.1 2 3 4 5 6 7
SELECT query, wait_event_type, wait_event, state FROM pg_stat_activity WHERE datname = 'your_database_name'; SELECT locktype, relation::regclass, mode, granted, pid FROM pg_locks WHERE relation = 'user_profiles'::regclass;
Common Pitfalls to Avoid
- Overlooking Secondary Unique Constraints: The most common pitfall is assuming
ON CONFLICT (primary_key_column)
will handle violations of all other unique constraints on the table. It won’t. - Ambiguous
conflict_target
: If column names inON CONFLICT (column_list)
could map to multiple unique indexes, PostgreSQL’s choice might not be what you expect. Naming the constraint (ON CONFLICT ON CONSTRAINT constraint_name
) is more explicit. - Intra-Statement Conflicts: If a single
INSERT
statement attempts to insert multiple rows that would conflict with each other on the arbiter key(s) (e.g.,INSERT INTO ... VALUES (1,'a'),(1,'b') ON CONFLICT (id) ...
), PostgreSQL will raise a “cardinality violation” error. TheON CONFLICT
clause cannot update the same row twice within the same statement. - Not Considering Trigger Side Effects: Forgetting that triggers can independently modify data leading to unique violations.
Conclusion
While PostgreSQL’s INSERT ... ON CONFLICT DO UPDATE
statement is a robust tool for atomic UPSERT operations, it’s crucial to understand its scope, particularly concerning its interaction with multiple unique constraints under concurrent workloads. The ERROR: duplicate key value violates unique constraint
often signals that a unique constraint other than the one specified as the arbiter has been violated due to a race condition with a concurrent transaction.
By carefully defining conflict targets, understanding the behavior with multiple constraints, judiciously using WHERE
clauses in the DO UPDATE
part, performing thorough diagnostics, and implementing application-level retries where appropriate, you can effectively troubleshoot and mitigate these concurrency-related errors, leading to more resilient and reliable PostgreSQL applications.