adllm Insights logo adllm Insights logo

Troubleshooting PostgreSQL ERROR 23505: Duplicate Key with Concurrent `INSERT ON CONFLICT`

Published on by The adllm Team. Last modified: . Tags: PostgreSQL SQL Database Concurrency UPSERT Unique Constraint Troubleshooting Error 23505

PostgreSQL’s INSERT ... ON CONFLICT DO UPDATE (commonly known as UPSERT) statement, introduced in version 9.5, provides a powerful and atomic way to either insert a new row or update an existing one if a conflict, typically a unique constraint violation, occurs. While this feature greatly simplifies handling “insert-or-update” logic and is designed to be concurrency-safe for its specified conflict target, developers can sometimes encounter the dreaded ERROR: duplicate key value violates unique constraint (SQLSTATE: 23505 unique_violation) even when using it, especially under high concurrent loads.

This article explores the common reasons behind this apparent contradiction, focusing on how concurrent transactions interact with multiple unique constraints and other subtleties of the UPSERT mechanism. We will delve into diagnostic techniques and provide robust solutions to help you build more resilient data ingestion pipelines.

Understanding the “Duplicate Key” Error with ON CONFLICT

The error message ERROR: duplicate key value violates unique constraint "constraint_name" clearly indicates an attempt to insert or update a row in a way that would create a duplicate value for a column (or set of columns) governed by a unique constraint.

The core of the confusion often arises because INSERT ... ON CONFLICT is specifically designed to handle such conflicts. However, its conflict resolution (the DO UPDATE or DO NOTHING part) is tied to a single arbiter constraint (or the unique index backing it). If a concurrent operation leads to a violation of a different unique constraint on the same table, the error can still surface.

Core Reasons for Error 23505 During Concurrent Upserts

While ON CONFLICT is atomic for its chosen conflict target, several scenarios can lead to unexpected duplicate key errors under concurrency.

1. Multiple Unique Constraints: The Primary Culprit

This is the most frequent reason. An INSERT ... ON CONFLICT statement specifies a conflict_target (either a list of columns that have a unique index or the name of a unique constraint). This target becomes the “arbiter” for deciding if a conflict exists and what action to take.

If a table has other unique constraints not designated as the arbiter for that specific UPSERT operation, a race condition can occur:

  1. Transaction A (TxA) starts an INSERT ... ON CONFLICT (arbiter_column) DO UPDATE .... It performs a speculative check for arbiter_column and finds no conflict.
  2. Transaction B (TxB) starts a similar INSERT. It also finds no conflict on arbiter_column.
  3. TxB successfully inserts its row and commits. This row satisfies arbiter_column but also sets a value for another_unique_column.
  4. TxA now proceeds to actually insert its row (or update based on arbiter_column). If the values TxA intends to write for another_unique_column now conflict with what TxB just committed, PostgreSQL will raise the duplicate key value violates unique constraint error for another_unique_column. The ON CONFLICT clause for arbiter_column does not handle violations of other constraints.

Illustrative Scenario with Code:

Consider a table user_profiles with a primary key id and another unique constraint on email:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
CREATE TABLE user_profiles (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE,
    username VARCHAR(50) NOT NULL,
    last_updated TIMESTAMPTZ DEFAULT NOW()
);

-- Optional: Forcing the name of the unique constraint for clarity
-- This constraint is often automatically named, e.g., user_profiles_email_key
-- ALTER TABLE user_profiles
-- ADD CONSTRAINT uq_user_profiles_email UNIQUE (email);

Now, imagine two concurrent transactions. Transaction A attempts an UPSERT targeting id = 1, intending to set email = 'common@example.com'. Transaction B inserts a new row with id = 2 but also sets email = 'common@example.com' and commits first.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
-- Transaction A (assume id=1 might or might not exist)
INSERT INTO user_profiles (id, email, username)
VALUES (1, 'common@example.com', 'user_a_new_name')
ON CONFLICT (id) DO UPDATE -- Arbiter is the primary key 'id'
SET
    email = EXCLUDED.email,
    username = EXCLUDED.username,
    last_updated = NOW();

-- Transaction B (inserts a new row, id=2)
-- This transaction commits first.
INSERT INTO user_profiles (id, email, username)
VALUES (2, 'common@example.com', 'user_b_name')
ON CONFLICT (id) DO UPDATE -- Arbiter is 'id'
SET
    email = EXCLUDED.email,
    username = EXCLUDED.username,
    last_updated = NOW();

If Transaction B inserts (2, 'common@example.com', 'user_b_name') and commits, the email 'common@example.com' is now present. When Transaction A proceeds (whether it’s inserting a new row with id=1 or updating an existing one), its ON CONFLICT (id) logic applies. However, the attempt to set email = 'common@example.com' will violate the separate unique constraint on the email column (e.g., user_profiles_email_key), and this violation is not handled by the ON CONFLICT (id) clause. The PostgreSQL documentation notes that internal mechanisms suppress errors only on the arbiter indexes.

2. Complex WHERE Clauses in DO UPDATE

If the DO UPDATE SET ... WHERE condition clause is used, the condition is evaluated after the conflict is detected. If the condition on the existing row (the one that caused the conflict) evaluates to false, then no update occurs, and the statement effectively behaves like DO NOTHING for that conflicting row.

In some race conditions, if a row is inserted by another transaction after the initial conflict check but before the current transaction attempts its insert (perhaps because the DO UPDATE path wasn’t taken for a subtly different reason initially), a duplicate key error could still occur if the logic doesn’t perfectly cover all interleavings.

3. Triggers Modifying Data

Triggers (e.g., BEFORE INSERT or BEFORE UPDATE) on the table might modify data in a way that introduces a unique constraint violation not directly anticipated by the ON CONFLICT clause itself. The trigger’s operations are part of the same transaction but could create a conflicting state.

4. Index Corruption (Rare)

Though uncommon, a corrupted unique index might lead to incorrect behavior, including spurious duplicate key errors or missed conflict detections. This can sometimes occur after OS-level issues, storage problems, or certain types of crashes, especially if collation behavior changes (e.g., glibc updates).

Best Practices and Solutions

Addressing these errors involves careful schema design, precise ON CONFLICT targeting, and sometimes application-level adjustments.

1. Specify the Correct and All-Encompassing conflict_target

If your intention is to avoid a duplicate on a specific set of columns, ensure your conflict_target precisely matches that unique constraint. See the INSERT documentation for details on conflict_target.

  • Using column(s): ON CONFLICT (column1, column2, ...)
  • Using constraint name: ON CONFLICT ON CONSTRAINT constraint_name

Using the constraint name can be more robust against column reordering or renaming if the constraint itself remains stable.

1
2
3
4
5
6
7
8
-- Assume user_profiles_email_key is the name of the unique constraint on email
INSERT INTO user_profiles (id, email, username)
VALUES (3, 'unique_email@example.com', 'user_c')
ON CONFLICT ON CONSTRAINT user_profiles_email_key DO UPDATE
SET
    username = EXCLUDED.username,
    last_updated = NOW()
WHERE user_profiles.email = EXCLUDED.email; -- Ensure update targets the correct row

This strategy ensures that if a row with unique_email@example.com already exists, its username and last_updated fields are updated.

2. Handling Multiple Unique Constraints

This is the most challenging scenario. There isn’t a single ON CONFLICT clause that can simultaneously arbitrate on multiple distinct unique constraints in one go if the conflicting values could violate any of them independently.

  • Sequential Upserts or Checks: Perform separate checks or upsert attempts if you need to guarantee uniqueness across multiple dimensions that aren’t part of a single composite unique key. This adds complexity and reduces atomicity.
  • Application-Level Locking: For critical sections where multiple unique constraints must be respected atomically in a complex way, consider explicit locking (e.g., SELECT ... FOR UPDATE on related data, or advisory locks), but this significantly impacts concurrency and complexity.
  • Schema Review: Re-evaluate if multiple unique constraints are truly necessary or if the entity can be modeled differently.
  • Ensure DO UPDATE Harmonizes: If an update occurs, ensure the SET clause does not inadvertently violate another unique constraint. For instance, if ON CONFLICT (id) DO UPDATE SET email = EXCLUDED.email, you are still relying on the database to throw an error if EXCLUDED.email violates the uniqueness of the email column. The ON CONFLICT clause for id won’t prevent this second error.

3. Careful Crafting of DO UPDATE WHERE Clauses

Ensure the WHERE clause in DO UPDATE SET ... WHERE condition correctly identifies the row to update and doesn’t lead to scenarios where an update is skipped, causing a subsequent insert attempt to fail on a newly appeared concurrent row. The WHERE clause applies to the existing row that caused the conflict.

1
2
3
4
5
INSERT INTO some_counters (name, value)
VALUES ('page_views', 1)
ON CONFLICT (name) DO UPDATE
SET value = some_counters.value + EXCLUDED.value
WHERE some_counters.is_active = true; -- Only update if active

If an inactive counter exists, this DO UPDATE will be skipped. If a concurrent transaction then inserts an active counter with the same name before this transaction completes its alternative path (which might be to re-try an insert or fail), an error could occur.

4. Application-Level Retries

For transient issues, especially in highly concurrent systems or when deadlocks (though ON CONFLICT reduces these) or other non-deterministic errors occur, implement a retry mechanism in your application logic. Use exponential backoff and a limited number of retries.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Conceptual C# example for retrying a database operation
public async Task UpsertUserDataAsync(UserData data, int maxRetries = 3)
{
    int attempts = 0;
    while (true)
    {
        try
        {
            // ... (construct and execute your INSERT ... ON CONFLICT command)
            // await ExecuteUpsertCommandAsync(data);
            return; // Success
        }
        catch (PostgresException ex) when (ex.SqlState == "23505") // Duplicate key
        {
            attempts++;
            if (attempts >= maxRetries) throw; // Rethrow after max retries

            // Log the attempt and wait before retrying
            // Ensure log messages and code lines stay within 80 chars.
            Console.WriteLine(
                $"Attempt {attempts} failed: {ex.Message}. Retrying..."
            );
            await Task.Delay(TimeSpan.FromSeconds(Math.Pow(2, attempts)));
        }
        // Catch other relevant exceptions if necessary
    }
}

Note: The C# code is illustrative. Ensure actual implementation details, like connection management and specific command execution, are handled correctly.

5. Index Maintenance

If index corruption is suspected (e.g., after system issues or if behavior is inexplicable), rebuild the relevant indexes using the REINDEX command:

1
2
3
4
5
-- Rebuild a specific unique index
REINDEX INDEX user_profiles_email_key;

-- Or rebuild all indexes on the table (requires exclusive lock)
REINDEX TABLE user_profiles;

For production systems, consider REINDEX CONCURRENTLY where possible to minimize locking, though it has its own caveats and cannot be run inside a transaction block.

6. Review and Test Triggers

Thoroughly test the interaction of your triggers with concurrent ON CONFLICT statements. Ensure triggers do not independently cause unique constraint violations that bypass the ON CONFLICT logic.

Diagnostic Techniques

To effectively troubleshoot these errors:

  1. Examine the Full Error Message: The DETAIL part of the PostgreSQL error message is crucial. It usually indicates the name of the violated constraint and the conflicting key value.
    1
    2
    
    ERROR:  duplicate key value violates unique constraint "user_profiles_email_key"
    DETAIL: Key (email)=(common@example.com) already exists.
    
  2. Verify Constraints and Indexes: List all unique constraints and associated indexes on the table. Use system catalogs like pg_constraint and views like pg_indexes.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    
    -- List unique constraints for a table
    SELECT conname AS constraint_name,
           pg_get_constraintdef(oid) AS constraint_definition
    FROM pg_constraint
    WHERE contype = 'u' -- 'u' for unique constraint
      AND conrelid = 'user_profiles'::regclass;
    
    -- List indexes (unique indexes often back unique constraints)
    SELECT indexname, indexdef
    FROM pg_indexes
    WHERE tablename = 'user_profiles';
    
    -- Check for invalid indexes (should return no rows)
    SELECT c.relname AS index_name
    FROM pg_index i
    JOIN pg_class c ON c.oid = i.indexrelid
    JOIN pg_class t ON t.oid = i.indrelid
    WHERE t.relname = 'user_profiles' AND NOT i.indisvalid;
    
  3. Simplify and Reproduce: Attempt to create a minimal, reproducible test case with simplified concurrent operations that reliably trigger the error. This helps isolate the exact interaction causing the problem.
  4. PostgreSQL Logging: Temporarily increase logging verbosity (log_statement = 'all' or log_min_duration_statement = 0 in postgresql.conf or per-session) to capture the exact SQL statements being executed by concurrent transactions. Caution: This can generate very large logs and impact performance.
  5. Monitor System Views: During periods when the error occurs, inspect pg_locks for lock contention and pg_stat_activity for query states, including waiting queries.
    1
    2
    3
    4
    5
    6
    7
    
    SELECT query, wait_event_type, wait_event, state
    FROM pg_stat_activity
    WHERE datname = 'your_database_name';
    
    SELECT locktype, relation::regclass, mode, granted, pid
    FROM pg_locks
    WHERE relation = 'user_profiles'::regclass;
    

Common Pitfalls to Avoid

  • Overlooking Secondary Unique Constraints: The most common pitfall is assuming ON CONFLICT (primary_key_column) will handle violations of all other unique constraints on the table. It won’t.
  • Ambiguous conflict_target: If column names in ON CONFLICT (column_list) could map to multiple unique indexes, PostgreSQL’s choice might not be what you expect. Naming the constraint (ON CONFLICT ON CONSTRAINT constraint_name) is more explicit.
  • Intra-Statement Conflicts: If a single INSERT statement attempts to insert multiple rows that would conflict with each other on the arbiter key(s) (e.g., INSERT INTO ... VALUES (1,'a'),(1,'b') ON CONFLICT (id) ...), PostgreSQL will raise a “cardinality violation” error. The ON CONFLICT clause cannot update the same row twice within the same statement.
  • Not Considering Trigger Side Effects: Forgetting that triggers can independently modify data leading to unique violations.

Conclusion

While PostgreSQL’s INSERT ... ON CONFLICT DO UPDATE statement is a robust tool for atomic UPSERT operations, it’s crucial to understand its scope, particularly concerning its interaction with multiple unique constraints under concurrent workloads. The ERROR: duplicate key value violates unique constraint often signals that a unique constraint other than the one specified as the arbiter has been violated due to a race condition with a concurrent transaction.

By carefully defining conflict targets, understanding the behavior with multiple constraints, judiciously using WHERE clauses in the DO UPDATE part, performing thorough diagnostics, and implementing application-level retries where appropriate, you can effectively troubleshoot and mitigate these concurrency-related errors, leading to more resilient and reliable PostgreSQL applications.