Despatches from the Codeface: Don't rely on identity column IDs in SQL Server


[Part of an infrequent and sporadic series of blog posts - click here for the background]

Be very wary of exposing SQL Server auto-incremented identity column values to client software. On "Project X" we're relying on an identity column to generate Order IDs. These IDs are used as the table's primary key, and for linking to other tables - which is fine. But we also expose these IDs via the website user interface, not only via the query string, but also when generating Order IDs that get attached to orders and displayed prominently on the site. We also rely on identity values for User IDs, which are again leaked to the website's UI.

This is a problem, because SQL Server doesn't guarantee the sequential ordering of these values [1]. In some situations, SQL Server pre-allocates batches of 1,000 identity values. If the server is rebooted before the batch is fully used up, the old batch is discarded and a new batch is allocated. So Order IDs can jump from 252 to 1,253 overnight (and have done so), despite only a single order being created. Identity values also get "used up" when data is inserted into the table, and the transaction is subsequently rolled back.

To the best of my knowledge this only affects SQL Server 2012+. It's not consistent across all our sites and servers. There is info online that suggests it only affects tables with less than 1,000 rows, so it may not be a long-term issue for a rapidly growing database. It can also be avoided by starting SQL Server with certain parameters (specifically, "-T272"), but this won't solve the "rolled back transaction" situation.

Regardless, we should avoid relying on identity values for meaningful information, and certainly shouldn't be surfacing them to user interfaces (at least for use cases where gaps in IDs would be problematic). They should ideally be used only for providing primary key columns, and linking relationships between tables. When we need to surface an ID to the front-end, we should be generating an ID programatically, i.e. adding a separate public ID column which we increment manually.

[1] "...SQL Server might cache identity values for performance reasons and some of the assigned values can be lost during a database failure or server restart. This can result in gaps in the identity value upon insert..."