Wednesday, March 31, 2010

And I shall call it "Foreign Key Cascade Delete"

I know it sounds kind of stupid trying to implement cascade delete in T-SQL, when all you have to do is to activate the "cascade delete" option on each foreign key within the association graph. Similarly stupid as implementing referential integrity by hand.

But imagine a case where you generally don't want cascade delete in normal production mode (because no user should be able to delete a parent row which has any child rows still referencing it). And unfortunately there are those rare occasions, when just for the moment, for one single parent row, cascade delete would make sense again, because you really have to get rid of it, only exceptionally.

Of course one could simply activate cascade deletes for this single transaction. But I don't like to change database schemas in production.

The guys at sqlteam.com came up with a different solution: A stored procedure, invoked recursively, traversing over all foreign key relations and deleting child rows bottom-up.

Sounded nice, but did not work in our case. The approach was based on nested subqueries, and from about 8 nested queries on, SqlServer refused to even calculate a query execution plan (I do have sympathy for SqlServer here, I wouldn't enjoy building an execution plan on queries like this either. And even with a plan, the performance would most likely be extremely poor). Plus the script could not handle foreign key relation cycles (I mean cycles on a table-basis, not on a row-basis; the latter can't be solved anyway), a fact that caused a stored procedure "stack overflow" (the limit is 32 recursive calls) on our database.

I then started to adapt the sqlteam.com script in order to make it work, and was about halfway done, when I noticed that commentator GreySky had already provided a solution which was following the same approach I had planned for. He solved the execution plan / performance issue by storing parent primary key values in a temptable, together with information about the current call level, so that the delete criteria would be ONE simple IN-subselect instead of many nested subselects.

So GreySky's script worked fine, with the exception that it still did not handle cycles in foreign key relations. But at this point, with a temptable holding all parent primary keys at hand, this was a simple task. I only had to add an extra column storing the table name that the primary keys belonged to. Thus it was possible to extend the statement's where-criteria to "visit" only those child rows, that had not been marked for deletion before. (Nearly) like Dijkstra's algorithm, you know ;-) Works like a charm!