A few jobs back, I worked for a software firm. Our system had a complex system of “accounts” which were tied to client companies and users. And our users could have multiple different accounts through which data would get routed to “make widgets.” Without going into details, these accounts were simply like an individual’s savings account – they were fairly complex entities which were simply labelled as “accounts.”
My team had been wanting to clean up obsolete account data stored in our account manager database for a terribly long time. Aside from just raw storage, having obsolete accounts lingering in our system had numerous other consequences which made reporting and other things terribly irritating. But our business was always extremely afraid to ever delete anything – so we had account data for everyone we ever had, etc.
As a “pilot,” they agreed to let us finally delete several hundred internal-only accounts. These accounts were obsolete, only used for demo, and were not associated to any client companies, just our company. As the point person for making this change, I put together a complex T-SQL script to manually remove all of these accounts from the various tables of our database system. But what I also did was put together a backout/restore T-SQL script that restored all of those accounts. I did this out of paranoia and lessons learned from prior experiences. This way one could run my delete script & my restore script repeatedly. The QA team tested my scripts for 1-2 weeks and we were finally given the go-ahead to clean out the stale records.
Monday morning came along and as scheduled, our Production DBAs executed my script before start of business. 30 minutes later, the frantic calls started to reach us. Seems some of our clients could no longer “make widgets!” Accounts that they needed to route data were gone! My manager and I looked at one another in horror – we were only deleting internal accounts!!! We didn’t hesitate and immediately had our Prod DBAs back out the change with my backout script, before the rest of the United Stated started business. The backout was executed immediately and all was back to normal, but business folks were pissed and wanted to know what happened.
As we dug through things, we came to discover that some of our implementations folks had jury-rigged some of their implementations. To meet customer requests for unsupported features, they utilized our company’s internal demo accounts! WTF!!! We in development were furious. Their use of the internal accounts was not documented anywhere, but the data trail didn’t lie. And these clients weren’t in QA, which is why they never picked it up either!
Moral of the story? Have a robust restore plan. Sure, our systems were fully logged and we could have executed a point-in-time restore. But for a scenario like this, having an immutable data restore script, that was fully tested and ready to be executed, was far faster. It allowed our business to get back online almost instantaneously & saved my bacon.