Measure Twice, Cut Once…
I still remember learning early on my career, the mantra of “Measure Twice, Cut Once.” It’s a important lesson that I’ve had to be re-taught periodically over my career, sometimes in a rather stressful or painful manner. But in the unforgiving realm of code, that is quite literal if you think about it, accuracy is paramount. We should always thoroughly test our code before it gets to Production.
No One Develops in Production, Right? RIGHT?
On the hand, I’ve also had more than a handful of occasions where Production is broken in a BAD WAY, and you have to FIX IT NOW. I remember when I worked in the financial commercial credit card industry, where our post-deployment QA smoke testing would take anywhere from 8-12 hours, and that’s AFTER a minimum 4 hour deployment. We rolled out a big release on Friday night, QA started their work in the wee hours of the morning Saturday and we found a nasty breaking bug that only manifested with Production data in play. Our app devs were trying to figure out how they could fix the code, but even the effort of committing, integrating, and cutting a new build was a multi-hour affair. In the meantime, we database developers were trying to figure out if there was a viable SQL Server/stored procedure workaround that could be implemented to allow us to not have to rollback everything.
I was the db dev lead on this particular release and my gut told me that there had to be a workaround – I just had to iron it out. I requested 90 minutes of “do not bother me NO MATTER WHAT,” focused, and 90 minutes (and one interruption) later, I had successfully coded a workaround fix. But I also committed a cardinal sin – I developed in Production.
Was that Production Ready? Well, QA did spend a few hours testing it and validated it. But was it subject to the full battery of integration and performance tests that our application would typically go through? No… but we had little choice and in this case, it worked. Funny thing is that workaround fix, like many thing things that were never meant to be permanent, remained permanent.
Is There a Lesson to be Learned Here?
If I had to share one key takeaway, is that I believe it is critical that every business have a true “Prod + 1” environment. This means having a second FULL COPY of Production that’s refreshed on a regular basis, into which “next builds” are installed for testing. Unfortunately, this is not an easy or inexpensive task. Fortunately, there’s many more solutions available these days (like a certain orange organization I happen to know) that make it far more feasible too.
Thanks for reading.