Thank you Wayne for hosting this month’s T-SQL Tuesday! Wayne shared last year a story where he had to reverse engineer a complex hierarchy pattern within a table. The developers appeared to do this to obfuscate how the software operated. This left him asking, “What were you thinking?” So this month, we are discussing stories that left you asking that same question.
During my introduction to a new company, we were running through the processes, duties, systems, and everything else you would expect when starting a new job. Overall, it was a basic induction into a new company.
They were describing to me one of their intake processes. Every week, we would receive about 5,000 records. The process basically says, bring all the records from our supplier that are younger than 7 days. This ran twice a day for 7 days.
It was running this way because once the system went live, they realized there were tons of deadlocks. This process would run for 3-5 hours for each execution, depending on the given day. The idea is that we are processing the same record 14 times, so surely that record would eventually update even though we were experiencing so many deadlocks.
To me, this was the definition of risky business. You couldn’t be sure the correct records were consistently updated. The good news is that records received were always distinct and the most current version. That means if I asked for the last 7 days of records, I wouldn’t have more than 1 row per record. This meant that ranking or ordering results to update by was not a cause for concern.
If you were like me, your first reaction would have been shock and awe. You may have even thrown a few expletives in the mix. How could a business allow this kind of data ingestion for 15+ years when it was a critical pipeline for their data?
Is this a technical problem? We can solve the problem with technology, but the solutions existed before the problem did. So why did this happen and what should be the course of action to resolve it?
The first point I want to emphasize is blame and negativity. The topic at hand can lead to a negative approach, but it can also lead to a productive outcome. I personally think environments of blame, pointing fingers, and overall negativity are not great places to work. It is fine to identify the problem, understand how it occurred, and then begin putting processes or technology in place to keep it from happening again.
Saying the system is inefficient and broken because X person is dumb or some jerk designed something poorly is not constructive, it burns bridges, and it creates conflict. You are no longer a team that is designed to work together. You have set yourself up an environment which means backstabbing, flying under the radar, and shifting blame are now rewarded actions. Thankfully, the business having problems with deadlocks was not an environment of blame. They were able to look past the “whodunit and let’s punish them” and look towards “this is our current state, how do we reasonably achieve the ideal future state?”
The last point I want to make is that everyone who worked on these systems were tenured experts of the systems. They had been working there since the dawn of the business. Turnover and hiring were not common events. This means you have people who have a limited view of how other companies or industries solve challenges. People were too close to the problem to see around it and may have never seen alternative solutions.
There is a book called “The Alliance”, it talks about having individuals partake in tours of duty. It also discusses the relationship between employers and employees as more of a partnership. The idea around tours of duty is that as people become experts in a silo of the business, they can switch to a new role or project. This keeps people employed longer and builds their skillsets. The business also benefits because they are internally hiring good workers with a diverse experience background, this brings solutions that were not introduced previously.
It’s important that if you want to expand on your career, you will need to have a variety of experiences. If my experience was less varied, I doubt I would have seen the solution to the deadlocks as quickly.
This is important because there are many people who believe in loyalty to the company until the day they die. Likewise, there are people who start a new job every 3 months. I think the answer lies somewhere in the middle and no one can say what the best method is for each person. We all have our own factors like bills, families, geological restrictions, and more. There are so many things that can dictate how long we work at a company and why we work in a particular job.
If you are not constantly learning, obtaining varied experiences, and your employer won’t help you with that education or experience, then they are performing a disservice to you and themselves. Again, there are other factors here, so weigh them carefully.
Likewise, if you are an employer and you see someone has worked at 4-6 places in the last 10 years, don’t throw their resume away. If the potential employee has a record of not jumping ship before delivering value, I would consider them. If a person at the company is looking to expand their skillsets, I would also look to see where else they could add value and benefit themselves before bidding them adieu.
In the end, I would say the true solution to the problem was bringing in new people with new ideas. What was the technical solution though? I’ve created two diagrams, one depicting the current state when deadlocks occurred and a future state that proposed and incorporated the fix.
Current State
Future State
We were unable to alter the application, and this meant the trigger and stored procedure had to stay. What we did was create a new stored procedure and a staging table. The procedure that was referenced in the application was essentially repurposed. We gutted it to become a straight insert into a staging table.
The SQL Agent was modified to add one more step. It would execute the new stored procedure that essentially took the old stored procedure logic of inserting or updating the records in the primary table. The procedure would be executed once after the staging table was filled, and it fired the trigger once.
After the change, we still had a similar number of transactions happening. The key fix was the trigger was now only executing one time. This meant we were no longer receiving table contention and deadlocks. The lack of deadlocks increased our performance significantly and heavily matured the quality of the data.
All it took was another set of eyes with a different bundle of experiences. In a similar vein, I’d recommend jumping over to Eugene’s blog and read about his last T-SQL Tuesday post about diversity. He dives significantly deeper and provides an important discussion in today’s work landscape.