Steve Jones has asked for some folks in the community to talk about what it’s like to work in our field across four posts. So how do I talk about my job across four job posts? If you didn’t read part 1, you can click here to read about what the day to day is like.
My four series post is structured as:
- Day to day, what is normal? What are 95% of my experiences?
- Best days ever, these are the highlights.
- Worst days ever, these don’t happen all of the time and this is what nightmares are made out of.
- Weird days, Brent Ozar said it best when he said Database Administration is weird.
The worst days ever for a DBA.
Some of the worst days you can have are when they are spent on firefighting. Your projects don’t get worked on so deadlines approach faster and you aren’t having the chance to work on being proactive because you are actively fixing the problem. What really makes those days bad though, is if policy and procedure are what’s allowing this to happen.
Disaster recovery practice, proper high availability set up, and properly spent resources are what keeps you afloat when you spring a leak. I often see IT in companies where they are more of an afterthought and a sunk cost instead of a valuable resource for the company. There are varying degrees of this that you will encounter, and those are areas that you’ll likely need to change their minds.
DBA’s should have a solid practice environment for disaster recovery. If you aren’t prepared and something big hits, nothing is worse than trying to google a solution to a problem that is bleeding the company actively. Being prepared means you know your backups have been tested, you know how to quickly recover to a point in time within minutes when necessary. The drives are already mapped and anticipated, you know where the files need to be restored to. (DBATools is a fantastic helper with disaster recovery and practicing by the way!)
Likewise, your availability should be thoroughly vetted and tested before deploying to production. Ideally, you’ll have several lower environments configured identically. When you patch or push new changes, you’ll do so through the lower environments first and let them “bake in”. After a testing period and proper QA, you can push the changes up the chain until they eventually land in production. Proper monitoring tools to check the health of the equipment and the software should be instantiated to alert you before things begin to go wrong.
Along with the hardware, you need to make sure you’re adding resources when necessary. The most common thing here is disk space, just because our data will grow and grow. This is where a lot of employers have trouble allocating more money, they want to have those historical highlights but they don’t want to spend any more on holding data because your MDF keeps growing. I find this is where you’re both going to have to compromise. I recommend looking into some new technology or practices that would allow you to keep as much historical data on slower and cheaper disk or possibly archive it off with the ability to restore it later if necessary. You’ll need to help them weigh the pros/cons and costs involved with the time and resources spent with the options. (Of course, other tools that help with compression and backups may be another idea!)
Assuming you are taking care in those three areas, you won’t have many bad days. I try to be the quiet DBA, not the hero DBA. The hero DBA is the one who is frequently running in to save the ship that’s on fire. This is great, but if your DBA isn’t doing this frequently, this means they are working proactively on the situation to keep things stable.
Not all disaster comes in the form of a crazy outage. Sometimes someone just runs something without a WHERE clause predicate. (DELETE FROM TABLE… whoops.)
I’m proud to say that I’ve never caused a show-stopping outage or problem that we couldn’t recover or fix quickly. Everyone’s going to have a moment or two like that though, in IT, that’s just the way it goes. Even if you are perfect, there’s someone out there who isn’t going to be. So if you are properly prepared, this isn’t too bad. If you aren’t properly prepared, you’ll be sweating bullets and get that feeling deep down in the pit of your stomach where it feels like you got kicked real hard. As you’re working the issue through, you’ll feel like your hair is greying in front of your eyes from the stress. So when you can resolve those problems in minutes confidently, that’s going to save you a ton of pain and hair loss. (Your employer will appreciate how quickly you resolved the issue too!)
These kinds of issues or days are pretty rare. I’ve seen code and hardware failures, worked weekends/nights, missed events and outings with families and friends. That’s just part of the job sometimes, but this is definitely not commonplace. (If it is, you may want to do everything you can to either improve the policy and turn that ship around or find a new ship.) It is something you can and have to expect though, which is why the policy should drive best practices and proper procedure so this can be dealt with quickly and with minimal to no impact to the company.
And just for fun, the three worst times of my IT career:
- 36-hour shift rebuilding Microsoft Exchange servers with Microsoft on the phone.
- Two 100 hour work weeks to fix every computer on the floor. (It was nice that we had showers at work and blankets/pillows.)
- That time a DELETE statement was run but I was able to fix the fallout after a stressful week of identifying corrupted backup processes and piecing it together from historical tables.