What is the average day of a DBA? (Part 4 of 4)

Steve Jones has asked for some folks in the community to talk about what it’s like to work in our field across four posts. So how do I talk about my job across four job posts? If you didn’t read part 1, you can click here to read about what the day to day is like.

My four series post is structured as:

  1. Day to day, what is normal? What are 95% of my experiences?
  2. Best days ever, these are the highlights.
  3. Worst days ever, these don’t happen all of the time and this is what nightmares are made out of.
  4. Weird days, Brent Ozar said it best when he said Database Administration is weird.

The weird days of being a DBA.

You can run into a lot of weird things being a DBA, but I don’t think the topics I’m going to discuss are what you are expecting. Most of the weird things I run into are design structures or processes in place that go against best practice or are not typically practical. Kind of like if the past admin gave out Neapolitan ice cream instead of vanilla. Everything is custom, unique, and different, whereas I try to instantiate things as “vanilla” as possible. If things are predictable and implemented consistently, it’s easier for anyone to come in and troubleshoot or support.

Well, what kind of weird things do I see then? Well, you have self-made issues, you have known and unknown bugs, and sometimes you just have weird behavior due to … computer gremlins.

Here are some examples of impractical implementations I’ve worked with: Primary keys that are clustered indexes based on 32 bytes (guid + datetime2 + datetime2), procedures meant to implement high concurrency DML when interacting with a table without handling isolation levels, and more than five duplicate indexes on the same table (minus the name of the index of course…).

Those scenarios are not best practice and it really goes against practical design. So you need to look at the overall picture and decide what the end goal and intentions are. Think about why and why not some of the features were implemented or missed. Whenever I go against best practice design, explanations go immediately into the documentation and comments go into the code. Sometimes it’s important or necessary to stray from vanilla to get some strawberry ice cream, but make sure your successors and current employers know/understand why you deviated.

Sometimes, things just DON’T make sense. I saw a weird issue where you cannot expand a drop-down box in SSRS when you open it. This took me a few minutes to figure it out. But what was the issue? It was a service pack bug. A quick cumulative update fixed that right up. This goes into what I talked about in my last session, the worst days ever. If you are propagating your changes up through multiple environments, you should catch issues early on and you don’t have to scratch your head wondering why you suddenly had new changes.

Other times it could be a bug that has no fix or report yet, but those are even rarer. And since they are rare, you can imagine how frustrating it can be trying to fix things that aren’t documented or experienced by many people. Microsoft has employees who help on StackOverflow who can help you identify bugs, so that may be a good resource if you get stuck banging your head against the wall.

Most issues you run into are probably user error or bugs. Sometimes though, things just don’t work and there may not be an explanation you can find for it. Not that there isn’t an explanation, just that you need either the proper tools, access, or knowledge prior to the problem occurring. If it’s a one-time event and it doesn’t happen again, good luck figuring that out without having the problem monitoring in place. Likewise, if something is going on outside of your control (network, sysadmin, application, etc), you’re going to have a hard time figuring out why things are behaving why they are. (Noisy neighbor syndrome in a VM is a common example. Where one box on the host eats the resources and the other VM has to shift resources back to the ESX Host to help that neighbor.) I like to call those problems, computer gremlins.

The best way to combat computer gremlins is with mentors and knowledgeable co-workers. It’s hard to know what you don’t know. When I started learning SQL, it was difficult to learn because I didn’t know what kind of questions to ask. Mentors can help guide that path and get you where you need to be so when a gremlin inevitably pops up, you’ve thought about and then implemented monitoring that can assist in that scenario or you know someone who can help.

The three weirdest IT experiences I’ve had:

  1. A ticket that literally stated: “Desktop background is showing a clown spitting plates and there is a typo in the background. This is affecting all agents and they would like the “scary” picture changed or removed.”
  2.  That time I had to drive to another site to help teach users how to use the CTRL+C and CTRL+V hotkey functions. 
  3. Troubleshooting split brain syndrome in an AG setup.

 

What is the average day of a DBA? (Part 3 of 4)

Steve Jones has asked for some folks in the community to talk about what it’s like to work in our field across four posts. So how do I talk about my job across four job posts? If you didn’t read part 1, you can click here to read about what the day to day is like.

My four series post is structured as:

  1. Day to day, what is normal? What are 95% of my experiences?
  2. Best days ever, these are the highlights.
  3. Worst days ever, these don’t happen all of the time and this is what nightmares are made out of.
  4. Weird days, Brent Ozar said it best when he said Database Administration is weird.

The worst days ever for a DBA.

Some of the worst days you can have are when they are spent on firefighting. Your projects don’t get worked on so deadlines approach faster and you aren’t having the chance to work on being proactive because you are actively fixing the problem. What really makes those days bad though, is if policy and procedure are what’s allowing this to happen.

Disaster recovery practice, proper high availability set up, and properly spent resources are what keeps you afloat when you spring a leak. I often see IT in companies where they are more of an afterthought and a sunk cost instead of a valuable resource for the company. There are varying degrees of this that you will encounter, and those are areas that you’ll likely need to change their minds.

DBA’s should have a solid practice environment for disaster recovery. If you aren’t prepared and something big hits, nothing is worse than trying to google a solution to a problem that is bleeding the company actively. Being prepared means you know your backups have been tested, you know how to quickly recover to a point in time within minutes when necessary. The drives are already mapped and anticipated, you know where the files need to be restored to. (DBATools is a fantastic helper with disaster recovery and practicing by the way!)

Likewise, your availability should be thoroughly vetted and tested before deploying to production. Ideally, you’ll have several lower environments configured identically. When you patch or push new changes, you’ll do so through the lower environments first and let them “bake in”. After a testing period and proper QA, you can push the changes up the chain until they eventually land in production. Proper monitoring tools to check the health of the equipment and the software should be instantiated to alert you before things begin to go wrong.

Along with the hardware, you need to make sure you’re adding resources when necessary. The most common thing here is disk space, just because our data will grow and grow. This is where a lot of employers have trouble allocating more money, they want to have those historical highlights but they don’t want to spend any more on holding data because your MDF keeps growing. I find this is where you’re both going to have to compromise. I recommend looking into some new technology or practices that would allow you to keep as much historical data on slower and cheaper disk or possibly archive it off with the ability to restore it later if necessary. You’ll need to help them weigh the pros/cons and costs involved with the time and resources spent with the options. (Of course, other tools that help with compression and backups may be another idea!)

Assuming you are taking care in those three areas, you won’t have many bad days. I try to be the quiet DBA, not the hero DBA. The hero DBA is the one who is frequently running in to save the ship that’s on fire. This is great, but if your DBA isn’t doing this frequently, this means they are working proactively on the situation to keep things stable.

Not all disaster comes in the form of a crazy outage. Sometimes someone just runs something without a WHERE clause predicate. (DELETE FROM TABLE… whoops.)

I’m proud to say that I’ve never caused a show-stopping outage or problem that we couldn’t recover or fix quickly. Everyone’s going to have a moment or two like that though, in IT, that’s just the way it goes. Even if you are perfect, there’s someone out there who isn’t going to be. So if you are properly prepared, this isn’t too bad. If you aren’t properly prepared, you’ll be sweating bullets and get that feeling deep down in the pit of your stomach where it feels like you got kicked real hard. As you’re working the issue through, you’ll feel like your hair is greying in front of your eyes from the stress. So when you can resolve those problems in minutes confidently, that’s going to save you a ton of pain and hair loss. (Your employer will appreciate how quickly you resolved the issue too!)

These kinds of issues or days are pretty rare. I’ve seen code and hardware failures, worked weekends/nights, missed events and outings with families and friends. That’s just part of the job sometimes, but this is definitely not commonplace. (If it is, you may want to do everything you can to either improve the policy and turn that ship around or find a new ship.) It is something you can and have to expect though, which is why the policy should drive best practices and proper procedure so this can be dealt with quickly and with minimal to no impact to the company.

And just for fun, the three worst times of my IT career:

  1. 36-hour shift rebuilding Microsoft Exchange servers with Microsoft on the phone.
  2. Two 100 hour work weeks to fix every computer on the floor. (It was nice that we had showers at work and blankets/pillows.)
  3. That time a DELETE statement was run but I was able to fix the fallout after a stressful week of identifying corrupted backup processes and piecing it together from historical tables.

What is the average day of a DBA? (Part 2 of 4)

Steve Jones has asked for some folks in the community to talk about what it’s like to work in our field across four posts. So how do I talk about my job across four job posts? If you didn’t read part 1, you can click here to read about what the day to day is like.

My four series post is structured as:

  1. Day to day, what is normal? What are 95% of my experiences?
  2. Best days ever, these are the highlights.
  3. Worst days ever, these don’t happen all of the time and this is what nightmares are made out of.
  4. Weird days, Brent Ozar said it best when he said Database Administration is weird.

The best days ever for a DBA.

I should preface that you should have some passion or interest in technology if you are getting into IT. You can find some jobs where the duties and technology will never change. It will be the same forever, typically those are places that move slowly and they don’t need to innovate. These sites just need to keep their service afloat. Those are not the most common for IT, but those places do exist.

I wanted to bring that up because you are probably going to have to keep learning. Would you make your primary doctor someone who graduated 40 years ago and hasn’t learned anything new since then? Why would you hire a DBA then that hasn’t learned anything new or progressed either?

Some employers grant time to help their people grow during business hours, but this is not consistent. So when you find an employer that does allow this, consider this a perk. If you don’t have that perk, you may consider using a percentage of your own time to study and learn. (You may consider this even if you do have that perk!)

Most days are average and you do roughly the same rotation of tasks over a period of time. If you enjoy that type of work, then every day is going to be a pretty good day. It can get dull, but that’s why you have better than average days and also worse than average days.

The best days for me are when: I get to learn, I receive that “aha!” moment, or I get to interface with other knowledgeable professionals.

As I prefaced, not every employer grants you dedicated time to learn. So when an employer does allocate a few hours a month for me to learn or create something to learn that will better the company, those days are awesome.

The most recent and interesting project was to figure out a beneficial way to incorporate Change Tracking. One of my favorite past projects was to learn how Transactional Replication works and create a new database as a service offering from that. Most of these projects are given with light direction. I’m given the end goal and I need to get us from A to B in the most reliable and practical way possible. If that means I need to do some research into various technologies / methods, then I may do so. Getting to really dig in with research and creative approaches is an incredibly satisfying and fun experience. To be able to architect a solution from start to finish is a huge task, but it’s one of those things I love to do.

A lot of projects, including research projects or proof of concepts, usually provide that “aha!” moment. It’s that point when you solve that puzzle that you have banged your head on the wall for days. Suddenly everything seems obvious, that tedious problem that’s given you some anxiety instantly melts. You feel dumb that you didn’t see the path to get here sooner, but you also feel emboldened by the fact that YOU solved that problem.

Those moments are typically the kinds of things I’ll list on my resume. Such as when I realized that some slight adjustments to our Data Warehouse brought our load times from 16 hours to 4 hours or when we reduced 10,000 daily deadlocks down to 0.

I’m not sure how most people get those moments, but I tend to have them when I’m doing something completely different. I’ll do all of my research and testing, but usually the solution doesn’t come immediately during those periods of time.

You have questions you can google:

How do I SUM and GROUP BY?

Then you have questions you can’t google:

What would be the impact of doing a SUM and GROUP BY on my production system?

It’s those questions where you need to apply research and practice to create your own dots to connect. Sort of like putting together a map and at the end of the map you’ll find the treasure.

It’s when I’m running, going to bed, showering, or biking to work where out of seemingly nowhere… an idea strikes! You’ll have times where you are too close to the problem to see around it and find the solution. Sometimes you just need to take a step back and think about things, sometimes those things will piece themselves together for you.

I’m also the kind of person who obsesses over problems. If I can’t figure something out at work, it’s going to linger in the back of my mind until I know the answer. Or if I have an interview or a colleague asks me a question I don’t know the answer to, I can’t rest until I know that answer.

Other signs of a good day is when you are working with people who know more than you do. When I worked at Solarwinds, there was no end of tenured DBA’s who were knowledgeable. Every time we chatted about things, I felt like I learned 20 different topics. My time there was definitely an exponential burst of learning, but this goes back into what I mentioned earlier. You should have some passion for technology, because it’s that curiosity to learn and grow that is going to help you build your career.

To end on the “best day ever” piece, here were a few great and memorable (but not typical) moments I’ve experienced in IT:

  1.      When we created a Christmas tree out of Windows Server 2000 cd’s.
  2.      Flying to San Jose, CA to give a presentation and work the booth at SQL Saturday.
  3.      Driving with the team down to a SQL Server 2012 announcement and features party.

 

What is the average day of a DBA? (Part 1 of 4)

Steve Jones has asked for some folks in the community to talk about what it’s like to work in our field across four posts. So how do I talk about what it’s like in four different ways?

I figured I’ll structure my four post series as:

1. Day to day, what is normal? What are 95% of my experiences?
2. Best days ever, these are the highlights.
3. Worst days ever, these don’t happen all of the time and this is what nightmares are made out of.
4. Weird days, Brent Ozar said it best when he said Database Administration is weird.

Today is the “day to day” aspect of my job! And how appropriate when returning back from a quick break over this holiday Wednesday to a normal Thursday.

The day to day as a DBA

I like to roll into the office a few minutes early, I gather my caffeine, water, and situate my desk. I’ll go through my inbox or chat system right afterwards. I want to see if I have any alerts, read through reports that give me key metrics on performances, and see if there are requests from people who may need assistance.

I’ll log into our monitoring systems afterwards to check things out but it typically is pretty sparse. I configured alerts so I don’t have to be the watcher, but it’s nice to get a view of what’s going on anyway, it’s almost like a window into what things my database is doing.

At this point, I’ll bring up three lists. I keep a spreadsheet that tracks my time worked, a notepad of the tasks I am completing and when I completed them, and a notepad of tasks I need to do. That “need to do list” is broken down by nice to have’s, important things, or even just daily reminders. I like to refresh this list in the morning after coming in and in the evening before leaving the office.

I’ve scripted and automated a lot of my job, the pieces I have not scripted I focus on automating. Things should alert me at thresholds indicating there could be a problem in the future and things should alert me when there are problems. The idea though is to be proactive and not reactive and to always be on top of what’s going on. Every time something happens, I wonder to myself “How can I prevent this / alert myself on this / automate this task?”

From there I get a ton of adhoc requests throughout the day. Once a week or a few times a week, I may get assigned a relatively major project. Projects have that hurry up and wait feel, you’ll sprint really hard to get things done. Once you are finished, you’ll wait days, weeks, or sometimes even years before more pieces come into play and you have to pick up where you left off.

When I’m not fighting the backlog of items to do, working on a project, or an adhoc request; I’m working on figuring out where we are not performing best practices. Are we tuning indexes regularly and analyzing them? Are we performing random backup checks? Are all backups succeeding? What are our major queries that need tuning? The important thing to take away from this time is that there is no direction here. No one is asking me to go do these things or figure out if we should be doing them. These are practices and skills you pick up the longer you work in this field. You should always be proactively assessing your work situation to improve things for your employer.

Some other things I do is I read. I read a minimum of three blog posts a day which takes about ~10 minutes altogether. If I can learn three new things every day, that is a great day. I’ve worked in database for over seven years, which means I have learned at least 7,665 different things. It’s probably a lot higher than that, but the point is to illustrate that over time that cumulative knowledge adds up.

When I take my lunch or breaks, I’ll look at Reddit or StackOverflow and try to help people with questions. This gives me an even more broad view of the world and problems I would have never even thought about AND it gives you the ability to think critically about solving those problems while seeing how others go about the same process.

I have a few meetings a week, I try to keep meetings short and to the point. If I don’t have any actionable items from meetings or people don’t need to ask me questions in meetings, then I find they are not worth going to unless the knowledge to be obtained from the meeting is helpful. These meetings typically range from change request meetings, team scrum meetings, or project update meetings.

In the event I find some “spare time”, I’ll go back to the point I made earlier of always trying to improve things for your employer. I’ll work on proof of concepts and their implementations, usually learning along the way to solve problems that the employer maybe hasn’t considered or hasn’t had the resources to resolve previously.

After I get home, I’ll try to write / read / watch / practice / study SQL for 30 minutes to an hour. Over the course of a week, I’ll usually get 2-5 hours of self education in and I log that self-education in my spreadsheet so I can track how and what I’m learning.

That’s what my typical day is, I’ve mostly aggregated my experiences across my four jobs and this is what it feels like most of the time to me. Every job is different, but at its core for DBA, you know what a DBA is and does and that work has a ton of similarities. Some employers may have something weird going on, another may do things a little differently, perhaps there’s a totally different technology stack! At the end of the day though, I can roll those jobs up into similar core functions. If I had to give you a high overview of what it looks like day to day though, it would be:

1. Being proactive to problems.
2. Being reliable and responsive to users.
3. Learning and growing your skill set.
4. Using your education and skill set to better the environment for your employer. (Because how can you do something better if you haven’t learned how to do it better yet? That’s why this is #4 instead of #3.)

Edit 2018-07-06:

I wanted to fill in a few points that were made by Steve in the comments and I wanted to touch on two other topics. One topic I mentioned briefly was stating that I found the core work to be the same from shop to shop. This is primarily because I focus on job listings where they need someone who can do everything well, but they do not need to be an industry expert in any given task to succeed. The position does not yield itself to requiring a niche skill but it needs more jack of all trades skills with databases. (However, I would say my personal niche and expertise lies in performance tuning and making things go fast.) I will post the last two descriptions of my job that were sent to me when I applied, so you can see how the job looked on the outside before getting the insider info on the day to day.

Senior SQL General Specialist

Senior Data Engineer

The other topic I completely forgot to mention was “documentation“. It feels like most people never create documentation and even if they do, no one will ever read it. I personally believe creating: (1) documentation, (2) training, and (3) knowledge hand offs; are necessary to complete a task or project. It’s part of the whole “senior data professional package”. I’ll come in and fix your problems and document it so you can have a sustainable and scalable process that can be supported by future employees. If you wanted to see what I would suggest for a bare minimum for documentation, you can see my blog post “Skill Showcase” where I created a quick project and documented it.

Onto comments!

When do you check alerts?

Ideally when I create alerts, I work hard on adjusting them so you don’t hear noise. Too many alerts when things aren’t actionable is not a good idea, it becomes easy to overlook real incidents. Likewise, if you don’t get an alert when things have gone south, that’s equally terrible. It’s forever a battle with tuning the alert thresholds to trigger exactly when necessary with as few notifications as needed.

When I set those thresholds on a new alert, I like to be liberal with the wiggle room. I prefer to get more alerts than necessary to start and dial it back until I know it’s definitively alerting every time there is a problem but it’s not firing so often that I stop checking the alerts. Some alerts are pretty simple, disk space is one where you can easily set a threshold based on company policy and performance expectations. Other alerts like blocking, locking, long running queries, etc., are more difficult to narrow down.

Over time as you work in the environment and gain business knowledge of how things function, you’ll understand more about how everything links and interacts. Every environment is relative, but the philosophy behind how you approach the problems is typically static or expanded with new technology. Your company should have guidelines and expectations for what defines a successful and healthy database. If they don’t, then you could help them write it! Setting the expectations based on the business needs to function and grow is not trivial and it’s a task that should involve most, if not all department heads in the company who have a stake in the database or the data. (That doesn’t mean that they should all dictate what the policy is, but they should voice their needs and concerns so the CEO / CIO / IT Director can make educated decisions.)

What do you look for when checking alerts? Or what’s an example of things to ignore or take action on?

With that baseline of knowing what is and is not acceptable, you can create procedures and processes that drive success in the business. Policy should be driving the changes you make and the goals you work towards, which brings me back full circle to the philosophy standpoint that is inherent in most jobs. Establish a baseline, measure where you are, measure again later to find out where you are afterwards. How you measure, when you measure, and how you establish a baseline are variables that will change and will depend greatly on the circumstance.

(Baseline)(Current Performance) = (Success Results, determines if actionable items are necessary.)

(Current Performance) (Past Performance) = (Progress, determines actionable items to increase performance.)

Using the formulas above, let’s look at two scenarios.

Scenario 1: You want to tune an alert. It should not create so much noise that we ignore the alert but it needs to alert when there is a problem.

(Baseline [100% notification when an error occurs, no more than 5 noise messages during the day])(Current Performance [We have missed 1/10 alerts and we receive no noise emails]) = (Success Results [Failed response to the 1/10 alerts. Requires action.])

(Current Performance [9/10 alerts alerted, no noise emails])(Past Performance [10/10 alerts alerted, but 100+ extra email notifications]) = (Progress [We are down noise but up missed alerts, go back and tune or create a new solution to limit noise.])

Scenario 2: You track run time in your environment and users are reporting the application has been functioning slowly lately when searching for a value.

(Baseline [Query runs for 5 seconds])–  (Current Performance [Query is executing for 10 seconds]) = (Success Results [Query is beyond expected run time, it needs to be tuned.])

(Current Performance [10 seconds for run time, analyze where the SQL is taking 5 additional seconds per run. Compare current resources, plans, wait types, etc. used to the past stats.])(Past Performance [5 seconds for run time. You can use these stats to compare to the new stats and see what has changed and where the new time has been added on.]) = (Progress [Query regression, comparing the differences between the two pieces of information will help you decide the cause and actionable items.])

As for what I specifically look for when creating alerts, it really varies based on the needs of the company but it boils down to things I am responsible for. If I am responsible for said thing, I need to know if its status changes or begins to change to an actionable point. I want to prevent problems and catch anything that may be a budding problem. This saves me the headache later and it’s a win/win for your employer.

What’s it like to tune a server?

I often like to compare tuning to a Rubik’s cube. When you look at a different side of the puzzle, things are related but it looks differently from the other sides. Every time you turn the pieces and create a new view, you get closer to solving the puzzle.

Looking at wait stats, query plans, extended events, event viewer, perfmon, and using so many different tools is that feeling. I look at one thing and it gives me enough information to create a new hypothesis which in turn gets tested. I’m continually moving those puzzle pieces and looking at it from different angles to eventually complete the puzzle.

The more you tune, the more logical and step by step it becomes. There are so many variables and paths to take when analyzing that the knowledge becomes second nature, just like a Rubik’s cube. You learned the algebra and patterns behind the cube and you don’t solve it / don’t have to solve it the exact same way twice. This is true for SQL as well. I always try to look at what I have, where I need to be, and work on that puzzle piece until it accomplishes what you set out to make it do.

And once more, just like that puzzle… it’s extremely satisfying when you figure it out. If you like solving puzzles that is, I know some folks who find that frustrating.

How do you find queries to tune and what have you done with one?

This goes in hand with the policy I mentioned earlier. With my jack of all trades scenarios, I usually rely on my query regression alerts telling me when things have begun to change based on how long it used to take and how long it takes now; on a rolling average but compared to historical patterns. This is what typically indicates things could be a problem.

When I don’t see something that is a growing problem, I’ll look to wait types and run times in cache and DMV’s to help me figure out what’s going on. I like to find queries that have generated the most wait time in their history, because if I take 1 second off a query that executes a million times a day, that’s 1 million seconds I just shaved off that workload. Even if that query only took 2 seconds and was fast before, that adds up really quickly and extends the life of your hardware. So if I’m not going after a specific goal because there’s a known or growing problem, I’ll look for what will give me the biggest bang for my buck.

What is an example of a request you would get day to day?

The requests will vary, but it could be summarized as:

  1. I need X report for Y things by Z day.
  2. Can you help me login to X server?
  3. Why is server Z doing thing T?
  4. I need help with X query, it…
    1. Does not do what is expected.
    2. Is too slow.
    3. Accidentally ran a DML statement without a where clause.
  5. Can you check on X alert I saw?

Most of the time it’s regarding help with specific queries, reports, or servers. The other 1% time it is probably the initiative and beginning request to a larger project.

For time, have you tried toggl? https://toggl.com/

I have looked at it before. Cathrine Wilhelmsen made an excellent post regarding online data tools and included Toggl in that list. I’m giving it a try out for a few days to see how I like it now.