Steve Jones has asked for some folks in the community to talk about what it’s like to work in our field across four posts. So how do I talk about what it’s like in four different ways?
I figured I’ll structure my four post series as:
1. Day to day, what is normal? What are 95% of my experiences?
2. Best days ever, these are the highlights.
3. Worst days ever, these don’t happen all of the time and this is what nightmares are made out of.
4. Weird days, Brent Ozar said it best when he said Database Administration is weird.
Today is the “day to day” aspect of my job! And how appropriate when returning back from a quick break over this holiday Wednesday to a normal Thursday.
The day to day as a DBA
I like to roll into the office a few minutes early, I gather my caffeine, water, and situate my desk. I’ll go through my inbox or chat system right afterwards. I want to see if I have any alerts, read through reports that give me key metrics on performances, and see if there are requests from people who may need assistance.
I’ll log into our monitoring systems afterwards to check things out but it typically is pretty sparse. I configured alerts so I don’t have to be the watcher, but it’s nice to get a view of what’s going on anyway, it’s almost like a window into what things my database is doing.
At this point, I’ll bring up three lists. I keep a spreadsheet that tracks my time worked, a notepad of the tasks I am completing and when I completed them, and a notepad of tasks I need to do. That “need to do list” is broken down by nice to have’s, important things, or even just daily reminders. I like to refresh this list in the morning after coming in and in the evening before leaving the office.
I’ve scripted and automated a lot of my job, the pieces I have not scripted I focus on automating. Things should alert me at thresholds indicating there could be a problem in the future and things should alert me when there are problems. The idea though is to be proactive and not reactive and to always be on top of what’s going on. Every time something happens, I wonder to myself “How can I prevent this / alert myself on this / automate this task?”
From there I get a ton of adhoc requests throughout the day. Once a week or a few times a week, I may get assigned a relatively major project. Projects have that hurry up and wait feel, you’ll sprint really hard to get things done. Once you are finished, you’ll wait days, weeks, or sometimes even years before more pieces come into play and you have to pick up where you left off.
When I’m not fighting the backlog of items to do, working on a project, or an adhoc request; I’m working on figuring out where we are not performing best practices. Are we tuning indexes regularly and analyzing them? Are we performing random backup checks? Are all backups succeeding? What are our major queries that need tuning? The important thing to take away from this time is that there is no direction here. No one is asking me to go do these things or figure out if we should be doing them. These are practices and skills you pick up the longer you work in this field. You should always be proactively assessing your work situation to improve things for your employer.
Some other things I do is I read. I read a minimum of three blog posts a day which takes about ~10 minutes altogether. If I can learn three new things every day, that is a great day. I’ve worked in database for over seven years, which means I have learned at least 7,665 different things. It’s probably a lot higher than that, but the point is to illustrate that over time that cumulative knowledge adds up.
When I take my lunch or breaks, I’ll look at Reddit or StackOverflow and try to help people with questions. This gives me an even more broad view of the world and problems I would have never even thought about AND it gives you the ability to think critically about solving those problems while seeing how others go about the same process.
I have a few meetings a week, I try to keep meetings short and to the point. If I don’t have any actionable items from meetings or people don’t need to ask me questions in meetings, then I find they are not worth going to unless the knowledge to be obtained from the meeting is helpful. These meetings typically range from change request meetings, team scrum meetings, or project update meetings.
In the event I find some “spare time”, I’ll go back to the point I made earlier of always trying to improve things for your employer. I’ll work on proof of concepts and their implementations, usually learning along the way to solve problems that the employer maybe hasn’t considered or hasn’t had the resources to resolve previously.
After I get home, I’ll try to write / read / watch / practice / study SQL for 30 minutes to an hour. Over the course of a week, I’ll usually get 2-5 hours of self education in and I log that self-education in my spreadsheet so I can track how and what I’m learning.
That’s what my typical day is, I’ve mostly aggregated my experiences across my four jobs and this is what it feels like most of the time to me. Every job is different, but at its core for DBA, you know what a DBA is and does and that work has a ton of similarities. Some employers may have something weird going on, another may do things a little differently, perhaps there’s a totally different technology stack! At the end of the day though, I can roll those jobs up into similar core functions. If I had to give you a high overview of what it looks like day to day though, it would be:
1. Being proactive to problems.
2. Being reliable and responsive to users.
3. Learning and growing your skill set.
4. Using your education and skill set to better the environment for your employer. (Because how can you do something better if you haven’t learned how to do it better yet? That’s why this is #4 instead of #3.)
Edit 2018-07-06:
I wanted to fill in a few points that were made by Steve in the comments and I wanted to touch on two other topics. One topic I mentioned briefly was stating that I found the core work to be the same from shop to shop. This is primarily because I focus on job listings where they need someone who can do everything well, but they do not need to be an industry expert in any given task to succeed. The position does not yield itself to requiring a niche skill but it needs more jack of all trades skills with databases. (However, I would say my personal niche and expertise lies in performance tuning and making things go fast.) I will post the last two descriptions of my job that were sent to me when I applied, so you can see how the job looked on the outside before getting the insider info on the day to day.
Senior SQL General Specialist
Senior Data Engineer
The other topic I completely forgot to mention was “documentation“. It feels like most people never create documentation and even if they do, no one will ever read it. I personally believe creating: (1) documentation, (2) training, and (3) knowledge hand offs; are necessary to complete a task or project. It’s part of the whole “senior data professional package”. I’ll come in and fix your problems and document it so you can have a sustainable and scalable process that can be supported by future employees. If you wanted to see what I would suggest for a bare minimum for documentation, you can see my blog post “Skill Showcase” where I created a quick project and documented it.
Onto comments!
When do you check alerts?
Ideally when I create alerts, I work hard on adjusting them so you don’t hear noise. Too many alerts when things aren’t actionable is not a good idea, it becomes easy to overlook real incidents. Likewise, if you don’t get an alert when things have gone south, that’s equally terrible. It’s forever a battle with tuning the alert thresholds to trigger exactly when necessary with as few notifications as needed.
When I set those thresholds on a new alert, I like to be liberal with the wiggle room. I prefer to get more alerts than necessary to start and dial it back until I know it’s definitively alerting every time there is a problem but it’s not firing so often that I stop checking the alerts. Some alerts are pretty simple, disk space is one where you can easily set a threshold based on company policy and performance expectations. Other alerts like blocking, locking, long running queries, etc., are more difficult to narrow down.
Over time as you work in the environment and gain business knowledge of how things function, you’ll understand more about how everything links and interacts. Every environment is relative, but the philosophy behind how you approach the problems is typically static or expanded with new technology. Your company should have guidelines and expectations for what defines a successful and healthy database. If they don’t, then you could help them write it! Setting the expectations based on the business needs to function and grow is not trivial and it’s a task that should involve most, if not all department heads in the company who have a stake in the database or the data. (That doesn’t mean that they should all dictate what the policy is, but they should voice their needs and concerns so the CEO / CIO / IT Director can make educated decisions.)
What do you look for when checking alerts? Or what’s an example of things to ignore or take action on?
With that baseline of knowing what is and is not acceptable, you can create procedures and processes that drive success in the business. Policy should be driving the changes you make and the goals you work towards, which brings me back full circle to the philosophy standpoint that is inherent in most jobs. Establish a baseline, measure where you are, measure again later to find out where you are afterwards. How you measure, when you measure, and how you establish a baseline are variables that will change and will depend greatly on the circumstance.
(Baseline) – (Current Performance) = (Success Results, determines if actionable items are necessary.)
(Current Performance) – (Past Performance) = (Progress, determines actionable items to increase performance.)
Using the formulas above, let’s look at two scenarios.
Scenario 1: You want to tune an alert. It should not create so much noise that we ignore the alert but it needs to alert when there is a problem.
(Baseline [100% notification when an error occurs, no more than 5 noise messages during the day]) – (Current Performance [We have missed 1/10 alerts and we receive no noise emails]) = (Success Results [Failed response to the 1/10 alerts. Requires action.])
(Current Performance [9/10 alerts alerted, no noise emails]) – (Past Performance [10/10 alerts alerted, but 100+ extra email notifications]) = (Progress [We are down noise but up missed alerts, go back and tune or create a new solution to limit noise.])
Scenario 2: You track run time in your environment and users are reporting the application has been functioning slowly lately when searching for a value.
(Baseline [Query runs for 5 seconds])– (Current Performance [Query is executing for 10 seconds]) = (Success Results [Query is beyond expected run time, it needs to be tuned.])
(Current Performance [10 seconds for run time, analyze where the SQL is taking 5 additional seconds per run. Compare current resources, plans, wait types, etc. used to the past stats.])–(Past Performance [5 seconds for run time. You can use these stats to compare to the new stats and see what has changed and where the new time has been added on.]) = (Progress [Query regression, comparing the differences between the two pieces of information will help you decide the cause and actionable items.])
As for what I specifically look for when creating alerts, it really varies based on the needs of the company but it boils down to things I am responsible for. If I am responsible for said thing, I need to know if its status changes or begins to change to an actionable point. I want to prevent problems and catch anything that may be a budding problem. This saves me the headache later and it’s a win/win for your employer.
What’s it like to tune a server?
I often like to compare tuning to a Rubik’s cube. When you look at a different side of the puzzle, things are related but it looks differently from the other sides. Every time you turn the pieces and create a new view, you get closer to solving the puzzle.
Looking at wait stats, query plans, extended events, event viewer, perfmon, and using so many different tools is that feeling. I look at one thing and it gives me enough information to create a new hypothesis which in turn gets tested. I’m continually moving those puzzle pieces and looking at it from different angles to eventually complete the puzzle.
The more you tune, the more logical and step by step it becomes. There are so many variables and paths to take when analyzing that the knowledge becomes second nature, just like a Rubik’s cube. You learned the algebra and patterns behind the cube and you don’t solve it / don’t have to solve it the exact same way twice. This is true for SQL as well. I always try to look at what I have, where I need to be, and work on that puzzle piece until it accomplishes what you set out to make it do.
And once more, just like that puzzle… it’s extremely satisfying when you figure it out. If you like solving puzzles that is, I know some folks who find that frustrating.
How do you find queries to tune and what have you done with one?
This goes in hand with the policy I mentioned earlier. With my jack of all trades scenarios, I usually rely on my query regression alerts telling me when things have begun to change based on how long it used to take and how long it takes now; on a rolling average but compared to historical patterns. This is what typically indicates things could be a problem.
When I don’t see something that is a growing problem, I’ll look to wait types and run times in cache and DMV’s to help me figure out what’s going on. I like to find queries that have generated the most wait time in their history, because if I take 1 second off a query that executes a million times a day, that’s 1 million seconds I just shaved off that workload. Even if that query only took 2 seconds and was fast before, that adds up really quickly and extends the life of your hardware. So if I’m not going after a specific goal because there’s a known or growing problem, I’ll look for what will give me the biggest bang for my buck.
What is an example of a request you would get day to day?
The requests will vary, but it could be summarized as:
- I need X report for Y things by Z day.
- Can you help me login to X server?
- Why is server Z doing thing T?
- I need help with X query, it…
- Does not do what is expected.
- Is too slow.
- Accidentally ran a DML statement without a where clause.
- Can you check on X alert I saw?
Most of the time it’s regarding help with specific queries, reports, or servers. The other 1% time it is probably the initiative and beginning request to a larger project.
For time, have you tried toggl? https://toggl.com/
I have looked at it before. Cathrine Wilhelmsen made an excellent post regarding online data tools and included Toggl in that list. I’m giving it a try out for a few days to see how I like it now.