What is the optimal way to order a clustered index?

Someone recently was asking me about indexes and how the order of the columns affected the index.

I have a table, and I want to create a clustered index on it. SSMS lets me choose the order of the columns when I make a clustered index. Logic would dictate that this must have some effect or else it wouldn’t be offered as an option. What factors are considered when determining the optimal order?

I read on Stack Overflow I think, someone said that the best way is to start with the column that has the most distinct values, and then decrease from there, but then a guy below him said he’s wrong. I haven’t been able to find a good source of into on this. Any help appreciated!

Can you guess my answer? Continue reading “What is the optimal way to order a clustered index?”

T-SQL Tuesday #106 Trigger headaches or happiness?

Triggers are both a useful tool with a specific niche… and the devil. Thank you Steve Jones for hosting this week’s topic on triggers!

On the left we have triggers and on the right we have constraints.
On the left we have triggers and on the right we have constraints.

When used appropriately, it’s a great skill to have in your repertoire. When used inappropriately, you will probably ruin someone’s day. Continue reading “T-SQL Tuesday #106 Trigger headaches or happiness?”

How many Sequences can I have in SQL Server?

I thought this was an interesting question, but it makes sense to have some concern about it. If you are using Sequences over identity inserts, this typically means you need more control over how those numbers are handled from creation to archive. So what if you need a lot of different Sequences stored on your server to handle your needs? Will you run out of Sequence objects that you can create?

This information is not intuitively simple to find, this is one of those times where you need to read various articles and connect the dots. Let’s start simple:

What is a Sequence?

A sequence is a user-defined schema bound object that generates a sequence of numeric values according to the specification with which the sequence was created.

MSDN on Sequence definition reference.

Ok, it’s a schema bound user defined object. What does SQL say about how many User Defined Objects we can have?

Database objects include objects such as tables, views, stored procedures, user-defined functions, triggers, rules, defaults, and constraints. The sum of the number of all objects in a database cannot exceed 2,147,483,647.

MSDN on maximum UDF reference.

Summary answer: You can have 2,147,483,647 objects in a database. One of those objects referenced would be Sequence Objects which Sequences fall into that category. This is not explicitly stated, but it can be contrived from the wording “… the sum of the number of all objects in a database cannot exceed 2,147,483,647.”

Do Synonyms impact performance in SQL Server?

I tried an experiment recently to test whether synonyms impacted performance in SQL Server.

Here’s our setup for the experiment:

-- Experiment table
CREATE TABLE test 
(one INT NOT NULL, 
two INT NOT NULL, 
three INT NOT NULL, 
four INT NOT NULL);

-- Get records
INSERT INTO test
VALUES (1, 1, 1) 
GO 10000

-- Make synonym
CREATE SYNONYM test2
FOR test;

-- Clear cache
DBCC FREEPROCCACHE;

-- Measure
SET STATISTICS IO ON;
SET STATISTICS TIME ON;

-- Test
SELECT * from test2;
SELECT * FROM test;

When experimenting, I ran each SELECT independently and ran DBCC FREEPROCCACHE (DON’T DO THAT IN PRODUCTION) in between run times.

My first experiment, I created the table as a HEAP. I saw no change in plan or performance.

-- TEST 1.1 Synonym used in HEAP
DBCC FREEPROCCACHE;
SET STATISTICS IO ON;
SET STATISTICS TIME ON;

-- Test synonym
SELECT * from test2;

Test 1.1: Synonym used in HEAP plan

(10000 row(s) affected) Table ‘test’. Scan count 1, logical reads 33, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 117 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.

-- TEST 1.2 Synonym unused in HEAP
-- Test base table
SELECT * FROM test;

Test 1.2: No synonym used in HEAP plan

(10000 row(s) affected) Table ‘test’. Scan count 1, logical reads 33, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 131 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.

The second experiment I added a primary key and clustered index, I also saw no change in plan or performance. Here’s the setup and first test.

DROP TABLE test;

CREATE TABLE test 
(one INT NOT NULL identity(1, 1) PRIMARY KEY, 
two INT NOT NULL, 
three INT NOT NULL, 
four INT NOT NULL);

INSERT INTO test(1,1,1)
GO 10000

DBCC FREEPROCCACHE;
SET STATISTICS IO ON;
SET STATISTICS TIME ON;

-- TEST 2.1 Synonym used in B-Tree
-- Test synonym
SELECT * from test2;

Test 2.1: Synonym used in clustered table plan

(10000 row(s) affected) Table ‘test’. Scan count 1, logical reads 33, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times: CPU time = 16 ms, elapsed time = 102 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.

-- TEST 2.2 Synonym unused in B-Tree
DBCC FREEPROCCACHE;
SET STATISTICS IO ON;
SET STATISTICS TIME ON;

-- Test base table
SELECT * FROM test;

Test 2.2: No synonym used in clustered table plan

(10000 row(s) affected) Table ‘test’. Scan count 1, logical reads 33, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 83 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.

For the last experiment I created a non-clustered index on the two column to include columns three and four as well and forced the use of an index.

CREATE NONCLUSTERED INDEX testnonc ON test (two);

DBCC FREEPROCCACHE;

-- Measure
SET STATISTICS IO ON;
SET STATISTICS TIME ON;

-- TEST 3.1 Synonym used with B-Tree and Index
SELECT * FROM test2 WITH (INDEX(testnonc));

Test 3.1: Synonym used in clustered table with non-clustered index plan

(10000 row(s) affected) Table ‘test’. Scan count 1, logical reads 20654, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 104 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.

-- TEST 3.2 Synonym unused in B-Tree with index
DBCC FREEPROCCACHE;

-- Measure
SET STATISTICS IO ON;
SET STATISTICS TIME ON;

SELECT * FROM test WITH(INDEX(testnonc))

Test 3.2: No synonym used in clustered table with non-clustered index plan

(10000 row(s) affected) Table ‘test’. Scan count 1, logical reads 20654, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times: CPU time = 31 ms, elapsed time = 126 ms. SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.

Ok, so there was actually no difference. So what’s going on? The short answer is that there is a phase that happens before SQL creates a Query Plan, the bind phase. So this means we won’t see the impact Synonyms have on the query. This also means that any if at all negative performance associated to the Synonym, is so negligible that it is not relevant to tuning or worth worrying about.

Scott Hodgin on Stack Exchange has compiled a few sources talking about the bind phase and how it has little to no impact.

Skill showcase – Pet store OLTP database

I was involved awhile ago in a non-officially-competitive but competitive all the same project, where I needed to create a database and showcase my knowledge set. The requirements were straight forward, vague, and yet complex. The timeline was one day, so I had to be smart about my time and use the proper toolset.

Here’s the assignment summarized:

Create a pet shop database that involves inventory/point of sale. I plan to sell pets, toys, supplies, and food. Please create a DB definition that will allow me to track purchases, inventory, vendors, holiday sales, and taxes.

It must work with high volume pet shop in the state and be compatible to expand to the rest of the states later.

So what did we end up with?

1. A project plan.
2. Load test documentation.
3. ERD of database.
4. Complete database documentation.
5. T-SQL script to create it all.

I think the end result from a single day timeline was great, even if that restriction made things more… “ruff”! And with that said, I’m proud to present… PetsDumb! Not quite as successful as PetSmart. Sorry for the bad jokes, I’ll stop “kitten” around now. Here’s the source files on my GitHub showing what I was able to accomplish with that description and time frame. Happy Saturday everyone!

https://github.com/Jonathanshaulis/Pet-Shop-OLTP-Database

Parameter validation failed in SSRS

Parameter validation failed. It is not possible to provide valid values for all parameters. (rsParameterError)

I saw this error the other day and it took me a minute to recall how I’ve fixed this in the past.

If you are unfamiliar with SSRS and Visual Studios, it can be a little “unfriendly” with some features. Specifically in this case, taking a parameter that used to be hidden and making it a visual integral part of the report. In my particular circumstance, this report is not testable in visual studio and has to be uploaded to a server in order to validate.

I found a few articles and posts, but none of them helped me in this circumstance:

Microsoft MSDN

Social MSDN Microsoft Forum

Forums ASP

SQL Server Central Forums

The things I tried I really thought that would work:

  • Changing one of my dependent parameters populated from a data set to refresh when it sees fit and not automatically every time.
  • Copying an existing and working parameter that is similar over the old one in the XML and tweaking it slightly.
  • Saving and exiting the report.
  • Deleting the parameter and re-creating the parameter.
  • Scouring the XML for all references to the parameter and making sure it all was kosher.

What did work in the end:

Deleting the report off the report server and re-uploading it. Overwriting the report / uploading and applying the updates to it did not change anything and it gave me a consistent error. After I deleted it from the server and re-uploaded it, the error was gone completely.

I have added in a form for feedback, it would be very helpful if you would take a few minutes to fill this out. My goal is to adjust the content to best help others resolve their problems.

Did this solve your problem?(required)

Featured Stack Exchange answer!

As many people I talk to about the pursuit of knowledge know, I advocate for a minimum of learning three new things per day. So typically I’ll subscribe to SQL newsletters, podcasts, blogs, and any other medium of information I can find to learn as much as possible. Well, I was really surprised when I found out an answer I posted on Stack Exchange was featured in Brent Ozar’s newsletter!

Here below is the newsletter: (I have a feeling the link may break in the future though!)
Ozar Newsletter

So here’s a screenshot of the page:

And here’s the answer I had posted!

I just thought it was really interesting and neat to pop up randomly in a newsletter I’ve been reading for 7+ years and I had to share it.

Snapshot isolation transaction failed in database (Availability Group and SSIS woes)

 

The Setup:

I have a SSIS package and it runs a relatively basic ETL setup. We take data from TableA, move it to TableB. TableB is a heap, we index it in our ETL and drop it afterwards. Now we join TableB to TableC. The reason we don’t go from A to C is because of the data relationship, it can cause a cascade join that creates BILLIONS of reads when we plug in some where predicate values.

Once TableB is joined to TableC, we create our file out of that data and take it away to import-land, where we do important import stuff. This is the match that lit the fire.

The error:

Snapshot isolation transaction failed in database ‘%.*ls’ because the object accessed by the statement has been modified by a DDL statement in another concurrent transaction since the start of this transaction. It is disallowed because the metadata is not versioned. A concurrent update to metadata can lead to inconsistency if mixed with snapshot isolation.

MSDN on the error.

The job ran three times. On the third time, the job succeeded but no rows were imported. The previous two runs elicited the same error.

The where:

I looked at the following timestamps:

– Job failure and SSIS package failure timestamps
– Availability Group log timestamps
– Staging table (Table B) timestamps
– Destination table (Table C) timestamps
– Historical query data timestamps

SSIS failed only milliseconds after an error in the Availability Group logs appeared. Likewise, I had a mismatch of timestamps between the Staging table and Destination table. New data exists in staging but not in the destination table nor the destination file.

I found a link however that gave me an idea!

What if the query taking data from Table B and Table C to a file is what was causing the issues and there is not something running in parallel stopping me? The job had a flow that should not cause issues like this, there is no parallelism in this package. It is acting though as if I am using two connections and one connection is a readonly and the second connection is R/W and I’m holding metadata locks on the table.

I looked to the Availability Group logs and voila! Lock_redo_blocked, creating a SCH_M lock on the Metadata of the table in question.

The why:

Let’s go back to the setup, this time I’m going to make comments between the steps to help illustrate what’s going on.

I have a SSIS package and it runs a relatively basic ETL setup. We take data from TableA, move it to TableB.

This is on the primary node of a synchronous availability group, we have R/W privileges here.

TableB is a heap, we index it in our ETL and drop it afterwards.

Same deal, we are on the R/W connection in the AG.

Now we join TableB to TableC.

Rubber… meet road, road… meet rubber.

The primary AG node is having issues replicating the changes from node A to node B fast enough and is taking out a SCH_M lock on the table which is causing our error. The extract source is based off of the secondary node always, which is in synchronous mode.

Woah, wait! Your package succeeded the third time… but there were no rows? Your staging table must have had no records then.

Not quite, we had rows there. The table we were joining to had matches as well. We are querying the staging table on the secondary and the rows are not present.

Ok… that doesn’t make sense. You are in synchronous commit mode, node A persists as soon as node B does.

True, but there is a problem with this replication as per Microsoft and it is not error proof.

MSDN on why Availability Groups don’t love you anymore.

The client application completes an update on the primary replica successfully, but querying the secondary replica shows that the change is not reflected. This case assumes that your availability has a healthy synchronization state. In most cases, this behavior resolves itself after a few minutes.

 If changes are still not reflected on the secondary replica after a few minutes, there may be a bottleneck in the synchronization work flow. The location of the bottleneck depends on whether the secondary replica is set to synchronous commit or asynchronous commit.

Those are the top paragraphs on that page. Ok… so that means it is possible that we read no rows from the secondary when they existed on the primary before the AG was able to replicate or lock the schema. Why?

Each successful update on the primary replica has already been synchronized to the secondary replica, or that the log records have already been flushed for hardening on the secondary replica. Therefore, the bottleneck should be in the redo process that happens after the log is flushed on the secondary replica.

Bottlenecks. Alright, what are the causes of bottle necks?

Well, the MSDN suggests that long running active transactions, high network latency or low throughput, another workload blocks the redo thread, or redo thread falls behind because of resource contention.

The solution:

Ultimately my solution was to change the data source from the read only secondary to the primary node assigned by the listener on a R/W connection. Now I don’t care if the secondary is lagging behind in this particular instance, but it looks like I have some query tuning to do!

I have added in a form for feedback, it would be very helpful if you would take a few minutes to fill this out. My goal is to adjust the content to best help others resolve their problems.

Did this solve your problem?(required)

How do I check the metadata of my temporal table?

There was an interesting question recently about checking whether a table had been created with DATA_CONSISTENCY_CHECK, if the SYSTEM_VERSION option was set, and how to find the name of the historical tables.

First up, DATA_CONSISTENCY_CHECK.

I have finished studying A LOT on temporal tables for my 70-761 and I knew most of the answers off the top of my head. The question regarding DATA_CONSISTENCY_CHECK and seeing if that value persisted anywhere was a head scratcher though. It seemed no one had asked a question in that way before and I did not find any links or articles that talked about that aspect.

I had to create a temporal table to see the results for myself. In short, DATA_CONSISTENCY_CHECK is a one time action flag when you create or alter a table. It is telling SQL Server to run DBCC CHECKCONSTRAINT. This is validating that the data between the table you want to turn into a temporal table and the historical table receiving records from the temporal table match up. It will not allow a SYSTEM_VERSION of a table if the SysEndTime > SysStartTime.

If you try and look for the value DATA_CONSISTENCY_CHECK is set to, you won’t find it. It’s not persisted after run time, it is just a flag to say “perform a check please”. It is also ran by default without you specifying, so I would say you can leave it out of your typical syntax unless you wanted to declare no check on that constraint.

If you have temporal tables, you can see that it does not exist for yourself:

select * from sys.tables
order by name;

select * from sys.key_constraints;

select * from sys.check_constraints;

select * from sys.sysconstraints;

select definition,name
from sys.check_constraints;

select definition,name
from sys.default_constraints;

If you would like to run the check yourself, you can with this base command. You will want to alter the after portion of DBCC CHECKCONSTRAINTS.

DBCC CHECKCONSTRAINTS WITH ALL_CONSTRAINTS

MSDN on DBCC CHECKCONSTRAINTS

Here is how Microsoft officially describes it:

When creating a link to an existing history table, you can choose to perform a data consistency check. This data consistency check ensures that existing records do not overlap. Performing the data consistency check is the default. Use this argument in conjunction with the PERIOD FOR SYSTEM_TIME and GENERATED ALWAYS AS ROW { START | END } arguments to enable system versioning on a table.

MSDN Source.

Generally, performing the data consistency is recommended whenever the data between the current and history tables may be out of sync, such as when incorporating an existing history table that is populated with history data.

MSDN Source.

Now, is SYSTEM_VERSION currently set? This effectively means you are in temporal table mode. Here’s a quick way to check this:

SELECT name, object_id,temporal_type,temporal_type_desc FROM sys.tables
where temporal_type in (1,2)
order by name

Our last question was “How can I find the name of my history table?” We’re going to use our good friend the sys.tables object once more for this.

SELECT NAME
	,object_id
	,temporal_type
	,temporal_type_desc
FROM sys.tables
WHERE temporal_type_desc = 'History_Table'
ORDER BY NAME

My study guide to the 70-762: Developing SQL Databases

I couldn’t find a spreadsheet that would help track the topics, ETA, and other notes online, so I decided to compile all of my resources into a singular spreadsheet to help me keep track of studying and to help the community for those taking the 70-762. You can copy / make a copy of this and track yourself.

Spreadsheet here!

The tab Topic Resources has been taken from Daniel Calbimonte and his post on MSSQLTips. Huge thanks to him for taking the time to compile all of those resources on those topics!

In the Additional Resources, I’ve listed out links to things I thought are incredibly beneficial to my studying of the 70-762. This also lists out a Time to Study table where you can average how many hours of studying you plan to do on certain days of the week and then it gives you an estimate of the hours necessary to study based on what you estimate each topic will take you to learn. I’ve already filled in the hours for an estimated optimistic and pessimistic goal, as you fill in the actual hours worked you’ll notice it will adjust for actual time spent instead of only estimated time.

The Topic and Hours page lists out the topics as per the test page from Microsoft and breaks it out into a table that you can track hours you spend in each topic. This gives you a manageable approach to figuring out how much time this will take you to complete. I’ve filled in my own estimate of the optimistic and pessimistic guidelines, so feel free to adjust this as you need to match your own knowledge. I figured I would want at least 30 minutes to a hour on each topic even if it’s a brief overview just to refresh myself.

The last page is the planner. This is where I keep track of the date I studied, the high / mid / detail level topics I worked on, the amount of time I worked on it, and there are two extra fields. One for notes if there is something I need to come back to or perhaps I found some additional resources I want to touch base on later. And then there’s the Other Topic column. I’m still reading through the Internal Pros book, so when I log time it’s specific to reading that book rather than any of the categories I’ve listed. I decided when I begin working on the 70-762 study guide I’ll begin filling in time on the topics and making sure I understand the topics to meet the detailed level of the topics listed by Microsoft. This will make for an interesting PowerBI graph later to see how my time was spent studying. Maybe I can use that data to study more efficiently for 70-767!

Happy studying and see you on the other side!