What is DAMA?

DAMA is the Data Management Association International which is an association of data professionals. They have curated a certification program, framework, and documentation when it comes to database. There are many frameworks out there that you can use when dealing with data architecture:

  • DAMA
  • DCAM
  • CMMI
  • IBM
  • Stanford
  • Gartner

I’m not going to go deep into the others or their differences right now, but here’s a link from datacrossroads that talks about them all briefly. So, what exactly is the framework? Well, it is essentially a high-level guide to help obtain or work towards Data Governance while enhancing your company’s data maturity.

Continue reading “What is DAMA?”

How to keep track of how many records were truncated?

Someone had an issue where a third party app would truncate a table for maintenance and they had no control over the truncate and when it happened. The issue was that they needed to know how many records were in the table before the truncate occurred. Continue reading “How to keep track of how many records were truncated?”

What is the optimal way to order a clustered index?

Someone recently was asking me about indexes and how the order of the columns affected the index.

I have a table, and I want to create a clustered index on it. SSMS lets me choose the order of the columns when I make a clustered index. Logic would dictate that this must have some effect or else it wouldn’t be offered as an option. What factors are considered when determining the optimal order?

I read on Stack Overflow I think, someone said that the best way is to start with the column that has the most distinct values, and then decrease from there, but then a guy below him said he’s wrong. I haven’t been able to find a good source of into on this. Any help appreciated!

Can you guess my answer? Continue reading “What is the optimal way to order a clustered index?”

T-SQL Tuesday #106 Trigger headaches or happiness?

Triggers are both a useful tool with a specific niche… and the devil. Thank you Steve Jones for hosting this week’s topic on triggers!

On the left we have triggers and on the right we have constraints.
On the left we have triggers and on the right we have constraints.

When used appropriately, it’s a great skill to have in your repertoire. When used inappropriately, you will probably ruin someone’s day. Continue reading “T-SQL Tuesday #106 Trigger headaches or happiness?”

How many Sequences can I have in SQL Server?

I thought this was an interesting question, but it makes sense to have some concern about it. If you are using Sequences over identity inserts, this typically means you need more control over how those numbers are handled from creation to archive. So what if you need a lot of different Sequences stored on your server to handle your needs? Will you run out of Sequence objects that you can create?

This information is not intuitively simple to find, this is one of those times where you need to read various articles and connect the dots. Let’s start simple:

What is a Sequence?

A sequence is a user-defined schema bound object that generates a sequence of numeric values according to the specification with which the sequence was created.

MSDN on Sequence definition reference.

Ok, it’s a schema bound user defined object. What does SQL say about how many User Defined Objects we can have?

Database objects include objects such as tables, views, stored procedures, user-defined functions, triggers, rules, defaults, and constraints. The sum of the number of all objects in a database cannot exceed 2,147,483,647.

MSDN on maximum UDF reference.

Summary answer: You can have 2,147,483,647 objects in a database. One of those objects referenced would be Sequence Objects which Sequences fall into that category. This is not explicitly stated, but it can be contrived from the wording “… the sum of the number of all objects in a database cannot exceed 2,147,483,647.”

Snapshot isolation transaction failed in database (Availability Group and SSIS woes)

 

The Setup:

I have a SSIS package and it runs a relatively basic ETL setup. We take data from TableA, move it to TableB. TableB is a heap, we index it in our ETL and drop it afterwards. Now we join TableB to TableC. The reason we don’t go from A to C is because of the data relationship, it can cause a cascade join that creates BILLIONS of reads when we plug in some where predicate values.

Once TableB is joined to TableC, we create our file out of that data and take it away to import-land, where we do important import stuff. This is the match that lit the fire.

The error:

Snapshot isolation transaction failed in database ‘%.*ls’ because the object accessed by the statement has been modified by a DDL statement in another concurrent transaction since the start of this transaction. It is disallowed because the metadata is not versioned. A concurrent update to metadata can lead to inconsistency if mixed with snapshot isolation.

MSDN on the error.

The job ran three times. On the third time, the job succeeded but no rows were imported. The previous two runs elicited the same error.

The where:

I looked at the following timestamps:

– Job failure and SSIS package failure timestamps
– Availability Group log timestamps
– Staging table (Table B) timestamps
– Destination table (Table C) timestamps
– Historical query data timestamps

SSIS failed only milliseconds after an error in the Availability Group logs appeared. Likewise, I had a mismatch of timestamps between the Staging table and Destination table. New data exists in staging but not in the destination table nor the destination file.

I found a link however that gave me an idea!

What if the query taking data from Table B and Table C to a file is what was causing the issues and there is not something running in parallel stopping me? The job had a flow that should not cause issues like this, there is no parallelism in this package. It is acting though as if I am using two connections and one connection is a readonly and the second connection is R/W and I’m holding metadata locks on the table.

I looked to the Availability Group logs and voila! Lock_redo_blocked, creating a SCH_M lock on the Metadata of the table in question.

The why:

Let’s go back to the setup, this time I’m going to make comments between the steps to help illustrate what’s going on.

I have a SSIS package and it runs a relatively basic ETL setup. We take data from TableA, move it to TableB.

This is on the primary node of a synchronous availability group, we have R/W privileges here.

TableB is a heap, we index it in our ETL and drop it afterwards.

Same deal, we are on the R/W connection in the AG.

Now we join TableB to TableC.

Rubber… meet road, road… meet rubber.

The primary AG node is having issues replicating the changes from node A to node B fast enough and is taking out a SCH_M lock on the table which is causing our error. The extract source is based off of the secondary node always, which is in synchronous mode.

Woah, wait! Your package succeeded the third time… but there were no rows? Your staging table must have had no records then.

Not quite, we had rows there. The table we were joining to had matches as well. We are querying the staging table on the secondary and the rows are not present.

Ok… that doesn’t make sense. You are in synchronous commit mode, node A persists as soon as node B does.

True, but there is a problem with this replication as per Microsoft and it is not error proof.

MSDN on why Availability Groups don’t love you anymore.

The client application completes an update on the primary replica successfully, but querying the secondary replica shows that the change is not reflected. This case assumes that your availability has a healthy synchronization state. In most cases, this behavior resolves itself after a few minutes.

 If changes are still not reflected on the secondary replica after a few minutes, there may be a bottleneck in the synchronization work flow. The location of the bottleneck depends on whether the secondary replica is set to synchronous commit or asynchronous commit.

Those are the top paragraphs on that page. Ok… so that means it is possible that we read no rows from the secondary when they existed on the primary before the AG was able to replicate or lock the schema. Why?

Each successful update on the primary replica has already been synchronized to the secondary replica, or that the log records have already been flushed for hardening on the secondary replica. Therefore, the bottleneck should be in the redo process that happens after the log is flushed on the secondary replica.

Bottlenecks. Alright, what are the causes of bottle necks?

Well, the MSDN suggests that long running active transactions, high network latency or low throughput, another workload blocks the redo thread, or redo thread falls behind because of resource contention.

The solution:

Ultimately my solution was to change the data source from the read only secondary to the primary node assigned by the listener on a R/W connection. Now I don’t care if the secondary is lagging behind in this particular instance, but it looks like I have some query tuning to do!

I have added in a form for feedback, it would be very helpful if you would take a few minutes to fill this out. My goal is to adjust the content to best help others resolve their problems.

How do I check the metadata of my temporal table?

There was an interesting question recently about checking whether a table had been created with DATA_CONSISTENCY_CHECK, if the SYSTEM_VERSION option was set, and how to find the name of the historical tables.

First up, DATA_CONSISTENCY_CHECK.

I have finished studying A LOT on temporal tables for my 70-761 and I knew most of the answers off the top of my head. The question regarding DATA_CONSISTENCY_CHECK and seeing if that value persisted anywhere was a head scratcher though. It seemed no one had asked a question in that way before and I did not find any links or articles that talked about that aspect.

I had to create a temporal table to see the results for myself. In short, DATA_CONSISTENCY_CHECK is a one time action flag when you create or alter a table. It is telling SQL Server to run DBCC CHECKCONSTRAINT. This is validating that the data between the table you want to turn into a temporal table and the historical table receiving records from the temporal table match up. It will not allow a SYSTEM_VERSION of a table if the SysEndTime > SysStartTime.

If you try and look for the value DATA_CONSISTENCY_CHECK is set to, you won’t find it. It’s not persisted after run time, it is just a flag to say “perform a check please”. It is also ran by default without you specifying, so I would say you can leave it out of your typical syntax unless you wanted to declare no check on that constraint.

If you have temporal tables, you can see that it does not exist for yourself:

select * from sys.tables
order by name;

select * from sys.key_constraints;

select * from sys.check_constraints;

select * from sys.sysconstraints;

select definition,name
from sys.check_constraints;

select definition,name
from sys.default_constraints;

If you would like to run the check yourself, you can with this base command. You will want to alter the after portion of DBCC CHECKCONSTRAINTS.

DBCC CHECKCONSTRAINTS WITH ALL_CONSTRAINTS

MSDN on DBCC CHECKCONSTRAINTS

Here is how Microsoft officially describes it:

When creating a link to an existing history table, you can choose to perform a data consistency check. This data consistency check ensures that existing records do not overlap. Performing the data consistency check is the default. Use this argument in conjunction with the PERIOD FOR SYSTEM_TIME and GENERATED ALWAYS AS ROW { START | END } arguments to enable system versioning on a table.

MSDN Source.

Generally, performing the data consistency is recommended whenever the data between the current and history tables may be out of sync, such as when incorporating an existing history table that is populated with history data.

MSDN Source.

Now, is SYSTEM_VERSION currently set? This effectively means you are in temporal table mode. Here’s a quick way to check this:

SELECT name, object_id,temporal_type,temporal_type_desc FROM sys.tables
where temporal_type in (1,2)
order by name

Our last question was “How can I find the name of my history table?” We’re going to use our good friend the sys.tables object once more for this.

SELECT NAME
	,object_id
	,temporal_type
	,temporal_type_desc
FROM sys.tables
WHERE temporal_type_desc = 'History_Table'
ORDER BY NAME