How long do the parts of my job take?

When you schedule a job in SQL Server, you can see how long the job took by right clicking on the job and then clicking View History. If you want to see how long each step in the job took, you can click the little plus sign next to each instance of the job running and you’ll get a list of information about each step.

While this is very nice, what do you do if you want to keep the information longer than the history allows? Or if you want to share it with people or graph the information in various ways? Graphing can be a good way to tell if you’re really getting the performance improvements you’re expecting. Fortunately, there’s a way to query this data directly from SQL Server.

If you want to get the data in easily copyable or storable format, just query the two msdb tables: sysjobhistory and sysjobs. Between them, you can query by job name and get the time the job ran and the duration of every step. You’ll see the join in the query below. However, you should know there’s something weird about the way the data is presented.

If you were to run a query and return the columns run_time and run_duration, you might expect to get the time back formatted as a time and the run duration in seconds or milliseconds. What you will get back is two ints. If you ran your job at 3:15, you’ll see the number 31500 in the run_time column. If your job took 5 minutes and 47 seconds, you’ll see 547 in the run_duration column. Neither of these is formatted well for easy use. You’d want the run_time column to show the time in hours and minutes. You’d want the run_duration column to either show in seconds if you’re trying to graph it or do any math with it or you’d want it to be formatted like you’re used to seeing time. 547 looks like 5 hundred forty seven when it is actually 5 minutes 47 seconds.

In the following query, I change both columns to be more easily readable.

SELECT run_date, run_time, (mts * 60) + secs as run_duration_seconds
FROM
(SELECT run_date, run_time/100 as run_time, Right(run_duration, 2) as secs,
(run_duration – Right(run_duration, 2)) /100 as mts
FROM msdb..sysjobhistory H
INNER JOIN msdb..sysjobs J
ON J.job_id = H.job_id
WHERE J.[name] = ‘MyJob’
AND step_id = 0) Z
Order by run_date DESC

You can see in the subquery that I divide the run_time by 100 to get rid of the superfluous zeroes at the end. Since the rest of it is hours, minutes, seconds, it is easily formattable in whatever program you choose for display.

For run_time, I take the rightmost 2 characters as the seconds. I use a similar calculation to subtract the seconds from the integer, divide by 100 to get rid of the resulting zeroes and the rest is left as minutes. The final step is to multiply the minutes by 60 and add the seconds and you have a nicely graphable, base 10 integer of your step duration in seconds. I find this little transformation results in more usable data and it is easier to do it in the query than to try to do the calculation in Excel or other program later. I also find it better to do it before I store it so I don’t have to remember to do it later.

 

Posted in Uncategorized | Leave a comment

Stored Procedures don’t necessarily stop at an error

I normally expect precedence to save me in my stored procedures. If I have six statements and the third ones fails, I expect the fourth through sixth not to run. Boy, was I wrong. Normally I test my stored procedures a step at a time and don’t end up seeing this kind of behavior, but I now have stored procedures running that occasionally are fed bad data and that’s how I discovered this problem. I’m using SQL Server 2008R2, so I don’t know if this holds true for older versions, but I think it likely does. Fortunately, SQL Server 2008 has a solution.

The stored procedure that revealed this error to me is one of the ones I use to import data into our warehouse daily. It takes the data from a table where it has been accumulating from the previous day and moves it to an interim table. The data is then copied from the interim table to the permanent table, but is left in this interim table so the previous day’s load can be more easily referenced. The problem comes from the truncates. Once the data is copied to the interim table, the accumulation table is truncated to prepare for the next day’s data. The interim table is truncated at the beginning of the process to remove the data we’d been storing for reference.

The other night, I was alerted to an error in the load. There was a primary key violation on this table. I went to see what the key violation was, but the accumulation table had already been truncated. I thought for a moment that perhaps there just was no data and was wondering why I’d gotten a primary key violation error. After all, the truncate came after the insert. If the insert had failed, surely the truncate shouldn’t have happened. I did some testing and found that no, the truncate had indeed happened and I needed to make some changes to prevent this from happening in the future. (Fortunately, the data was easily replicated so I didn’t lose anything)

I’ll be using my Country table for this example. I created three versions of the table for this test: StartCountry, InterimCountry and EndCountry. StartCountry simulates the origin or accumulation table, InterimCountry the interim table and EndCountry the permanent table.

CREATE TABLE StartCountry(
CountryAlpha char(2) NOT NULL,
LongAlpha char(3) NULL,
NumberCode char(3) NULL,
CountryName varchar(45) NULL
)

CREATE TABLE InterimCountry(
CountryAlpha char(2) NOT NULL,
LongAlpha char(3) NULL,
NumberCode char(3) NULL,
CountryName varchar(45) NULL,
PRIMARY KEY CLUSTERED (CountryAlpha ASC)
)

CREATE TABLE EndCountry(
CountryAlpha char(2) NOT NULL,
LongAlpha char(3) NULL,
NumberCode char(3) NULL,
CountryName varchar(45) NULL
)

To simulate the error, you’ll notice there’s only a Primary Key on the Interim table. In the actual database, there’s one on the permanent table as well, I just didn’t bother to add one here.

This is the stored procedure that I had been using

TRUNCATE TABLE InterimCountry

INSERT INTO InterimCountry(CountryAlpha, LongAlpha, NumberCode, CountryName)
SELECT CountryAlpha, LongAlpha, NumberCode, CountryName
FROM StartCountry

TRUNCATE TABLE StartCountry

INSERT INTO EndCountry(CountryAlpha, LongAlpha, NumberCode, CountryName)
SELECT CountryAlpha, LongAlpha, NumberCode, CountryName
FROM InterimCountry

Nice and simple, no input parameters, doesn’t return anything, just moves some data and ends. Unfortunately, even though the error happened in the first INSERT INTO, the second TRUNCATE still ran. I had thought it would stop when the error happened, but it kept going.

I created the primary key conflict by running the following query twice

INSERT INTO StartCountry(CountryAlpha, LongAlpha, NumberCode, CountryName)
SELECT CountryAlpha, LongAlpha, NumberCode, CountryName
FROM Country

I ran the query and sure enough, no data in any of the three tables. I ran the query twice again and prepared my new stored procedure. This one uses TRY…CATCH. All my stored procedures probably should use TRY…CATCH, but old habits are hard to break.

I have two main inserts in my stored procedure, so I’m using two TRY…CATCH blocks, each with its own TRANSACTION block.

TRUNCATE TABLE InterimCountry

BEGIN TRY
      BEGIN TRANSACTION
      INSERT INTO InterimCountry(CountryAlpha, LongAlpha, NumberCode, CountryName)
      SELECT CountryAlpha, LongAlpha, NumberCode, CountryName
      FROM StartCountry

      COMMIT TRANSACTION

END TRY
BEGIN CATCH
      ROLLBACK TRANSACTION
      RETURN
END CATCH

TRUNCATE TABLE StartCountry

BEGIN TRY
      BEGIN TRANSACTION
      INSERT INTO EndCountry(CountryAlpha, LongAlpha, NumberCode, CountryName)
      SELECT CountryAlpha, LongAlpha, NumberCode, CountryName
      FROM InterimCountry
      COMMIT TRANSACTION

END TRY
BEGIN CATCH
      ROLLBACK TRANSACTION
      RETURN
END CATCH

Now this works as intended. I ran the stored procedure with the problem in the primary key and it stopped before truncating the source table. I ran it without the primary key violation and it truncated where it was supposed to and populated where it was supposed to. I was surprised to see my original stored procedure fail in the way that it did, but I’m very happy that SQL Server 2008 has a way to solve that problem without elaborate work-arounds.

MS BOL Using Try…Catch in T-SQL
MS BOL Try…Catch

Posted in 2008, SQL Server | Tagged , , | Leave a comment

Send Text notifications

I have most of my jobs set up to email me either when they fail or when they complete, depending on how sensitive the job is. If I need to know it is taking too long, I set it up to email me on completion. This works fairly well, but some of these jobs run in the middle of the night and I need to be woken up if they fail. An Email is nice, but what would be great is a text to my phone. I checked the Notifications tab on one of the jobs and the options it lists are: Email, Pager and NetSend. The latter two of these are pretty dated and I could have sworn that the last time I looked in BOL, Pager and NetSend were depreciated. I was surprised Text Message isn’t one of the options. Still, all was not lost. There’s a way to send text messages to phones using Email.

I went in to my profile under Operators and under EMail Name I put in my text email address right after my work and home addresses. The texts you get aren’t pretty, but really all they need to do is let you know you need to log in and check what’s going on.

Some of the jobs need to notify me and a colleague. I have a special operator set up to email both of us. I went to add both of our text email addresses to the operator, but I ran into a problem; The EMail Name field for Operators has a character limit of 100 characters. Since I already had 3 email addresses in there, I hit the limit long before I was able to complete a text email address. It took me a few tries to figure out what was going on because it doesn’t notify or warn you that you’re over the limit, it just truncates as it saves and then when used it sends to any valid emails, ignoring the rest.

Since there’s no way to add multiple operators to a notification event, there are only 2 ways to get around this. The first is to set up a group email in Outlook with all the emails you want in it and use that as the email address in the special operator. This can be the easiest, but if you don’t have the authority to create and modify these groups, you have to go through someone else every time you want to make a modification or create a new group which can add unacceptable time to the process.

The other way is more difficult and fiddly. You can send emails by creating an SSIS package that sends the email to as many people as you want (I have some that send emails to 8 people with no problem), but then you have to change your job so that if any step fails, it goes to the email step and if no step fails, it never goes to that step. If you have a job with many steps, this can become tedious. However, it does allow more flexibility in processing the job so that if there are some steps that can fail without needing the whole job to stop, you can still get an email while allowing the job to continue. Unfortunately, that increases the complexity yet again as you need to create a step for every email that could possibly go out.

Since there is no way to tell the email step that if it was called by step 6 it should go to step 7 and if it was called by step 12 is should go to step 13, you need a separate step for each that will indicate where to resume the job. For most jobs, the user group is the proper method, but I do have some that are complex enough when it comes to when it can continue and when it needs to stop that I may end up creating the SSIS email job and adding another dozen steps to allow it to email me, but allow the package to continue.

Posted in Uncategorized | Tagged , , , , , | Leave a comment

A weird result from IN (SELECT

I was recently building a query that referenced a SQL Server 2000 database on a linked server. I’m using SQL Server 2008 R2 on my end and I’m used to Intellisense suggesting the columns in the table I’m querying. Unfortunately, it doesn’t make these suggestions for the linked server in question, but I thought I remembered the columns so I just built the query without checking.

For example I had a local table

CREATE TABLE DailySales(
SaleID int NOT NULL,
SaleDesc varchar(100) NOT NULL,
TenderType int NOT NULL,
Amount smallmoney NOT NULL
)

and I wanted to see if the sales IDs existed in another table on another server

CREATE TABLE SalesLoad(
Sale_ID int NOT NULL,
Sale_Desc varchar(100) NOT NULL,
Load_Date datetime NOT NULL
)

I built my query

SELECT SaleID, SaleDesc, TenderType, Amount
FROM DailySales
WHERE SaleID IN (SELECT SaleID FROM OldServer.DBName.dbo.SalesLoad WHERE Load_Date = ’6/1/2011′)

The table DailySales had 32,450 records. The number of matching records on SalesLoad were 560. How many records did I see returned?

32,450

At first I was very perplexed, the query must have been a valid one, it returned records! Why was it returning every record in the local table?
The problem is that there’s an error in the IN (SELECT. It should have read

SELECT Sale_ID FROM OldServer.DBName.dbo.SalesLoad WHERE Load_Date = ’6/1/2011′

But I’d left out the underscore in the column name. Instead of failing and telling me the query failed, it returned every record in the local table. If there’s no data in the table referenced in the IN clause, no records are returned either way, but if there’s data there, it’ll return everything in the parent table without letting you know anything is wrong with your query. Yet another reason it is important to have an idea of what should be returned from a query, so you don’t just assume whatever is returned is correct because data came back.

If you highlight the query in the IN clause and try to run it by itself, you’ll get an error telling you your column name is wrong. I tried it again later to see if the same would hold true if both tables were on the same server and it does! In fact, it won’t tell you the column is wrong even if you have Intellisense on. It may consider it a valid column because that column is in the other table, but Intellisense won’t flag it as a mistake. I’m not sure if the query returns everything in the main table because it just ignores the IN clause or if it sees the SaleID column name and cross references back to the table in the main query. I should try it on a table with a non-unique column and see if I get results like in a CROSS APPLY. That should let me know which of those scenarios is true.

Posted in Uncategorized | Tagged , , | Leave a comment

Who’s data is it?

Recently I needed to import some data from a flatfile into SQL Server. As I was creating the table for it, I tried to determine what the data actually was. There wasn’t very good documentation for it so I played around with the numbers a little to see how earlier columns were used in calculations to come up with later columns. After I’d gotten most of them, there were still quite a few that just didn’t seem to correlate with anything so I went to one of the people who’d worked with these files and had been with the company a long time.

He managed to identify some of the remaining columns, but the response that was interesting to me was when he said “You shouldn’t bother with the columns after this one, the data is usually wrong.” That’s not the way I tend to work. I don’t think that’s my call to make and since he’s not the business user I don’t really think it is his either.

In this kind of situation, my instinct is to bring all the data into the database and document it the best that I can. His response was that no-one reads documentation. Even if this is true, at least if I document the data then I’ve done all I can for future consumers of the data. If I provide all the data for the business user, then they can make the decision about whether or not to use it. If I don’t provide it they’ll either not know it exists or might later decide they need it and then the import process needs to be changed and the back data (if it has been kept) needs to be imported. If it hasn’t been kept in the original files, then it is most likely lost.

As the person storing the data I have little idea what might be done with the data. I have some idea and can get more of an idea if I’m allowed to discuss the system with the business user, but either way I shouldn’t be the one making the decision about whether or not to save data that is presented to me for storage. If the data is bad, it can be just as easily ignored if it is in the table as if it isn’t. And if the data is demonstrably bad, the business person can use it to confront the data provider and attempt to get the data corrected. If the data is missing, they can’t do that and might be paying for something that they don’t even know I’m filtering out.

We frequently hear advice that “It is your data”, with admonitions to protect it, make sure it is good data, that it is backed up, etc…, but it isn’t really my data, I’m just holding it for the people who own it. All those things about keeping it safe and clean still apply, but in the end someone else needs to make the call about keeping or discarding it and that someone should be a business user with the authority to do so and the willingness to put any requests to discard it in writing. You never know what the next person to inherit the data may want.

Posted in Uncategorized | Tagged , | Leave a comment

Oracle to SQL Server

It can be hard getting back to blogging after some time off, but here goes. I moved in March/April and I’ll be unpacking for a long time, but I do want to be blogging it is just hard to find the time. I figure the best way to get back into it is to start small.

When importing data from an Oracle database to a SQL Server database, there are all kinds of strange conversions you have to make. In part it looks this way because of the syntax and commands you have to use. You have to use an OpenQuery select to access the data and if you want to limit what you’re returning, you end up having to use Dynamic SQL to create a string and then execute that string. As a result, you end up with a LOT of single quotes and it can be hard to determine exactly how many you need on a side of a variable. For example

DECLARE @TestQuery as varchar(2000)
DECLARE @TestDate as varchar(20)

SET @TestDate = CONVERT(VARCHAR(20), GETDATE(), 106)

@TestQuery= ‘INSERT INTO Sales(Store, SaleDate, SaleAmount, DataSource))
SELECT Store, SaleDate, SaleAmount, ”Oracle”
FROM (SELECT Store, SaleDate, SaleAmount
    FROM OPENQUERY(Registers, ”SELECT Store, SaleDate, SaleAmount
FROM RegisterRecord
WHERE SaleDate = ””’ + @TestDate + ”””’))

EXEC (@TestQuery)

As much fun as it can be to try to determine how many quotes you need inside a quoted string so that you’ll still have the right number of sincle quotes within the string, that’s not the part that bit me recently.

We’re upgrading from a SQL Server 2000 database to a SQL Server 2008 R2 database. The code on SQL Server 2000 to get the data from Oracle is almost the same as on SQL Server 2008 R2 with one important difference. The above SQL is on 2000 and it works well. It will also work if you just pass a datetime variable.
I’m not a fan of using formatting or passing text instead of the actual data type, so when I converted this to 2008 R2 I declared it as datetime and tested it.

It didn’t work. I tried it with the original CONVERT, it didn’t work. What finally worked?

DECLARE @TestDate as Date = CAST(GETDATE() as date)

2000 doesn’t have this data type, but Oracle is able to handle the implicit conversion. Formatting it as text also works in this case. When passing it from 2008 R2, it needed to be passed as a date type to parse correctly.

Posted in Uncategorized | Tagged , , , | Leave a comment

Tool Choice

I’ve been wanting to set up a portion of my site as a shared repository for my friends and I to post recipes and talk about cooking. My immediate thought was to use a wiki because wikis are all about shared data. They’re designed to be used by multiple users and to organize information, but as I looked into it I started questioning this assumption. Most of the people who would use this aren’t technical people and manually updating several pages to ensure searchability and categories were properly maintained wasn’t likely. I was surprised to find that none of the wiki software I examined in my research supported tags.

As a result, I think I’m going to attempt to set up a multi-user WordPress blog. WordPress allows both categories and tagging and it can all be done on the entry page. With a wiki, the users would create the entry and then have to go to update the navigation page with links for the ingredients and then sub-pages showing which recipes use each ingredient. For a complicated recipe, that could be a dozen pages. With tags, you just add a dozen tags and it does all the linking for you.

Posted in Design | Tagged , , | Leave a comment

Database Design

One of the things I like best about working with computers is designing systems, whether that’s programs, databases, interfaces or all three and more. Lately I’ve been working primarily on databases and I thought I’d talk a little about database design. At the earliest stages of database design, it doesn’t matter what database system you’re working with, it can be as simple as Access or MySQL or one of the more advanced systems like Oracle or, my current favorite, SQL Server. There are two primary starting points with database design: designing from scratch or redesigning an existing system. Let’s start with designing from scratch.

Ideally, you’ll get to do this step whether you’re designing from scratch or redesigning an existing system, but it isn’t as necessary if you’re redesigning as you can gather much of the information from the existing system if you absolutely have to. Either way, you’ll get a better final product if you can start with meetings with the business people and end users. When designing a database from scratch you’ll want to talk to both of these groups and to any programmers that will end up interfacing with the database you create. Take detailed notes on what they tell you they want in the database, but try to steer the conversation as much as possible to what they are trying to accomplish with the system. Business people and end users have ideas about what they want the system to do, but they will frequently give you much better ideas about what the system needs to do if they’re sticking to their area of expertise and telling you what it is they’re trying to accomplish and how the system will integrate with the business.

This is the time to sound them out on any ideas you may have about the project. Use your area of expertise to flesh out what they’re telling you and to compliment their knowledge about the business. This is also the time to begin to set expectations. If you don’t let them know what can be done, what is reasonable to do and the time frames that are involved in the various scenarios, you’ll end up with a botched project.

When gathering information to design a database, you frequently want to start with the scope of the database. How much of the business will it cover, how frequently will data come in and go out, how many sources will you have for the data and so on. You’ll get much of the next step while pursuing the first, but once the first step is well underway you can concentrate more on the second step: thinking about how the data is grouped and the specific pieces of information to group. This is what will eventually become your tables and columns. You’ll tend to get the information grouped the way the presenters of the information think it should be grouped, but keep in mind that you’ll likely want to rearrange it all when putting the database together. This is also a good time to be gathering the terms they use for their data and what those terms really mean. You won’t always want to use their terms for column or table names as business terms can be overly specific.

When talking to the business people and end users, talk about the future of the data. How do they see it being used as time goes on? What expansion might occur? Being prepared for these developments not only makes it easier on developers in the future, it often lets you better arrange the data now. As you’re having these conversations and taking these notes, you’ll be learning their business. The better you know their business, the better the system you’ll create however, you have to make sure you keep a step or two back so you can look at the data without being locked in to the processes they currently use. When they’re telling you the details about what they want to do with the data and what all the data is, they’ll be looking at it through processes that they’ve been using for a long time. Frequently those processes are dictated by the systems they’re using and that’s exactly what you’re trying to get away from. This is why you want to focus more on what needs to be done than how they currently do it. I’ll talk more later about ways to ease into talking to the people you’re working with about changing the way they do things, but while you’re doing the information gathering, don’t let that bother you. You’re trying to make the best system you can to start with, you can always compromise later if you have to.

After you’ve done all the gathering, you’ve finished the “generic” part of the design, now you have to take your system into account in determining your column types, where you want to store the data and even if you want to store the data. When I talk about that I’ll use the examples of Access and SQL Server.

Posted in Design | Tagged , | Leave a comment

Column Naming

When you’re designing your tables it is important to keep track of the names you’re using for the columns. Don’t just name the ID columns ID. Name them so you’ll know what table the ID belongs to. For example, if you have a music database and you have a table for the Artist, you could name the ID column ArtistID. Try to keep your column names unique throughout the database to minimize confusion about what table you’re referencing. There’s one very important exception to this, however. If your column is a foreign key, keep the name the same as in the origin table. This will prevent future confusion.

Unfortunately, I see databases fairly frequently where the foreign key column has a different name from the name of the origin column. To make matters worse, there’s frequently no explicit foreign key declared, you’re just supposed to know where the link is. For example, in the music database mentioned before, if you have an Artist table and the Primary Key Column is called ArtistID and a table called Album, make the foreign key on the table Album ArtistID. If you lose track of what you’re naming things it is entirely possible to make that column BandID or PerformerID, then when you go back you’ll have little idea what it was intended to reference.

This gets much worse in the business world where the databases are larger and the data more generic. To help keep track of this kind of information, I highly recommend Visio. There are other database diagramming programs out there, but Visio is the one I’ve used most. Having it all laid out in front of you before you begin actually creating the tables can help make sure the design is solid and help you make sure you haven’t made any crazy naming decisions.

Posted in 2008, SQL Server | Tagged , , | Leave a comment

Redundancy

There are certain types of redundancy that are good in the IT world: Backups, failover servers, recovery plans and the like. Most redundancy however, is bad. In the programming world, bad redundancy comes in the form of repeated code instead of reusable functions. Much of this is either leftover code that is ported from upgrade to upgrade, persisted from a time before subroutines were the norm. Sometimes the code is new, but from programmers who started in languages like COBOL that didn’t allow for reusable code. When you come upon such code, you should endeavor to consolidate. When you have multiple instances of code that is supposed to do the same thing, you run into the chance that it won’t do the same thing. The more copies there are, the more chances there are for it to differ or to fail.

The same holds true with data. When you store data in multiple tables and multiple columns, you have a greater chance for it not to match up. One of the last things you want in your database is to join two tables and have the data not match up properly. If for example, you have the employee name in 2 places, say FirstName, LastName in the Employee table and then FullName in the department table and you do a join, what do you do when you end up with
FirstName = Lawrence
LastName = Simmons
FullName = Jane Tauscher

If you run into this, you’d better have a way to find out which is the correct entry. Depending on how and when the data is updated, you can’t just assume the Employee information is the correct data. If nothing else, you’ll have to figure out how the Department record came to be wrong. Whenever possible, hold that data in only one place. If you have a secondary database as a data warehouse for reporting that is only updated from the primary database, you can have it in multiple places to make your reports run faster. If you only have the primary database, find some other way to speed up your reports.

Posted in 2008, Programming, SQL Server | Tagged , , , | Leave a comment