Recently I needed to import some data from a flatfile into SQL Server. As I was creating the table for it, I tried to determine what the data actually was. There wasn’t very good documentation for it so I played around with the numbers a little to see how earlier columns were used in calculations to come up with later columns. After I’d gotten most of them, there were still quite a few that just didn’t seem to correlate with anything so I went to one of the people who’d worked with these files and had been with the company a long time.
He managed to identify some of the remaining columns, but the response that was interesting to me was when he said “You shouldn’t bother with the columns after this one, the data is usually wrong.” That’s not the way I tend to work. I don’t think that’s my call to make and since he’s not the business user I don’t really think it is his either.
In this kind of situation, my instinct is to bring all the data into the database and document it the best that I can. His response was that no-one reads documentation. Even if this is true, at least if I document the data then I’ve done all I can for future consumers of the data. If I provide all the data for the business user, then they can make the decision about whether or not to use it. If I don’t provide it they’ll either not know it exists or might later decide they need it and then the import process needs to be changed and the back data (if it has been kept) needs to be imported. If it hasn’t been kept in the original files, then it is most likely lost.
As the person storing the data I have little idea what might be done with the data. I have some idea and can get more of an idea if I’m allowed to discuss the system with the business user, but either way I shouldn’t be the one making the decision about whether or not to save data that is presented to me for storage. If the data is bad, it can be just as easily ignored if it is in the table as if it isn’t. And if the data is demonstrably bad, the business person can use it to confront the data provider and attempt to get the data corrected. If the data is missing, they can’t do that and might be paying for something that they don’t even know I’m filtering out.
We frequently hear advice that “It is your data”, with admonitions to protect it, make sure it is good data, that it is backed up, etc…, but it isn’t really my data, I’m just holding it for the people who own it. All those things about keeping it safe and clean still apply, but in the end someone else needs to make the call about keeping or discarding it and that someone should be a business user with the authority to do so and the willingness to put any requests to discard it in writing. You never know what the next person to inherit the data may want.