Monday, March 9, 2009

Finding Missing Dates With a Chart

Yesterday I was reviewing a report of sales and shipment data for 2007 and 2008. The data was very detailed with order numbers and quantities and shipdates. I wanted to know if any shipdates were missing (dates for which no shipping occurred) or if there were really shipments for every day in 2007 and 2008. This is how I looked for any missing dates:

I created a straight table chart with SHIPDATE as the dimension. I created two expressions. One expression, labeled Missing Date, looked like this:

If(Not IsNull(Above(SHIPDATE)),If(SHIPDATE-Above(SHIPDATE)<>1,Date(Above(SHIPDATE)+1)))

And the other expression, labeled Missing Date Count, looked like this:

If(Not IsNull(Above(SHIPDATE)),If(SHIPDATE-Above(SHIPDATE)<>1,(SHIPDATE-Above(SHIPDATE))-1))

The Chart Above() function lets the expression look at a value from the row above. Since the chart is sorted by SHIPDATE, the SHIPDATE value in the current row should be one day greater than the row above. If it isn’t then there is at least one missing date. The check on IsNull is required because there is no row above for the first row on the chart.

The chart worked ok. I tried it out on a sample of the data where I had intentionally excluded two dates from loading in the loadscript and it worked fine.
The total line for the Missing Date Count tells me if there are any missing dates. With this method, though, you have to scroll through down through the chart to find the specific missing dates. You can’t sort the chart to bring missing dates to the top because that would interfere with the Above() function in the expression (I disabled the interactive sort capability for this chart to make sure nobody tried changing the sorted order). The other drawback is that a gap of several consecutive missing dates only shows the one missing date that begins the gap (the Missing Date Count though shows a count of how many dates are missing in that gap).

This expression works in a chart with more dimensions too. For example, the dimensions might be ship-to-country and shipdate. Then the chart would show missing shipping dates (or dates where no shipping occurred) by ship-to-country. It only makes sense though to look for missing shipping dates within data where there is an assumption about shipping every day or every week day in order that your assumptions are tested versus the data.

I know that there are methods that could be used in the loadscript to look for missing dates and there are sql queries that can look for missing members of a sequence in the database. But, if anyone knows of a better method for finding missing dates using only QlikView sheet objects I’d be interested to know how.

No comments: