Wednesday, August 31, 2016

Is My KPI Getting Worse or Getting Better

We have a lot of QlikView documents that measure various kinds of business planning performance. The measures are usually a calculation of the difference between planned values and the actual values. Then the calculation is normalized by dividing it either by the planned or actual values so that the measure can be shown on a document as a percentage.

Key measures or key performance indicators (KPI) like that, if they are well chosen and well designed, give people an idea of how well a business process is performing. I believe that the measure can be even more useful if we add to it an indication of whether the measure has been getting “better” or getting “worse” over time.

Most of the performance measures we use are aggregation calculations that can be used with various dimensions like customer, product, or geographic region. When you show a measure like that in a line graph where the x-axis is time (for example, weeks or months) the line is usually jagged with up and down measures and maybe a “hump” or two. 

The first thing you can do to help someone figure out if the measure is getting better or worse is to use a trendline. In a line graph, go to the chart properties and pick the expression where you’d like to see the trend, find the trendlines section of the expression dialog and click the Linear check box. Now, click OK and you should see that QlikView has drawn a straight line across your jagged line graph. The upward or downward slope of the line can give a person some idea of whether the measure has been getting better or worse.
Note: I usually modify the Presentation tab of the chart properties and set Trendline Width to a narrow line (maybe 0.5) so it doesn’t compete with the main graph line for attention. 

If you stop reading this post at this point, then you’ve already picked up something useful.

Some users of these documents say that they would like the document to point out the areas where a particular measure is getting very bad so they know where to spend their time or resources. We can use a function already built in to QlikView to provide us with a number that would, if it was a trendline in a graph, be the slope of the line. 
If the line is going uphill from left to right, then that would be a positive slope and if it is going downhill then that would be a negative slope. A flat, level line would have a slope of zero. I’m going to describe here how to build a calculated dimension for a chart that can show the 20% of things where a measure has been deteriorating the worst over time.

First, to simplify the description, imagine that you have taken the performance measure expression from your straight table and put it into a variable. We’ll call the variable vKPI.
The time dimension field in the example is PWEEK.
And we are going to use this calculated dimension in a straight table where it will be labeled 20% of Projects Where KPI Has Deteriorated the Most
The linest function is used to calculate the slope of the imaginary graph line. 
The Rank function will help identify the 20% of worst performing PROJECTs. 
Aggr is used to create an array of PROJECT values for our chart dimension. In this example, the higher values indicate a deteriorating measure but you can flip the calculation around if your logic is reversed. 

,1)-1) / Count(distinct total [PROJECT]) < 0.20, [PROJECT]), [PROJECT])

I know this is one of the more complex topics I’ve tackled in this blog but the result in a document that analyzes key performance indicators is valuable. In the documents where I’ve used this concept I usually provide a method for the user to choose what dimension field they would like to use, so it is not limited to PROJECT so the user could choose PRODUCT, SALESPERSON, or REGION, etc. And then for even more flexibility, I put the calculated dimension into a cycle group made up of other calculated dimensions that identify things that need attention.

In my next post, I’ll show how a related calculation could be used to color code items in a chart so that the items where performance has been getting worse are tagged with red and items where performance has been getting better are tagged with green.


Thursday, December 24, 2015

Chart Adjusts to Whatever Fieldnames are Loaded

I’m often called on to validate a new data file from a client or to examine a new data extract. My first step is to load data into a copy of Steve Dark’s Data Profiler (search on quickintelligence data profiler). I’ve made a few modifications to Steve’s original document – here are a couple of the most useful.

I added a line to the loadscript that uses preceding load to add a record number to the rows being loaded. My line is added between the table label and the first line of the regular Load statement. It looks like this:

DATAFILE:  //table label
Load *, recno() as zrecno;  //this is the preceding load line
Load   //this is the beginning of your "regular" Load statement

For the second set of modifications, I added a new sheet or tab which I named Table.
On the new tab, I added a multibox and then I added, not fieldnames, but 20 expressions. You can add any number you like but 20 seemed like plenty. These are mostly copy and paste so they are not a lot of typing and you can start by copying the expression shown in this blog and pasting it into your document. The first multibox expression looks like this:

[$(=only({1<$FieldNo={1}>} $Field))]

That expression evaluates to the fieldname corresponding to field #1. Then I add a second expression to the multibox:

[$(=only({1<$FieldNo={2}>} $Field))]

You can see that in the second expression, I changed $FieldNo={1} to $FieldNo={2} so that the second expression evaluates to the fieldname corresponding to field #2. I did that for each of the 20 expressions so that the 20th expression in the multibox contains $FieldNo={20}

Then, I added a straight table chart. I used zrecno as the dimension. Then, I added 20 expressions. For the first expression, I typed this:

[$(=only({1<$FieldNo={1}>} $Field))]

And for the label, I typed this:

$(=only({1<$FieldNo={1}>} [$Field]))

Again, the rest of the expressions are similar, each time changing the $FieldNo={1} to be 2 for the second expression and its label, $FieldNo={3} for the third expression and its label and so on for each of the 20 expressions. Click OK when you’re done.

Now, you have a multibox and a straight table chart that appear as though you typed in the 20 fieldnames. But, these objects will adjust themselves to whatever fieldnames are loaded. If you load a file with less than 20 fieldnames, the “extra” ones will just be null.

You can copy and paste these objects into other QlikView documents and they will adjust themselves to whatever the first 20 fieldnames are in the document (see important info in first blog comment below). But, I think these are most useful in a document where different fieldnames can be loaded each time.

I checked Dropdown Select for all columns in the Presentation tab to make it easier for the document user to make selections on the chart. 

In order that we don’t burn up all of the memory on the computer, I added a calculation condition to the straight table chart with this expression =if(count(zrecno) < 200000,1,0) so that the chart will refuse to appear if there are more than 200,000 rows. I customized the error message for the calculation condition to say, “There’s more than 200k rows. You have to make some selections to reduce the amount of data in this chart.” A calculation condition like this is a good idea whenever you have an object that might try to show more data than is possible with the available memory.


Tuesday, December 1, 2015

Easy to Add Multivariate Analysis Feature

Here is a technique that I’ve used several times over the past year to quickly add a multivariate analysis feature to a document. We have numerous documents that have a multibox for selections and usually a graph and at least one chart; usually a straight table with a cycle group for the dimension. I usually start with one of these existing documents.
This feature uses QlikView alternate states. I normally steer people away from using alternate states in documents used by business users as it can be confusing. But, in this technique, it seems to be easily used and understood.

The first step, is to add an alternate state to the document. Using the menu, choose Settings->Document Properties, then choose the General tab and click the Alternate States… button (if you don’t see the button then you may be using an older version that doesn’t have the Alternate States functionality). On the Alternate States dialog box that appears, click Add and then simply type a  B  and then click OK. This defines an alternate state in the document named B. It’s not a very descriptive name but it is perfect for this technique.

Now, we’re going to duplicate the multibox. Right-click on the multibox and choose Copy to Clipboard -> Object. Then, from the menu choose Edit->Paste. QlikView will paste the new copy right on top of the old one so you need to drag the copy of the multibox off to the right. This is easiest if the multibox properties Show Caption option is checked so you can click on the title bar at the top of the multibox and drag it. Make the caption of the original multibox say Selections:   and make the caption on the new copy of the multibox say  B Selections:   It may help to make the caption font on the new copy in italics.

Easy, so far? Now, right-click on the new copy multibox and choose Properties. Choose the General tab and look for the Alternate State field at the top. Click the little down arrow on the Alternate State box and click to choose B and then click OK. You now have two multiboxes: one that controls the usual selections and one that controls an entirely different set of selections in alternate state B.

Here's the part that makes it useful. In your charts, you probably have a column something like this:   Sum( Sales )  with the label Sales on it. Add a new column, label it Sales B, and put it right next to the original column. For the expression, type  Sum({B} Sales) 
The syntax is the same as for set analysis. But, this is simpler, just use the {B} inside the argument for any aggregation function and it makes the aggregation use the selections that are defined in your B Selections multibox. No need to change the chart dimensions or change anything in the load script.

Now, you can compare the column for Sales with the Sales B column with the numbers side by side. I assume the fields most useful for the data were already defined in the multibox. So, you can change the B selections to easily compare, for example, June sales next to March sales; or compare tier 1 customers versus tier 2 customers; or profit from the 1kg size versus the 500g size, or inventory turns at the Memphis warehouse versus turns at the combined California warehouses or total cost of the cheesy-beans business versus costs with administrative costs unselected,… etc.

As quickly as you can add a new field to the two multiboxes, you can have a side by side comparison of data for different selections on that field.

I’ve found that graphs that use this technique should use a similar but noticeably different appearance for the two expressions. Using the example from above, a graph of Sales could use a red line in the line graph and use a dotted red line for Sales B.

You can use color in the chart to point out the biggest differences between the regular aggregation and the B aggregation. I’ve sometimes used the Colormix wizard to color the rows with the biggest differences in bright yellow and the smaller differences with less bright shades of yellow (the Colormix wizard is described in the manual and it is well worth your time to become familiar with it).
It would be just as easy to also add C Selections but, so far, I haven’t found a case where it adds value.


Saturday, March 7, 2015

How Many Days Since That Document Reloaded?

I was challenged to write a simple blog posting. Here it is, although it still has some code in it.

A recent project required a document to “know” how many days ago a separate .qvw document had been reloaded. This statement in the loadscript uses the pathname to the separate document and puts the answer to “how many days ago” in the variable vDaysAgo:

LET vDaysAgo = Now() - FileTime('e:\ventures\PriceAnalysis.qvw'); 



Saturday, January 31, 2015

General File Loading Technique

Here’s a technique I first used several years ago when I started a project where we had several dozen data files, all of them containing different data, in a delimited format and each of them with a header record as the first record containing the field names. The loadscript code below shows how we could load each of the files into different QlikView tables using the same loadscript.

The code below has the filename in the code but it could be loaded into the QlikView document by including the filename as a variable in the command line execution of QlikView (for an example, search on "qlikview maven command line and automation").  Other options include building a loop and loading each of files in a folder (search on “qlikview maven loading all of the files from a folder”); or, if you are comfortable with macro module code, you could use a browse-for-file technique to make it easy to find and load a filename (search on “qlikview maven browse-for-file macro button”).

Here’s the loadscript example code:  (some of the long lines are wrapped around as you view the code in the blog window but they will go back to normal if you copy and paste the code into a text file or loadscript edit window) 

Set xfile='customer_data_12.csv'; //file pathname
Let xfile=Upper('$(xfile)'); //optional convert to upper case

FIELDDAY:  //load first record only 
Load * from $(xfile)
(txt, no labels, delimiter is '|') 
where recno()=1;  //note "no labels" 

Set loadstatement='Load '; //now, build the Load statement
For i=1 to 100
Let fn=trim(SubField(peek('@$(i)',0,'FIELDDAY'),';',1));
if '$(fn)'='' then 
  exit for
  end if;
if $(i)=1 then
  Let loadstatement='$(loadstatement) @$(i) as [$(fn)]';  
  Let loadstatement='$(loadstatement), @$(i) as [$(fn)]';
  end if

$(xfile): //QlikView table name
(txt, no labels, header is 1 line, delimiter is '|'); //change delimiter character as needed

Drop table FIELDDAY;

Store $(xfile) into $(xfile).qvd (qvd); //optional save as qvd
//Drop table $(xfile);  //optional drop table

Change the file attributes as needed for your situation: for example, Excel files, skip over unneeded header records, different delimiter, etc.

There are two important ideas illustrated in the example:  first, the idea of loading the header record of a file and building a custom load statement on the spot; second, the idea that you can construct any loadscript statement or part of a statement in a variable and execute it just by putting the variable in the right place in the loadscript.

This technique helped me in a recent project where the data files were supplied with field names in the header record but all of the field names needed a little modification in order to use them. Some of them I converted to upper case and some of them needed to have special characters removed. It was easy to add the necessary text functions to edit the field names as the code assembled the load statement.

It will help you debug and validate this technique if you set the document properties option to write out a loadscript log file when the loadscript runs. The log file will show all of the lines in the loadscript as they were executed with variables fully expanded.


Sunday, January 11, 2015

A Little More Optimized Load

This is a continuation of last week’s blog topic (search on "QlikView Maven Optimized QVD Load With a Where Clause and Dates"). 
I had a load statement in a loadscript loading from a qvd file that (simplifying a bit) looked like this:

ApplyMap('custprimap',CUST_NO) as CUST_PRIORITY,
from invoice_data.qvd (qvd) 
where exists(INV_DATE); 

It was written using a mapping table feature for customer data to avoid a left join load for performance reasons. But, the load from the qvd file wasn’t an optimized load because of the ApplyMap function. The table contained many millions of rows and using an optimized load would save quite a bit of time. 
I was able to get an optimized load and still have the ApplyMap functionality by recoding that portion of the loadscript to also use a Join and a super-fast load from FieldValue like this: 

Load CUST_NO, //optimized load
from invoice_data.qvd (qvd) 
where exists(INV_DATE);

Join (INVOICE_DATA) load //preceding load
ApplyMap('custprimap',CUST_NO) as CUST_PRIORITY; 
Load FieldValue(CUST_NO,IterNo()) as CUST_NO
Autogenerate(1) while not isnull(FieldValue(CUST_NO,IterNo()));

______________ _________________ ________________

Note to Santhosh:  You can add special comments to a chart expression that will make it easier to find specific expressions with the Settings->Expression Overview, Find feature. For example, you might use an expression like this:
sum(CONTRCT_S1 * CNTRY_ADJ) //per NATO procurement 1987.36.c
and then it would be easier to find the expressions that need to be changed when procurement rules change by doing an Expression Overview Find on "NATO procurement".
Our otherwise well-intentioned friends who believe all chart expressions should be contained within variables cannot easily use this suggestion.


Wednesday, January 7, 2015

Optimized QVD Load With a Where Clause and Dates

If you are working with large quantities of data and especially if you are working with large data sets of historical data where you add a portion of new data to the historical data each week (an incremental load), then you will want to be working with qvd files in your loadscript. And if you are working with qvd files then you certainly want to load data from the qvd files with an optimized load. The optimized load is many times faster than loading the same data with an unoptimized load. 

A common problem is that adding a where clause to the load statement will cause the load to be unoptimized. Some types of where clauses though, can be used with an optimized load. A simple where exists type of where clause, for example, can usually be used with an optimized load.

Every incremental load process is a little different but if your process involves dates then this idea may help. This idea is for situations where you want to load data from a qvd file and you want your where clause to refer to a date within the data. Here’s an example:

Imagine that you have a date like 01/01/2014 in a variable named start_dt and another date like 12/31/2015 in a variable named end_dt and you want to load invoice data where the invoice date, inv_date, is between start_dt and end_dt. First, build a little table of all possible dates between start_dt and end_dt. Use this in the loadscript before loading any other data containing a INV_DATE field:

/* Build table of dates just for the "where exists" clause */
Load date(date('$(start_dt)')+IterNo()-1) as INV_DATE 
While date(date('$(start_dt)')+IterNo()-1) <= date('$(end_dt)'); 

That code can build a large table of dates in your loadscript in just a few seconds. Now, you can code the load from qvd like this:

INVOICE_DATA:   //this is an optimized load
Load * from invoice_data.qvd (qvd) 
where exists(INV_DATE);
Drop table TMP_DATE_TABLE; //no longer needed

That table of dates can be varied as needed for your specific application. For example, using (IterNo()-1)*7 could give you a set of week start dates. Use whatever method makes sense to create a list of the dates that you would want to load if they exist within the qvd file data. Remember, the basic idea here is to build a table of values you would like to load from the qvd file. It is not limited to dates and not all of the values in your temp table have to actually match something in the qvd file.

____________ _________________ _________________

Note to Santhosh:  If you need to quickly find the maximum or latest invoice date from the qvd file in order to use it with date arithmetic then you can get it in a variable by doing something like this in your loadscript before loading any other tables with the INV_DATE field:

TMP_INV_DATE:  //this is an optimized load
Load INV_DATE FROM invoice_data.qvd (qvd);
Load Max(INV_DATE) as maxinvdate;  //preceding load
Load FieldValue('INV_DATE',IterNo()) as INV_DATE
While not Isnull(FieldValue('INV_DATE',IterNo()));
Let var_maxinvdate = Peek('maxinvdate',-1,'TMP_MAXINV_DATE');
Drop tables TMP_INV_DATE, TMP_MAXINV_DATE; //not needed