Friday, November 18, 2011

Select Top 4 Values On Open






I was working on a document that contained a lot of data and the performance when the document was opened or when the user moved from tab to tab was a little slow. On my own laptop when I save documents like that I usually make a selection of a small amount of data so that later when I open the document again it will open quickly. I wanted to do something like that for the users -- some kind of selection that would happen automatically when the document is opened, but it had to be a selection of data that would be useful and easy for the users to understand.



The data in the document was organized by a date field named POSTDATE so I decided that I would like the document to automatically select the most recent four POSTDATE values when it was opened. Most users look in the document for the most recent data and anyone who wanted to review older data could easily add older dates to the selection through the multibox.




The method I used is to create a trigger that would fire when the document is opened. The trigger would do a Pareto Select. Now, I'm not saying that this is the best way to accomplish the selection and it certainly is not the only way, but it was easy and didn't involve writing macro code.




Pareto Select is used to select a percentage of values in a field based on ranking of a value or expression. It is commonly used select things like the top 20% best selling products. In this case I wanted to use it to select not a fixed percentage but the four most recent POSTDATE date values. This is what I did:



From the menu choose Settings->Document Properties and choose the Triggers tab.
In the document event triggers click the Add Action button for the OnOpen event.
Click Add Action and in the Add Action dialog choose Action Type=Selection and Action=Pareto Select then click OK.
In the Actions dialog type POSTDATE into both the Field and Selection boxes. POSTDATE is not only the field I want to select but also the expression value I want ranked.
Then, in the Percentage box enter =(4/COUNT(DISTINCT POSTDATE))*100
And click OK. Save the document and the next time it is opened it will automatically select the four most recent POSTDATE values.




A similar method could be used to do automatic selection of any number of top values of any field. Pareto Select does its selection based on the current Possible values. If you wanted to ignore the current selection and select a percentage of all values then you should add an extra action to the trigger to clear the field before making the Pareto Selection.



Testing trigger actions is most easily done by defining the same actions for a button and then you can test the action by clicking the button. When you're done testing, remove the button and make sure the actions are associated with the proper event.

= = = = = = = = = = = = = = = = = = = = = = = = = =
Author's Note 6Dec2011:
This method didn't work the way I wanted when the document already included selections on the POSTDATE field. I couldn't find any good way to deal with that using the available trigger actions. I eventually replaced the Pareto Select action with a macro action that selects the four most recent POSTDATE values. The macro code looks like this:
sub Top_4_POSTDATE
set fd=ActiveDocument.GetField("POSTDATE")
fd.Clear
fd.TopSelect "=if(aggr({1} rank(POSTDATE),POSTDATE)<=4,POSTDATE)",4
set fd=nothing
end sub

* * *

Saturday, November 12, 2011

Replace IF in Expression with Set Analysis





We often come across chart expressions that look like this:



=sum(if(SLSTYPE='DOMESTIC',SLSTOT))




It is easy to understand and if your document is a normal size and working well then leave the expression alone. But, if the document is large and you are looking for ways to speed up chart performance then consider changing the expression by replacing the "if" in the expression with the much faster set analysis syntax. Remember that set analysis is related to the concept of QlikView selections. Set analysis syntax works by specifying temporary selections within the function. So, if we wanted to consider set analysis for our example expression we should imagine whether the expression would work ok if we removed the "if" from the expression and, instead, made a selection on the SLSTYPE field to select the value DOMESTIC. If the expression would work ok in that situation then we can change the expression to use set analysis like this:



=sum({$<SLSTYPE ={'DOMESTIC'}>} SLSTOT)




We couldn't do that if the SLSTYPE field wasn't associated with SLSTOT because selecting the value DOMESTIC would have no effect on the expression. For example, if SLSTYPE had been defined in the loadscript as a Data Island unrelated to any other data then the expression wouldn't work correctly not matter how you coded the set analysis syntax within the sum function.




Another important consideration is whether or not the report is designed for users to make their own selections on the SLSTYPE field. If users are making SLSTYPE selections then they may not like an expression that overrides their selection within the expression. If the document allows user selection on the SLSTYPE field then we should define a separate field in the loadscript that will contain the exact same values as SLSTYPE and be part of the same QlikView table and same rows. Maybe call it SLSTYPE_SA and then we can code the set analysis part of our expression using the SLSTYPE_SA field and the expression will work correctly while still respecting all other user selections.




* * *

Friday, November 4, 2011

Increasing Likelyhood of Duplicate Values While Loading Data





I was working on a QlikView document last week that loaded several kinds of sales forecast data and also actual sales for a chemical manufacturer. The forecast data came from a statistical forecasting application. The charts on the document were simple and showed the forecast data rounded to integer numbers of tons.


The document contained a lot of data and the performance was a little slow when making selections so I looked for ways to improve performance. Examining the forecast data, I saw that although the quantities in the database were expressed in tons almost every quantity included a decimal fraction. Now, QlikView is very good at compressing data and especially good at compressing data with duplicate values since it stores each unique value for a field only once. This forecast data did not contain a lot of duplicates because of the decimal fractions. I saw that if I rounded the values as they were loaded then values like 1.001, 1.00032, 1.2, etc. would all be loaded as 1. After changing the loadscript to round values to integers as the data was loaded the report file size on disk was reduced by 40% and had a corresponding reduction in memory required. The charts on the report were not affected all.


This technique would not be useful in all situations. It would cause a lot of trouble if you rounded data values on some accounting documents! But, sometimes, altering data slightly as it is loaded to increase the likelyhood of duplicate values can save memory for a QlikView document. Besides rounding (not necessarily to integers, you can round to a specific number of decimal places), you can also increase the likelyhood of duplicate values by:



  • Truncating text data - keeping only the first ten characters, for example.



  • Trimming text data to remove any leading or trailing blanks.



  • Converting to all capital letters.


  • Converting non-abbreviated words to standard abbreviations, like state abbreviations in address data.



  • Removing the time stamp portion from date data



  • Converting date data to beginning-of-week or beginning of month dates.



  • Converting date data to month abbreviations if that is the level of data used in the document charts.



* * *