Monday, September 20, 2010

Random Sample Selection


Here’s an idea that will certainly come in handy some day. You can make a selection of random values from any field in your QlikView document. For example, imagine that you have a document loaded with your company’s employee expense report data for the past five years. And, imagine that you want to select a random twenty percent of the employees. Just click on a listbox of employee names or click on a pulldown arrow of employee names on a multibox. Then, type this expression:
=rand()<=0.20

And you will be looking at a random selection of approximately twenty percent of the company employees. Hit Enter to complete the selection.

This idea can work with any field in the document. You can also use an advanced search dialog box instead of a listbox or multibox. This kind of search expression works within the existing selections – if you had already selected only the part of your company operating in Mexico then your expression will select a random twenty percent of employees in Mexico. If you already have some employee names selected then you might want to clear the selection before using the expression. And, like I wrote in my previous blog posting, you can save an expression like this in a bookmark.
Note: If you are going to be doing this random selection several times on the same document then it may give you the same group of values each time. To make sure you get a different set of values every time you use the expression type this slightly longer expression instead:
=rand()<=0.2+now()*0


(See the comments section below for a suggestion about how to do a random sample selection as the Calculated Dimension in a Chart)

★ ★ ★

7 comments:

Anonymous said...

I have a Straight Table chart that lists rows of data. I want to be able to apply a 20% random selection to this table of data. Do you have a tutorial for me to follow to allow this? Many thanks for your help. Jayne

Anonymous said...

I have a Straight Table chart displaying rows of data and wish to apply a 20% random selection to it. Do you have a tutorial that I could follow? The list box option doesn't seem to be working. Many thanks Jayne

-TB said...

Hi Jayne,
I don't know why the selection expression isn't working for you.
You can use a similar expression as a Calculated Dimension in a chart. For example, if you have a chart where you'd like the dimension to be a 20% random selection of values of the SLS_DATE field then you could use this as the calculated dimension:
=if(rand()<=0.20,SLS_DATE)

Anonymous said...

I believe your post and its comments have saved my bacon! Thank you so much for passing on the knowledge!

Anonymous said...

hi TB
there is a way to do this in script with a "QVD_Optimized" load ?

i use these technique, but none of them are "QVD_Optimized" load:
http://qlikviewapuntes.blogspot.com.ar/2014/01/selecionar-muestra-aleatoria.html

thk!

-TB said...

The qvd optimized load is a way to pour a file of data almost directly into memory. The qvd format is optimized to go into memory very quickly. When you add something like a "where clause" it disrupts that process. So, I suspect the answer is no.

Have you tried a solution with succeeding load something like this:

LOAD * WHERE Rand()<0.5;
LOAD * FROM BIGFILE.QVD (QVD);

Does that preserve the optimized load?

Enmanuel Santana said...

hi TB. thanks for the answer.
I tried your suggestion, but the precedent load is not optimized.

i share the best approach i found:

Temp_1: LOAD Distinct IdField From Big.qvd (qvd);
Temp_2: NoConcatenate LOAD IdField Resident Temp_1 Where Rand()>.9;
DROP Table Temp_1;
LOAD * From Big.qvd (qvd) Where Exists (IdField);
DROP Table Temp_2;



thks!