Monday, May 6, 2013

Loading an Unstructured Text File


This is the start of a series of postings about dealing with unstructured text data. I've used QlikView to help analyze message log files and error files and even program source code – there are some things QlikView can do that Notepad style text editors cannot.
First, we need to load the data. It isn’t always a completely unstructured file like my example, often the text is one or more fields inside a structured file or database.  There are various ways to load a text file; this is what I usually put into the loadscript:
TEXTFILE:
LOAD 
RecNo() as recno,
@1 as rectext
FROM
[$(File_Pathname)]
(txt, no labels, delimiter is \x7, no quotes); 

That code will load each record from the text file into the rectext field. I use a variable for the file pathname because I usually use a loadscript like this with a document where the user can supply a filename through an inputbox. Alternatively, you could add browse-for-file functionality like I wrote about in March 2009 (search on QlikView Maven Browse-for-File Macro Button). Because the data is unstructured and the document objects are meant to be used with unstructured data, the document works well as a general-purpose tool that can be used with many types of data in varying situations. 

I defined the file as though it is delimited and made the specified delimiter an uncommon character I don’t expect to find in the data. 

The record number, loaded as recno, from the input file is a useful piece of data and we’ll be using it in some of the example objects over the next few postings.