Downloads
Home ] News ] FAQs ] [ Downloads ] Products ] Links ] Beginners ] Services ] Resume ]
Feedback ]
Hello and Welcome to my web site.  Thank you for coming. Best Regards, Peter Nolan.

 

 

 

 

Home
News
FAQs
Downloads
Products
Links
Beginners
Services
Resume
Feedback

Downloads

All the materials I have published as free downloads are available from this page as zip files. The files are usually in PDF, PPS, or their original Office format.  Any information that is placed onto this download page is available for you to use as you wish in your organisation as long as you appropriately reference the material.

The table below provides bookmarks to the headings for each of the free downloads. I hope you find these downloads enjoyable and valuable.

Best Regards

Peter Nolan

 

Link

Description

The Data Warehouse Journey

This presentation is an amusing story I called The Data Warehouse Journey. It is one persons possible story of their Data Warehouse project.  I hope it does not turn out to be yours!!

I think it is entertaining and amusing.  And yes, there is a point to the story.  Please note that it contains a lot of photographs and is just under 3MB to download.  So you might want to get a coffee while it is downloading.  I hope you enjoy the show!!

Instant Data Warehouse Utilities I have packaged up all the utilities I have written around the Instant Data Warehouse into a suite of utilities. You will find them useful in you data warehousing projects.
Customer Dimension Maintenance A sophisticated Customer Dimension Table Maintenance Program for the Insurance Industry. The ideas are applicable to other industries with large customer bases such as telco, banks and retail.
C Code A full suite of C code to maintain fact tables in a large star schema data warehouse. The code was generated by the C Data Warehouse Generator.
Presentations This is the link to all the presentations I have published.  Unfortunately, most of the presentations I have written over the years cannot be published.
Newsletters This is the link to the Newsletters I have published. I will be writing and publishing new Newsletters according to what I think is interesting and demand from readers. If you want some topic covered write to me at peter@peternolan.net and we will see how many people are interested.
Cobol Code Cobol Data Warehouse Code. For those of us who are a bit older.
Intelligent Enterprise Article This is an article I wrote for intelligent Enterprise in 2001.  I have been using the concept of the Management Decision Making Process since late 1993.  I can't claim it as my own, though I have evolved it since then. I was taught that by one of the best salesmen I ever met, Keith Fountain, from Metaphor.

 

 

The Data Warehouse Journey

This presentation is an amusing story I called The Data Warehouse Journey. It is one persons possible story of their Data Warehouse project.  I hope it does not turn out to be yours!!

I think it is entertaining and amusing.  And yes, there is a point to the story.  Please note that it contains a lot of photographs and is just under 3MB to download.  So you might want to get a coffee while it is downloading.  I hope you enjoy the show!!

 

 

The Instant Data Warehouse Utilities

I initially released a utility called the 'Data Transfer Utility'. However, as I developed more utilities around the Instant Data Warehouse I decided to release all the utilities as a 'suite' of utilities.  

If you would like to read about the currently available utilities please click here

If you would like to download the windows setup file for the utilities please click here

Note that the utilities include the Microsoft Visual Studio.NET CLR and hence the file is around 3.7MB.

If you have more utilities you would like to see published on this page please let me know by emailing me on peter@peternolan.net. I plan to build more useful utilities and publish them.  Feedback from data warehousing practitioners as to what exactly they would like to have would be an excellent way to make sure that the utilities I develop have broad application.  

 

Customer Dimension Table Maintenance Program

Why would I take the time and effort to publish a customer dimension table maintenance program? I hear you ask.

Some Ancient History

Read this section if you want to find out where this customer dimension table maintenance program came from.

In 1993 I was asked to assist an IBM customer tune a Metaphor designed database running on DB2/MVS.  I was a pretty useful DB2 performance tuner in those days and the only one in IBM Australia who had actually built a star schema database on DB2.  Hence I got the job of tuning this customer's star schema database.  

The problem was that after the database design and ETL code was delivered someone had decided that all the reports really needed a fact that was only available at the detail level.  The report developers pointed all their reports to the detailed level fact table and with test data volumes everything seemed to work ok.  With production volumes, this was never going to work.  It took a couple of days to figure out what exactly had happened and what needed to be done.  The obvious fix was to move that particular fact up to the summary level fact table, which was one of the recommendations in my report.  What surprised me was that some time later the customer told me that it cost 55 workdays to move that one fact field from the detail table to the summary table. It seemed too much but the customer was adamant that was what it cost.

Not much later one of my Metaphor customers was subject to the dreaded 'review' my McKinseys, the consulting company.  The standard method of operation by McKinseys is to come in and ask for the answers to a couple of hundred 'performance measure' style questions and then charge huge sums of money sitting around waiting for the IT people to produce the results. What happened in this account was that a McKinsey consultant and I sat at a Metaphor workstation for two days. She asked the questions and I wrote the queries against the (very primitive) data warehouse to get all the answers that they were looking for.  The customer was happy. At the rates McKinseys charge I saved them a lot of money.

At the time I couldn't help thinking that if I had been able to hive off some summaries, quickly and easily, rather than run and re-run query after query against the full 1.1 million customer database we would have been able to do the work faster, cheaper, better as well as be able to do more.

After these two experiences I thought it would be really interesting to try and figure out if it was possible to write dimension table maintenance programs that made it really easy to add new levels of summaries into the dimensions and therefore make it really easy to add new levels of summaries into the fact tables.  As luck would have it my Metaphor customer engaged me as an advisor on what was pretty much a research project to develop 'worlds best practice direct marketing' in this major insurance company.   

Obviously, sophisticated demographic analysis, ability to add new summaries to fact tables, new levels to customer dimension, hive off portions for further research; all these things would form part of what would be 'worlds best practice' because we were damn sure no-one else in the world was doing these things.

We had a lot of help from the folks at Metaphor, who designed our data model for us.  From there it took the best part of 18 months elapsed time to bring out our first release of our 'second generation' data warehouse. But it was worth the effort.  This 'second generation data warehouse' was light years in front of our first effort from 1991. We could do analytical work that was just fantastic.  So far advanced of what we had been able to do up until that point in time. And the results of our campaigns showed it. 

One of the many things that came out of the project was a mechanism to build multi-level dimension tables and multi-level summary fact tables. We had solved all the problems that I mentioned above by making all the levels and summaries driven by tables or, at worst, required a cobol copy book change and a recompile.  We did not have to change the definition of a table or change one line of code in the ETL process to generate a new summary level of data in a fact table.  Query performance was fantastic over what we have previously achieved. 

The importance of being able to do this can be measured in the tens of thousands or hundreds of thousands of dollars we would have had to spend to be able to do this sort of analytical processing on a less sophisticated data model.  Or, worse still, we would have kept doing business without this kind of customer demographic analysis because it's 'too hard' and 'too expensive'.     

 

What's Happening Today?

My experience has been that Insurance companies, telcos, banks and retailers have great need for customer attribute analysis for functions as diverse as sales and marketing to fraud detection. 

It is no exaggeration to say that customer attribute analysis in insurance companies and banks provides the single biggest opportunity to improve the profitability of the company over the longer term.  Especially in cross selling and up selling products and services. 

Yet, even today, in 2002, nearly every company I go to has: 

Severe constraints on their ability to perform ad-hoc analysis on customer attributes. 
Limited ability to integrate customer analytics into the standard business processes and run those analytical applications regularly.
Limited ability to build limited lifetime summaries for investigation into correlations between customer attributes and other facts stored in fact tables. 

These limitations are usually driven by one or more of the following: 

Bad customer data.
No single centalised system that records all customers and all products that all customers hold.
Detailed level only customer attributes.
Fixed fields in customer summary dimensions (often referred to as 'mini-dimensions') that take significant IT time and effort to change.
Fixed levels of summaries in summary tables that take significant IT time and effort to change.

 

Unfortunately, it seems many people have still not come to grips with building really good customer dimension tables.  It seems people are still pretty confused about how to do this.  Or worse, people put in very limited solutions which fail to deliver on the promise of data warehousing. 

 

What Am I Doing About It?

I'm proposing that there is a better way to implement customer dimension table maintenance programs. To back that up I am publishing a sophisticated customer dimension table maintenance program for the Insurance Industry.  I am publishing the actual source code, for free, in C, for you to learn from and to use as you wish.  

I know it works because I've been using this style of dimension table maintenance program since 1995!!  The one that I have published is the 'latest' and most evolved version of this program converted to C.  The C conversion has meant there are some changes that you must make to the code for new levels of summaries.  It is not as 'clean' as the cobol version. There is just no way around it.

In the Customer Dimension Maintenance Program for Insurance package you will find:

The source code for the program.
The DDL to create the input and output tables and views.
A small access database that shows you the input and output for the program.
The 'Newsletter #9' again that talks about the need for demographic analysis on large customer bases in insurance so that you don't have to go hunting on my web site for it.

 

Features of the Customer Dimension Maintenance Program for Insurance

The Customer Dimension Maintenance Program for Insurance has the following features:

It has 2 physical dimension tables. 
The first is the detailed level dimension table which contains all the attributes of customer that you want to maintain in a type 2 dimension table. This table can be used in queries. It is intended for use in queries at the customer level only.
The second is a summary level customer dimension table which can be used in queries.

I  will talk about each of the dimension tables separately.  

The detailed level dimension table has the following features.

It is a type 2 dimension. 

It has date_from, date_to and current_flag fields on it.

It has an integer key, a level column and a set of aggregate keys on it.

It sources data from an upstream staging table.

It has lots of useful attributes typical of the life insurance business already defined on it.  Though it is certain that you will want to add/delete some attributes. 

There is a test inside the program to test if any of the fields that will cause a new type 2 record to be written have changed.  You can test all fields, some fields or no fields, as you wish.

If the record has changed but one of the type 2 fields has not changed it will simply replace the detail record.

If one of the aggregate groups that the detail record aggregates into has changed (eg. age group) then the record will be updated or the current record closed and a new one inserted depending on whether a type 2 test field has changed.  In either case, the 'current' record points to the new summary level.

It has a 150 character string used to create the key to the record at a particular level.

You use these 150 characters by concatenating the string representation of any fields you want on the detailed record to form a key for a summary level.  For example, it is possible to string together age, sex, marital status and geocode into one string and use that as a summary level key.  This example is implemented in the code.

At the detail level all fields are updated from the input staging table.

If you want a new level of summary to be available you simply need to make the changes noted in the code and run the full customer file into the program. It will build the summary level combinations that exist within your data for you.  You do not need to build summaries of possible combinations.

If you then want the new summary level to actually be generated in a fact table you simply add a row to the aggregate level control table to select this new level of aggregate in conjunction with aggregates from other dimensions.  This is part of the C Code I have published below.

It performs incremental updates by checking every single field for change. If no field has changed no update takes place. This means you can safely feed this program the entire customer file every night and it will not be bogged down by massive updates against all rows.  It will only update rows that have changed.  

If no actual field has changed, but the row has been affected by the passage of time and a change has happened in an aggregate key then the change will be applied as a type 1 change.

All the summary records are actually stored in the detail table as well.

A summary record in the detail table only has those columns defined as being in the summary level filled in. All other columns are defaulted to 0, blank, low date as appropriate.

When you perform a lookup to get customer keys all keys are retrieved from the one record in the one read making the attribution of transaction fact records as fast as possible.

The summary level dimension table has the following features.

It is a type 1 dimension. There is no need to make the summary a type 2 table.

There are no fields such as date_from, date_to, current_flag on it.

There are no aggregate keys at the back of the record.

For each row, only those fields that are actually valid at that summary level or above are on the row. This is to avoid using a field that should not be used. 

For further protection in using a summary dimension table you might want to apply views over the top to only expose those fields that are useful at each level.  Tools like microstrategy used to actually require this. It may still do.

It is a duplicate of the data in the detail table and so can be rebuilt from the detail table in case of a disaster.

It does not contain the 150 character string key and it does not contain rows at the detailed customer level. 
These two points are very important. It means that this table is much smaller than the detail level table and thus takes much less processing to select rows from when constraints are applied against customer attributes. Further, it returns far fewer rows and keys with which to then search the fact table. Performance of this model is usually at least an order of magnitude better than just using the detail level customer dimension. 

If your database allows it, this table can be placed into memory as a 'hot' table further improving performance. (SQL Server 2000 has this feature, and I really like it.) It is often not possible to load the entire customer record into memory for queries because of the size of the table and the waste of memory in loading things like address lines, phone numbers, email addresses etc.  These fields are almost never used for customer analysis. They are only used when you want to actually contact a subset of customers.

There is extensive documentation inside the code itself. Please feel free to read through the code for further detailed documentation.

 

Summary

Customer attribute analysis is one of the greatest opportunities to improve the long term performance of any large insurance company, bank, telco or retailer. Today, the data warehousing industry, or at least many of the companies I come across, are still struggling to do this well.

I have published this customer dimension maintenance program for insurance to put some knowledge of how to build sophisticated customer dimension tables into the public domain. 

It is my hope is that you will recognise this as a good idea.  If it is better than what you have at the moment perhaps you will move towards more sophisticated customer attribute analysis capability in your company.  The result being we all move the data warehousing industry that little bit further forward. 

If you think you have something better and want to share, by all means do so. I'd be happy to publish other peoples ideas on my web site with appropriate acknowledgement of course. 

 

Finally...

Experience tells me that when you read it you will have one of two reactions:

I don't understand what this thing is doing or why it is a good idea. (Which is the majority by the way.)
or
This is the best piece of dimension table data management I've ever seen.  Let's do it!!

If you are in the category of "don't understand" feel free to write to me on peter@peternolan.net. I'll be keeping frequently asked questions on a page here and other people will be seeking the same answers.

If you are in the category of "Let's do it!!" I wish you all the best. 

I'd like to hear from you so I know this has been worthwhile.

 

 

C Data Warehouse Code

I personally think this suite of C Data Warehouse loading code is the most interesting download on this page.   This code is a complete set of the code that is needed to load a star schema data warehouse using C, a very standard language.  I believe it will be a very valuable learning tool for people new to data warehousing.  

Even if you have been doing data warehousing for a long time you will find this code very interesting to read. The ideas behind the construction of the features of the code will also be of great interest to you. I know that I would have loved to have this code when I was first starting out. 

To help you out I have commented the C code extensively. 

You are welcome to download the entire suite of generated code from here.  I have not published the generator itself as it forms part of what I am selling.  You can read about the generator on the Products page.   

You are welcome to take this suite of code and edit it by hand to implement your own star schemas. You will find that every place you need to make a change in the code is marked with '@@@@' and there is a note on what you have to change.  I'm confident that if you actually start changing this code to build a real data warehouse you will soon be interested in buying the generator or the Instant Data Warehouse.  It is way less boring to generate the code or write no code at all!!

The reason C has been chosen is that it is the standard language of Windows and Unix and finding C programmers is very easy. Virtually all universities require students to learn C and C++ at some stage of their degree. Even I did a lot of C coding when I was at university in 1985!!  

The run time environment for this code is intended to be C and SQL Server 2000. I have not tested the C generator on Oracle however I believe there are only very small changes to make for the code to work on Oracle. I implemented the cobol precursor product on SQL Server, Oracle and DB2 with only very minor differences.

I suggest you read newsletters 6 and 7 prior to downloading this code to read it.  These newsletters talk about the background to why it was developed in the first place. 

Link

Description

Program Source A complete set of the embedded SQL source code that will extract data from staging area tables and process that data into a detail fact table and a summary fact table. There is also an example dimension maintenance program. 
Include Files This is the suite of C include files that are required to compile the code.
File Groups

Tables

Views

Batch Commands

SQL Server 2000 DDL required to create the database.  

I have included:

the database and file group definitions
the table definitions
the view definitions
the batch commands to run these definitions from a dos box on windows 2000.

Note that I only generate the create table/views statements.  This is because setting up a database and the associated file groups is only a very small amount of work for any particular site.  Note that the DDL generator can put the primary index and the table itself into different file groups. 

Note that not all the tables are published. Tables that are used to control the generation of C code and DDL for the database are not published. 

SQL Server Database The SQL Server database copied to MS Access.  I copied the database into Access so it is possible for people to have a look at what is in the tables. This way you can see the data that should be generated or used.
Visual C++ Workspace If you happen to be using Visual C++ Version 5 or above you will want to download the workspace and the projects for each program. The workspace and projects already have all the pre-processor commands in them for all the programs. They expect to be put into a directory called 'source' and they expect to find the header files in a directory at the same level called 'copy'.  

 

PS.  I didn't spell check the comments. The idea of the comments was to write enough comments that you will be able to understand exactly what the code is doing rather than writing documentation about the code separately. So please forgive the typos.

 

 

Presentations

This section contains links to all the presentations I have published.  

Link

Description

The Instant Data Warehouse The Instant Data Warehouse Presentation for Business Managers and Project Sponsors.
The Instant Data Warehouse The Instant Data Warehouse Presentation for data warehouse Project Managers.
The C Data Warehouse Generator The C Data Warehouse Generator presentation.
What is Data Warehousing all About I have often started a customers 'education' with a slide set like the following.  Most importantly, I believe it is important to set the scene by discussing the 'Management Decision Making Process'. Once you understand that a data warehouse is all about supporting the management decision making process a lot of other things start falling into place.  This is not to say that data warehousing technologies cannot be used for other purposes, like pure reporting systems, or archives of operational data with no thought as to what will be done with it.  It is just that if the project is run with the 'Management Decision Making Process' as the filter for what should be in the data warehouse you have a better chance of delivering something useful.
Database Design Methods This is a short presentation on the two major data warehouse database design methods widely used today.  Third normal form is also widely used but you don't need me to teach you about that.
The Marketing Data Warehouse This is a short presentation which argues that marketing analysis is a fine starting point for a data warehouse project. The benefits are high, the costs are lower, and you can get a lot of first hand experience in how to go about data warehousing.
The Data Warehouse Journey

This presentation is an amusing story I called The Data Warehouse Journey. It is one persons possible story of their Data Warehouse project.  I hope it does not turn out to be yours!!

I think it is entertaining and amusing.  And yes, there is a point to the story.  Please note that it contains a lot of photographs and is just under 3MB to download.  So you might want to get a coffee while it is downloading.  I hope you enjoy the show!!

 

 

Newsletters

This section contains some old newsletters I wrote in a previous life. They have been somewhat updated to today's world.  They might provide interesting reading for new starters into data warehousing.  

One thing they have not taken into account is the evolution of the 'Corporate Information Factory' and all the associated other systems that make up an information ecosystem.  I am planning to stay with the discussion on data warehousing and business intelligence systems.  The data warehouse is the heart and soul of the information ecosystem, it is also the hardest and most time consuming piece to build. I personally think that there is plenty of work for me to do in this area over the coming years.

I plan to write some more newsletters as I find interesting topics to write about.  If you can think of an interesting topic that you think should be published here please feel free to let me know.  I am not sure what criteria I will apply to writing newsletters, probably spare time, my interest and my experience.  Also, if you are really keen on getting some newsletter of some topic written it will probably go up my list of things to do if you get some of your friends to ask for the same thing.  Then I will know it is more worthwhile.  

Note that I will add newsletters to the top so the new ones are at the top of the list.

Link

Description

Newsletter #10 Newsletter #10. Managing Multi-Language Business Objects Reports. 
Some time ago I was working with a customer who wanted the be able to manage Business Objects reports in many different languages.  We did quite a bit of research on this and came up with our 'best effort' solution.  I've changed the document to put my own examples in. Perhaps this will be useful for people managing Business Objects universes.
Newsletter #9 Newsletter #9. Demographic Analysis Using Large Stars in Insurance
This newsletter contains detailed discussion on the issues and solutions to the problem of performing demographic analysis in a large insurance company.  The newsletter is specifically tailored to insurance though it could be interpreted to support other industries.  The techniques are published in the Data Warehouse Generator. The Customer Dimension Maintenance Program for Insurance is an implementation of what is discussed in this newsletter.
Newsletter #8 Newsletter #8. Demographic Analysis Using Large Stars in Banking
This newsletter contains detailed discussion on the issues and solutions to the problem of performing demographic analysis in a large bank.  The newsletter is specifically tailored to banking though it could be interpreted to support other industries.  The techniques are published in the Data Warehouse Generator.
Newsletter #7 Newsletter #7. A Data Warehouse Generator - Details
This newsletter contains details of the Data Warehouse Generator. You might find this interesting historical reading if you are looking at the C Data Warehouse Generator.
Newsletter #6 Newsletter #6. A Data Warehouse Generator - Overview
This newsletter contains an overview of the Data Warehouse Generator. You might find this interesting historical reading if you are looking at the C Data Warehouse Generator.
Newsletter #5

The Marketing Data Warehouse presentation

Newsletter #5. Marketing Analysis Systems.
When I first started in data warehousing in 1991 there were really simple goals for data warehousing projects. Make money, as much as possible, as fast as possible, while still being sensitive to the goals of the organisation.  It seemed like a sensible thing to do at the time.  

The place to make money was in 'selling stuff'. And to 'sell stuff' you need to know who your customers are, and to know who your customers are you need a single central place to store customer and product information. We proved the saying "it's 5 times easier to sell something to an existing customer than a new customer" but we often found that it was 10-20 times easier.  We often got 15-30% purchase response rates to direct mail marketing. And that was selling insurance products by mail, if you can believe it.

This paper talks about Marketing Analysis Systems which was what we developed to support the marketing processes.  Many companies now offer 'products' in this area. This paper contains some notes about what to look for.

My  opinion is that over the last few years I have seen far too many data warehouses built that are glorified reporting systems. That makes them very expensive reporting systems. I'd like to see a move back to developing systems that are profit contribution focussed.  

I have included a presentation to go with the newsletter.

Newsletter #4 Newsletter #4. Data Warehousing - The IT Managers Perspective.
Similarly there has been a lot of 'noise' to IT managers as to what a data warehouse is all about.  This paper was my effort at describing what a data warehouse is in such a fashion that an IT manager could understand and make some assessment of what other people were telling them.
Newsletter #3 Newsletter #3. Data Warehousing - The Business Managers Perspective.
One of the things that has proven tough over the years is to articulate to a business manager what a data warehouse is in language and pictures that they understand.  Also, there has been the problem that lots of people are out there telling business managers what a data warehouse is, and it is usually whatever it is that they are selling. That's perfectly reasonable. So this was my effort at describing what a data warehouse is in such a fashion that a business manager could understand and make some assessment of what other people were telling them.
Newsletter #2

Database Design Methods presentation
Newsletter #2. Database Design Methods.
This newsletter contains a brief discussion on data modelling techniques in the data warehousing environment. It was written when I was still getting a lot of 'resistance' to dimensional modelling.

People who have watched my appends to dwlist over the years will know that I personally believe there is a place for an archival data store in a data warehouse. Though most people have agreed star schemas are the way to go, not many people I see are building archival data warehouses. I have included a powerpoint presentation to go with the newsletter.

Newsletter #1 Newsletter #1. Cost Benefit Analysis and Business Benefits Realisation.
This newsletter contains detailed discussion on Data Warehousing Cost Benefit Analysis and Business Benefits Realisation. This newsletter is addressed at all those situation when someone says 'So how do we cost justify building a data warehouse?'  Though far fewer people are asking this question nowadays it has been a popular one over the years.

  

 

Cobol Data Warehouse Code

This section contains links to the Cobol Data Warehouse Code downloads. You can download the entire suite of the generated code from here.  I have not published the generator itself because I am not expecting anyone will actually use a cobol generator nowadays.  I have converted it to C and am selling it as 'The C Data Warehouse Generator'.  The C Data Warehouse Generator is useful as an extension to today's ETL tools, or as a 'starter kit' for a company getting into star schema data warehouse development for the first time.  The version published here is Oracle 7.x.  The published version is a very small and simple star.  However it has been used to build quite complex stars over the years.

I cannot run/test any of this code any more as I do not have all the software with me to perform any further development in Cobol/Oracle.  I left those CDs in Australia as I had no plans to do any cobol/oracle development when I moved to Ireland.

I suggest you read newsletters 6 and 7 prior to downloading this code to read it.  These newsletters talk about the background to why it was developed in the first place and my view that it still has some useful lessons to teach people about building star schemas.

Link

Description

Listings A complete set of cobol compiler listings and the pro cobol source code that was used to generate the listings.
Copy Books There is a suite of cobol copybooks that are required to compile the code.
Oracle DDL Oracle DDL required to create the database.  Note that I only generate the create table/views statements.  This is because setting up table spaces and index spaces is only a very small amount of work for any particular site. Hardly worth generating. 
Oracle Database Oracle Database copied to MS Access.  I copied the Oracle database into Access so it is possible for people to have a look at what is in the tables.

 

 

Home ] News ] FAQs ] [ Downloads ] Products ] Links ] Beginners ] Services ] Resume ]
Feedback ]

Send mail to peter@peternolan.net with questions or comments about this web site.
Copyright © 2002 Peter Nolan
Last modified: February 14, 2003