|
|
Instant Data Warehouse FAQsAs the Instant Data Warehouse is evaluated and used it is expected that a number of frequetly asked questions will evolve. To make it easier for more people to evaluate and to consider the purchase of the Instant Data Warehouse I will compile these questions and answers on an ongoing basis.
Questions
Q001. Does the IDW place any restrictions on table and column names in the warehouse? Q002. Does the IDW require that all source and target column names be identical?
Answers
A001. Does the IDW place any restrictions on table and column names in the warehouse?No. The IDW is intended to perform all processing via views. As such, all table and column names can conform to whatever your internal standard is. The view names must conform to a naming standard documented in the User Guide. There are some naming conventions provided for some columns in some views which allow the IDW to perform it's processing by using these columns as well as assuming relationships between the names of some columns. For example the integer key of a dimension table is expected to be pk_<dimension table name>.
A002. Does the IDW require that all source and target column names be identical?Yes and No. The IDW moves columns from source to target based on the columns having the same name. The process used is that the 'assignment' is performed by looping through the column name of every target column and for each target column searching for the same column name on the input table. Where a match is found the data is moved from source to target. This way, the input table can have many more columns than the target table, and these extra columns are ignored. It is highly recommended that all columns on the target view are found on the input view. Processing an input to a target where non-nullable target fields are not present on the source is not supported. Even where the target fields are nullable, it is highly recommended that these fields are hidden to the view that the IDW uses for inserts updates etc. The field name comparisons are performed as case insensitive string compares. So a field called PK_TIME_DIM is considered to be the same name as pk_time_dim.
A003. Are dimension tables with multiple portions to the primary key supported? For example outriggers?As of 27/01/2003 the answer is No. The first release of the IDW is intended to dramatically reduce the development time to implement a large star schema data warehouse. The initial support is for those portions of the development cycle that consume large amounts of time and which can be coded in such a fashion as to require very little 'metadata'. These are the creation of multi-level dimensions and multi-level summary fact tables. Hence the IDW is aimed at these areas first. It is not the intention of the Instant Data Warehouse to 'be all things to all people' in the ETL space because this market is already crowded. The IDW will be extended to support areas where the ETL tools are weak or 'time consuming by design'. In the first instance this also happens to be in the creation of multi-level dimensions and multi-level summary fact tables. There are more products under development which the ETL community will find very useful and they are all complementary to the leading ETL tools. Having said all this, the vast majority of the work to build the IDW was in the construction of the C++/ODBC/File access classes. With these classes in place, it is quite easy and fast to develop new programs to perform new functions. So, if a number of customers ask for 'outrigger' support I will build 'outrigger' support.
A004. Is the price really $US/EUR5000 per server? What if I want a development license? What if I want it on many servers because I have many data marts?I am very familiar with the concept that ETL via ODBC over a WAN is not a great idea. I am also aware that most of the software vendors price by server (and some by the number of processors in the server). I have not yet published firm prices for a development license. It will be significantly less than the production server license. I have not yet released pricing for volume purchases. A number of evaluators have many servers they want to install the software on. We are working through the pricing that seems most appropriate. For people considering evaluating the product and wondering what those prices might be the current guidance is that the discount curve will be very steep and that after 3 copies you can expect the price to be quite low. So don't rule out the IDW because you have 10 data marts you need to install the software on. You will not be paying anywhere near 10 times the list price. Further, the reseller program has been announced. Every customer has an opportunity to recover their investment in the products by re-selling it to colleagues.
A005. How can IDW be 10 times faster than the informatica wizards for building dimension table processing?When you use an Powercentre wizard to build a dimension table you are building a very limited job. You can only build a single 'level' of data into your dimension table. Also, You cannot get the wizard to generate a concatenated string to store into the dimension table to use as the key to lookup the dimension table on. So, unless you are willing to live with these limitations you can't use the wizard anyway.If you are willing to live with these limitations you need to write a job for every level of every dimension you want to store. You will also likely need to create a separate physical table to store each level of each dimension. (If you use the wizard it will generate a sequence icon so if you want to use one single table to store many levels of information in the dimension table you cannot use the wizard alone because you need to use a shared sequence icon.) Once you have different tables for each level of each dimension for all dimensions you have an expensive maintenance issue. If you add a column, which is very common, you need to change each table and each informatica mapping. You also need to thoroughly test all of these changes because it is very easy to make a mistake when editing so many jobs. In the two IWS projects I have done in Informatica we have found that the wizards were too limited to be able to use them for our dimension table processing so we had to write the dimension table processing for each dimension. As data warehouses are becoming more complicated the number of dimension tables and the number of summaries being built and maintained has dramatically increased. With the Instant Data Warehouse you only specify source to target mapping at the table level. The columns are discovered by the IDW. So there is no coding to load the dimension table. For a type 1 dimension you just have to tell the IDW which columns to join together at each level. Then, when you add a field, there are no code changes at all. This is easily 10 times faster. Indeed, in recent testing of an extremely complex piece of ETL I was able to construct the IDW job in 1 hour and we had taken in excess of 100 hours to write the same piece of ETL in Powercentre. So, I am perfectly happy to claim that in some very specific, very complex cases, the difference in speed is up to 100 times in favour of IDW over informatica.
A006.A007.A008.A009.A010.A011.A012.
|
Send mail to peter@peternolan.net with
questions or comments about this web site.
|