Kettle Tip: Using java locales for a Date Dimension

by Roland Bouman

The Date dimension is a well known construct in general data warehousing. In many cases, the data for a date dimension is generated using a database stored procedure or shell-script.

Another approach to obtain the data for a date dimension is to generate it using an ETL tool like Pentaho Data Integration, a.k.a. Kettle. I think this approach makes sense for a number of reasons:

  • When you tend to use a particular ETL tool, you will be able to reuse the date dimension generator over an over, and on different database platforms.

  • You won't need special database privileges beyond the ones you need already. Privileges for creating tables and to perform DML will usually be available, whereas you might need to convince a DBA that you require extra privileges to create and execute stored procedures.

In addition to these general considerations, you can pull a neat little trick with Kettle to localize the data and format of the date attributes. I wouldn't go as far as to say that this feature is Kettle specific: rather, it relies on the localization support built into the java platform and the way you can put that to use in Kettle transformations.

11 Comments

Arunn
2008-02-23 01:53:31
Hi,
I am new to this pentaho tool and tried some of basic transformation.. in my transformation i hve created the date dimension table but i want to extract date from the date dimension table based on the given date range... could u plz repy me as soon as possible...


Thanks & Regards
Arunn

Roland Bouman
2008-02-23 03:09:38
Hi Arunn,


Take a look at the penthao forums at http://forums.pentaho.org/ or on the freenode ##pentaho IRC channel. There are many people there that can offer help.


I'm not sure I understand what you mean by "extract date from the date dimension table" but with in Kettle, you can use a "Table Input Step", and there you can enter an arbitrary SELECT statement. You can select a particular range there by setting up an appropriate WHERE condition.


Arunn
2008-02-24 22:43:23
Hi Roland,
Thk u for ur help...but my problem is different,actually am having 2 table input(source table) the 1st table contains start_date & end_date and 2nd table contains date(dimension table).During the tranformation i want to extract the date based on the range(start_date & end_date) and load it on to the target table.


for ex.


1st table
project_id start_date end_date
110 01/01/2008 03/01/2008



2nd table
pk_date_id date
.
.
50 01/01/2008
51 02/01/2008
52 03/01/2008
.
.


target table


project_id start_date end_date expense_date
110 01/01/2008 03/01/2008 01/01/2008
110 01/01/2008 03/01/2008 02/01/2008
110 01/01/2008 03/01/2008 03/01/2008


each row from 1st table is compare to the 2nd table and generate the no. of rows based on the range...


could u plz explain me in a brief manner as soon as possible...



Thanks
Arunn

Roland Bouman
2008-02-25 00:15:40
Hi Arunn,


Please understand that the pentaho forums should be the first place to look for help. As is always the case with voluntary help offered through community channel, you should be patient and not stess too often that you need help urgently. If you really need help "...as soon as possible..." then maybe you should look into getting a pentaho subscription so you can call for support whenever you need it. Thank you for understanding.


Now, as for your problem, there are at least two ways to solve this, but most probably more than that. I bet you can find out for yourself too if you take some time to read the kettle documentation. Anyway,


1) you can simply write the appropriate SQL to join your tables in your table input step.
2) you can use a kettle database join step


kind regards, i hope this helps.


Roland

Arunn
2008-02-26 04:29:04
Hi Roland,
sorry for the urgent expectation... i was tried all the object(sql script,database join etc..) to filter the rows based on range but it was not worked fine after tht i hve created the cartesian product with condition it's work fine.. thk for ur reply....


Thanks
Arunn

Arunn
2008-02-26 22:28:57
Hi Roland,
I hve created 3 tranformation in kettle but i wnt to schedule the transformation in particular interval.i was tried it by using job, it execute the trans. but data is not available...
i want to execute the transformation in particular interval.


Thanks
Arunn

Roland Bouman
2008-02-26 23:47:59
So what's wrong with cron or at?
Arunn
2008-02-27 02:31:07
Hi Roland,
i was tried by using at command even though it was not executed properly...


Ex.
PMA.bat, my batch file contains 2 transformations


pan.bat /rep:"My_Sample_Repository" /trans:"PMA_Project_Transformation" /dir:/ /user:admin /pass:admin /level:Basic


pan.bat /rep:"My_Sample_Repository" /trans:"PMA_Expenditure_Transformation" /dir:/ /user:admin /pass:admin /level:Basic


i want to execute this 2 transformation one after another..
In command prompt i schedule the batch file by using "at" command
ex. at 15:30 "PMA.bat"

Roland Bouman
2008-02-27 02:49:49
Well, I'm sorry but I guess you're going to have to troubleshoot your problem then.


Please allow me to remind you of the pentaho forums (forums.pentaho.org) and the ##pentaho irc channel on freenode. But are an excellent source of community support.


kind regards,


Roland Bouman

sumit
2008-06-10 03:53:55
well in case of jobs created for transformation
does the connection to MySQL is created at the sheduled execution time of the transformation or is created and stored to be used again and again for the job
Roland Bouman
2008-06-10 04:57:34
Hi!


Your scheduler invokes Kitchen, which loads all transformations and jobs, and instantiates objects such as database objects. Connections are closed again whenever the tranformation/job finishes.


Hope this thelps.