Pentaho BI Suite Installation

The installation of BI Suite is very simple and encompasses actions of unzip files and run programs. Below a list of what must be downloaded is presented:

• JDK (Java Development Kit) 1.7 – Do not use 1.8

• Database (PostgreSQL, MySQL, Oracle, MS SQL Server)

• Pentaho BI Server

• PSW- Pentahp Schema Workbench

• PRD – Pentaho Report Designer

• PDI – Pentaho Data Integration

• PMD - Pentaho Data Mining (WEKA)

It is important to have JAVA_HOME variable specified at operating system.

I- Installing JDK (Java Development Kit)

• At Linux (Unix):

sudo apt-get install sun-java7-bin sun-java7-fonts sun-java7-jre sun-java7-plugin sun-java7-jdk

II- Configuring JAVA_HOME environment variable:

• At Linux (Unix):

export JAVA_HOME=/usr/lib/jvm/java-7-sun-1.7.0_80/

To know whether the variable was exported type

echo $JAVA_HOME

Better than define JAVA_HOME every time, define it once at Linux environment variables. To do so, first verify if this variable is not already defined and if it is not, open terminal and type the following:

echo $JAVA_HOME

Whether appear the correct path of the last Java version installed, excellent, it is configured. On the other hand, if nothing appears, type the following on terminal:

sudo cp /etc/bash.bashrc /etc/bash.bashrc_backup sudo gedit /etc/bash.bashrc

Add the below at the end of the file, remembering of changing /usr/lib/jvm/java-7-sun-1.7.0_80 to the right installation path

JAVA_HOME=/usr/lib/jvm/java-7-sun-1.7.0_80 export JAVA_HOME

At the next login, remember to verify if it is correctly defined:

echo $JAVA_HOME

The result must be similar to the below:

root@caio-laptop:/home/caio# echo $JAVA_HOME
/usr/lib/jvm/java-6-sun-1.6.0.20/
root@caio-laptop:/home/caio#

III- Choosing internal database (Pentaho)

  • Observation: The standard version of Pentaho BI Server is ready to run, so it is not necessary to configure the internal database with BI Server tables, as previous version until 2.0 version. Nevertheless, there is the possibility of changing Hibernate and Pentaho configuration in order to allow working with other databases such as MySQL, Oracle, PostgreSQL.

  • To migrate internal Pentaho database (HSQLDB) to Oracle, MySQL, PostgreSQL or MS SQL Server on Linux or Windows environment we recommend to access:

    http://www.prashantraju.com/projects/pentaho/

IV- Unzip/ Start/ Stop BI Server

After download, extract bi-server-ce.xxxx.zip or .tar.gz file and execute start-pentaho.bat or start.pentaho.sh. Afterwards, access http://localhost:8080 (user: admin and password: password).

V- Changing Pentaho BI Server port (Tomcat)

As default, Pentaho BI Server (Tomcat) uses 8080 port. Whether you want, change port editing the file:

  • Edit server.xml file at \biserver-ce\tomcat\conf\ folder
  • Edit web.xml file at \tomcat\webapps\pentaho\WEB-INF\ folder

VI- Publishing password configuration

If the password is not configured, it will not be possible to publish reports and cubes using tools such as PSW, PRD and PME.

biserver-ce/pentaho-solutions/system/publisher_config.xml

Example:

your_password</publisher-password> </publisher-config>

VII- Important information about Pentaho BI Server administration

• Pentaho’s License

biserver-ce/licenses

• Tomcat path inside Pentaho BI Server

biserver-ce/tomcat

Pentaho BI Server log

 Linux :    biserver-ce/tomcat/logs/catalina.out
 Windows: biserver-ce/tomcat/bin/pentaho.log

STMP Configuration

biserver-ce/pentaho-solutions/system/smtp-email/email_config.xml

• Google’s API Settings:

    biserver-ce/pentaho-solutions/system/google/googlesettings.xml

• Catalogs (Schemas Mondrian):

    biserver-ce/pentaho-solutions/system/olap/datasources.xml

• Pentaho BI Server temporary folders:

The folders below can be deleted* to eliminate cache problems.

    biserver-ce/pentaho-solutions/system/tmp
    biserver-ce/pentaho-solutions/tmp
    biserver-ce/tomcat/temp
    biserver-ce/tomcat/temp_pentaho
    biserver-ce/tomcat/work
  • Only with Pentaho BI Server turned off.

• External Libs*

    /biserver-ce/tomcat/webapps/pentaho/WEB-INF/lib

*Sometimes is necessary to develop your own libs.

Since Pentaho BI Suite is installed, we can start our BI project

Creating ETL processes on Kettle

ETL Processes

The processes of collection, organization, cleansing, application of business rules, formulas, matching of various database sources and data warehousing consume large part of efforts and necessary resources to realize a Business Intelligence project. This step needs to be developed in accordance with decision makers, because businesses questions determine ETL processes.

Defining indicators

In order to illustrate the ETL process, a study case with the development of company’s income analysis indicator is used as an example. The indicator will be the income analysis (quantity of sales and value of sales) with the seller, sale channel, client, product and time.

The majority of companies find the process of matching important data difficulty. Therefore, a model able to be used by companies of different sizes was developed. According to SEBRAE: “Plan sales actions and, through systematic evaluations, amplify what is right and correct errors, is a manner to increase companies’ competitiveness. The adoption of a sales control enables entrepreneur forecasts incomes and, consequently, plan purchases. Also, the monthly behaviour of sales is easier to follow: variations due to seasonality or short-term instalment.” (SEBRAE,2010)

Hence, the defined model to this study case has the following views:

  • Total and quantity of products sold

By:

  • Vendor;

  • Sales channel;

  • Client;

  • Product;

  • Time.

Dimensional modelling

At the figure below, the dimensional modelling developed to answer questions from Support Decision System users is presented.

Figure 5: Income Cube

At this model the star schema was used. At OLAP Cube, defined to analyse company’s income, there is the “sales fact” table, where the quantity of products sold and total of sales are metrics and where vendors, sales channel, client, product and time are dimensions.

Description of Dimensions, facts and fields

Table: fact_sale

Metrics:

quant_products = Quantity of products sold sales_total = Total of Sales

Relations:

fact_sale.id_client -> dim_client.id_client

fact_sale.id_salechannel -> dim_salechannel.id_salechannel

fact_sale.id_vendor -> dim_vendor.id_vendor

fact_sale.id_product -> dim_product.id_product

fact_sale.id_time -> dim_time.id_time

Table: dim_salechannel

Fields description

id_salechannel -> sale channel unique id

cod_salechannel -> code of sale channel at the origin system

desc_salechannel -> description of sale channel

Table: dim_time

Fields description:

id_time -> period unique id

year -> year

trimester-> trimester

month_number-> number of the month

Table: dim_client

Fields description:

id_client -> client's unique id

cod_client -> client’s code at origin system

document -> client’s document

type -> personal or business

desc_client -> client’s description

city -> client’s city

state-> client’s state

country -> client’s country

latitude -> client’s latitude

longitude -> client’s longitude

desc_business -> business’ description

Table: dim_product

Fields decription:

id_product -> unique product id

cod_product -> code of the product at origin system

cod_category -> category description

product_status -> A=Active I= Inactive

Table: dim_vendor

Fields description:

id_vendor -> unique vendor id

cod_vendor -> code of the vendor at the origin system

document -> vendor’s document

birthdate -> vendor’s date of birth

city -> vendor’s city

state -> vendor’s state

country -> vendor’s country

Creating ETL processes on Kettle

PDI –Kettle will be used to perform extract, transform and load routines and create a data warehouse. Below the components to create a traditional dimension and functions are presented.

Components:

1) TIN- Extract data -> using the table input component, through a SQL query performed on the database, the necessary data for dimension are extracted.

2) DIMLKP – Load Dimension -> The next step is the use of Dimension Lookup component. This component verifies if the record already exists and allows the insertion (whether the record does not exist) or the updating (whether the record exists). The DIMLKP allows the creation and load of the dimension, being the responsible for the existing controls in a dimension. The creation process of the fact table involves components such as Table input, Database Lookup and others.

Every ETL process can be created using PDI – Kettle, in a visual way, as previous presented.

OLAP Cube

At the figure below the OLAP Cube for a company’s income (study case) is presented.

Figure 6: OLAP Cube- Company's income

In this cube is possible to analyse the quantity of sold products and total value by product, vendor, client and sale channel. This tool allows a non-technical user creates various business views.

Clicking on the icon below, the user can change the order of dimensions, add and remove metrics and apply filters.

An example of income Cube is presented:

At this Cube three metrics were defined: quantity of sold products, total income and total of sales. In a simple way, user can choose which metrics will be used on each view.

Dashboard with maps

In order to help the manager of the company, a dashboard using Java, JQuery (AJAX), Pentaho and Google Maps was created. At this dashboard it is possible to identify the location of the company’s clients and to know, through colours, the total income of each client, filtering by income, sale channel, period, region and state.

Colours of each icon:

-Red = until 100 thousand of reais

-Blue = between 100 thousand and 300 thousand of reais

-Yellow = between 300 thousand and 500 thousand of reais

-Orange = between 500 thousand and 1 million

-Green = over 1 million

Figure 8: Income Dashboard

Figure 9: Income Dashboard per state

Google Android

According to Google’s site: “Android is an open code software created to cell phones and other devices. The Android Open Source Project, led by Google, is in charge of Android’s maintenance and development.” (Google, 2010) Accessing Business Intelligence system with Google Android The BI portal developed for the company of the example can be accessed from an application that runs on cell phones with Google Android operating system.

Figure 10: Application to access BI portal at Google Android

The necessary application to access the portal is called IT4Biz BI Mobile

Figure 11: IT4biz BI Mobile – Android

Figure 12: BI Portal login screen, accessed from Google Android

Figure 13: Example of a dashboard, visualization from Google Android

Figure 14: Example of another dashboard, visualization from Google Android

Figure 15: Example of OLAP Cube, visualization from Google Android