Pentaho BI Suite Installation
The installation of BI Suite is very simple and encompasses actions of unzip files and run programs. Below a list of what must be downloaded is presented:
• JDK (Java Development Kit) 1.7 – Do not use 1.8
• Database (PostgreSQL, MySQL, Oracle, MS SQL Server)
• Pentaho BI Server
• PSW- Pentahp Schema Workbench
• PRD – Pentaho Report Designer
• PDI – Pentaho Data Integration
• PMD - Pentaho Data Mining (WEKA)
It is important to have JAVA_HOME variable specified at operating system.
I- Installing JDK (Java Development Kit)
• At Linux (Unix):
sudo apt-get install sun-java7-bin sun-java7-fonts sun-java7-jre sun-java7-plugin sun-java7-jdk
II- Configuring JAVA_HOME environment variable:
• At Linux (Unix):
export JAVA_HOME=/usr/lib/jvm/java-7-sun-1.7.0_80/
To know whether the variable was exported type
echo $JAVA_HOME
Better than define JAVA_HOME every time, define it once at Linux environment variables. To do so, first verify if this variable is not already defined and if it is not, open terminal and type the following:
echo $JAVA_HOME
Whether appear the correct path of the last Java version installed, excellent, it is configured. On the other hand, if nothing appears, type the following on terminal:
sudo cp /etc/bash.bashrc /etc/bash.bashrc_backup sudo gedit /etc/bash.bashrc
Add the below at the end of the file, remembering of changing /usr/lib/jvm/java-7-sun-1.7.0_80 to the right installation path
JAVA_HOME=/usr/lib/jvm/java-7-sun-1.7.0_80 export JAVA_HOME
At the next login, remember to verify if it is correctly defined:
echo $JAVA_HOME
The result must be similar to the below:
root@caio-laptop:/home/caio# echo $JAVA_HOME
/usr/lib/jvm/java-6-sun-1.6.0.20/
root@caio-laptop:/home/caio#
III- Choosing internal database (Pentaho)
Observation: The standard version of Pentaho BI Server is ready to run, so it is not necessary to configure the internal database with BI Server tables, as previous version until 2.0 version. Nevertheless, there is the possibility of changing Hibernate and Pentaho configuration in order to allow working with other databases such as MySQL, Oracle, PostgreSQL.
To migrate internal Pentaho database (HSQLDB) to Oracle, MySQL, PostgreSQL or MS SQL Server on Linux or Windows environment we recommend to access:
IV- Unzip/ Start/ Stop BI Server
After download, extract bi-server-ce.xxxx.zip or .tar.gz file and execute start-pentaho.bat or start.pentaho.sh. Afterwards, access http://localhost:8080 (user: admin and password: password).
V- Changing Pentaho BI Server port (Tomcat)
As default, Pentaho BI Server (Tomcat) uses 8080 port. Whether you want, change port editing the file:
- Edit server.xml file at \biserver-ce\tomcat\conf\ folder
- Edit web.xml file at \tomcat\webapps\pentaho\WEB-INF\ folder
VI- Publishing password configuration
If the password is not configured, it will not be possible to publish reports and cubes using tools such as PSW, PRD and PME.
biserver-ce/pentaho-solutions/system/publisher_config.xml
Example:
VII- Important information about Pentaho BI Server administration
• Pentaho’s License
biserver-ce/licenses
• Tomcat path inside Pentaho BI Server
biserver-ce/tomcat
Pentaho BI Server log
Linux : biserver-ce/tomcat/logs/catalina.out
Windows: biserver-ce/tomcat/bin/pentaho.log
STMP Configuration
biserver-ce/pentaho-solutions/system/smtp-email/email_config.xml
• Google’s API Settings:
biserver-ce/pentaho-solutions/system/google/googlesettings.xml
• Catalogs (Schemas Mondrian):
biserver-ce/pentaho-solutions/system/olap/datasources.xml
• Pentaho BI Server temporary folders:
The folders below can be deleted* to eliminate cache problems.
biserver-ce/pentaho-solutions/system/tmp
biserver-ce/pentaho-solutions/tmp
biserver-ce/tomcat/temp
biserver-ce/tomcat/temp_pentaho
biserver-ce/tomcat/work
- Only with Pentaho BI Server turned off.
• External Libs*
/biserver-ce/tomcat/webapps/pentaho/WEB-INF/lib
*Sometimes is necessary to develop your own libs.
Since Pentaho BI Suite is installed, we can start our BI project
Creating ETL processes on Kettle
ETL Processes
The processes of collection, organization, cleansing, application of business rules, formulas, matching of various database sources and data warehousing consume large part of efforts and necessary resources to realize a Business Intelligence project. This step needs to be developed in accordance with decision makers, because businesses questions determine ETL processes.
Defining indicators
In order to illustrate the ETL process, a study case with the development of company’s income analysis indicator is used as an example. The indicator will be the income analysis (quantity of sales and value of sales) with the seller, sale channel, client, product and time.
The majority of companies find the process of matching important data difficulty. Therefore, a model able to be used by companies of different sizes was developed. According to SEBRAE: “Plan sales actions and, through systematic evaluations, amplify what is right and correct errors, is a manner to increase companies’ competitiveness. The adoption of a sales control enables entrepreneur forecasts incomes and, consequently, plan purchases. Also, the monthly behaviour of sales is easier to follow: variations due to seasonality or short-term instalment.” (SEBRAE,2010)
Hence, the defined model to this study case has the following views:
- Total and quantity of products sold
By:
Vendor;
Sales channel;
Client;
Product;
Time.
Dimensional modelling
At the figure below, the dimensional modelling developed to answer questions from Support Decision System users is presented.
Figure 5: Income Cube
At this model the star schema was used. At OLAP Cube, defined to analyse company’s income, there is the “sales fact” table, where the quantity of products sold and total of sales are metrics and where vendors, sales channel, client, product and time are dimensions.
Description of Dimensions, facts and fields
Table: fact_sale
Metrics:
quant_products = Quantity of products sold sales_total = Total of Sales
Relations:
fact_sale.id_client -> dim_client.id_client
fact_sale.id_salechannel -> dim_salechannel.id_salechannel
fact_sale.id_vendor -> dim_vendor.id_vendor
fact_sale.id_product -> dim_product.id_product
fact_sale.id_time -> dim_time.id_time
Table: dim_salechannel
Fields description
id_salechannel -> sale channel unique id
cod_salechannel -> code of sale channel at the origin system
desc_salechannel -> description of sale channel
Table: dim_time
Fields description:
id_time -> period unique id
year -> year
trimester-> trimester
month_number-> number of the month
Table: dim_client
Fields description:
id_client -> client's unique id
cod_client -> client’s code at origin system
document -> client’s document
type -> personal or business
desc_client -> client’s description
city -> client’s city
state-> client’s state
country -> client’s country
latitude -> client’s latitude
longitude -> client’s longitude
desc_business -> business’ description
Table: dim_product
Fields decription:
id_product -> unique product id
cod_product -> code of the product at origin system
cod_category -> category description
product_status -> A=Active I= Inactive
Table: dim_vendor
Fields description:
id_vendor -> unique vendor id
cod_vendor -> code of the vendor at the origin system
document -> vendor’s document
birthdate -> vendor’s date of birth
city -> vendor’s city
state -> vendor’s state
country -> vendor’s country
Creating ETL processes on Kettle
PDI –Kettle will be used to perform extract, transform and load routines and create a data warehouse. Below the components to create a traditional dimension and functions are presented.
Components:
1) TIN- Extract data -> using the table input component, through a SQL query performed on the database, the necessary data for dimension are extracted.
2) DIMLKP – Load Dimension -> The next step is the use of Dimension Lookup component. This component verifies if the record already exists and allows the insertion (whether the record does not exist) or the updating (whether the record exists). The DIMLKP allows the creation and load of the dimension, being the responsible for the existing controls in a dimension. The creation process of the fact table involves components such as Table input, Database Lookup and others.
Every ETL process can be created using PDI – Kettle, in a visual way, as previous presented.
OLAP Cube
At the figure below the OLAP Cube for a company’s income (study case) is presented.
Figure 6: OLAP Cube- Company's income
In this cube is possible to analyse the quantity of sold products and total value by product, vendor, client and sale channel. This tool allows a non-technical user creates various business views.
Clicking on the icon below, the user can change the order of dimensions, add and remove metrics and apply filters.
An example of income Cube is presented:
At this Cube three metrics were defined: quantity of sold products, total income and total of sales. In a simple way, user can choose which metrics will be used on each view.
Dashboard with maps
In order to help the manager of the company, a dashboard using Java, JQuery (AJAX), Pentaho and Google Maps was created. At this dashboard it is possible to identify the location of the company’s clients and to know, through colours, the total income of each client, filtering by income, sale channel, period, region and state.
Colours of each icon:
-Red = until 100 thousand of reais
-Blue = between 100 thousand and 300 thousand of reais
-Yellow = between 300 thousand and 500 thousand of reais
-Orange = between 500 thousand and 1 million
-Green = over 1 million
Figure 8: Income Dashboard
Figure 9: Income Dashboard per state
Google Android
According to Google’s site: “Android is an open code software created to cell phones and other devices. The Android Open Source Project, led by Google, is in charge of Android’s maintenance and development.” (Google, 2010) Accessing Business Intelligence system with Google Android The BI portal developed for the company of the example can be accessed from an application that runs on cell phones with Google Android operating system.
Figure 10: Application to access BI portal at Google Android
The necessary application to access the portal is called IT4Biz BI Mobile
Figure 11: IT4biz BI Mobile – Android
Figure 12: BI Portal login screen, accessed from Google Android
Figure 13: Example of a dashboard, visualization from Google Android
Figure 14: Example of another dashboard, visualization from Google Android
Figure 15: Example of OLAP Cube, visualization from Google Android