The Power of Pentaho

1. History

Pentaho was founded in 2004 by a veteran Business Intelligence team composed of professionals from companies such as Business Objects, Cognos, Hyperion, JBoss, Oracle, Red Hat and SAS. The Pentaho Corporation is an American company and became reference in Open Source Business Intelligence.

Headquarted in Orlando, Florida, Pentaho has contributors from all over the world and is the main developer of Business Intelligence tools gathered in it Suite.

Moreover, the company has investments from Enterprise Associates, Sugar CRM, Xensource, Index Ventures and investors such as MySQL, Zend, etc.

Pentaho’s business model is the subscription model. It is a way to market the software with no license cost, where the producer, e.g. Pentaho, provides support, services and improvements for an annual fee. Nevertheless, client still can use the community version where there are not annual fees or support, since it is now offered by Pentaho’s community.

Therefore, Pentaho coordinates and sponsors the most important open source projects related to its Business Intelligence Suite, such as Mondrian, Kettle, WEKA and others. At the beginning, the important projects of Pentaho were developed by people from all over and were not related with the corporation itself. Nonetheless, in 2004, date of foundation, these projects were incorporated, as well as their main developers.

For example, Kettle, developed by Matt Casters, was incorporated in 2006 and became PDI (Pentaho Data Integration). In this context, Matt Casters, the most important developer of the tool, started working with Pentaho. Another important example is the Mondrian project, developed by Julian Hyde, both were incorporated in 2005.

Besides Pentaho Data Integration (Kettle) and the OLAP Server (Mondrian), a report solution was necessary. Hence, Thomas Morgner became part of Pentaho with his open source project called JfreeReporting and currently he is the project’s main leader, renamed as Pentaho Reporting.

Finally, in 2006, WEKA was another important acquisition in addition with its developer – Sir Mark Hall, who became the Senior Engineer of Pentaho’s Data Mining and popularised WEKA as Pentaho’s Data Mining.

There are more important people in Pentaho’s history: Marc Batchelor (Founder and Chief Engineer), Richard Daley (founder and president), James Dixon (Founder and Chief Geek), Doug Moran (Founder and Community’s vice-president) and Will Gorman (vice-president).

Pentaho’s main characteristics

• 100% Java Project;

• Multi-platform BI Server and development tools;

• Scalable;

• Open Source;

• Developed with open standards and tools available in the market;

• License free of charge (Pentaho Community Version);

• Possibility of customization and integration;

• Supplier independent;

• Developer’s support through Subscription model (paid version);

• Vast community of developers in Brazil and around the world;

• Web access to business indicators, independent of operating system (Windows, Linux, Mac, iOS, Android, etc)

Pentaho BI Suite Componentes and the context of a BI project

The development of a Business Intelligence project can be exemplified through the picture below:

Figure 2: Overview of Pentaho's Business Intelligence project.

It is relevant to notice that in a BI project, it is firstly necessary to identify where the data we need are storage: data sheets, text files, company's systems or any other data source. Once all the data are available, based on our experience and knowledge about the business and the tools, we start to extract these data, transform and load it on our Data Warehouse (ETL process). The Data Warehouse contains the most important data for the company. In this context, here we find the relevant information for our BI project.

Afterwards, it is necessary to aggregate value on the data, data become useful since it is processed on a BI Server (in our case Pentaho) and we can face OLAP Cubes, PDF, Excel, HTML reports and dashboards available in various ways: from a desktop (Windows, Linux and Mac), personal computers, cell phones (iOS, Android, Windows) with internet access and others. These effectively present important characteristics to decision-making process.

In this context, it is important to understand the core of a BI project creation process. Pentaho BI Server is inside this core.

Pentaho BI Server

Pentaho BI Server is an application able to manage indicators, sharing them among users, access control, etc. Pentaho BI Server runs in many J2EE webservers, such as: Tomcat, JBOSS, Glassfish. The current version of Pentaho BI Server is 5.4 and Tomcat is the standard webserver.

Figure 3: Pentaho BI Server
Source: Pentaho

Figure 4: Pentaho BI Server architecture Source: Pentaho

More than understand Pentaho BI Server, some of the applications that form BI Server must be presented:

  • Apache Tomcat

    Tomcat is a web Java server, a servlet container. Tomcat is JEE application server, but it is not an EJB server. Developed by Apache Software Foundation, it is distributed as free software inside Apache Jakarta project, being officially endorsed by Sun as a reference in Java Servlet and JavaServer Pages (JSP) technologies. It covers part of J2EE specification with servlet and JSP technologies, and support technologies such as Realms, security, JNDI Resources and JDBC DataSources. It has the capacity of acting as web server or integrated to a dedicated web server such as Apache or IIS. As web server, it provides a HTTP web server, purely in Java. The server includes tools to manage and configure, this can also be done manually editing configuration files in XML format.

    Source: http://pt.wikipedia.org/wiki/Tomcat

  • Jetty

    Jetty is an HTTP server and Servlet Container written in Java. It is Tomcat’s competitor, famous to have been used as JBoss servlet container. Jetty’s big advantage is it easy configuration. It is also pioneer in asynchrony use of I/O to handle a bigger load of simultaneous users without the use of thread-per-connection. It is a Java server used in big Brazilians websites, as GUJ forum.

    Source: http://pt.wikipedia.org/wiki/Jetty

  • Spring Security

    Spring Security is an application to authentication control.

  • Hibernate

    Hibernate is a framework to map relational object written with Java language, but also available at .Net as the called NHibernate. This program facilitates attributes’ mapping between a traditional relational database and the application object model, with XML files to establish this relation. Hibernate is a free open source software, distributed with LGPL license.

    Source: Fonte: http://pt.wikipedia.org/wiki/Hibernate

  • Quartz

    Quartz is a scheduler that can be integrated to Java EE or SE application.

  • Hyper SQL (HSQLDB)

    HSQLDB (Hyperthreaded Structured Query Language Database ) is a database server, open source, written in Java. Considering robustness and security, it is not possible to compare HSQLDB to other database servers, as Oracle or Microsoft SQL Server, nevertheless HSQLDB is a simple solution, using few resources and having a good performance. Due to these characteristics, it is used on Desktop applications, which interacts with a persistence layer through SQL language. The OpenOffice suite, 2.0 includes HSQLDB as data storage engine.

    Source: http://pt.wikipedia.org/wiki/HSQLDB

Inside Pentaho BI Suite, it is possible to know:

PRD (Pentaho Report Designer)

PRD, acronym for Pentaho Reporting Designer, is a tool to create reports. The current version, 5.4, presents new functionalities. It is possible to create reports with filters, graphics and sub-reports in PDF, Excel or HTML and publish them directly on Pentaho BI Server. There is no need of programming. This tool is completely visual and allows the creation of extremely professional reports. It is possible the creation of “pixel perfect” report, where each pixel can be organized.

PDI (Pentaho Data Integration)

PDI is one of the most important tools of Pentaho BI Suite and it is responsible for ETL processes (Extract, Transform, Load). In a visual way, the tool allows the ETL developers to connect to various databases, extract data, copy, transform, match, delete, refresh, send to different locations, create jobs, mail, access a web server via ssh, ftp, run error handling and many other tasks for ETL processes. Using PDI it is possible integrate companies in a visual and organised way.

PSW (Pentaho Schema Workbench)

Pentaho Schema Workbench is an important tool to create OLAP Cube in a XML format. The creation process can be done through a text editor or, in a visual way, through PSW application. Cubes created on PSW are presented at Pentaho BI Server, the publishing process is simple and intuitive. With PSW it is possible to create metrics, dimensions and hierarchies.

WEKA

WEKA is a Data Mining tool of Pentaho BI Suite. It is a graphic environment where it is possible to apply statistical models, artificial intelligence and other models in order to enrich databases and support the decision-making process.

Once everything is presented, it is necessary to install Pentaho.