Exploiting MySQL 5.1 For Advanced Business Intelligence Applications Matt Casters: Chief Architect, Data Integration and Kettle Project Founder MySQL User.

Exploiting MySQL 5.1 For Advanced Business Intelligence Applications Matt Casters: Chief Architect, Data Integration and Kettle Project Founder MySQL User Conference, Tuesday April 24, 2007

Agenda Great news Pentaho Data Integration introduction Pentaho Data Integration MySQL support Table partitioning Database partitioning Clustering Metadata to the rescue Pentaho company overview & platform demo Questions & closing

Great news The San Francisco Giants beat the Arizona Diamondbacks last Saturday Barry Zito pitched a great game The hot-dogs where great

Pentaho news Mondrian 2.3. was released a few weeks ago Detailed cache control (clearing) Sun Microsystems & Pentaho announced that Pentaho Reporting will be used to drive Business Intelligence requirements in Open Office 2.3: to be released in the second half of this year. Tomorrow we’ll do another one.

Pentaho Data Integration introduction From source systems … to the data warehouse … to reports … to analyses … to dashboard reports … to better information

Data Transformation and Integration Examples Data filtering Is not null, greater than, less than, includes Field manipulation Trimming, padding, upper and lowercase conversion Data calculations + - X /, average, absolute value, arctangent, natural logarithm Date manipulation First day of month, Last day of month, add months, week of year, day of year Data type conversion String to number, number to string, date to number Merging fields & splitting fields Looking up date Look up in a database, in a text file, an excel sheet, …

Pentaho Data Integration (Kettle) Components Spoon Connect to data sources Define transformation rules and design target schema(s) Graphical job execution workflow engine for defining multi-stage and conditional transformation jobs Pan Command-line execution of single, pre-defined transformation jobs Kitchen Scheduler for multi-stage jobs Carte Remote execution “slave” server with embedded web server Pentaho BI Platform Integrated scheduling of transformations or jobs Ability to call real-time transformations and use output in reports and dashboards

Data Transformation and Integration Examples Demo time : load 1M rows into a database table

Case Study: Pentaho Data Integration Organization: Flemish Government Traffic Centre Use case: Monitoring the state of the road network Application requirement: Integrate minute-by-minute data from 570 highway locations for analysis (1550 lanes) Technical challenges: Large volume of data, more than 2.5 billion rows Business Usage: Users can now compare traffic speeds based on weather conditions, time of day, date, season Best practices: Clearly understand business user requirements first There are often multiple ways to solve data integration problems, so consider the long-term need when choosing the right way

Summary and Resources Pentaho and MySQL can address help you manage your data infrastructure Extraction, Transformation and Loading for Data Warehousing and Data Migration kettle.pentaho.org Kettle project homepage kettle.javaforge.com Kettle community website: forum, source, documentation, tech tips, samples, … www.pentaho.org/download/ All Pentaho modules, pre-configured with sample data Developer forums, documentation Ventana Research Open Source BI Survey www.mysql.com White paper - http://dev.mysql.com/tech-resources/articles/mysql_5.0_pentaho.htmlhttp://dev.mysql.com/tech-resources/articles/mysql_5.0_pentaho.html Kettle Webinar - http://www.mysql.com/news-and-events/on-demand- webinars/pentaho-2006-09-19.phphttp://www.mysql.com/news-and-events/on-demand- webinars/pentaho-2006-09-19.php Roland Bouman blog on Pentaho Data Integration and MySQL http://rpbouman.blogspot.com/2006/06/pentaho-data-integration-kettle-turns.html

Pentaho Data Integration MySQL support Reading: MySQL Result Streaming (cursor emulation) support Writing: MySQL dialects for data types Job entry: Bulk Loader of text files for MySQL Job entry: Bulk writer to a text file for MySQL

Table partitioning Challenges More data is being gathered all the time Data is coming from more sources than ever Faster access to stored information is becoming more important More people require concurrent access to the data The old solution Split up the data into different tables SALES_2003 SALES_2004 … SALES_2006 Hardcode reports to use one or the other table Lots of database management issues High report maintenance costs Nothing is automatic, everything is manual

Table partitioning The new solution: MySQL 5.1. partitioned tables Automatic split up of data into smaller chunks SQL engine that can automatically pull results back together

Table partitioning Sales table Year 2003 Partition Year 2004 Partition Year 2005 Partition Year 2006 Partition Sales table

Table partitioning Advantages Reduces query time by reducing the amount of data to “plough” through. Increases performance by “Pruning” the list of partitions to search for automatically. This is done using the MySQL query optimizer based on the query that’s being issued. Massive reduction in I/O Smaller partitioned indexes leading to faster index tree traversal Allowing parallel access to the different partitions

Table partitioning How do I create a partitioned table? CREATE TABLE sales ( sales_date TIMESTAMP, ordernr INT, customernr INT, productnr INT, discount_pct TINYINT ) PARTITION BY RANGE( YEAR(sales_date) ) ( PARTITION P2000 VALUES LESS THAN (2001), PARTITION P2001 VALUES LESS THAN (2002) )

Table partitioning How do I add another partition to the table? ALTER TABLE sales ADD PARTITION ( PARTITION P2002 VALUES LESS THAN (2003) )

Table partitioning How do I drop a partition from the table? ALTER TABLE sales DROP PARTITION P2002 ;

Table partitioning Retrieving partition information SELECT partition_name FROM information_schema.partitions WHERE table_name = ’sales’ ; +----------------+ | partition_name | +----------------+ | P2000 | | P2001 | | P2002 | | P2003 | | P2004 | | P2005 | | P2006 | | P2007 | | P2008 | | P2009 | | P2010 | | P2011 | +----------------+

Table partitioning Use-case: Flemish Traffic Center Medium sized data set: 514M rows Partitioned by week  PARTITION BY RANGE ( YEAR(mtime)*100 + WEEK(mtime) ) 80GB of data in hundreds of partitions 1 row of data: speed, number of vehicles (5 classes), etc. Per minute Per lane of traffic For 1550 detectors  815M rows of data per year A test query aggregating speed & counts per road position per minute Gave back 642.319 rows 9 seconds for MySQL to get the result 29 seconds to pass the data to the client over JDBC

Table partitioning Should we do table partitioning? Yes! However… Even though you can do sub-partitioning, you need to partition the data in one certain way, by range, by list, … The entry point for optimization is always going to be that partitioning method Pruning is always going to be based on that partitioning method Queries across the whole data set don’t always get benefit (neither a penalty) More detailed information: Partitioning in MySQL 5.1 and Onwards Mikael Ronström, Senior Software Architect, MySQL AB  Tomorrow 11:50am – 12:35pm, Ballroom A How can we improve upon this?

Database partitioning Challenge: Dealing with even greater data sets The need to present results even quicker The old solution Large boxes Huge amounts of RAM Very fast disk systems (Fibre attached, SAN, …) Multi-CPU monsters  $$$ Very expensive and complicated systems $$$ The new solution: 2 or more MySQL servers Basic, off the shelf, hardware Standard software Simple configurations Pentaho Data Integration & MySQL 5.1

Database partitioning Year 2003 Partition Year 2004 Partition Year 2005 Partition Year 2006 Partition Sales table 2003 2004 2005 2006 Sales 2003 2004 2005 2006 Sales 2003 2004 2005 2006 Sales 2003 2004 2005 2006 Sales DB1 DB2 DB3 DB4

Database partitioning Split the data over several databases Pick a splitting mechanism in advance Divide the load over the databases Make the databases smaller Increase the query performance Increase the load performance

Database partitioning How do you set it up? First create a partitioned / clustered database connection Then create a Partitioning Schema

Database partitioning Demo time: Creating partitions partitioned Loading data partitioned Reading back data partitioned Reading back data partitioned and ordered

Table partitioning Use-case: Flemish Traffic Center Medium sized data set: 514M rows Same partitioning as before Same table layout, engine and configuration Partitioned over 3 databases on detector (per lane) A test query aggregating speed & counts per road position per minute Gave back 642.319 rows 3 seconds for MySQL to get the result 10 seconds to pass the data to the client over JDBC  Demonstrating almost linear scalability!!

Table partitioning Should we do database partitioning? Yes! However… The scalability is limited to the raw CPU power that is available on a single system. This poses a scalability problem when dealing with larger numbers of partitions Reading back large volumes of data  Enters Pentaho Data Integration: Clustering

Pentaho Data Integration : Clustering 2003 2004 2005 2006 Sales DBx Server X

Pentaho Data Integration : Clustering The challenge is to eliminate CPU bottlenecks Pentaho Data Integration “Carte” comes to the rescue Can be started up on any system turning it into a “Slave Server” The combination of database partitioning and PDI Clustering: Will split up the number of database partitions among the slave servers Will split the total CPU load over the slave servers Will split the total I/O load over the database partitions (MySQL instances)

Pentaho Data Integration : Clustering Demo-time Start up 2 slave servers Run a step across the 2 servers Monitor

Pentaho Metadata to the rescue The problem: Reporting becomes harder on a database partitioned system In stead of 1 database you read from a bunch of them This is where our new project Pentaho Metadata Comes in Due in a few months Auto-generates SQL now Will auto-generate PDI transformations in the near future Will auto-deliver the data to your report in the platform Will make it very easy for your users to create drag and drop reports

Pentaho Introduction World’s most popular enterprise open source BI Suite 2 million lifetime downloads, averaging 100K / month Founded in 2004: Pioneer in professional open source BI Key Projects JFreeReport Reporting Kettle Data Integration Mondrian OLAP Pentaho BI Platform Weka Data Mining Management and Board Proven BI veterans from Business Objects, Cognos, Hyperion, SAS, Oracle Open source leaders - Larry Augustin, New Enterprise Associates, Index Ventures MySQL Gold Partner

Pentaho Introduction Strategic Operational SalesMarketingInventoryFinancialProduction Scorecards Analysis Aggregates Reports Departmental

Pentaho Introduction Demo and overview

Questions and Closing ?

Birds of a Feather MySQL Data Warehousing and BI Tonight at 7:30 Camino Real With: Chief Geek James Dixon: Senior architect & CTO, Pentaho Julian Hyde, OLAP Architect and Mondrian Project Founder, Pentaho Brian Miezejewski, Principal Consultant, MySQL Roland Bouman, Certification Developer, MySQL MySELF™ A few bottles of beer

Exploiting MySQL 5.1 For Advanced Business Intelligence Applications Matt Casters: Chief Architect, Data Integration and Kettle Project Founder MySQL User.

Similar presentations

Presentation on theme: "Exploiting MySQL 5.1 For Advanced Business Intelligence Applications Matt Casters: Chief Architect, Data Integration and Kettle Project Founder MySQL User."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Exploiting MySQL 5.1 For Advanced Business Intelligence Applications Matt Casters: Chief Architect, Data Integration and Kettle Project Founder MySQL User.

Similar presentations

Presentation on theme: "Exploiting MySQL 5.1 For Advanced Business Intelligence Applications Matt Casters: Chief Architect, Data Integration and Kettle Project Founder MySQL User."— Presentation transcript:

Similar presentations

About project

Feedback