Business Intelligence
BI Fundamentals Business Transactions Data Bases Data Warehouses Data Marts Data Mining
Data bases (DB or DBMS) “A collection of information organized in such a way that a computer program can quickly select desired pieces of data.” An electronic filing system Organized by Fields: a single piece of information Records: one complete set of fields Files: a collection of record
Data warehouse (DW) “Contain a wide variety of data that present a coherent picture of business conditions at a single point in time.” “A database system which contains periodically collected samples or summarized (aggregated) transactional data; e.g., daily totals, or monthly averages” Typically a compilation of information from multiple transactional databases
Data mart “A database, or collection of databases, designed to help managers make strategic decisions about their business.” A smaller and more focused form of a data warehouse. Usually created for a particular department or position A data mart created as a subset of data warehouse data are referred to as a “dependent data mart”.
Data mining “A class of database applications that look for patterns in data to be used to predict and direct future behavior.” Increasingly being used by marketers to find consumer data through the web and store purchases.
What is BI? The new technology for understanding the past and predicting the future A broad category of technologies that allows for Gathering, storing, accessing and analyzing the data business users make better decisions Analyzing business performance through data-driven insight A broad category of applications, which includes the activities of Decision support systems Query and reporting OLAP Statistical, forecasting and data mining
BI vs. AI AI systems make decisions for the users BI systems help users make the right decisions, based on the available data However, many BI techniques have roots in AI
BI Processes MDA (Model Driven Architecture)
Business Intelligence PSA (Persistant Staging Area), ODS (Operational Data Store)
Data-Information–Knowledge–Decision Making Cycle
What does BI seek to find? Patterns What kind of patterns? Sales Stocks Anything useful
Techniques for Finding Patterns Statistics Trends Correlation (searching for a best fit)
Patterns continued Combinatorial Example If-then relationships If we put chips on sale on a Friday, then we also sell more soda.
Leading the Industry Cognos Software BI software company Used for reporting, analysis, scorecarding, dashboards, business event management, and data integration
Cognos Multiple Solutions Industry Department Banking Education Defense Government Department Executive Management Finance Marketing
Open Source Tools for BI ETL (Extract, Transform, Load) tools OLAP (Online Analytical Processing) servers OLAP clients DBMSs (Data Base Management System)
ETL Tools Bee CloverETL Octopus ROLAP (Relational OLAP) oriented ETL tool CloverETL ROLAP oriented ETL tool Implemented in Java and uses JDBC to transfer data cloveretl.berlios.de Octopus Implemented in Java and uses JDBC octopus.objectweb.org
OLAP Servers Bee Lemur Mondrian ROLAP oriented server Uses mySQL to manage the DB sourceforge.net/products/bee/ Lemur HOLAP oriented server www.nongnu.org/lemur Mondrian Implemented in Java Can be used with any DBMS sourceforge.net/projects/mondrian/
OLAP Client Bee Jpivot Web-based, used with Bee OLAP server Generates pie, bar, chat, etc. (in 2D & 3D) Export data to Excel, PDF, PNG, Powerpoint, XML Jpivot Web-based, used with Mondrian OLAP server Generates 2D & 3D graphics Export data to PDF jvipot.sourceforge.net
DBMSs MonetDB MySQL MaxDB Run on Linux, Windows, Mac OS, etc. monetdb.cwi.nl MySQL Run on Linux, Windows www.mysql.com/products/mysql MaxDB Formely SAP DB (by SAP AG) www.mysql.com/products/maxdb
PostgreSQL www.postgresql.org Run on Linux, Unix, Windows (versi > 8.0)
PALO OLAP Palo OLAP Server Palo ETL Server Palo OLAP Client http://www.jedox.com/ Open source MOLAP server be installed locally or in a company network Palo ETL Server enables the efficient extraction of mass data from heterogeneous data sources, ie. all common relational database systems and flat files Palo OLAP Client http://www.jpalo.com/en/ Two versions: Palo Client and Palo Web Client
Data Mining Softwares Open sources Commercials Borgelt data mining suite Gnome data mine Weka RapidMiner Commercials See5 (Rulequest) Clementine (SPSS) Enterprise Miner (SAS) GhostMiner (Fujitsu) Statistica Data Miner (StatSoft) Oracle Data Miner (Oracle)
Borgelt Data Mining Suite Tasks: Association: apriori, eclat Classification: bayesian networks, decision trees, naive bayes Regression: neural networks Clustering: self-organizing maps (SOM) Platforms: Linux, Unix, MS Windows Website: http://fuzzy.cs.unimagdeburg.de/~borgelt/software.html
Genome Data Mine Tasks: Platforms: Linux, Unix, MS Windows Website: Association: apriori Classification: decision trees Platforms: Linux, Unix, MS Windows Website: http://www.togaware.com/datamining/gdatamine Owner: Togaware, Canberra, Australia.
WEKA Tasks: Platforms: Linux, Unix, MS Windows Website: Association: apriori Classification: decision trees, support vector machines, conjunctive rules Clustering: k-means Platforms: Linux, Unix, MS Windows Website: http://www.cs.waikato.ac.nz/ml/ Owner: University of Waikato, Hamilton, New Zealand
RapidMiner http://rapid-i.com/ The world-leading open-source system for knowledge discovery and data mining Multiplaftorm: implemented in Java Supports about 400 operators data mining
Who uses BI? Businesses The Government ? What are some ethical implications of the use of BI?