Download presentation
Presentation is loading. Please wait.
Published byMarilynn Martin Modified over 9 years ago
1
McGraw-Hill/Irwin ©2008 The McGraw-Hill Companies, All Rights Reserved Computer Systems and Big Data Analysis Big Data Analysis
2
Motivating Examples “Data is very important. The world in the future will be dominated by data”. Ma Yun –“ 数据非常重要, 未来的世界是数据的世界 ” 。 马云 Guess which provinces are bikini best sold in China. –Guangdong, Hainan? –No…. –According to Taobao, there are Xinjiang and Inner Mongolia. –Explanation: Each man have told his wife/lover/girl friend that he would take her swimming in the sea. Orbitz is a ticket-booking website. After data analysis, they found that customers’ ticket prices are related to their web browser: Safari highest, Chrome and Firefox similar. –They adjust the strategy accordingly. The Safari user will be given expensive tickets first. 2
3
What Is Big Data? There is not a consensus as to how to define big data 3 “Big data exceeds the reach of commonly used hardware environments and software tools to capture, manage, and process it with in a tolerable elapsed time for its user population.” - Teradata Magazine article, 2011 “Big data refers to data sets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze.” - The McKinsey Global Institute, 2011
4
Where Is This “Big Data” Coming From ? 12+ TBs of tweet data every day 25+ TBs of log data every day ? TBs of data every day 2+ billion people on the Web by end 2011 30 billion RFID tags today (1.3B in 2005) 4.6 billion camera phones world wide 100s of millions of GPS enabled devices sold annually 76 million smart meters in 2009… 200M by 2014
5
Volume of Tweets create daily. 12+ terabytes Variety of different types of data. 100’s Veracity decision makers trust their information. Only 1 in 3 With Big Data, We’ve Moved into a New Era of Analytics trade events per second. 5+ million Velocity
6
3 Vs of Big Data The “BIG” in big data isn’t just about volume 6
7
Four Characteristics of Big Data Collectively Analyzing the broadening Variety Responding to the increasing Velocity Cost efficiently processing the growing Volume Establishing the Veracity of big data sources 30 Billion RFID sensors and counting 1 in 3 business leaders don’t trust the information they use to make decisions 50x 35 ZB 2020 80% of the worlds data is unstructured 2010
8
Big Data Analysis Example: Product arrangement How does location tracking work? –Recognize the dead zone 8
9
Usage Example in Big Data In March 2012, The White House announced a national "Big Data Initiative" that consisted of six Federal departments and agencies committing more than $200 million to big data research projects. –PRISM is a clandestine mass electronic surveillance data mining program operated by the United States National Security Agency (NSA) since 2007.clandestinemasselectronic surveillancedata miningNational Security Agency It is reported that China is going to create a national policy about big data management. 9
10
Why You Need to Tame Big Data Analyzing big data is already standard (e.g. ecommerce) Be left behind in a few years –So far, only missed the chance on the bleeding edge Capturing data, using analysis to make decisions –Just an extension of what you are already doing today 10
11
Filtering Big Data Effectively Sipping from the hose 11 Focus on the important pieces of the data It makes big data easier to handle
12
0011010100100100100110100101010011100101001111001000100100010010001000100101 Analytic With Data-In-Motion & Data At Rest 12 01011001100011101001001001001 11000100101001001011001001010 01100100101001001010100010010 11000100101001001011001001010 01100100101001001010100010010 Opportunity Cost Starts Here 01100100101001001010100010010 11000100101001001011001001010 01100100101001001010100010010 11000100101001001011001001010 01100100101001001010100010010 11000100101001001011001001010 Adaptive Analytics Model Bootstrap Enrich Data Ingest
13
Big Data Exploration: Value & Diagram 13 File Systems Relational Data Content Management Email CRM Supply Chain ERP RSS Feeds Cloud Custom Sources Data Explorer Application/ Users Find, Visualize & Understand all big data to improve business knowledge Greater efficiencies in business processes New insights from combining and analyzing data types in new ways Develop new business models with resulting increased market presence and revenue
14
Raw Logs and Machine Data Indexing, Search Statistical Modeling Root Cause Analysis Federated Navigation & Discovery Real-time Analysis Only store what is needed Operations Analysis: Value & Diagram Machine Data Accelerator
15
The Need for Standards Become more structured over time Fine-tune to be friendlier for analysis Standardize enough to make life much easier 15
16
Today’s Big Data Is Not Tomorrow’s Big Data Banking industries were very hard to handle even a decade ago 16 “BIG” will change: Big data will continue to evolve
17
17 IBM IBM Case : How Computers Make Big Data dream to come true Dedicated device Optimized for purpose Complete solution Fast installation Very easy operation Standard interfaces Low cost Built-In Expertise systems for Big Data analysis
18
BigInsights and the data warehouse BigInsights Query-ready archive for “cold” warehouse data Data Warehouse Big Data analytic applications Traditional analytic tools From Cognos BI via Hive JDBC
19
BigInsights Connectivity to DBMS / Warehouse Netezza BigInsights JDBC DBMS DB2 LUW, IW with DPF BigInsights drives RDBMS work DBMS drives BigInsights work
20
IBM system for Big Data SQL via native, ODBC, JDBC Load through UDF Netezza RDBMS Cognos Insight Application (Map-Reduce, Lucene, SystemT) Storage (HBase, HDFS, GPFS) Query Methods (Jaql, Pig, Hive) BigSheets CSV InfoSphere BigInsights Hive via JDBC REST via HTTP Cognos BI Server Text Analytics REST API Explore & Analyze Report & Act Cognos Business Intelligence Big Data Architecture
21
Analyze Streaming Data
22
The Platform Advantage BI / Reporting Exploration / Visualization Functional App Industry App Predictive Analytics Content Analytics Analytic Applications IBM Big Data Platform Systems Management Application Development Visualization & Discovery Accelerators Information Integration & Governance Hadoop System Stream Computing Data Warehouse BENEFITSIN DETAIL Increase over time By moving from entry to a 2 nd and 3 rd project Lowering deployment costs Shared components Integration Points of leverage Shared text analytics for Streams and BigInsights HDFS connectors (data integration (ETL, …), Streams) Accelerators Build across multiple engines
23
Original Platform Netezza Workflow Reporting2 hours1 minute Invoicing and Payments reporting Payment discipline of current month invoices33 minutes17 seconds Overdue Debt of Invoices – in Current Month10 hours23 seconds Average Monthly Invoice Figures50 minutes38 seconds RESPONSE TIME MASSIVELY IMPROVED How Much the Big Data Analysis Enhanced by IBM Project of T-Mobile Czech Rep.
24
Resource Ömer Sever (omers@tr.ibm.com) IBM SWG TR Martin Pavlík (Martin_pavlik@cz.ibm.com) cz.ibm.com iDB: Internet Database Lab
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.