Large Scale Data Analytics

Slides:



Advertisements
Similar presentations
R and HDInsight in Microsoft Azure
Advertisements

Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
Data warehouse example
Big Data Workflows N AME : A SHOK P ADMARAJU C OURSE : T OPICS ON S OFTWARE E NGINEERING I NSTRUCTOR : D R. S ERGIU D ASCALU.
Chapter 14 The Second Component: The Database.
Data Mining – Intro.
Multimedia Data Mining Arvind Balasubramanian Multimedia Lab (ECSS 4.416) The University of Texas at Dallas.
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
WHT/ HPCC Systems Flavio Villanustre VP, Products and Infrastructure HPCC Systems Risk Solutions.
Big Data A big step towards innovation, competition and productivity.
Large Scale Data Analytics
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Data Mining Techniques
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Processing and Analyzing Large log from Search Engine Meng Dou 13/9/2012.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Guest Lecture Introduction to Data Mining Dr. Bhavani Thuraisingham September 17, 2010.
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Machine Learning Extract from various presentations: University of Nebraska, Scott, Freund, Domingo, Hong,
Data Visualization Michel Bruley Teradata Aster EMEA Marketing Director April 2013 Michel Bruley Teradata Aster EMEA Marketing Director.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 14 November 2003 William.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Machine Learning. Definition Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational.
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.
Introduction to Machine Learning, its potential usage in network area,
Data Analytics (CS40003) Introduction to Data Lecture #1
Introducing Precictive Analytics
Introduction to Big Data James Miller
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
Data Analytics 1 - THE HISTORY AND CONCEPTS OF DATA ANALYTICS
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining – Intro.
SNS COLLEGE OF TECHNOLOGY
Big Data Enterprise Patterns
Data Mining Generally, (Sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it.
ANOMALY DETECTION FRAMEWORK FOR BIG DATA
Themes in Geosciences.
Ramesh Jain Events in Data Science Ramesh Jain
Data and Applications Security Introduction to Data Mining
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
Big Data.
Overview of big data tools
Data Mining: Concepts and Techniques
INNOvation in TRAINING BUSINESS ANALYSTS HAO HElEN Zhang UniVERSITY of ARIZONA
Course Introduction CSC 576: Data Mining.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining: Concepts and Techniques
Big DATA.
Analytics, BI & Data Integration
What's New in eCognition 9
UNIT 6 RECENT TRENDS.
Big Data.
Presentation transcript:

Large Scale Data Analytics Jiawan Zhang School of Computer Software, Tianjin University jwzhang@tju.edu.cn

Outline Big Data Gartner Hype Cycle 2012 Large scale data processing Visual Analytics Chances and Challenges Discussions

Big Data V3 Volume:Gigabyte(109), Terabyte(1012), Petabyte(1015), Exabyte(1018), Zettabytes(1021) Variety: Structured,semi-structured, unstructured; Text, image, audio, video, record Velocity(Dynamic, sometimes time-varying) Big Data refers to datasets that grow so large that it is difficult to capture, store, manage, share, analyze and visualize with the typical database software tools.

Numbers How many data in the world? 800 Terabytes, 2000 160 Exabytes, 2006 500 Exabytes(Internet), 2009 2.7 Zettabytes, 2012 35 Zettabytes by 2020 How many data generated ONE day? 7 TB, Twitter 10 TB, Facebook Big data: The next frontier for innovation, competition, and productivity McKinsey Global Institute 2011

Why Is Big Data Important?

Gartner Hype Cycle 2012

Large Scale Visual Analytics Definition: Visual analytics is the science of analytical reasoning facilitated by interactive visual interfaces. People use visual analytics tools and techniques to Synthesize information and derive insight from massive, dynamic, ambiguous, and often conflicting data Detect the expected and discover the unexpected Provide timely, defensible, and understandable assessments Communicate assessment effectively for action.

Inforviz Reference Model to Visual Analytics

Applications Terrorism and Responses Multimedia Visual Analytics Situation Surveillance and Awareness in Investigative Analysis Disease visual analytics for Disease outbreak Prediction Financial Visual Analytics Cybersecurity Visual Analytics Visual Analytics for Investigative Analysis on Text Documents

Techniques and Technologies A wide variety of techniques and technologies has been developed and adapted for Data aggregation Data manipulation Data analysis Data visualization These techniques and technologies draw from several fields including Statistics Computer science Applied mathematics Economics.

Techniques and Applications Statistics: A/B testing(split testing/bucket testing ),Spatial analysis , Predictive modeling :Regression Machine Learning Unsupervised learning: cluster analysis Supervised learning: classification, support vector machines(SVM), ensemble learning Association rule learning Data Mining and Pattern Recognition: neural network, classification, clustering Natural language processing(NLP): Sentiment analysis Dimension Reduction: PCA, MDS, SVD Data fusion and data integration: Visual Word Time series analysis: Combination of statistics and signal processing Simulation: Monte Carlo simulations, MRF Optimization: Genetic algorithms Visualization: Scientific Viz, Inforviz, Visual Analtytics

Technologies Database and Data warehouse Google File System and MapReduce: Big Table Hadoop: HBase and MapReduce, open source Apache project Cassandra: An open source (free) DBMS, originally developed at Facebook and now an Apache Software foundation project. Data warehouse: ETL (extract, transform, and load) tools and business intelligence tools. Business intelligence (BI): data warehouse, reporting, real-time management dashboards Cloud computing: Services, SOA, etc. Metadata: XML Stream processing R, SAS and SPSS Visualization:Tag cloud,Clustergram,History flow, Themeriver, Treemap

Origin of Information Visualization

InforViz Techniques Scatterplot and Scatterplot Matrix Hierarchies Visualization:Node-Link Diagrams, Sunburst,Treemap, Circle- packing layouts Network Visualization:Force-Directed Layout,Arc Diagrams,Matrix Views Multidimensional Visualization/Parallel Coordinates Stacked Graphs Flow Maps

Scatterplot and Scatterplot Matrix

Tree Visualization(1) Node-Link Diagrams sunburst Dendrogram

Circle-packing layouts Tree Visualization(2) Treemap Circle-packing layouts

Network Visualization Force-Directed Layout Matrix Views Arc Diagrams

Parallel Coordinates

Stacked Graphs

Flow Maps

Examples

Fraud Detection of Bank Wire Transactions

Displays and Views

A classical VA tool

GapMinder [Demo]

Smart Money Map [Demo]

A recent project

Chances and Challenges The basic techniques for large scale simulation and computing are ready However, large and time-consuming computing tasks need steering or visualize the intermediate computing results. Most simulation and computing tasks have to tune hundreds of parameters. Smart/intelligent data mining/data processing algorithms are ready However, most data mining algorithms have high computational complexity: N2 rather than Nlog(N), or N How to combine automatic computing(machine) and high-level intelligence to gain insight(Human), and involve human in the computing?

Recent Research Topics Unified Visual Analytics by Heterogeneous Data Sources(esp. Text) Structured and semi-structured data fusion framework Data indexing and similarity rank Visual analytics for high-dimensional heterogeneous data Domain Risk Management and Preventive Control by Sensor Data Collection and Data Mining Sensor techniques Data Warehouse Coordinated Views integrate visual analytic techniques Parallel/Distributed Computing Steering by Parameter Optimization and Visualization Parameter tuning and computing optimization Intermediate results visualization and task steering Markov Chain Monte Carlo(MCMC) Simulation

Questions and Thanks!