Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc.

Slides:



Advertisements
Similar presentations
Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER.
Advertisements

From Startup to Enterprise A Story of MySQL Evolution Vidur Apparao, CTO Stephen OSullivan, Manager of Data and Grid Technologies April 2009.
1 Copyright © 2002 Pearson Education, Inc.. 2 Chapter 1 Introduction to Perl and CGI.
1 Towards an Open Service Framework for Cloud-based Knowledge Discovery Domenico Talia ICAR-CNR & UNIVERSITY OF CALABRIA, Italy Cloud.
The LAC/UIC experiences through JGN2/APAN during SC04 Katsushi Kouyama and Kazumi Kumazoe Kitakyushu JGN Research Center / NiCT Robert L. Grossman, Yunhong.
Open Source Intelligence: Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC IOP 06 Sheraton Premier, Tysons Corner, Virginia January.
1 of 16 Information Access The External Information Providers © FAO 2005 IMARK Investing in Information for Development Information Access The External.
Corporation For National Research Initiatives DOIs and the Handle System 5 August 1998 Larry Lannom CNRI.
David Martin for DAML-S Coalition 05/08/2003 OWL-S: Bringing Services to the Semantic Web David Martin SRI International
Imagining the Future. WORLD WIDE WEB Tim Berners-Lee invented the World Wide Web.World Wide Web A graduate of Oxford University, England, in 1989, Tim.
Oxford University Press Journals Collection Online.
By Ms. Marshall, September 2013 Senior Project Where do I start? Please login to the Network: First name_Last name panthers (unless you have already modified.
Computer Technology Timpview High School. A collection of local, regional, national, and international computer networks that are linked together to exchange.
RealNames An example of Common Name Namespace. Presentation roadmap Context of use Market requirements Technical Overview Q&As.
How Big is an Exabyte? 5 Exabytes = 37,000 new libraries
Experimentation in a Multi-site GENI WiMAX Network using Orbit Management Framework (OMF) 8th International ICST Conference on Testbeds and Research Infrastructures.
1 Programming the Web: HTML Basics Computing Capilano College.
Large Scale Computing Systems
University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
Rule Discovery from Time Series Presented by: Murali K. Kadimi.
Introduction to Computing Using Python CSC Winter 2013 Week 8: WWW and Search  World Wide Web  Python Modules for WWW  Web Crawling  Thursday:
Company Name Here Industry Analysis. You must be connected to UCF’s wireless network in order to gain access to the library resources necessary to obtain.
R and HDInsight in Microsoft Azure
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Copyright 2012 Trend Micro Inc. Raimund Genes, CTO Innovation In Cloud Security.
Big Data and Predictive Analytics in Health Care Presented by: Mehadi Sayed President and CEO, Clinisys EMR Inc.
Econ 140 Lecture 121 Prediction and Fit Lecture 12.
UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.
Introduction to Video Game Programming (VGP) Mr. Shultz.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Data Mining: Introduction. Why Data Mining? l The Explosive Growth of Data: from terabytes to petabytes –Data collection and data availability  Automated.
Introducing the Internet Source: Learning to Use the Internet.
Big Data. What is Big Data? Big Data Analytics: 11 Case Histories and Success Stories
1 1 Slide Introduction to Data Mining and Business Intelligence.
Dr. Michael D. Featherstone Summer 2013 Introduction to e-Commerce Web Analytics.
Data Mining By Dave Maung.
Internet Overview Data Service Center What is the Internet? F A network of networks connecting computers/people around the world allowing them to share.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
ECE 424 Embedded Systems Design Lecture 2: Attributes of Embedded Systems Now Ning Weng.
DDM Kirk. LSST-VAO discussion: Distributed Data Mining (DDM) Kirk Borne George Mason University March 24, 2011.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Data Mining Basics. “Copyright and Terms of Service Copyright © Texas Education Agency. The materials found on this website are copyrighted © and trademarked.
 In the 1960s, ARPANET (Advanced Research Projects Agency Network), the internet’s predecessor, was invented  ARPANET used two technologies that are.
Chapter 11 Analysis Methodology Spring Incident Response & Computer Forensics.
BIG DATA USES CASES & LESSONS LEARNED Marrakech – March 2016 Alexandre AKROUR, CEO 1.
Data mining in web applications
22C:145 Artificial Intelligence
Discovering User Access Patterns on the World-Wide Web
Introduction to Web Mining
TIGGE Data Archive and Access System at NCAR
Instructor Name Instructor Title Library Name
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Course Introduction CSC 576: Data Mining.
Data Mining: Introduction
Grand Challenges in e-Science
CS 345A Data Mining Lecture 1
PubMed Database Interface (Basic Course: Module 4 Part A)
Data Mining: Concepts and Techniques
CS 345A Data Mining Lecture 1
Data Management Components for a Research Data Archive
Data Analysis and R : Technology & Opportunity
Introduction to Web Mining
CS 345A Data Mining Lecture 1
Global time spent by medium, Q4 2012
Presentation transcript:

Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc.

1. Background Three Fundamental Trends.

Trend 1. Explosion of Data …

… All in the Wrong Format With no one to analyze it.

The Data Gap Total new disk (TB) since 1995 New Ph.D.s Most data comes a GB and a TB at a time.

Data Mining is Inevitable log #(amount of networked data) log #(statisticians) The goal of data mining is to close this gap.

Trend 2. Sonet is dead. Lambda Rules. Gigabytes can be moved in seconds.

Gigabytes can be Moved in Minutes 1 TB in 6 hours 10 GBs in 4 minutes 1 TB in 1.5 hours 10 GBs in 1 minute

Trend 3: Most Data is Distributed Bushs Law: The usefulness of a column of data varies as the square of the number of columns it is compared to.

Example 1: ENSO & Cholera El Nino Data at NCARCholera Data at WHO

Example 2: Voting Table 1 Table 2

Correlation: Reform Voters vs Votes for Buchanan Palm Beach

2. Internet Infrastructures for Data Data Webs, Semantic Webs, Data Grids, Distributed Data Mining, Digital Libraries and all that

Data Mining Data mining is the semi-automatic extraction of patterns, models, changes, associations, and anomalies from large data sets. data mining algorithm <tree-node node-id=8 threshold = etc. > learning set statistical model

Data Mining Process - End to End Viewpoint NCAR WHO Phase 1. Exploratory Analysis Phase 2. Data Analysis & Mining Phase 3. Deployment & Decision 50%0%50% DataSpace

DataSpace – One Approach to Making Data Useful 16 terabytes of documents 4 billion documents Todays Multi-media Web Tomorrows Data Web petabytes of data tens of billions to trillions of records html http search by keyword workstations servers pmml & dtml dstp correlate & mine data & compute clusters Complementary to the grid, which we view as a distributed computer.

View Data as a Collection of Distributed Columns

Data Servers and Data Browsers NCAR data in Boulder WHO data in Geneva DataSpace Data browser in Chicago

attributes [aid] UCK [uckid] k[i], y[j] k[i], x[i] DSTP Server 1 DSTP Server 2 Click to obtain graph

3. Summary & Conclusion

Terra Mining Testbed Optical testbed for distributed tera mining of scientific data. Goal also to be testbed for broadband based business services.

Lessons Learned 1.Its the data stupid. Cycles, cylinders & lambdas are all commodities. 2.The fundamental challenge: lower the cost to make data useful. 3.The emergence of internet infrastructure for data is inevitable. Opens up possibilities for new types of scientific discoveries.

For More Information DataSpace DataSpace Standards Selected articles Magnify –

End of Slides

FTP Still Lives

Trend 2. Bandwidth is a Commodity OC-3 OC-12 OC-48

El Nina Anomalies

Indonesia Cholera Cases

Cholera Cases

Distributed Exabytes (New Disks) Source: IDC (1999) "1999 Winchester Disk Drive Market Forecast and Review" Petabytes 1 Exabyte

Trend 3: Most Data is Distributed Ws Law: The usefulness of a column of data varies as the square of the number of columns it is compared to.

Example 2: Voting

Database 1: Total Votes for Buchanan by County

Database 2: Total Registered Reform Voters by County

Correlation: Total Votes vs Buchanan Votes by County Palm Beach