Big Data to Knowledge Panel SKG 2014 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China August 29 2014 Geoffrey Fox

Slides:



Advertisements
Similar presentations
Big Data Open Source Software and Projects ABDS in Summary XIV: Level 14B I590 Data Science Curriculum August Geoffrey Fox
Advertisements

1 Challenges and New Trends in Data Intensive Science Panel at Data-aware Distributed Computing (DADC) Workshop HPDC Boston June Geoffrey Fox Community.
Clouds from FutureGrid’s Perspective April Geoffrey Fox Director, Digital Science Center, Pervasive.
Jefferson Ridgeway 2, Ifeanyi Rowland Onyenweaku 3, Gregor von Laszewski 1*, Fugang Wang 1 1* Indiana University, Bloomington, IN 47408, U.S.A.,
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Dibbs Research at Digital Science
Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October Chicago Geoffrey Fox
1 Challenges Facing Modeling and Simulation in HPC Environments Panel remarks ECMS Multiconference HPCS 2008 Nicosia Cyprus June Geoffrey Fox Community.
Big Data and Clouds: Challenges and Opportunities NIST January Geoffrey Fox
X-Informatics Introduction: What is Big Data, Data Analytics and X-Informatics? January Geoffrey Fox
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
I399 1 Research Methods for Informatics and Computing D: Basic Issues Geoffrey Fox Associate Dean for Research.
18:15:32Service Oriented Cyberinfrastructure Lab, Grid Deployments Saul Rioja Link to presentation on wiki.
Getting Access to FutureGrid CTS Conference 2011 Philadelphia May Geoffrey Fox
Computational Scientometrics Studying science by scientific means Dr. Katy Börner Cyberinfrastructure for Network Science Center, Director Information.
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Application Provider Visualization Access Analytics Curation Collection.
PolarGrid Geoffrey Fox (PI) Indiana University Associate Dean for Graduate Studies and Research, School of Informatics and Computing, Indiana University.
Science Clouds and FutureGrid’s Perspective June Science Clouds Workshop HPDC 2012 Delft Geoffrey Fox
OpenQuake Infomall ACES Meeting Maui May Geoffrey Fox
Efficient Approaches to High-Scale Apache Hadoop Processing Cloud Computing West - November /9/2012 Joey Jablonski Practice Director, Analytic Services.
Data Science at Digital Science October Geoffrey Fox Judy Qiu
Scientific Computing Environments ( Distributed Computing in an Exascale era) August Geoffrey Fox
ICETE 2012 Joint Conference on e-Business and Telecommunications Hotel Meliá Roma Aurelia Antica, Rome, Italy July
FutureGrid Connection to Comet Testbed and On Ramp as a Service Geoffrey Fox Indiana University Infra structure.
SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
Using SWARM service to run a Grid based EST Sequence Assembly Karthik Narayan Primary Advisor : Dr. Geoffrey Fox 1.
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Transformation Provider Visualization Access Analytics Curation Collection.
ISERVOGrid Architecture Working Group Brisbane Australia June Geoffrey Fox Community Grids Lab Indiana University
X-Informatics MapReduce February Geoffrey Fox Associate Dean for Research.
November Geoffrey Fox Community Grids Lab Indiana University Net-Centric Sensor Grids.
Recipes for Success with Big Data using FutureGrid Cloudmesh SDSC Exhibit Booth New Orleans Convention Center November Geoffrey Fox, Gregor von.
SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu
HPC in the Cloud – Clearing the Mist or Lost in the Fog Panel at SC11 Seattle November Geoffrey Fox
Project number: ENVRI and the Grid Wouter Los 20/02/20161.
K E Y : DATA SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Hardware (Storage, Networking, etc.) Big Data Framework Scalable.
CIMA and Semantic Interoperability for Networked Instruments and Sensors Donald F. (Rick) McMullen Pervasive Technology Labs at Indiana University
Directions in eScience Interoperability and Science Clouds June Interoperability in Action – Standards Implementation.
Big Data Open Source Software and Projects ABDS in Summary II: Layer 5 I590 Data Science Curriculum August Geoffrey Fox
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
Enabling Digital Earth by focussing on ‘accessibility’ rather than ‘delivery’. Ryan Fraser CSIRO.
Geoffrey Fox Panel Talk: February
Panel: Beyond Exascale Computing
Department of Intelligent Systems Engineering
Themes in Geosciences.
NSF start October 1, 2014 Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Indiana University.
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
I590 Data Science Curriculum August
Some remarks on Portals and Web Services
Data Science Curriculum March
Tutorial Overview February 2017
Data Science for Life Sciences Research & the Public Good
Scalable Parallel Interoperable Data Analytics Library
Cloud DIKW based on HPC-ABDS to integrate streaming and batch Big Data
Clouds from FutureGrid’s Perspective
Big Data Young Lee BUS 550.
Panel: Revisiting Distributed Simulation and the Grid
Big Data Architectures
Cyberinfrastructure and PolarGrid
Services, Security, and Privacy in Cloud Computing
Department of Intelligent Systems Engineering
Digital Science Center
$1M a year for 5 years; 7 institutions Active:
PolarGrid and FutureGrid
Panel on Research Challenges in Big Data
Digital Science Center
Cloud versus Cloud: How Will Cloud Computing Shape Our World?
Big Data, Simulations and HPC Convergence
Convergence of Big Data and Extreme Computing
I590 Data Science Curriculum August
Presentation transcript:

Big Data to Knowledge Panel SKG 2014 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China August Geoffrey Fox School of Informatics and Computing Digital Science Center Indiana University Bloomington

Analytics and the DIKW Pipeline Data goes through a pipeline Raw data  Data  Information  Knowledge  Wisdom  Decisions Each link enabled by a filter which is “business logic” or “analytics” We are interested in filters that involve “sophisticated analytics” which require non trivial parallel algorithms – Improve state of art in both algorithm quality and (parallel) performance Design and Build SPIDAL (Scalable Parallel Interoperable Data Analytics Library) More Analytics Knowledge Information Analytics Information Data

Database SS Portal Another Cloud Raw Data  Data  Information  Knowledge  Wisdom  Decisions SS Another Service SS Another Grid SS Fusion for Discovery/Decisions Storage Cloud Compute Cloud SS Filter Cloud Discovery Cloud Filter Cloud SS Filter Cloud Distributed Grid Hadoop Cluster SS SS: Sensor or Data Interchange Service Workflow through multiple filter/discovery clouds or Services

What is Big Data? Big Data to Knowledge. We have – Data to Information – Information to Knowledge – Knowledge to Wisdom Big Data == Big Information == Big Knowledge One can classify by properties like size but I prefer to classify by a data centric approach -- its the data that gives the answer rather than a model or theory I see no difference between Big Data and Intelligent Big Data -- Big Data characterized by its smart transformation

Status of Big Data? Obviously one needs good infrastructure Hardware Software Algorithms The basic hardware is good -- clouds or HPC both work I suggested that algorithms and their parallel implementation needed more work. There are key problems with data a) Coping with distribution -- cant bring computing to data very easily in some cases (where "global machine learning" needed) b) Getting data given privacy and proprietary issues. Web Observatory nice step