Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center.

Slides:



Advertisements
Similar presentations
Big Data Open Source Software and Projects ABDS in Summary XIV: Level 14B I590 Data Science Curriculum August Geoffrey Fox
Advertisements

Programming Models for IoT and Streaming Data IC2E Internet of Things Panel Judy Qiu Indiana University.
SALSA HPC Group School of Informatics and Computing Indiana University.
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
Big Data Open Source Software and Projects ABDS in Summary XVI: Layer 13 Part 1 Data Science Curriculum March Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary VI: Layer 6 Part 2 Data Science Curriculum March Geoffrey Fox
MapReduce in the Clouds for Science CloudCom 2010 Nov 30 – Dec 3, 2010 Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox {tgunarat, taklwu,
Cloudmesh: Software Defined Distributed Systems as a Service SDDSaaS Workshop on the Development of a Next-Generation, Interoperable, Federated Network.
Big Data Open Source Software and Projects Unit 0 Part B: Class Introduction Data Science Curriculum March Geoffrey Fox
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Dibbs Research at Digital Science
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October Chicago Geoffrey Fox
Big Data and Clouds: Challenges and Opportunities NIST January Geoffrey Fox
Remarks on Big Data Clustering (and its visualization) Big Data and Extreme-scale Computing (BDEC) Charleston SC May Geoffrey Fox
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science
Big Data Ogres and their Facets Geoffrey Fox, Judy Qiu, Shantenu Jha, Saliya Ekanayake Big Data Ogres are an attempt to characterize applications and algorithms.
Data Science at Digital Science October Geoffrey Fox Judy Qiu
Data Science at Digital Science November Geoffrey Fox Judy Qiu
FutureGrid Dynamic Provisioning Experiments including Hadoop Fugang Wang, Archit Kulshrestha, Gregory G. Pike, Gregor von Laszewski, Geoffrey C. Fox.
FutureGrid Connection to Comet Testbed and On Ramp as a Service Geoffrey Fox Indiana University Infra structure.
SALSA HPC Group School of Informatics and Computing Indiana University.
SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox
Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)
Looking at Use Case 19, 20 Genomics 1st JTC 1 SGBD Meeting SDSC San Diego March Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox
Computing Research Testbeds as a Service: Supporting large scale Experiments and Testing SC12 Birds of a Feather November.
Recipes for Success with Big Data using FutureGrid Cloudmesh SDSC Exhibit Booth New Orleans Convention Center November Geoffrey Fox, Gregor von.
Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics,
High Performance Processing of Streaming Data Workshops on Dynamic Data Driven Applications Systems(DDDAS) In conjunction with: 22nd International Conference.
SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu
Streaming Applications for Robots with Real Time QoS Oct Supun Kamburugamuve Indiana University.
Big Data to Knowledge Panel SKG 2014 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China August Geoffrey Fox
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center 1.
Towards High Performance Processing of Streaming Data May Supun Kamburugamuve, Saliya Ekanayake, Milinda Pathirage and Geoffrey C. Fox Indiana.
High Performance Processing of Streaming Data in the Cloud AFOSR FA : Cloud-Based Perception and Control of Sensor Nets and Robot Swarms 01/27/2016.
1 Panel on Merge or Split: Mutual Influence between Big Data and HPC Techniques IEEE International Workshop on High-Performance Big Data Computing In conjunction.
Geoffrey Fox Panel Talk: February
Digital Science Center
Private Public FG Network NID: Network Impairment Device
Digital Science Center II
Working With Azure Batch AI
Status and Challenges: January 2017
NSF start October 1, 2014 Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Indiana University.
Department of Intelligent Systems Engineering
Big Data and Simulations: HPC and Clouds
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
Department of Intelligent Systems Engineering
Digital Science Center I
I590 Data Science Curriculum August
Versatile HPC: Comet Virtual Clusters for the Long Tail of Science SC17 Denver Colorado Comet Virtualization Team: Trevor Cooper, Dmitry Mishin, Christopher.
High Performance Big Data Computing in the Digital Science Center
Research in Intelligent Systems Engineering
NSF Dibbs Award 5 yr. Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science IU(Fox, Qiu, Crandall, von Laszewski),
Data Science Curriculum March
Tutorial Overview February 2017
Data Science for Life Sciences Research & the Public Good
Martin Swany Gregor von Laszewski Thomas Sterling Clint Whaley
Research in Digital Science Center
Scalable Parallel Interoperable Data Analytics Library
Cloud DIKW based on HPC-ABDS to integrate streaming and batch Big Data
Digital Science Center III
Department of Intelligent Systems Engineering
$1M a year for 5 years; 7 institutions Active:
PHI Research in Digital Science Center
Panel on Research Challenges in Big Data
Big Data, Simulations and HPC Convergence
Research in Digital Science Center
Geoffrey Fox High-Performance Big Data Computing: International, National, and Local initiatives COLLABORATORS China and IU: Fudan University, SICE, OVPR.
Research in Digital Science Center
Convergence of Big Data and Extreme Computing
Presentation transcript:

Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center

Work on Applications Algorithms Systems Software Biology/Bioinformatics Computational Finance Network Science and Epidemiology Analysis of Biomolecular Simulations Analysis of Remote Sensing Data Computer Vision Pathology Images Real time robot data Parallel Algorithms and Software –Deep Learning –Clustering –Dimension Reduction –Image Analysis –Graph

Digital Science Center Research Areas Digital Science Center Facilities RaPyDLI Deep Learning Environment SPIDAL Scalable Data Analytics Library MIDAS Big Data Software Big Data Ogres Classification and Benchmarks CloudIOT Internet of Things Environment Cloudmesh Cloud and Bare metal Automation XSEDE TAS Monitoring citations and system metrics Data Science Education with MOOC’s

DSC Computing Systems Just installed 128 node Haswell based system (Juliet) –128 GB memory per node –Substantial conventional disk per node (8TB) plus PCI based SSD –Infiniband with SR-IOV –24 and 36 core nodes (3456 total cores) Working with SDSC on NSF XSEDE Comet System (Haswell 47,776 cores) Older machines –India (128 nodes, 1024 cores), Bravo (16 nodes, 128 cores), Delta(16 nodes, 192 cores), Echo(16 nodes, 192 cores), Tempest (32 nodes, 768 cores) with large memory, large disk and GPU Optimized for Cloud research and Large scale Data analytics exploring storage models, algorithms Build technology to support high performance virtual clusters

Cloudmesh Software Defined System Toolkit Cloudmesh Open source supportinghttp://cloudmesh.github.io/ –The ability to federate a number of resources from academia and industry. This includes existing FutureSystems infrastructure, Amazon Web Services, Azure, HP Cloud, Karlsruhe using several IaaS frameworks –IPython-based workflow as an interoperable onramp Supports reproducible computing environments Uses internally Libcloud and Cobbler Celery Task/Query manager (AMQP - RabbitMQ) MongoDB Gregor von Laszewski Fugang Wang

IOTCloud Device  Pub-Sub  Storm  Datastore  Data Analysis Apache Storm provides scalable distributed system for processing data streams coming from devices in real time. For example Storm layer can decide to store the data in cloud storage for further analysis or to send control data back to the devices Evaluating Pub-Sub Systems ActiveMQ, RabbitMQ, Kafka, Kestrel Turtlebot and Kinect

10 year US Stock daily price time series mapped to 3D (work in progress) 3400 stocks 10 large ones shown down up

Protein Universe Browser for COG Sequences with a few illustrative biologically identified clusters 8

3D Phylogenetic Tree from WDA SMACOF