Data Science at Digital Science October 22 2014 Geoffrey Fox Judy Qiu

Slides:



Advertisements
Similar presentations
Big Data Open Source Software and Projects ABDS in Summary VI: Layer 6 Part 2 Data Science Curriculum March Geoffrey Fox
Advertisements

Master of Arts in Data Science Geoffrey Fox for Data Science Program March
Master of Arts in Data Science
Cloudmesh: Software Defined Distributed Systems as a Service SDDSaaS Workshop on the Development of a Next-Generation, Interoperable, Federated Network.
Big Data Open Source Software and Projects Unit 0 Part B: Class Introduction Data Science Curriculum March Geoffrey Fox
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Dibbs Research at Digital Science
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,
Big Data and Clouds: Challenges and Opportunities NIST January Geoffrey Fox
LARGE SCALE DEPLOYMENT OF DAP AND DTS Rob Kooper Jay Alemeda Volodymyr Kindratenko.
Experimenting with FutureGrid CloudCom 2010 Conference Indianapolis December Geoffrey Fox
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science
Data Science at Digital Science November Geoffrey Fox Judy Qiu
Scientific Computing Environments ( Distributed Computing in an Exascale era) August Geoffrey Fox
FutureGrid Connection to Comet Testbed and On Ramp as a Service Geoffrey Fox Indiana University Infra structure.
Building Effective CyberGIS: FutureGrid Marlon Pierce, Geoffrey Fox Indiana University.
SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox
1 CReSIS Lawrence Kansas February Geoffrey Fox (PI) Computer Science, Informatics, Physics Chair Informatics Department Director Digital Science.
Big Data Open Source Software and Projects ABDS in Summary IV: Level 7 I590 Data Science Curriculum August Geoffrey Fox
Looking at Use Case 19, 20 Genomics 1st JTC 1 SGBD Meeting SDSC San Diego March Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox
Computing Research Testbeds as a Service: Supporting large scale Experiments and Testing SC12 Birds of a Feather November.
Recipes for Success with Big Data using FutureGrid Cloudmesh SDSC Exhibit Booth New Orleans Convention Center November Geoffrey Fox, Gregor von.
Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics,
SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu
Big Data to Knowledge Panel SKG 2014 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China August Geoffrey Fox
Training Data Scientists DELSA Workshop DW4 May Washington DC Geoffrey Fox Informatics, Computing.
Built on the Powerful Microsoft Azure Platform, Forensic Advantage Helps Public Safety and National Security Agencies Collect, Analyze, Report, and Distribute.
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center 1.
Ben Rogers August 18,  High Performance Computing  Data Storage  Hadoop Pilot  Secure Remote Desktop  Training Opportunities  Grant Collaboration.
1 Panel on Merge or Split: Mutual Influence between Big Data and HPC Techniques IEEE International Workshop on High-Performance Big Data Computing In conjunction.
Geoffrey Fox Panel Talk: February
Panel: Beyond Exascale Computing
Accelerated B.S./M.S An approved Accelerated BS/MS program allows an undergraduate student to take up to 6 graduate level credits as an undergraduate.
Digital Science Center
Private Public FG Network NID: Network Impairment Device
SPIDAL Analytics Performance February 2017
Digital Science Center II
Status and Challenges: January 2017
Implementing parts of HPC-ABDS in a multi-disciplinary collaboration
NSF start October 1, 2014 Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Indiana University.
Department of Intelligent Systems Engineering
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
Department of Intelligent Systems Engineering
Digital Science Center I
HPSA18: Logistics 7:00 am – 8:00 am Breakfast
I590 Data Science Curriculum August
Versatile HPC: Comet Virtual Clusters for the Long Tail of Science SC17 Denver Colorado Comet Virtualization Team: Trevor Cooper, Dmitry Mishin, Christopher.
High Performance Big Data Computing in the Digital Science Center
RaPyDLI: Rapid Prototyping HPC Environment for Deep Learning NSF
Research in Intelligent Systems Engineering
NSF Dibbs Award 5 yr. Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science IU(Fox, Qiu, Crandall, von Laszewski),
Data Science Curriculum March
Tutorial Overview February 2017
Data Science for Life Sciences Research & the Public Good
Martin Swany Gregor von Laszewski Thomas Sterling Clint Whaley
Research in Digital Science Center
Scalable Parallel Interoperable Data Analytics Library
4 Education Initiatives: Data Science, Informatics, Computational Science and Intelligent Systems Engineering; What succeeds? National Academies Workshop.
Department of Intelligent Systems Engineering
$1M a year for 5 years; 7 institutions Active:
PHI Research in Digital Science Center
Panel on Research Challenges in Big Data
Big Data, Simulations and HPC Convergence
Research in Digital Science Center
Geoffrey Fox High-Performance Big Data Computing: International, National, and Local initiatives COLLABORATORS China and IU: Fudan University, SICE, OVPR.
Research in Digital Science Center
Convergence of Big Data and Extreme Computing
Presentation transcript:

Data Science at Digital Science October Geoffrey Fox Judy Qiu School of Informatics and Computing Digital Science Center Indiana University Bloomington

IU Data Science Masters Features Fully approved by University and State October Blended online and residential (any combination) – Online offered at Residential rates (~$1100 per course) Informatics, Computer Science, Information and Library Science in School of Informatics and Computing and the Department of Statistics, College of Arts and Science, IUB 30 credits (10 conventional courses) Basic (general) Masters degree plus tracks – Currently only track is “Computational and Analytic Data Science ” – Other tracks expected such as m A purely online 4-course Certificate in Data Science has been running since January 2014 (Technical and Decision Maker paths) with 75 students total in 2 semesters A Ph.D. Minor in Data Science has been proposed. Managed by Faculty in Data Science: expand to full campus

DSC Computing Systems Working with SDSC on NSF XSEDE Comet System (Haswell) Purchasing 128 node Haswell based system (Juliet) – GB memory per node – Substantial conventional disk per node (8TB) plus SSD – Infiniband SR-IOV – Lustre access to UITS facilities Older machines – India (128 nodes, 1024 cores), Bravo (16 nodes, 128 cores), Delta(16 nodes, 192 cores), Echo(16 nodes, 192 cores), Tempest (32 nodes, 768 cores) with large memory, large disk and GPU – Cray XT5m with 672 cores Optimized for Cloud research and Data analytics exploring storage models, algorithms Bare-metal v. Openstack virtual clusters Extensively used in Education

Cloudmesh Software Defined System Toolkit Cloudmesh Open source supportinghttp://cloudmesh.github.io/ – The ability to federate a number of resources from academia and industry. This includes existing FutureSystems infrastructure, Amazon Web Services, Azure, HP Cloud, Karlsruhe using several IaaS frameworks – IPython-based workflow as an interoperable onramp Supports reproducible computing environments Uses internally Libcloud and Cobbler Celery Task/Query manager (AMQP - RabbitMQ) MongoDB

Two NSF Data Science Projects 3 yr. XPS: FULL: DSD: Collaborative Research: Rapid Prototyping HPC Environment for Deep Learning IU, Tennessee (Dongarra), Stanford (Ng) “Rapid Python Deep Learning Infrastructure” (RaPyDLI) Builds optimized Multicore/GPU/Xeon Phi kernels (best exascale dataflow) with Python front end for general deep learning problems with ImageNet exemplar. Leverage Caffe from UCB. 5 yr. Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science IU, Rutgers (Jha), Virginia Tech (Marathe), Kansas (CReSIS), Emory (Wang), Arizona State(Beckstein), Utah(Cheatham) HPC-ABDS: Cloud-HPC interoperable software performance of HPC (High Performance Computing) and the rich functionality of the commodity Apache Big Data Stack. SPIDAL (Scalable Parallel Interoperable Data Analytics Library): Scalable Analytics for Biomolecular Simulations, Network and Computational Social Science, Epidemiology, Computer Vision, Spatial Geographical Information Systems, Remote Sensing for Polar Science and Pathology Informatics.