Download presentation
Presentation is loading. Please wait.
Published byAlbert Boone Modified over 9 years ago
1
An Overview of PTI at Indiana University Mayfield Visit 12.12.12 Beth Plale PTI Managing Director William K. Barnett Director, Science Community Tools, Research Technologies Robert H. McDonald Associate Dean for Library Technologies
2
PTI Fact Sheet Pervasive Technology Institute employs about 120 full time employees At any one time PTI has over 70 graduate research assistants engaged in research in one of the PTI centers Total amount of active grants from external sources in PTI as of 30 November 2012 is $72,641,407 PTI outreach activities, which number over 100 every year, reach 10,000 people, the majority located in Indiana
3
What is making Big Data? Some driving forces: Patient records growing fast (70PB pathology); network graphs from Internet leading to community detection Large Hadron Collider (Switzerland, physics): analysis is mainly through creating histogram charts Commercial: Google, Bing have largest data analytics in world Time Series: Earthquakes, Twitter tweets, Stock Market Image Processing: from climate simulations to NASA to DoD to Radiology Financial decision support: marketing; fraud detection; automatic preference detection (map users to books, films) -From Professor Geoffrey Fox, director of the Digital Science Center in PTI
4
PTI Big Data Strategy Developing new technology – New system implementations – New software and technology Training 21 st century workforce – People with strong analytical and technical skills in statistics and machine learning who can analyze large volumes of data to derive business (and other) insights – Data-savvy managers and analysts who have the skills to be effective consumers of big data insights and who are capable of posing the right questions for analysis, interpreting and challenging the results, and making appropriate decisions – Technology personnel who develop, implement, and maintain the hardware and software tools needed to make and use big data.
5
Big Data Needs Big Storage Big Data requires Big Storage Spreading this data over many machines and servers lets them share the work – Data storage, like computing, can take advantage of parallelism IU has operated the Data Capacitor since 2006, providing high-end performance for Big Data and Big Science. Just announced upgrade to 5 PetaBytes of storage – this much data on CDs would make a stack 5 miles high
6
Big Data Needs Big Computation Big Red II – first university-funded, university owned supercomputer capable of 1 PetaFLOPS (a thousand trillion mathematical operations per second). It would take one person, doing one calculation with a calculator, 31 trillion years to do what Big Red II will be able to do in a second Big Red II and Data Capacitor II provide the system resources to address Big Data challenges
7
Networked Data Access Monon100 100 Gigabit per second connection to Internet2 (Indiana first state to announce!). Moves data FAST IU has leveraged its significant network expertise to make the Data Capacitor available to users at IU, in Indiana, nationally, and internationally – High-performance networks together with high-performance data storage for a “data cloud” for Big Science To keep pace with the tremendous growth in data, we must stay on the cutting edge of computing, storage and network technologies – We can’t sit still or we will be crushed by the data deluge! PTI is a leader in developing and integrating the latest approaches across these various technology domains
8
Dealing with Big Data - NSF DataNet Program Motivation: “… one of the major challenges of this scientific generation: how to develop the new methods, management structures and technologies to manage the diversity, size, and complexity of current and future data sets and data streams.” Response: DataNet creates “a set of exemplar national and global data research infrastructure organizations” to address this challenge.
9
SEAD Approach to DataNet Challenges SEAD Partners - http://sead-data.net Contribute infrastructure to the NSF DataNet vision that supports data access, sharing, reuse, and preservation for the long tail Develop a data access and preservation environment that supports the research, technical, and economic requirements for data management in the long tail Enable Active and Social Curation Utilize emerging preservation and access infrastructures
10
SEAD Social Networking/Virtual Archive at IU Active Curation Repository Virtual Archive SPARQL Endpoint VA UI SWORD Endpoint Query Endpoint ACR UI Query DOI Metadata Time Mark Data For Publication (and Accept Licensing Terms) (SPARQL) Query Metadata Return Metadata Curator Preview User Queries VA for DOI Metadata update and View Ingest Data To VA Curator Curator Request for Preview
11
RoCE [rok-ee] Demonstration at SC12 At SC’12, we demonstrated a data system capable of moving enough data to stream ~1000 high-definition Blu-ray movies at once – This was possible previously, but our approach reduced the server stack required from 6 feet tall to about 9 inches tall, reducing power and increasing efficiency -We deployed this system in collaboration with Orange Telecom (the telephone company in France), who offers service worldwide and fields a research office in San Francisco -Many of their clients do video distribution and our solution eliminates much of the customized, expensive and power-hungry hardware they currently use
12
RoCE [rok-ee] Technology Our approach integrated years of experience tuning networks and filesystem with an emerging protocol called RoCE (pronounced “Rocky”) RoCE eliminates many sources of overhead and inefficiency in the venerable Internet Protocols – If the Internet is like a highway full of cars and trucks, RoCE is like an Indy car pulling a semi trailer! The expertise required to tune and operate a system with Lustre and RoCE is significant, and we at IU were the first to demonstrate it working over a long distance – We focus today on making it work for Big Data and Big Science, and tomorrow on automating it for a wider audience
13
NATIONAL CENTER FOR GENOME ANALYSIS SUPPORT (NCGAS) Sequencing a human genome cost $95M in 2001. Now it costs $5,000 Genomics are now part of most biology and all disease research. Sequences are huge, each one is 250 Gigabytes (IU has the storage) and need supercomputers to analyze (IU has the supercomputers). Researchers don’t know how to use supercomputers – we help them The National Science Foundation has provided $1.5M for us to support genomics analysis.
14
FutureGrid Motivation: FutureGrid will make it possible for researchers to conduct experiments by submitting an experiment plan that is then executed via a sophisticated workflow engine, preserving the provenance and state information necessary to allow reproducibility. Response: The FutureGrid Project provides a distributed test-bed of networked HPC resources that makes it possible for researchers to tackle complex research challenges in computer science related to the use and security of grids and clouds.
15
What Does FutureGrid Offer? Traditional HPC and Grid computing support Cloud platforms – Nimbus, Eucalyptus, OpenStack GPU computing Dynamic Provisioning through RAIN and RAIN-MOVE – Image Generation and Registration – Generic Image Repository – Image Deployment Experiment Management Information Services and Performance tools Networks device and virtual networks tools A convenient portal for easy account and project management Help and Support via a ticket system
16
Big Data & 21 st century economy - PTI creates high quality jobs 1,108 person-years of employment supported by grants & contracts since 1999
17
Q & A on PTI
18
PTI Impact at IU
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.