Preparing for the Poster Session Gagan Agrawal. Outline Background on the proposal Overall research focus Equipment requested Preparing for the Site Visit.

Slides:



Advertisements
Similar presentations
INDIANAUNIVERSITYINDIANAUNIVERSITY GENI Global Environment for Network Innovation James Williams Director – International Networking Director – Operational.
Advertisements

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
Introduction CSCI 444/544 Operating Systems Fall 2008.
Information Technology Center Introduction to High Performance Computing at KFUPM.
Build VIVO in the Cloud NIH Workshop on Value Added Services for VIVO Brand Niemann Semantic Community March 25-26,
The Virtual Microscope Umit V. Catalyurek Department of Biomedical Informatics Division of Data Intensive and Grid Computing.
Using Metacomputing Tools to Facilitate Large Scale Analyses of Biological Databases Vinay D. Shet CMSC 838 Presentation Authors: Allison Waugh, Glenn.
NGNS Program Managers Richard Carlson Thomas Ndousse ASCAC meeting 11/21/2014 Next Generation Networking for Science Program Update.
Real Parallel Computers. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel.
New Challenges in Cloud Datacenter Monitoring and Management
UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.
Computer System Architectures Computer System Software
1 Advanced Storage Technologies for High Performance Computing Sorin, Faibish EMC NAS Senior Technologist IDC HPC User Forum, April 14-16, Norfolk, VA.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Research Support Services Research Support Services.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Shared Memory Parallelization of Decision Tree Construction Using a General Middleware Ruoming Jin Gagan Agrawal Department of Computer and Information.
Example: Sorting on Distributed Computing Environment Apr 20,
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Impact of High Performance Sockets on Data Intensive Applications Pavan Balaji, Jiesheng Wu, D.K. Panda, CIS Department The Ohio State University Tahsin.
Computer Science and Engineering Predicting Performance for Grid-Based P. 1 IPDPS’07 A Performance Prediction Framework.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
(High-End) Computing Systems Group Department of Computer Science and Engineering The Ohio State University.
FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.
High-level Interfaces and Abstractions for Data-Driven Applications in a Grid Environment Gagan Agrawal Department of Computer Science and Engineering.
ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.
11 January 2005 High Performance Computing at NCAR Tom Bettge Deputy Director Scientific Computing Division National Center for Atmospheric Research Boulder,
System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Gagan Agrawal Department of Computer and Information Sciences Ohio.
High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
Packet Size optimization for Supporting Coarse-Grained Pipelined Parallelism Wei Du Gagan Agrawal Ohio State University.
AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.
Research Overview Gagan Agrawal Associate Professor.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
Background Computer System Architectures Computer System Software.
Data Infrastructure Building Blocks (DIBBS) NSF Solicitation Webinar -- March 3, 2016 Amy Walton, Program Director Advanced Cyberinfrastructure.
PEER 2003 Meeting 03/08/031 Interdisciplinary Framework Major focus areas Structural Representation Fault Systems Earthquake Source Physics Ground Motions.
Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.
System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services Sivaramakrishnan Narayanan, Tahsin Kurc, Umit Catalyurek and Joel Saltz.
Accessing the VI-SEEM infrastructure
Strategies for NIS Development
D I S C O V E R Y Challenge.
Grid Computing.
Constructing a system with multiple computers or processors
Communication and Memory Efficient Parallel Decision Tree Construction
Optimizing MapReduce for GPUs with Effective Shared Memory Usage
Support for ”interactive batch”
CLUSTER COMPUTING.
Constructing a system with multiple computers or processors
Compiler Supported Coarse-Grained Pipelined Parallelism: Why and How
Resource Allocation for Distributed Streaming Applications
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
LCPC02 Wei Du Renato Ferreira Gagan Agrawal
Cluster Computers.
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Preparing for the Poster Session Gagan Agrawal

Outline Background on the proposal Overall research focus Equipment requested Preparing for the Site Visit

Background A proposal submitted to the National Science Foundations (NSF) CISE Research Infrastructure program The program targets research equipment for multi- investigator teams doing experimental computer science - typically fund 4-5 US universities each year After initial review of proposals, a set of universities receive a site visit. Final selection based upon the site visit

History on the Proposal Proposal involving 14 faculty / senior researchers across CIS, BMI, and OSC (Principal Investigators: Panda, Agrawal, Sadayappan, Shen, Saltz) Proposal submitted in October 2002 (a 105 page document in all !) Total request to NSF: $1,350,000 (+ matching from state of Ohio and OSU) All funds for equipment and one full time support person to manage the equipment Rated as one of the top three proposals among 22 submissions this year 8 universities are getting site visit, 4-6 to be funded

Site Visit Schedule Scheduled for 10 th March, will involve two NSF program managers and 2 experts from other universities Agenda: Presentations about the department, our research, requested equipment Discussion about our education programs, diversity, etc. Meeting with Dean and Vice Provost for research Tour of facilities and demos A student poster session

Motivation / Goals for Poster Session Graduate education is a key mission of NSF – they want to fund where it will make a difference on graduate education Opportunity to show research beyond talks from PIs A further opportunity to demonstrate a vibrant group of experimental computer science researchers A further opportunity to stress our need for equipment

Why Should You Care New equipment should help your research Having an award like this will give more visibility to our group / department (will help you when you look for a job) A good opportunity to present your work Posters can be reused for open houses, etc. Your advisor will be unhappy if you don’t do a good job 

Rest of this talk A big research picture that was put in the proposal Required to show an overall vision / synergy among the investigators Some details of the equipment and configuration requested Things to bring-out in your poster Some kind of questions you should be prepared for

Overall Research Focus Science and high performance computing are becoming data- driven well recognized, for example in the cyberinfrastructure report Clusters are a cost-effective way for storing large datasets (i.e. serve as data repositories) compute-intensive processing of data. SMPs are also popular architectures for compute-intensive tasks Processing of data may not always be feasible or desirable where data is hosted data repositories may be shared resources may not be the best configuration for compute-intensive tasks

Grid and Cluster Computing Context Separating processing of data from the cluster hosting the data will be the norm in a wide-area (grid) environment However, it may also be done within an organization many users accessing the data different configuration may be better for compute-intensive tasks Support for hosting data at a cluster, and processing the data at another cluster or an SMP machine is critically required a challenging problem Our overall focus

Research Challenges Better intra-cluster communication and I/O support for data intensive and interactive applications, and for allowing shared access to data repositories Need scheduling and resource sharing policies for such an environment Need high-level programming support to use such an environment (middleware, compilers) Algorithms from data intensive application areas (data mining, viz.) need to be modified or tuned for such an environment Need to work with real applications and real datasets to drive the work Many existing individual projects in these directions, but a common infrastructure will help integrate and evaluate the work

The Equipment we are asking for Storage cluster - 24 nodes, 80 TB of storage, located at BMI Compute cluster – 32 nodes, various interconnects (myrinet, quadrics, infiniband) located at CIS SMP machine - approx. 16 CPU machine, located at CIS Visualization equipment (graphics cards, haptic devices) High-speed networking (1.0 Gb) between CIS and BMI, CIS and OSC, and BMI and OSC Storage and compute clusters will be upgraded during the 4 th year of the grant - inter-site networking up to 10 Gb

Overall Configuration

16-Dual Pentium 1.0 GHz Compute Servers Visualization Server Video Wall 9-Dual Pentium 1.0 GHz + Terabytes of storage Data Server Ohio Supercomputer Center (Production Clusters + Storage Cluster) Gigabit Ethernet Myrinet GigaNet Configuration Within CIS Myrinet (Lanai 3) 16-Quad Pentium 700 MHz 16-Dual Pentium 300 MHz 8-Dual Pentium 2.4 GHz Myrinet (Lanai 9) InfiniBand (4) Quadrics (4) Myrinet (Lanai 7) Gigabit Ether (8) Myrinet (Lanai 9)

Rationale Need to experiment with applications on a distributed collection of compute, storage, and visualization resources We want to study architectures for storage clusters and compute clusters, and therefore, want crashable resources Need to work with data-intensive applications with very large datasets, need sufficient storage for those We want to evaluate system software in a distributed and heterogenous environment, but need a set up that will allow repeatable experiments Research will focus on networked clusters (and SMP machines) but is extendable to a more wide area environment through links to OSC, OSC machines, and links from OSC to elsewhere

Proposed Research Overall theme: an integrated approach – support at low-level, incorporated into appropriate programming systems, driven or enhanced by research at algorithms level, and tested by end applications Four components: Communication and I/O (Panda, Lauria, Wyckoff ) Middleware and Programming Systems (Saltz, Kurc, Catalyurek, Agrawal, Saday) Data Intensive algorithms (or application areas) – Srini, Hakan, Agrawal, Han-Wei, Raghu, Stredney (?) End applications: Saltz et al, Stredney, Raghu, Saday, Han- wei (?)

Area 1: Communication and I/O Need to enhance communication and I/O mechanisms Both at the intra-cluster and inter-cluster level Specific needs for data-intensive and interactive applications Components: Support for point-point and collective communication, and synchronization – incorporated at the MPI, DSM layers (Panda) Support for intra and inter cluster QoS (Panda) Support for efficient and parallel I/O at intra and inter- cluster level (Lauria)

Area 2: Middleware and Programming Systems Goal: High-level programming systems and policies are required to utilize multiple clusters and SMP machines Components: Datacutter (Saltz, Kurc, Catalyurek) Compiler support on top of Datacutter (Agrawal et al.) Scheduling task graphs (Saday et al.) Scheduling across multiple tasks (Saday) Multiple Query Optimization (Saltz et al.) Middleware for Datamining (Agrawal) Indexing and declustering for data repositories (Hakan)

Area 3: Data Intensive Algorithms Need to develop and/or fine-tune and/or evaluate algorithms and techniques in the areas of data mining scientific data analysis, and visualization in our proposed environment and on top of the programming systems developed Components: Parallel data mining algorithms, particularly shared memory (Srini, Agrawal) Scientific data analysis (Machiraju, Srini) Visualization and imaging etc. (Han-Wei, Raghu)

Area 4: End Data Intensive Applications We are working with end data-intensive, data-driven, interactive, and/or collaborative applications to evaluate our work at the communication and I/O, programming systems, and algorithm levels to obtain large datasets to demonstrate that our research can benefit end real applications Components: Time-varying scientific data visualization (Han-Wei) Oil reservoir simulation (Saltz) Medical applications (Saltz, Shen, Stredney, Machiraju) Scientific (chemistry) application (Saday) 3-d human scan analysis (Machiraju)

Things to bring out in your posters Interesting experimental computer science research Involving system software, Large datasets, Careful performance analysis on dedicated systems, or Involving a distributed environment Preferably some preliminary experimental results Show we can do quality experimental research Demonstrate need for more equipment, if appropriate (part of future work ?) Mention existing or potential collaborations, if appropriate

Some Questions to be Prepared for What equipment you have used so far ? Do you feel need for any additional equipment ? For systems posters: what benchmarks/applications you might be using in the future See if any of existing work in the areas of visualization, data mining, end applications may be appropriate For algorithm / application posters: what system support you could use for scaling your work, or going to distributed environments See if any of the work on QoS, DataCutter, FREERIDE, Scheduling may be relevant

Some Logistics A rehearsal session on 28 th Feb, 3:30 – 4:30, DL 480 Final site-visit on 10 th March, poster session 1:30 – 2:30 - set up from 11:30 onwards, plan to be available till 3:30 - room TBA Poster size – 30 inch width, 36 inch height – can have 9-12 slides Can use department poster printer (ask your advisor) – don’t use it for rehearsal Be professional during the site visit – no unnecessary talking among yourself, no use of Hindi / Chinese / … Dress code - ?