Overview of MSR External Research Earth, Energy, and MSR Environmental Ecosystem Conceptual Model Projects Trident GrayWulf Dyrad and DryadLinq.

Slides:



Advertisements
Similar presentations
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Advertisements

Business logic Datacenter Respond to hardware failures Add storage capacity Handle increase in traffic Diagnose service failures Apply OS.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Feature: Reprint Outstanding Transactions Report © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product.
Feature: Purchase Requisitions - Requester © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
Web RoleWorker Role At runtime each Role will execute on one or more instances A role instance is a set of code, configuration, and local data, deployed.
MIX 09 4/15/ :14 PM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Feature: Payroll and HR Enhancements © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.
Parametric Sweeps Cluster SOA MPI LINQ to HPC Excel Cluster Deployment Monitoring Diagnostics Reporting Job submission API and portal.
Amalga UIS Modules Medical Imaging Research Foundation Quality Measures Other HealthVault Partner Applications Microsoft Partner Solutions.
Interactivity Navigating a data model Working with large quantities of data Entry Editing and adding data User feedback and validation Presentation.
Co- location Mass Market Managed Hosting ISV Hosting.
SQL Server 2008 Overview Lubor Kollar, Group Program Manager.
Windows 7 Training. Windows ® 7 Compatibility Installer Detection.
Windows 7 Training Microsoft Confidential. Windows ® 7 Compatibility Version Checking.
Feature: Purchase Order Prepayments II © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
Introduction to Big Data and Hadoop Name Title Microsoft Corporation.
Addressing World-Scale Challenges Computation as a powerful change agent in areas such as Energy, Environment, Healthcare, Education Collaboration.
Feature: OLE Notes Migration Utility
Feature: Web Client Keyboard Shortcuts © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
Get more control & flexibility of the Windows Azure environment Developers IT Pros Easier migration of existing Windows applications to Windows Azure.
Feature: SmartList Usability Enhancements © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
Session 1.
Built by Developers for Developers…. © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
 Rico Mariani Architect Microsoft Corporation.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
Feature: Assign an Item to Multiple Sites © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
Connect with life Connect with life
NEXT: Overview – Sharing skills & code.
1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft.
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or.
Feature: Document Attachment –Replace OLE Notes © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product.
Operating System for the Cloud Runs applications in the cloud Provides Storage Application Management Windows Azure ideal for applications needing:
Feature: Suggested Item Enhancements – Sales Script and Additional Information © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows.
Feature: Customer Combiner and Modifier © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
SQL Server SQL Azure Visual Studio“Quadrant” SQL Server Modeling Services Entity Framework ADO.NET“M”/EDM Data Services …
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.
demo Instance AInstance B Read “7” Write “8”

Kenny Wolf Architect WCF and WF
customer.
demo © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.
demo Demo.
demo QueryForeign KeyInstance /sm:body()/x:Order/x:Delivery/y:TrackingId1Z
Feature: Suggested Item Enhancements – Analysis and Assignment © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and.
projekt202 © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are.
The CLR CoreCLRCoreCLR © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product.
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks.
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or.

7/23/ :49 AM © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Data Platform and Analytics Foundational Training
Возможности Excel 2010, о которых следует знать
Windows Azure 講師: 李智樺, Ruddy Lee
Title of Presentation 12/2/2018 3:48 PM
8/04/2019 9:13 PM © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Виктор Хаджийски Катедра “Металургия на желязото и металолеене”
PENSACOLA ENERGY WORK PLAN OCTOBER 10, 2016
Developing Windows Azure Applications with Visual Studio
Шитманов Дархан Қаражанұлы Тарих пәнінің
Title of Presentation 5/24/2019 1:26 PM
Day 2, Session 2 Connecting System Center to the Public Cloud
Office 365 Development July 2014.
日本初公開!? Vista の新機能を実演 とっちゃん わんくま同盟 7/23/2019 9:09 AM
Presentation transcript:

Overview of MSR External Research Earth, Energy, and MSR Environmental Ecosystem Conceptual Model Projects Trident GrayWulf Dyrad and DryadLinq

Research locations : Redmond, Washington (Sept, 1991) San Francisco, California (Jun, 1995) Cambridge, United Kingdom (July, 1997) Beijing, China(Nov, 1998) Silicon Valley, California (July, 2001) Bangalore, India (Jan, 2005) Cambridge, Massachusetts(July, 2008) MSR New England MSR Asia MSR India

Division within Microsoft Research focused on partnerships between academia, industry and government to advance computer science, education, and research in fields that rely heavily upon advanced computing Supporting groundbreaking research to help advance human potential and the wellbeing of our planet Developing advanced technologies and services to support every stage of the research process Microsoft External Research is committed to interoperability and to providing open access, open tools, and open technology

Core Computer Science Earth, Energy & Environment Education & Scholarly Communication Health & Wellbeing Advanced Research Tools and Services Community and Geographic Outreach

Visualizing and Experiencing E 3 Data + Information: Provide a unique experience to reduce time to insight and knowledge through visualizing data and information Accessible Data: Ensure E 3 data (remote and local sensing) is easily accessible and consumable in the scientists domain Enabling Scientific Collaboration: Look at new ways to enable collaboration in scientific virtual organizations Earth, Energy & Environment

7 Action Knowledge Inform

8 AnalysisInsightPublishData Action Knowledge Communicate Decide Implement Inform

Each of these potentially impacts the technology, user interface, and API design ● I want to visualize ocean processes and share my analysis. I want to do this more than once and get exactly the same answer. I want to do this more than once, but don’t care if I get exactly the same answer. I’m only going to do this once and don’t care about keeping the data or the results long term (but I need to remember the inputs); I want to store the data in I want full provenance to validate a result, OPM compliant; I want to use my own provenance management system; Each group may wish a different UI (no WF), or authoring tool I only want NCAR, MBARI, etc. data because I trust it. I know that Jon really wants my results to drive his model and I want to share my workflow and executables.

Visually program workflows. Libraries of activities and workflows, to save and reuse workflows. Abstract parallelism for HPC, to test on desktop and then run on cluster. Automatic provenance capture, for all workflows and data products. Integrated data storage and access, allows researcher to store data on a SQL database, local files or in the cloud (Microsoft SDS, Amazon S3). Reproducible research Composition Space Activity Library Workflow Library Data Options & Sharing

PanSTARRs (Astronomy) One of the largest visible light telescopes Four unit telescopes acting as one One Gigapixel per telescope Survey entire visible universe in 1 week Catalog solar system, moving objects/asteroids ps1sc.org: Univ. Hawaii, Johns Hopkins, …

1 PB of raw image data/year 2.5 TB image data | 1000 images | 150 M detections / night 30 TB of processed data per year 5.5 Billion celestial objects 350 Billion detections The largest astronomy DB in the world! And the platform to build it upon! Telescope Telescope diameter (m) Effective collecting area (m 2 ) [A] Solid angle subtended by field of view (deg 2 ) [D] Nominal image quality (arcsec) [Q] The survey power [AD/Q 2 ] Status UH 2.2-m/PFCam Palomar/QUEST CFHT/Megacam Active Subaru/Suprimam Active Pan-STARRS DMT/LSST

Software & Hardware design principles for data intensive science Enhances BeoWulf model with storage co-located with commodity HPC nodes Databases for fast queries on index High sequential I/O bandwith for varying query patterns Scale out instead of Scale up The GrayWulf name pays tribute to Jim Gray who was actively involved in the defining these design principles.

GrayWulf Shared Compute Resources Shared Queryable Data Store Configuration Management, Health and Performance Monitoring Operator User Interface User Interface Data Valet User Interface VALETWORKFLOWVALETWORKFLOW USER WORKFLOWUSER WORKFLOW User Storage Data Flow Control Flow Data Valet Queryable Data Store User Queryable Data Store

Cluster - Scheduling & Monitoring Windows HPC 2008 Cluster Database - Shared Domain DBs & User MyDBs SQL Server 2008 Trident Workflow Workbench Windows Workflow Foundations, Composer, Registry, Provenance/Logging Common data management library Domain specific user interfaces Scientists, Data Valets, System Operations

3000 node cluster 12,000 cores (36 x cycles/sec) 48 terabytes of RAM 9 petabytes of persistent storage

Continuously deployed since 2006 Running on >> 10 4 machines Sifting through > 10Pb data daily Runs on clusters > 3000 machines Handles jobs with > 10 5 processes each Used by >> 100 developers Rich platform for data analysis Microsoft Research, Silicon Valley Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly

Automatic plan generated by DryadLINQ Automatic distributed execution by Dryad Programmer writes sequential C#, VB,… code – System figures out the data-parallelism – Manages execution, traditional parallel-DB tricks

A radical approach to programming at scale Nodes talk to each other as little as possible (shared nothing) Programmer is not allowed to communicate between nodes Data is spread throughout machines in advance, computation happens where it’s stored. Master program divvies up tasks based on location of data, schedules tasks on same machine as the data resides, or at least same rack, detects worker failures and restarts, load balances, redundant execution, etc…

The goal of the analysis is to execute a set of analysis functions on a collection of data files produced by high-energy physics experiments Histogramming of events from large data set (TBs) DryadLINQ program provides easy way to distribute the computation on the cluster

Broad academic/research Dryad and DryadLINQ ( binary for now, source release in planning) With tutorials, programming guides, sample codes, libraries, and a community site.

© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.