SYSTEMS SUPPORT FOR GRAPHICAL LEARNING Ken Birman 1 CS6410 Fall 2014 9/18/2014.

Slides:



Advertisements
Similar presentations
Consumer-Centric Knowledge Web A Vision of Consumer Applications of Software Agent Technology - Enabling Consumer-Centric Knowledge-Based Computing Jack.
Advertisements

epiC: an Extensible and Scalable System for Processing Big Data
Oracle Labs Graph Analytics Research Hassan Chafi Sr. Research Manager Oracle Labs Graph-TA 2/21/2014.
Nokia Technology Institute Natural Partner for Innovation.
FUTURE TECHNOLOGIES Lecture 13.  In this lecture we will discuss some of the important technologies of the future  Autonomic Computing  Cloud Computing.
Distributed Graph Analytics Imranul Hoque CS525 Spring 2013.
DryadLINQ A System for General-Purpose Distributed Data-Parallel Computing Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep.
Optimus: A Dynamic Rewriting Framework for Data-Parallel Execution Plans Qifa Ke, Michael Isard, Yuan Yu Microsoft Research Silicon Valley EuroSys 2013.
Ken Birman. Massive data centers We’ve discussed the emergence of massive data centers associated with web applications and cloud computing Generally.
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
Ken Birman Cornell University. CS5410 Fall
Eugene Meidinger Execution Plans
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Computer Science Prof. Bill Pugh Dept. of Computer Science.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.
Project Proposal (Title + Abstract) Due Wednesday, September 4, 2013.
Web Search Engines and Information Retrieval on the World-Wide Web Torsten Suel CIS Department Overview: introduction.
3DAPAS/ECMLS panel Dynamic Distributed Data Intensive Analysis Environments for Life Sciences: June San Jose Geoffrey Fox, Shantenu Jha, Dan Katz,
Introduction. Readings r Van Steen and Tanenbaum: 5.1 r Coulouris: 10.3.
SilverLining. Stuff we're covering Hardware infrastructure and scaling Cloud platform as a service The SilverLining Project.
Ch 4. The Evolution of Analytic Scalability
S EEQ C ORPORATION Big Data Oregon Connections Telecommunications Conference Dustin Johnson October 23, 2014.
Introduction to Computer and Programming CS-101 Lecture 6 By : Lecturer : Omer Salih Dawood Department of Computer Science College of Arts and Science.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Cloud Distributed Computing Environment Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
SYSTEMS SUPPORT FOR GRAPHICAL LEARNING Ken Birman 1 CS6410 Fall /18/2014.
© 2007 Pearson Addison-Wesley. All rights reserved 0-1 Spring(2007) Instructor: Qiong Cheng © 2007 Pearson Addison-Wesley. All rights reserved.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Introduction to Hadoop and HDFS
Storage and Analysis of Tera-scale Data : 2 of Database Class 11/24/09
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Large scale IP filtering using Apache Pig and case study Kaushik Chandrasekaran Nabeel Akheel.
Major Disciplines in Computer Science Ken Nguyen Department of Information Technology Clayton State University.
CS5412: SHEDDING LIGHT ON THE CLOUDY FUTURE Ken Birman 1 Lecture XXV.
Future of parallel computing: issues and directions Laxmikant Kale CS433 Spring 2000.
Distributed Graph Simulation: Impossibility and Possibility 1 Yinghui Wu Washington State University Wenfei Fan University of Edinburgh Southwest Jiaotong.
CS 127 Introduction to Computer Science. What is a computer?  “A machine that stores and manipulates information under the control of a changeable program”
Master’s Degree in Computer Science. Why? Acquire Credentials Learn Skills –Existing software: Unix, languages,... –General software development techniques.
1 Specialized Machine Learning Topics Lantz Ch 12 Wk 6, Part 2 Above – Specialized bicycle – a tandem track bike. Note that the seats are not adjustable,
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Carnegie Mellon University Computer Science Foundations for Ph.D. Students The Carnegie Mellon Perspective Computer Science Foundations for Ph.D. Students.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
1 Isolating Web Programs in Modern Browser Architectures CS6204: Cloud Environment Spring 2011.
11 Introduction to Neo4j. 2 We all have our own graphs...
CS507 Information Systems. Lesson # 11 Online Analytical Processing.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
1 Cloud Systems Panel at HPDC Boston June Geoffrey Fox Community Grids Laboratory, School of informatics Indiana University
1 TCS Confidential. 2 Objective : In this session we will be able to learn:  What is Cloud Computing?  Characteristics  Cloud Flavors  Cloud Deployment.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
Cloud Computing: Pay-per-Use for On-Demand Scalability Developing Cloud Computing Applications with Open Source Technologies Shlomo Swidler.
BIG DATA/ Hadoop Interview Questions.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
Abstract Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for.
Ch 4. The Evolution of Analytic Scalability
Pregelix: Think Like a Vertex, Scale Like Spandex
Parallel Applications And Tools For Cloud Computing Environments
Overview of big data tools
Panel on Research Challenges in Big Data
CS 239 – Big Data Systems Fall 2018
Presentation transcript:

SYSTEMS SUPPORT FOR GRAPHICAL LEARNING Ken Birman 1 CS6410 Fall /18/2014

Graphical models and applications CS5412 Spring 2014 (Cloud Computing: Birman) 2  Artificial intelligence and machine learning is the core technology in many modern cloud settings  Support for social networking mechanisms  Creating product placement recommendations  Understanding the flow of “influence” within communities  Graphical processing can also matter in systems  Understand what to cache and what not to cache  Learning common patterns to optimize

What makes this hard? CS5412 Spring 2014 (Cloud Computing: Birman) 3  Prior generation of solutions was too general  Programming languages can do anything, but they aren’t at all specialized for graph structured data  Database systems are awesome for tabular data but much less optimized for graphical data  There is also an issue of scale  We’re good at what can be done on one computer  But a company like Facebook has billions of users and their infrastructure runs on massive data centers

Today’s papers CS5412 Spring 2014 (Cloud Computing: Birman) 4  TAO paper (I’ll start with this) gives a sense of the challenge Facebook confronts  Like an entire distributed operating system  But the whole role of the solution is to manage graphical data and support queries against it  Massive loads and surreal scale  Things to notice?  How does the architecture of the solution reflect the special environment in which it runs?  How did they identify and optimize the critical paths?

Dryad/LINQ CS5412 Spring 2014 (Cloud Computing: Birman) 5  Here we see two concepts combined  At Microsoft, LINQ has become very popular  It embeds a kind of query processing into C# code  Dryad takes this one step further  Given a LINQ expression, Dryad can run it on a distributed “computing engine” of their own design  Idea is to obtain massive parallelism

Basic architecture of Dryad CS5412 Spring 2014 (Cloud Computing: Birman) 6

Execution of a LINQ expression CS5412 Spring 2014 (Cloud Computing: Birman) 7

A join, done in two ways CS5412 Spring 2014 (Cloud Computing: Birman) 8

A join, done in two ways CS5412 Spring 2014 (Cloud Computing: Birman) 9

MapReduce in Dryad/LINQ CS5412 Spring 2014 (Cloud Computing: Birman) 10

Other major systems in this space CS5412 Spring 2014 (Cloud Computing: Birman) 11  Check out  They list 50 or so graphical databases and processing systems  Some popular ones in research settings are Pregel (from Google), GraphLab (CMU) and Vowpal Wabbit (“Fast Learning”) (Yahoo)

Take aways CS5412 Spring 2014 (Cloud Computing: Birman) 12  Computer systems need to be responsive to  Styles of use (what our “customers” are doing)  Common patterns of load (optimize for this case)  In today’s major cloud computing settings, graphical data and graphical learning solutions are becoming a highly dominant form of load and focus  Computer systems need to evolve to track this need