CIS 6930.008: Internet-Scale Networked Systems Adriana Iamnitchi (Anda)

Slides:



Advertisements
Similar presentations
Performance in Decentralized Filesharing Networks Theodore Hong Freenet Project.
Advertisements

Critical Reading Strategies: Overview of Research Process
Research Seminar Course For MRes and first-year PhD students Spring term January-March Up to 10 weeks, ca.1-2 hours per week
AMAZON S3 FOR SCIENCE GRIDS: A VIABLE SOLUTION? Mayur Palankar and Adriana Iamnitchi University of South Florida Matei Ripeanu University of British Columbia.
S4: A Simple Storage Service for Sciences Matei Ripeanu Adriana Iamnitchi University of British Columbia University of South Florida.
High Performance Computing Course Notes Grid Computing.
Socially-Aware Distributed Systems or Why this Class Collaboration? Anda Iamnitchi
Literature Survey, Literature Comprehension, & Literature Review.
GlobeTraff A traffic workload generator for the performance evaluation of ICN architectures K.V. Katsaros, G. Xylomenos, G.C. Polyzos A.U.E.B. (presented.
Small-World File-Sharing Communities Adriana Iamnitchi, Matei Ripeanu and Ian Foster,
Mining and Searching Massive Graphs (Networks)
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
1 Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes Yunfeng Lin, Ben Liang, Baochun Li INFOCOM 2007.
1 Complex systems Made of many non-identical elements connected by diverse interactions. NETWORK New York Times Slides: thanks to A-L Barabasi.
CS 34701: Large-Scale Networked Systems Professor: Ian Foster TA: Adriana Iamnitchi
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
Are P2P Data-Dissemination Techniques Viable in Today's Data- Intensive Scientific Collaborations? Samer Al-Kiswany – University of British Columbia joint.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
UMass Lowell Computer Science Advanced Algorithms Computational Geometry Prof. Karen Daniels Spring, 2004 Project.
Tracking User Attention in Collaborative Tagging Communities Elizeu Santos-Neto Matei Ripeanu Univesity of British Columbia Adriana Iamnitchi University.
EECE 571R (Spring 2010) Autonomic Computing (Building Self* Systems) Matei Ripeanu matei at ece.ubc.ca.
What is adaptive web technology?  There is an increasingly large demand for software systems which are able to operate effectively in dynamic environments.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 34 – Media Server (Part 3) Klara Nahrstedt Spring 2012.
1 Efficient Management of Data Center Resources for Massively Multiplayer Online Games V. Nae, A. Iosup, S. Podlipnig, R. Prodan, D. Epema, T. Fahringer,
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Data Communications and Networks
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
On the Scale and Performance of Cooperative Web Proxy Caching University of Washington Alec Wolman, Geoff Voelker, Nitin Sharma, Neal Cardwell, Anna Karlin,
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
The Social Hourglass: Enabling Socially-aware Applications and Services Adriana Iamnitchi University of South Florida
CIS : Federated Distributed Systems Adriana Iamnitchi (Anda)
CIS : Federated Distributed Systems Adriana Iamnitchi (Anda)
Lecture 19 Chapter 10 A Portfolio Approach to Managing IT Projects.
1 CS 425 Distributed Systems Fall 2011 Slides by Indranil Gupta Measurement Studies All Slides © IG Acknowledgments: Jay Patel.
A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama.
1 presentation of article: Small-World File-Sharing Communities Article: Adriana Iamnitchi, Matei Ripeanu, Ian Foster Presentation: Periklis Akritidis.
Autonomous Replication for High Availability in Unstructured P2P Systems Francisco Matias Cuenca-Acuna, Richard P. Martin, Thu D. Nguyen
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
Writing the “Results” & “Discussion” sections Awatif Alam Professor Community Medicine Medical College/ KSU.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
The roots of innovation Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on:
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
How to Read Research Papers? Xiao Qin Department of Computer Science and Software Engineering Auburn University
EECE 571e (Fall 2015) (Massively) Parallel Computing Platforms Matei Ripeanu ece.ubc.ca.
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
A Peer-to-Peer Approach to Resource Discovery in Grid Environments (in HPDC’02, by U of Chicago) Gisik Kwon Nov. 18, 2002.
Update on replica management
Caitriana Nicholson, CHEP 2006, Mumbai Caitriana Nicholson University of Glasgow Grid Data Management: Simulations of LCG 2008.
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
Presenting and Analysing your Data CSCI 6620 Spring 2014 Thesis Projects: Chapter 10 CSCI 6620 Spring 2014 Thesis Projects: Chapter 10.
Measurement in the Internet Measurement in the Internet Paul Barford University of Wisconsin - Madison Spring, 2001.
Managing Web Server Performance with AutoTune Agents by Y. Diao, J. L. Hellerstein, S. Parekh, J. P. Bigus Presented by Changha Lee.
MiddleMan: A Video Caching Proxy Server NOSSDAV 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Soam Acharya Inktomi Corporation.
Ian F. C. Smith Preparing a thesis document. 2 Disclaimer This is mostly opinion. Suggestions are incomplete. There are other ways to prepare a thesis.
Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design Authors: Matei Ripeanu Ian Foster Adriana.
March 3, 2009 Network Analysis Valerie Cardenas Nicolson Assistant Adjunct Professor Department of Radiology and Biomedical Imaging.
INFO 4990: Information Technology Research Methods Guide to the Research Literature Lecture by A. Fekete (based in part on materials by J. Davis and others)
File Grouping for Scientific Data Management: Lessons from Experimenting with Real Traces Shyamala Doraimani* and Adriana Iamnitchi University of South.
Computer Networks CNT5106C
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
#16 Application Measurement Presentation by Bobin John.
Netlogo demo. Complexity and Networks Melanie Mitchell Portland State University and Santa Fe Institute.
Course Overview Stephen M. Thebaut, Ph.D. University of Florida Software Engineering.
Research Methods Dr. X.
Objectives of the Course and Preliminaries
On the Scale and Performance of Cooperative Web Proxy Caching
Guidelines for Reports Advanced Constraint Processing
Presentation transcript:

CIS : Internet-Scale Networked Systems Adriana Iamnitchi (Anda)

2 CIS : Internet-Scale Networked Systems (Spring 2008) Contact Info Office: ENB 334 Office hours: Wed 2-4 and by appointment ( me) Course page:

3 CIS : Internet-Scale Networked Systems (Spring 2008) CIS : Course Goals l Primary –Gain deep understanding of fundamental issues that affect design of large-scale federated distributed systems –Map primary contemporary research themes –Gain experience in distributes systems research l Secondary –By studying a set of outstanding papers, build knowledge of how to present research –Learn how to read papers & evaluate ideas

4 CIS : Internet-Scale Networked Systems (Spring 2008) What I’ll Assume You Know l Basic Internet architecture –IP, TCP, DNS, HTTP l Basic principles of distributed computing –Asynchrony (cannot distinguish between communication failures and latency) –Partial global state knowledge (cannot know everything correctly) –Failures happen. In very large systems, even rare failures happen often l If there are things that don’t make sense, ask!

5 CIS : Internet-Scale Networked Systems (Spring 2008) Examples of Distributed Systems ATT webGnutella network The Internet A Sensor Network

6 CIS : Internet-Scale Networked Systems (Spring 2008) Definition (a version) l A distributed system is a collection of autonomous, programmable, failure-prone entities that are able to communicate through a communication medium that is unreliable. –Entity=a process on a device (PC, PDA, mote) –Communication Medium=Wired or wireless network l “Internet-Scale”: – Spanning multiple institutional or network (DNS) domains –(Much) Larger than “cluster”

7 CIS : Internet-Scale Networked Systems (Spring 2008) This semester’s Theme (a proposal) Exploiting Emergent Behavior in Large-Scale Distributed Systems

Filecules and Small Worlds in a Scientific Workload: Characteristics and Significance

9 CIS : Internet-Scale Networked Systems (Spring 2008) Grid: Resource-Sharing Environment l Users: –1000s from 10s institutions –Well-established communities l Resources: –Computers, data, instruments, storage, applications –Owned/administered by institutions l Applications: data- and compute- intensive processing l Approach: common infrastructure

10 CIS : Internet-Scale Networked Systems (Spring 2008) The Problem l We have now: –Mature grid deployments running in production mode l We do not have yet: –Quantitative characterization of real workloads. >How many files, how much input data per process, etc. –And thus, benchmarks, workload models, reproducible results l Costs: –Local solutions, often replicating work –“Temporary” solutions that become permanent –Far from optimal solutions –Impossible to compare alternatives on relevant workloads

11 CIS : Internet-Scale Networked Systems (Spring 2008) Still, Why Should We Care? Partial TopologyRandom 30% dieTargeted 4% die from Saroiu et al., MMCN 2002 l Impossibility results, high costs: Tradeoffs are necessary –Solution: Select tradeoffs based on >User requirements (of course) >Usage patterns l Patterns exist and can be exploited. Examples: –Zipf distribution for request popularity (web caching) Breslau et al., Infocom’99 –Network topology:

12 CIS : Internet-Scale Networked Systems (Spring 2008) The DØ Experiment l High-energy physics data grid l 72 institutions, 18 countries, 500+ physicists l Detector Data –1,000,000 Channels –Event rate ~50 Hz –So far, 1.9 PB of data l Data Processing –Signals: physics events –Events about 250 KB, stored in files of ~1GB –Every bit of raw data is accessed for processing/filtering –Past year overall: 0.6 PB l DØ: –… processes PBs/year –… processes 10s TB/day –… uses 25% – 50% remote computing

Filecules and Small Worlds in Scientific Communities: Characteristics and Significance Joint work with Matei Ripeanu (UBC) and Ian Foster (ANL and UChicago)

14 CIS : Internet-Scale Networked Systems (Spring 2008) “No 24 in B minor, BWV 869” “Les Bonbons” “ Yellow Submarine” “Les Bonbons” “Yellow Submarine” “Wood Is a Pleasant Thing to Think About” “Wood Is a Pleasant Thing to Think About” New metric: The Data-Sharing Graph G m T (V, E):  V is set of users active during interval T  An edge in E connects users that asked for at least m common files within T

15 CIS : Internet-Scale Networked Systems (Spring 2008) Small average path length Large clustering coefficient The DØ Collaboration Small World! CCoef = # Existing Edges # Possible Edges 6 months of traces (January – June 2002) 300+ users, 2 million requests for 200K files

16 CIS : Internet-Scale Networked Systems (Spring 2008) Small-World Graphs l Small path length, large clustering coefficient –Typically compared against random graphs l Think of: –“It’s a small world!” –“Six degrees of separation” l Milgram’s experiments in the 60s l Guare’s play “Six Degrees of Separation”

17 CIS : Internet-Scale Networked Systems (Spring 2008) Other Small Worlds Word co-occurrences Film actors LANL coauthors Internet Web Food web Power grid D. J. Watts and S. H. Strogatz, Collective dynamics of small-world networks. Nature, 393: , 1998 R. Albert and A.-L. Barabási, Statistical mechanics of complex networks, R. Modern Physics 74, 47 (2002).

18 CIS : Internet-Scale Networked Systems (Spring 2008) Web Data-Sharing Graphs 7200s, 50files 3600s, 50files 1800s, 100files 1800s, 10file 300s, 1file Data-Sharing Relationships in the Web, Iamnitchi, Ripeanu, and Foster, WWW’03

19 CIS : Internet-Scale Networked Systems (Spring 2008) DØ Data-Sharing Graphs 7days, 1file 28 days, 1 file

20 CIS : Internet-Scale Networked Systems (Spring 2008) KaZaA Data-Sharing Graphs 7day, 1file 28 days 1 file 2 hours 1 file 1 day 2 files 4h 2 files 12h 4 files Small-World File-Sharing Communities, Iamnitchi, Ripeanu, and Foster, Infocom ‘04

21 CIS : Internet-Scale Networked Systems (Spring 2008) D0 Web Kazaa Interest-Aware Information Dissemination in Small-World Communities, Iamnitchi and Foster, HPDC’05 Interest-Aware Data Dissemination

22 CIS : Internet-Scale Networked Systems (Spring 2008) Tracking User Attention in Collaborative Tagging Communities, Elizeu Santos-Neto, Matei Ripeanu, and Adriana Iamnitchi, Workshop on Contextualized Attention Metadata (CAMA'07), Vancouver, Canada, June Current Work: Tagging Communities

D Ø Workload Characterization Joint work with Shyamala Doraimani (USF) and Gabriele Garzoglio (FNAL)

24 CIS : Internet-Scale Networked Systems (Spring 2008) DØ Traces l Traces from January 2003 to May 2005 l 234,000 jobs, 561 users, 34 domains, 1.13 million files accessed l 108 input files per job on average l Detailed data access information about half of these jobs (113,062)

25 CIS : Internet-Scale Networked Systems (Spring 2008) Contradicts Traditional Models File size distribution l Expected: log-normal. Why not? –Deployment decisions –Domain specific –Data transformation File popularity distribution l Expected: Zipf. Why not? (speculations): l Scientific data is uniformly interesting l User community is relatively small

26 CIS : Internet-Scale Networked Systems (Spring 2008) Filecules: Intuition

27 CIS : Internet-Scale Networked Systems (Spring 2008) Filecules: General Characteristics Filecules in High-Energy Physics: Characteristics and Impact on Resource Management, Adriana Iamnitchi, Shyamala Doraimani, Gabriele Garzoglio, HPDC’06

28 CIS : Internet-Scale Networked Systems (Spring 2008) Filecules: Size Filecules of different sizes: l Largest filecule:17 TB or 51,841 files l 28% mono-file filecules

29 CIS : Internet-Scale Networked Systems (Spring 2008) Consequences for Caching l Use filecule membership for prefetching –When a file is missing from the local cache, prefetch the entire filecule l Use time locality in cache replacement –Least Recently Used (classic algorithm) l Implemented: –LRU with files and LRU with filecules –Greedy Request Value: prefetching + job reordering >Does not exploit temporal locality >Prefetching based on cache content –Our variant of LRU with filecules and job reordering E. Otoo, et al. Optimal file-bundle caching algorithms for data-grids. In SC ’04

30 CIS : Internet-Scale Networked Systems (Spring 2008) Comparison: Caching Algorithms (1)

31 CIS : Internet-Scale Networked Systems (Spring 2008) Comparison: Caching Algorithms (2) % of cache change is a measure of transfer costs.

32 CIS : Internet-Scale Networked Systems (Spring 2008) Summary Part 1 l Revisited traditional workload models –Generalized from file systems, the web, etc. –Some confirmed (temporal locality), some infirmed (file size distribution and popularity) l Compared caching algorithms on D0 data: –Temporal locality is relevant –Filecules guide prefetching

33 CIS : Internet-Scale Networked Systems (Spring 2008) Summary l Workload characterization based on a HEP grid –Quantify scale (data processed, number of files) –Contradict traditional models l Patterns can guide system design –Filecules: caching, data replication –Small world data sharing: adaptive information dissemination, replica placement

34 CIS : Internet-Scale Networked Systems (Spring 2008) Administravia: Paper Reviewing (1) l Goals: –Think of what you read –Get used to writing paper reviews l Reviews due by noon before class l Be professional in your writing l Have an eye on the writing style: –Clarity –Beware of traps: learn to use them in writing and detect them in reading –Detect (and stay away from) trivial claims. E.g., 1 st sentence in the Introduction: “The tremendous/unprecedented/phenomenal growth/scale/ubiquity of the Internet…”

35 CIS : Internet-Scale Networked Systems (Spring 2008) Administravia: Paper Reviewing (2) Follow the form provided when relevant. l State the main contribution of the paper l Critique the main contribution: Rate the significance of the paper on a scale of 5 (breakthrough), 4 (significant contribution), 3 (modest contribution), 2 (incremental contribution), 1 (no contribution or negative contribution). Explain your rating in a sentence or two. Rate how convincing the methodology is. l Do the claims and conclusions follow from the experiments? l Are the assumptions realistic? l Are the experiments well designed? l Are there different experiments that would be more convincing? l Are there other alternatives the authors should have considered? l (And, of course, is the paper free of methodological errors?)

36 CIS : Internet-Scale Networked Systems (Spring 2008) Administravia: Paper Reviewing (3) l What is the most important limitation of the approach? l What are the three strongest and/or most interesting ideas in the paper? l What are the three most striking weaknesses in the paper? l Name three questions that you would like to ask the authors. l Detail an interesting extension to the work not mentioned in the future work section. l Optional comments on the paper that you’d like to see discussed in class.

37 CIS : Internet-Scale Networked Systems (Spring 2008) Administravia: Discussion leading l Come prepared! –Prepare discussion outline –Prepare questions: >“What if”s >Unclear aspects of the solution proposed >… –Similar ideas in different contexts –Initiate short brainstorming sessions l Leaders do NOT need to submit paper reviews l Main goals: –Keep discussion flowing –Keep discussion relevant –Engage everybody (I’ll have an eye on this, too)

38 CIS : Internet-Scale Networked Systems (Spring 2008) Administravia: Projects l Combine with your research if relevant to the class l Get approval from all instructors if you overlap final projects: –Don’t sell the same piece of work twice –You can get more than twice as many results with less than twice as much work l Aim high! –Put one extra month and get a publication out of it –It is doable (we have proofs) l Try ideas that you postponed out of fear: it’s just a class, not your PhD.

39 CIS : Internet-Scale Networked Systems (Spring 2008) Administravia: Project deadlines (tentative) l January 30: 1-page project proposal l Feb. 26: 3-page literature survey –Know relevant work in your problem area –If implementation project, list tools, similar projects l March 31: 5-page Midterm project due –Have a clear image of what’s possible/doable –Report preliminary results l Last class:In-class project presentation –Demo, if appropriate l May 1: –Final report due

40 CIS : Internet-Scale Networked Systems (Spring 2008) Next Classed l Lectures on basics of distributed systems l Will start reading papers in about 2 weeks

41 CIS : Internet-Scale Networked Systems (Spring 2008) Questions?