CS244 Lecture 8: Sound Strategies For Internet Measurement

Slides:



Advertisements
Similar presentations
Attention (your target market) !. Are you (their problem) ?
Advertisements

Kapsalakis Giorgos - AM: 1959 HY459 - Internet Measurements Fall 2010.
A Swarming Architecture is Good for Internet Data Transfer ? Offensed by Jiazhen Chen & Alexander Kiaie.
Bryan Coad Research Fellow Ian Wark Research Institute University of South Australia 18 th July 2013 Publishing during your PhD and during your post-doc.
I won’t cite a paper as a reference unless I’ve read it first. This seems like an obvious rule. Am I ever tempted not to follow it? o I read a paper by.
Friday, November 14 and Monday, November 17 Evaluating Scientific Argument: Peer Review IPHY 3700 Writing Process Map.
CSCD 555 Research Methods for Computer Science
Publishing your paper. Learning About You What journals do you have access to? Which do you read regularly? Which journals do you aspire to publish in.
Advanced Research Methodology
CHAPTER 3: DEVELOPING LITERATURE REVIEW SKILLS
Technology in the Classroom: A View from the Trenchant Trenches.
Technical Writing Vikram Pudi. Vikram © IIIT 2 Dedicated to: My Ph.D advisor Prof. Jayant Haritsa IISc, Bangalore.
Lecture 7 Page 1 CS 236 Online Password Management Limit login attempts Encrypt your passwords Protecting the password file Forgotten passwords Generating.
Business and Management Research WELCOME. Business and Management Research Instructor:Rawaa Muhandes Office Number: 624 Term/yearSemester.
Rob Sherwood CS244 Lecture 8: Sound Strategies For Internet Measurement.
Unit 1 – Improving Productivity Instructions ~ 100 words per box.
6.3 Ethics in Statistics. Minimizing Risk vs. Maximizing Info To test a new surgical practice, should you account for the placebo effect by performing.
Digital Citizenship Created By: Kelli Stinson June 2011.
TOK Camp 2013 – TOK Presentation Preparation Part 1.
How to read a scientific paper
Reflection helps you articulate and think about your processes for communication. Reflection gives you an opportunity to consider your use of rhetorical.
Plagiarism. Doing research puts you in a position to present views relevant to your topic other than your own. You will discover many interesting ideas.
PRISM: Private Retrieval of the Internet’s Sensitive Metadata Ang ChenAndreas Haeberlen University of Pennsylvania.
Developing Academic Reading Skills Planning Research Chapter 2.
COMP 417 – Jan 12 th, 2006 Guest Lecturer: David Meger Topic: Camera Networks for Robot Localization.
Guide for AWS Reviewers Lois A. Killewich, MD PhD AWS AJS Editorial Board.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.
Lecture 15 Page 1 CS 236 Online Evaluating Running Systems Evaluating system security requires knowing what’s going on Many steps are necessary for a full.
Questioning as Formative Assessment: GRECC Math Alliance February 4 th - 7 th, 2008.
Writing your Lab Report and Using the Google Drive By Ms. Ninfa.
JOURNAL ARTICLES AN INTRODUCTION. WHAT IS A PERIODICAL? Period: amount of time Magazines (every week or month) Newspapers (every day) Journals (every.
Notes on Writing the Internal Assessment. Part A Plan of the Investigation Your first sentence needs to be your question. Don’t paraphrase or restate.
Technical Writing (Applies to research papers and theses)
Start-Up - Discussion 11/16/15
Step 1 I found it, Now what?.
Analyzing Science Research
Review session For DS final exam.
How to Create a PowerPoint Presentation
Fair Use in the Classroom
Research Methods Dr. X.
Password Management Limit login attempts Encrypt your passwords
How are drugs and alcohol portrayed in the media?
Outline Introduction Characteristics of intrusion detection systems
Giving instant Feedback to Disabled Students with Technology to Create Engagement and Motivation By John O’Sullivan.
Title of your science project
Thinking About How You Read READING STRATEGIES
Lesson 6: Focus
How to Get Your Paper Rejected
Research Presentation
Hello, my name is Dirk van Barneveld
How to Read Research Papers?
Title of Your Action Research Project
Generating and Refining Research Ideas
UFCE8V-20-3 Information Systems Development SHAPE Hong Kong 2010/11
The Real Deal with Peggy Pima
Thinking About How You Read READING STRATEGIES
Problems with IDR Before the holidays we discussed two problems with the indirect realist view. If we can’t perceive the external world directly (because.
How to Get Your Paper Rejected
Book Review Over the next few weeks you will be studying a novel of your choice in detail.
Paper title Abstract Discussion and Conclusions Introduction Results
APPROPRIATE POINT OF CARE DIAGNOSTICS
Hank Childs, University of Oregon
Applied Software Project Management
Title of your experimental design
DNS Security The Domain Name Service (DNS) translates human-readable names to IP addresses E.g., thesiger.cs.ucla.edu translates to DNS.
Warm-Up Create a T-chart on p. 25 (take half the page). Brainstorm…..
Writing a Summary Say- Now we are going to write a summary of the story I just read- The Wall by Eve Bunting.
Lesson 6: Focus King Arthur is not allowed into the castle because.
Research Presentation
Presentation transcript:

CS244 Lecture 8: Sound Strategies For Internet Measurement Rob Sherwood

Background Who am I? Stanford 2008-2011; Visiting Researcher/PostDoc Currently CTO of Big Switch Networks Research Background Internet Security Peer-to-Peer Internet Measurement Software Defined Networking

Sound Strategies Big Money Questions: Why this paper? Hint: not because it’s short  Who is Vern Paxon? Measurement is a critical aspect of system design “You can’t improve what you can’t measure” Nothing here is specific to the internet; more for really large systems War stories for wise people – lots of ‘em Helped me with a lot of my research work How does this paper differ from others this class has studied?

Why Measure The Internet? Isn’t it man-made? Why not just model it? Partial Answers: Statistical models of packet arrival and traffic matrices inaccurate The actual topology is unknown Many parts are intentionally obscured for commercial gain Apply natural science principles

You Said Patrick Harvey, "Many of the potential pitfalls in data gathering and analysis that the paper notes--while relevant to non-Internet data--seem to be somewhat exacerbated by the heavily-layered Internet architecture. This may be especially true of ‘misconception’, the potential for which means that sound data analysis likely must not treat Internet abstractions as opaque as is typical in the development of many actual systems, but instead take into account many layers and modules in addition to those of most immediate proximity to the measured data."

Accuracy versus Precision? Hard/formal definition? Why is this so important for measurement?

SigFigs? “Real” science depends heavily on Significant Figures E.g., C=2.997,924,58 x108 meters/second How do we apply SigFigs with computer systems? gettimeofday() == 1130322148.939977000

Implicit Assumptions? About Time? About TCP? About Routers/Switches?

Metadata What is this? Why is it important? Critical tip: Save exact cut-and-paste command for every graph People will ask you to reproduce War Stories: DNS data OptAck – Nick’s class last year

You Said Anonymous 1, "I feel like the first half of this paper could have been titled "Reasons to Never Ever Use tcpdump". Anonymous 2, “The author says this advice is drawn from his experiences so now I am bit skeptical of every chart I see.”

Misconceptions vs. Calibration? Obviously misconceptions are bad What can we do about them? Is “learn lots of domain knowledge” enough? What does Calibration mean in practice? Answer: if you want to do it right: “measure twice or more, cut publish once” Can be painstaking, but better than retraction Great Firewall of China  visualization

Are Large Datasets Still Hard? Paper was published in 2004 Most of the lessons were learned before then 10+ years later, we have Hadoop, AWS, Is big data management still an issue? My Dissertation gathered 4+TB (!!! ) Needed tuned RAID, mysql, condor, and 300+ machines to process

Why Is Reproduction Hard? Or important? Truth time: who has already had this problem?

Why is Publishing Data Hard? Practical answer: Privacy is Important Very hard to get consent Map IPs to people? People don’t understand the cost or benefit Most academic institutions have a “fail fast” approach to legal threats Anonymizing and De-anonymizing data Huge research topic; very interesting

You Said Anonymous, " I am a bit worried about the conflict that might appear between the two ideals of collecting an abundance of meta- data and making datasets publicly available. If a lot of metadata is collected in order to allow for the reuse of the data in many settings, this only increases the complexity of dealing with privacy issues in regards to the use of the data"

Ethical Internet Measurement Follow Up Papers/IMC Guidelines “BotNet Labs”/Password distribution Open Question: Should Internet Measurement go through IRB approval? War story: Multiple accidental DoS experiments Very unhappy people == unhappy advisor

You Said Anonymous, "I feel the authors should have discussed the issue of intrusiveness of a measurement technique in the accuracy/ misconception section -- the authors rightly describe the importance of collecting metadata especially for publicly made data. But how much metadata should one collect - and what if this metadata collection actually causes an overhead and skews the results?"

Conclusion Very few of these concepts apply to just the Internet Rob’s claim: This paper made me a better scientist Which then made me a better system designer Additional Questions/Comments?