Evaluating the SiteStory Transactional Web Archive With the ApacheBench Tool Justin F. Brunelle Michael L. Nelson Lyudmila Balakireva Robert Sanderson.

Slides:



Advertisements
Similar presentations
Basic Internet Terms Digital Design. Arpanet The first Internet prototype created in 1965 by the Department of Defense.
Advertisements

“SharePoint Farm Deployment & Configuration. Recommendations” Michael Nemtsev, Readify Pty Microsoft MVP.
PHP syntax basics. Personal Home Page This is a Hypertext processor It works on the server side It demands a Web-server to be installed.
An Introduction to the Internet and the Web Frank McCown COMP 250 – Internet Development Harding University.
1 Configuring Internet- related services (April 22, 2015) © Abdou Illia, Spring 2015.
1 Web Server Performance in a WAN Environment Vincent W. Freeh Computer Science North Carolina State Vsevolod V. Panteleenko Computer Science & Engineering.
Content  Overview of Computer Networks (Wireless and Wired)  IP Address, MAC Address and Workgroups  LAN Setup and Creating Workgroup  Concept on.
Hypertext Transfer Protocol Kyle Roth Mark Hoover.
Application Layer  We will learn about protocols by examining popular application-level protocols  HTTP  FTP  SMTP / POP3 / IMAP  Focus on client-server.
TCP connection my Computertelnet client web server remote computer 1 character per transmission Telnet uses TCP connection.
TCP connection my Computertelnet client web server remote computer 1 character per transmission * Telnet uses TCP connection * but Nagle's algorithm modifies.
Internet Technologies Networking / Internet Protocols (TCP/IP) Server/Client Software Communication via Ports Web Page Technology Recipe of Web Page Development.
 Proxy Servers are software that act as intermediaries between client and servers on the Internet.  They help users on private networks get information.
Fundamentals of Networking Discovery 1, Chapter 9 Troubleshooting.
22-Aug-15 | 1 |1 | Help! I need more servers! What do I do? Scaling a PHP application.
Web Servers Web server software is a product that works with the operating system The server computer can run more than one software product such as .
Slow Web Site Problem Analysis Last Update Copyright 2013 Kenneth M. Chipps Ph.D. 1.
CP476 Internet Computing Lecture 5 : HTTP, WWW and URL 1 Lecture 5. WWW, HTTP and URL Objective: to review the concepts of WWW to understand how HTTP works.
WADL 2013 July th Indianapolis, IN Martin SiteStory Archiving Done Differently
Memento Update CNI Task Force Meeting, Spring Memento Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.
Archival HTTP Redirection Retrieval Policies Temporal Web Analytics Workshop 2013, Rio De Janiro Ahmed AlSum, Michael L. Nelson Old Dominion University.
XHTML Introductory1 Linking and Publishing Basic Web Pages Chapter 3.
Honeypot and Intrusion Detection System
WebWatch Ian Peacock UKOLN University of Bath Bath BA2 7AY UK
TCP/IP Networking Review Covered Subjects:  Packet Switched Network Structure  Issues of PSNs  Ports & IP Numbers  Delivery Services  Domain Name.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 23 How Web Host Servers Work.
Internet  Major:Safety science and engineering  Author:jiangqian( 蒋乾 )
-II Outlook How To Topics CS-3505 Outlook 2000 Wb_ -II.ppt.
Mr C Johnston ICT Teacher BTEC IT Unit 05 - Lesson 05 Network Protocols.
1 Introductory material. This module illustrates the interactions of the protocols of the TCP/IP protocol suite with the help of an example. The example.
Application Block Diagram III. SOFTWARE PLATFORM Figure above shows a network protocol stack for a computer that connects to an Ethernet network and.
Data Communications and Computer Networks Chapter 2 CS 3830 Lecture 8 Omar Meqdadi Department of Computer Science and Software Engineering University of.
ECEN “Internet Protocols and Modeling”, Spring 2012 Course Materials: Papers, Reference Texts: Bertsekas/Gallager, Stuber, Stallings, etc Class.
Lectu re 1 Recap: “Operational” view of Internet r Internet: “network of networks” m Requires sending, receiving of messages r protocols control sending,
Profiling Web Archive Coverage for Top-Level Domain & Content Language Ahmed AlSum, Michele C. Weigle, Michael L. Nelson, and Herbert Van de Sompel International.
Introduction to Internet. Chapter 1 Objectives Origins of the Internet Packets and Routers TCP/IP DNS HTTP URL Client-Server.
9: Troubleshooting Your Network
S305 – Network Infrastructure Chapter 5 Network and Transport Layers.
Kemal Baykal Rasim Ismayilov
1 Part VII Component-level Performance Models for the Web © 1998 Menascé & Almeida. All Rights Reserved.
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
1 Microsoft Windows 2000 Network Infrastructure Administration Chapter 4 Monitoring Network Activity.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 1 Fundamentals.
CS1001 Lecture 7. Overview Computer Networks Computer Networks The Internet The Internet Internet Services Internet Services Markup Languages Markup Languages.
Internet Essentials. The History of the Internet The Internet started when the Advanced Research Projects Agency (ARPA) of the United States Defense Department.
Internet Essentials.
CSI 3125, Preliminaries, page 1 Networking. CSI 3125, Preliminaries, page 2 Networking A network represents interconnection of computers that is capable.
17 Establishing Dial-up Connection to the Internet Using Windows 9x 1.Install and configure the modem 2.Configure Dial-Up Adapter 3.Configure Dial-Up Networking.
Firewalls. Intro to Firewalls Basically a firewall is a barrier to keep destructive forces away from your computer network.
IS Infrastructure Managing Infrastructure and Services Copyright © 2016 Curt Hill.
Introduction to Performance Testing Performance testing is the process of determining the speed or effectiveness of a computer, network, software program.
COMPUTER NETWORKS Hwajung Lee. Image Source:
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
LAN Chat server BY: VIPUL GUPTA VIKESH SINGH SUKHDEEP SINGH.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public 1 VLANs.
25/09/2016 INASP: Effective Network Management Workshops Unit 6: Solving Network Problems.
Chapter 5 Network and Transport Layers
Training Objectives About D2F Download Installation Configuration
Instructor: Ahmed Jafer
Using Transactional Web Archives To Handle Server Errors
CISC103 Web Development Basics: Web site:
Profiling Web Archive Coverage for Top-Level Domain & Content Language
Web Caching? Web Caching:.
TYPES OF SERVER. TYPES OF SERVER What is a server.
Module 2: Understanding Local Area Networks
Monkey See, Monkey Do A Tool for TCP Tracing and Replaying
Configuring Internet-related services
A tool for locating QoS failures on an Internet path
World Wide Web Uniform Resource Locator hostname [:port]/path
Internet Applications & Programming
Presentation transcript:

Evaluating the SiteStory Transactional Web Archive With the ApacheBench Tool Justin F. Brunelle Michael L. Nelson Lyudmila Balakireva Robert Sanderson Herbert Van de Sompel TPDL 2013, Sept

September 7, 2011

September 12, 2011

September 16, 2011

Problem People view ABC News all the time No mementos for “all the time” – Stories missing or incomplete Possible solutions: – archive.org: crawl more often (how often is often enough?) – abcnews.com: install a Transactional Web Archive

Agenda Traditional Archiving SiteStory Experiment Design Benchmark Results Conclusions 7

Traditional Web Archiving Active crawling Heritrix

Issues with Traditional Web Archiving Request can be rejected (robots.txt, user- agent, IP) Can be deceived (geo-location, user- agent) Can be trapped (crawl my calendar!) Resource-intense (bandwidth) Recrawl vs. change-rate

Missed Updates seen by humans: C 1, C 3, C 4 ; archived by crawler: C 1, C 3

Agenda Traditional Archiving SiteStory Experiment Design Benchmark Results Conclusions 11

for each HTTP response, the Apache web server sends (i.e., HTTP PUT) the same entity to SiteStory web server

Now we have them all seen by humans: C 1, C 3, C 4 ; archived by transactional archive: C 1, C 3, C 4

Agenda Traditional Archiving SiteStory Experiment Design Benchmark Results Conclusions 14

Benchmark with ab ApacheBench: ab – -n [Number of Connections] – -c [Concurrency] Benchmarked with SiteStory on & off

Benchmark with wget ws-dl-03.cs.odu.edu x99,…,, megalodon.lanl.gov

Agenda Traditional Archiving SiteStory Experiment Design Benchmark Results Conclusions 17

Testing LAN with ab

Benchmark with wget (unburdened)

Benchmark with wget (burdened)

Results Negligible difference SiteStory On vs Off Limited to local LAN Performance over WAN?

WAN Testbed Performance

Agenda Traditional Archiving SiteStory Experiment Design Benchmark Results Conclusions 26

Results Distributed: Higher variance Increased delay due to network On vs. Off Comparison still comparable

Conclusions Small performance difference No gaps in coverage -- archives every HTTP response sent (optimizations possible)

get started now by using this piece

SiteStory Testbed Use our SiteStory web archive on your server! 1.Install and configure mod_sitestory on your Apache Server 2.Send an containing: 1.Your contact info 2.Web server IP address 3.Web server domain name 3.Happy Sitestory’ing! mailto:

Backups

Sample ab output $ ab -n 10 -c 2 " This is ApacheBench, Version 2.3 … Server Software: Apache/ Server Hostname: Server Port: 80 Document Path: / Document Length: bytes Concurrency Level: 2 Time taken for tests: seconds Complete requests: 10 Failed requests: 0 Write errors: 0 Total transferred: bytes HTML transferred: bytes Requests per second: [#/sec] (mean) Time per request: [ms] (mean) Time per request: [ms] (mean, across all concurrent requests) Transfer rate: [Kbytes/sec] received … Connection Times (ms) min mean[+/-sd] median max Connect: Processing: Waiting: Total: Percentage of the requests served within a certain time (ms) 50% 45 66% 46 75% 46 80% 46 90% 63 95% 63 98% 63 99% % 63 (longest request)