AIT Crawler Anthony Johnson Igor Volynskiy Travis Rothlisberger.

Slides:



Advertisements
Similar presentations
Arpit Bansal Richard Cau Deepa Prabhu.  Library Mobile Site Statistics  Our Project  Our Views – What We Liked!  Our Views – What We Found Difficult.
Advertisements

Web Crawlers Nutch. Agenda What are web crawlers Main policies in crawling Nutch Nutch architecture.
IS 213 Presentation Healthy Communities Florance Gee, Ran Li, Nettie Ng April 29, 2004.
Progress Report 11/1/01 Matt Bridges. Overview Data collection and analysis tool for web site traffic Lets website administrators know who is on their.
Session Management A290/A590, Fall /25/2014.
Mike Krajecki – Project, Design, and Presentation Manager Justin McMillan – Implementation Manager Joe Brunner – Requirements and Testing Manager Mohammed.
Dependency Injection and Model-View-Controller. Overview Inversion of Control Model-View-Controller.
Dr. Thomas Website User Interface Design Comp 6620 Rahul Potghan Sonal Kulkarni.
United Nations Statistics Division User communication.
For more info visit at For more info visit at
Do More With Your School Website! Delaware Instructional Technology Conference April 14, 2005 Liz Niederberger Zumu Software.
Web Page Overview. What is a Web page? A document with.html file extension Possible incorporation of graphics Saved on a Web server Private Vs Public.
Dr. Mike Lowndes, Interactive Media Manager, Natural History Museum, London – Houses 350-permanent scientific staff, plus postgraduate students; one of.
Analytics. Is your site working?  Lots of ways to measure this.  User feedback  Functional tests (output is what you expect)  But, can we measure.
STEALTH TRACKER STEALTH TR STEALTH TRACKER Agenda: Review MyMajors Stealth Tracker Combined workflow Tracker results & stats Mobile App Questions MOBILE.
Objective Understand concepts used to web-based digital media. Course Weight : 5%
Putting it all together Dynamic Data Base Access Norman White Stern School of Business.
Search Engines1 Searching the Web Web is vast. Information is scattered around and changing fast. Anyone can publish on the web. Two issues web users have.
Sessions in PHP – Page 1 of 13CSCI 2910 – Client/Server-Side Programming CSCI 2910 Client/Server-Side Programming Topic: Sessions in PHP Reading: Williams.
Flashback: A Peer-to-Peer Web Server for Flash Crowds Presented by Tom Batkiewicz CS 587x Fall ‘07.
Sean Malley Mentor: Candace Jackson. The Old Main Page Not Easily Themable Complex Content Management Poor Structure No User Interactivity.
Compare and Contrast : Blackboard & a Personal Web Page www3.ltu.edu/~s_schneider/howto/faculty.htm You’ll find this presentation (and another) here :
Our MP3 Search Engine Crawler –Searching for Artist Name –Searching for Song Title Website Difficulties Looking Back.
1 DIG 3134 Lecture 6: Maintaining State Michael Moshell University of Central Florida Media Software Design.
ASSIGNMENT 2 Salim Malakouti. Ticketing Website  User submits tickets  Admins answer tickets or take appropriate actions.
Display Page (HTML/CSS)
Personalizing Web Sites Nasrullah. Understanding Profile The ASP.NET application service that enables you to store and retrieve information about users.
CREATE YOUR OWN WEBSITE IN A SNAP! Shanon Sims Travis High School W Weebly W Website W Workable.
Google Analytics Graham Triggs Head of Repository Systems, Symplectic.
DATABASE ACCESS CONTROL IST Question Almost every PHP page needs to interact with database, does that mean sqlUsername and sqlPassword need to be.
Name Developing your own Query Magic in SharePoint Search Virgil Carroll principal architect, high monkey.
RECORDS MANAGEMENT Judith Read and Mary Lea Ginn Chapter 10 Geographic Records Management 1 © 2016 Cengage Learning ®. May not be scanned, copied or duplicated,
Introduction The concept of a web framework originates from the basic idea that every web application obtains its foundations from a similar set of guidelines.
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
BUS 611 Week 1 Assignment Article Review Article Review. Write a 2-3 page, APA-style article review of the Rosenwinkel article (found in the reading section).
BUS 611 Week 2 Assignment Project Risk Project Risk. Risk is often described as the event that “might” happen during the course of a project. Explain how.
IT 210 Week 7 DQ 2 To purchase this material link 210-Week-7-DQ-2 For more courses visit our website
IT 210 Week 8 Individual Object-Oriented Design To purchase this material link Week-8-Individual-Object-Oriented-Design.
ACC 421 Week 2 Individual Assignment P1, P2, P3, P4 To purchase this material link Week-2-Individual-Assignment.
Reasons Why To Invest In PHP Web Application Development
 Google analytics add your word press to help you to track your website visitors  That what they are looking for  Google + help you to access your.
10 Great Reasons to Choose
Jason Bury Dylan Drake Rush Corey Watt
Improving searches through community clustering of information
Active Server Pages Computer Science 40S.
Lenva Shearing Gail Mitchell
Shavonne Henry, Nikia Clarke, David Heymann, Brandon Knight
6 Benefits of Using Microsoft Access Database. Microsoft Access is an efficient program that helps companies to carry out complex business processes in.
Kanban Task Manager SharePoint Editions ‒ Introduction
Migrating Oracle Forms Using Oracle Application Express
Placing Relative Links
Custom Creative Capabilities – Build Your Own
Why Does Your Website Need a Sitemap?
Objective % Explain concepts used to create websites.
Juliana Cook Adrienne Ivey Meredith Marks Nhien Tran
Displaying Form Validation Info
IST 497 Vladimir Belyavskiy 11/21/02
Why social media matters
INFM 603 Main Class Project
Back to Table of Contents
Prototype using PowerPoint
Already Crawling at One Month
Project Structure Overview
Go to the Audacity website. (You can search for Audacity in Google).
Medium-Fi Prototype Rachel J and Esther G
BCS Template Presentation February 22, 2018
Objective Explain concepts used to create websites.
[Based in part on SWE 432 and SWE 632 materials by Jeff Offutt, GMU]
Remind App A free text messaging app that helps teachers, students, and parents stay connected.
TIPS TO INCREASE YOUR SOCIAL MEDIA PRESENCE Grow Your Social Media Presence in Right Direction.
Presentation transcript:

AIT Crawler Anthony Johnson Igor Volynskiy Travis Rothlisberger

Crawler Overview

Measurements First assignment crawl Guessing Accuracy CPU time 15m 40s # pages visited 236 # Links followed 273 mp3s found 712 Pages/mp3s 0.33 CPU time per MP3 1.32 sec Guessing Accuracy Overall: about 60% Major Artists: about 80%

Lessons learned Teamwork Planning End Product Project divisions, importance of communicating the interaction between individuals’ parts. Planning Concrete, well-thought out design. Foresight regarding potential issues and changes. Design code to be flexible for any additions, reductions or changes that may need to be made in the future. End Product Thoroughly test the code. Just because it works for a couple hours, doesn’t mean it won’t hit any snags. Users prefer simple and easy to neat and complex.

Next Time… Better Crawling Better Planning Better Stuff Learn from the past, track statistics on servers. Identify “hidden” MP3s, (.zip, .class, …) Clean up afterwards. Remove dead/bad links. Better Planning More in-depth design of website and UI before developing database. More scalability. Allow the structure to expand more easily. Better Stuff Artist and file info, discographies, bios, … User interaction. Help name files, report dead links, request and post files, “punch the monkey” banners….