, Fall 2006IAT 800 Recursion, Web Crawling. , Fall 2006IAT 800 Today’s Nonsense  Recursion – Why is my head spinning?  Web Crawling – Recursing in HTML.

Slides:



Advertisements
Similar presentations
HTML Basics Customizing your site using the basics of HTML.
Advertisements

Hypertext markup language.  Client asks for an html file  Server returns the html file  Client parses and displays it  This display is what most people.
Chapter 4 Marking Up With Html: A Hypertext Markup Language Primer.
Introduction to HTML Lists, Images, and Links. Before We Begin Save another file in Notepad Save another file in Notepad Open your HTML, then do File>Save.
CM143 - Web Week 2 Basic HTML. Links and Image Tags.
Marking Up With Html: A Hypertext Markup Language Primer
June 18, 2015IAT 2651 Recursion. June 18, 2015IAT 2652 Today’s Excitement  Recursion.
Designing Web Pages Getting to know HTML... What is HTML? Hyper Text Markup Language HTML is the major language of the Internet’s World Wide Web. Web.
1 Spidering the Web in Python CSC 161: The Art of Programming Prof. Henry Kautz 11/23/2009.
Chapter 4 Fluency with Information Technology L. Snyder Marking Up With HTML: A Hypertext Markup Language Primer.
Reading Data in Web Pages tMyn1 Reading Data in Web Pages A very common application of PHP is to have an HTML form gather information from a website's.
Review: How do you change the border color of an image?
HTML HTML stands for "Hyper Text Mark-up Language“. Technically, HTML is not a programming language, but rather a markup language. Used to create web pages.
.  Entertain  Inform  Educate  Blogs  Sell  Date  Gamble  Religion.
Amber Annett David Bell October 13 th, What will happen What is this business about personal web pages? Designated location of your own web page.
1 Essential HTML coding By Fadi Safieddine (Week 2)
Chapter 4: Hypertext Markup Language Primer TECH Prof. Jeff Cheng.
Images Inserting an image on a web page. chcslonline.org2 ITEMS REQUIRED Go to the course download page on the course website and download the 3 images.
CSS Class 7 Add JavaScript to your page Add event handlers Validate a form Open a new window Hide and show elements Swap images Debug JavaScript.
Images in HTML PowerPoint How images are used in HTML.
Computer Science Department Data Structure & Algorithms Lecture 8 Recursion.
Chapter 4 BIE1313/BPROG1203 | Web design Prepared by Mohamed Abdulkarim / Mike Ng Ah Ngan.
HTML: Hyptertext Markup Language Doman’s Sections.
Week 1 – Beginners Content McAfee & Big Fish Games CoderDojo.
Definition CSS “Short for Cascading Style Sheets, a new feature being added to HTML that gives both Web site developers and users more control over how.
HTML Lesson 3 Hyper Text Markup Language. Assignment Part 2  Set the file name as “FirstName2.htm”  Set the title as “FirstName LastName First Web Site”
HTML. Adding Background Color The bgcolor attribute lets you change the background color of the Web page. Located in the body tag See common Web Page.
Web Design (8) Images (2). My Holiday Photos An exercise in adding and linking images. Create a new website folder calling it ‘My Holiday Photos’. In.
HTML(Hyper Text Markup Language) ByNaveen. Introduction HTML or Hyper Text Markup Language is the standard markup language Its used to create the web.
LCC 6310 Computation as an Expressive Medium Lecture 8.
The Web Wizard’s Guide to HTML Chapter Three Colors, Patterns, and Inline Graphics.
HTML (Hyper Text Markup Language) Lecture II. Review Writing HTML files for web pages – efficient compact – fundamental. Text files with htm extension.
Adding Images to Your Web Page Web Design Section 5-7 Part or all of this lesson was adapted from the University of Washington’s “Web Design & Development.
HTML IMAGES. CONTENTS IMG Tag Alt Attribute Setting Width and Height Of An Image Summary Exercise.
Web Authoring with Dreamweaver. Unit Objectives  Be able to define keywords: HTML, HTTP (protocol), browser, web server, client/server, tag, attribute,
Spiderman ©Marvel Comics Creating Web Pages (part 1)
HTML CS 105. Page Structure HTML elements control the details of how a page gets displayed. Every HTML document has the following basic structure: … …
CHAPTER TWO HTML TAGS. 1.Basic HTML Tags 1.1 HTML: Hypertext Markup Language  HTML stands for Hypertext Markup Language.  It is the markup language.
Click on CIS120/17 to go to website for course. The week of will tell you what is planned for the week and what has been assigned.
Chapter 4 HTML Tags. HTML is written in something called tags. Tags come in pairs, an opening one and a closing one. The first pair of tags we'll write.
Revision Webpage design HTML.   FACE  Attributes  Marquee  Define the following terms.
HTML Help book. HTML HTML is the programming language used to make web pages for the Internet. HTML stands for Hyper Text Markup Language. HTML is made.
Welcome to Recursion! Say what?!? Recursion is… the process of solving large problems by simplifying them into smaller ones. similar to processing using.
1 CSC 143 Recursion [Reading: Chapter 17]. 2 Recursion  A recursive definition is one which is defined in terms of itself.  Example:  Sum of the first.
Recursive. Recursive F(n) = F(n-1) + F(n-2) n! = (n-1)! x n C(m,n) = C(m-1,n-1)+C(m-1,n)......
TNPW1 Ing. Jiří Štěpánek.  Tags  Marks for elements ▪ Pair ▪ Start and end tag ( Paragraph text ) ▪ Single ▪ Only start tag, according to XHTML 1.0.
Lab 3 Html basics.
Marking Up with XHTML Tags describe how a web page should look
Images in HTML PowerPoint How images are used in HTML
Computers as an Expressive Medium
IAT 800 Recursion Oct 28, 2009 IAT 800.
IAT 265 Recursion May 24, 2016 IAT 265.
Adding Images to Your Web Page
Basic HTML and Embed Codes
Marking Up with XHTML Tags describe how a web page should look
Marking Up with XHTML Tags describe how a web page should look
HTML Images.
HTML What is it? HTML is a computer language devised to allow website creation. These websites can then be viewed by anyone else connected to the Internet.
HTML Links.
HTML Images CS 1150 Fall 2016.
Pertemuan 1b
Introduction to HTML.
CSC 143 Recursion.
Marking Up with XHTML Tags describe how a web page should look
Pertemuan 1 Desain web Pertemuan 1
Marking Up with XHTML Tags describe how a web page should look
Main() { int fact; fact = Factorial(4); } main fact.
Marking Up with XHTML Tags describe how a web page should look
Images in HTML PowerPoint How images are used in HTML
Presentation transcript:

, Fall 2006IAT 800 Recursion, Web Crawling

, Fall 2006IAT 800 Today’s Nonsense  Recursion – Why is my head spinning?  Web Crawling – Recursing in HTML Shortened class again today. Boy, your TA must be a total slacker, or something.

, Fall 2006IAT 800 Recursion  Recursion basically means calling a method from inside itself. int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } What the?!?

, Fall 2006IAT 800 Inside Itself?!  Let’s step through what happens. factorial(3);int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 3)

, Fall 2006IAT 800 Inside Itself?!  Let’s step through what happens. factorial(3);int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 3) int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 2)

, Fall 2006IAT 800 Inside Itself?!  Let’s step through what happens. factorial(3);int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 3)int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 2) int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 1)

, Fall 2006IAT 800 Inside Itself?!  Let’s step through what happens. factorial(3);int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 3)int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 2) 1

, Fall 2006IAT 800 Inside Itself?!  Let’s step through what happens. factorial(3);int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 3) int factorial(int n) { if(n > 1) { return n * 1; } else return 1; } (n = 2)

, Fall 2006IAT 800 Inside Itself?!  Let’s step through what happens. factorial(3);int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 3) int factorial(int n) { if(n > 1) { return 2 * 1; } else return 1; } (n = 2)

, Fall 2006IAT 800 Inside Itself?!  Let’s step through what happens. factorial(3);int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 3) 2

, Fall 2006IAT 800 Inside Itself?!  Let’s step through what happens. factorial(3);int factorial(int n) { if(n > 1) { return n * 2; } else return 1; } (n = 3)

, Fall 2006IAT 800 Inside Itself?!  Let’s step through what happens. factorial(3);int factorial(int n) { if(n > 1) { return 3 * 2; } else return 1; } (n = 3)

, Fall 2006IAT 800 Inside Itself?!  Let’s step through what happens. factorial(3);6

, Fall 2006IAT 800 Base Case  The most important thing to include in a recursive call is the “base case”, something that will assure that the function stops calling itself at some point.  In our example, we made sure it only called the recursive function if n was greater than 1, and each time we call it, n gets smaller, so we know it will eventually get to a number less than or equal to 1.

, Fall 2006IAT 800 Web Crawling  Let’s use recursion for something more interesting.  Say we have some method “parsePage”, that looks at a web page. Suppose we then want that method to follow the links on that page and parse the pages it is linked to.  We’d then want to call the “parsePage” method on those links from inside the parsePage method we have.

, Fall 2006IAT 800 Web Crawling

, Fall 2006IAT 800 Web Crawling

, Fall 2006IAT 800 Web Crawling  You can see here the need for a base case. One way of controlling our search is by placing a limit on the depth of the links we follow.  For instance, in the visual example, we followed the links from our start page (depth 1), and then the links from those pages (depth 2).

, Fall 2006IAT 800 Recursion Fig. 3: Remember—base cases prevent infinite cats.

, Fall 2006IAT 800 Parse HTML?  “Parsing” means to walk through the structure of a file (look at it word-by-word) –Look at an HTML file  The structure of an HTML file is the tag structure –So parsing means to walk through and interpret the tags  If you can parse HTML files, you can pull content out of web pages and do stuff with it  Procedural manipulation of web content Results of about 202 for matrix red … some text == some text

, Fall 2006IAT 800 Basic approach  Use two classes to parse  One class reads info from a URL – HTMLParser  The other class is used by HTMLParser to process tags – child of HTMLEditorKit.ParserCallback  HTMLParser recognizes when a tag appears ( ) and calls appropriate methods on the ParserCallback class (start-tags, end-tags, simple-tags, text, etc.)  The programmer (ie. you), fill in the ParserCallback methods to do whatever you want when you see different kinds of tags

, Fall 2006IAT 800 Running the example  We’ve written HTMLParser for you  To access it, it must be in the data directory of your project  Simplest thing will be just to copy the code from the website and put the directory in your default sketchbook directory

, Fall 2006IAT 800 handleSimpleTag  public void handleSimpleTag(HTML.Tag tag, MutableAttributeSet attrib, int pos) –Called for tags like IMG –tag stores the name of the tag –attrib stores any attributes –pos is the position in the file  Example: –The tag is img –The attributes are src, alt, align, width (with their respective values)

, Fall 2006IAT 800 handleStartTag  public void handleStartTag(HTML.Tag tag, MutableAttributeSet attrib, int pos) –Called for tags like BODY –tag stores the name of the tag –attrib stores any attributes –pos is the position in the file  Example: –The tag is body –The attributes are bgcolor, topmargin, leftmargin, marginheight (with their respective values)

, Fall 2006IAT 800 handleEndTag  public void handleEndTag(HTML.Tag tag, int pos) –Called for tags like –tag stores the name of the tag –pos is the position in the file

, Fall 2006IAT 800 handleText  public void handleText(char[] data, int pos) –Handles anything that’s not a tag (the text between tags) –data is an array of characters containing the text –pos is the position

, Fall 2006IAT 800 Filling in these methods  You fill in these methods to do whatever processing you want  In the image collage example –handleSimpleTag is looking for images –handleStartTag is looking for the start of anchors and follows links