Presentation is loading. Please wait.

Presentation is loading. Please wait.

, Fall 2006IAT 800 Recursion, Web Crawling. , Fall 2006IAT 800 Today’s Nonsense  Recursion – Why is my head spinning?  Web Crawling – Recursing in HTML.

Similar presentations


Presentation on theme: ", Fall 2006IAT 800 Recursion, Web Crawling. , Fall 2006IAT 800 Today’s Nonsense  Recursion – Why is my head spinning?  Web Crawling – Recursing in HTML."— Presentation transcript:

1 , Fall 2006IAT 800 Recursion, Web Crawling

2 , Fall 2006IAT 800 Today’s Nonsense  Recursion – Why is my head spinning?  Web Crawling – Recursing in HTML Shortened class again today. Boy, your TA must be a total slacker, or something.

3 , Fall 2006IAT 800 Recursion  Recursion basically means calling a method from inside itself. int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } What the?!?

4 , Fall 2006IAT 800 Inside Itself?!  Let’s step through what happens. factorial(3);int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 3)

5 , Fall 2006IAT 800 Inside Itself?!  Let’s step through what happens. factorial(3);int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 3) int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 2)

6 , Fall 2006IAT 800 Inside Itself?!  Let’s step through what happens. factorial(3);int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 3)int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 2) int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 1)

7 , Fall 2006IAT 800 Inside Itself?!  Let’s step through what happens. factorial(3);int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 3)int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 2) 1

8 , Fall 2006IAT 800 Inside Itself?!  Let’s step through what happens. factorial(3);int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 3) int factorial(int n) { if(n > 1) { return n * 1; } else return 1; } (n = 2)

9 , Fall 2006IAT 800 Inside Itself?!  Let’s step through what happens. factorial(3);int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 3) int factorial(int n) { if(n > 1) { return 2 * 1; } else return 1; } (n = 2)

10 , Fall 2006IAT 800 Inside Itself?!  Let’s step through what happens. factorial(3);int factorial(int n) { if(n > 1) { return n * factorial(n-1); } else return 1; } (n = 3) 2

11 , Fall 2006IAT 800 Inside Itself?!  Let’s step through what happens. factorial(3);int factorial(int n) { if(n > 1) { return n * 2; } else return 1; } (n = 3)

12 , Fall 2006IAT 800 Inside Itself?!  Let’s step through what happens. factorial(3);int factorial(int n) { if(n > 1) { return 3 * 2; } else return 1; } (n = 3)

13 , Fall 2006IAT 800 Inside Itself?!  Let’s step through what happens. factorial(3);6

14 , Fall 2006IAT 800 Base Case  The most important thing to include in a recursive call is the “base case”, something that will assure that the function stops calling itself at some point.  In our example, we made sure it only called the recursive function if n was greater than 1, and each time we call it, n gets smaller, so we know it will eventually get to a number less than or equal to 1.

15 , Fall 2006IAT 800 Web Crawling  Let’s use recursion for something more interesting.  Say we have some method “parsePage”, that looks at a web page. Suppose we then want that method to follow the links on that page and parse the pages it is linked to.  We’d then want to call the “parsePage” method on those links from inside the parsePage method we have.

16 , Fall 2006IAT 800 Web Crawling

17 , Fall 2006IAT 800 Web Crawling

18 , Fall 2006IAT 800 Web Crawling  You can see here the need for a base case. One way of controlling our search is by placing a limit on the depth of the links we follow.  For instance, in the visual example, we followed the links from our start page (depth 1), and then the links from those pages (depth 2).

19 , Fall 2006IAT 800 Recursion Fig. 3: Remember—base cases prevent infinite cats. http://infinitecat.com/

20 , Fall 2006IAT 800 Parse HTML?  “Parsing” means to walk through the structure of a file (look at it word-by-word) –Look at an HTML file  The structure of an HTML file is the tag structure –So parsing means to walk through and interpret the tags  If you can parse HTML files, you can pull content out of web pages and do stuff with it  Procedural manipulation of web content Results 1 - 20 of about 202 for matrix red … some text == some text

21 , Fall 2006IAT 800 Basic approach  Use two classes to parse  One class reads info from a URL – HTMLParser  The other class is used by HTMLParser to process tags – child of HTMLEditorKit.ParserCallback  HTMLParser recognizes when a tag appears ( ) and calls appropriate methods on the ParserCallback class (start-tags, end-tags, simple-tags, text, etc.)  The programmer (ie. you), fill in the ParserCallback methods to do whatever you want when you see different kinds of tags

22 , Fall 2006IAT 800 Running the example  We’ve written HTMLParser for you  To access it, it must be in the data directory of your project  Simplest thing will be just to copy the code from the website and put the directory in your default sketchbook directory

23 , Fall 2006IAT 800 handleSimpleTag  public void handleSimpleTag(HTML.Tag tag, MutableAttributeSet attrib, int pos) –Called for tags like IMG –tag stores the name of the tag –attrib stores any attributes –pos is the position in the file  Example: –The tag is img –The attributes are src, alt, align, width (with their respective values)

24 , Fall 2006IAT 800 handleStartTag  public void handleStartTag(HTML.Tag tag, MutableAttributeSet attrib, int pos) –Called for tags like BODY –tag stores the name of the tag –attrib stores any attributes –pos is the position in the file  Example: –The tag is body –The attributes are bgcolor, topmargin, leftmargin, marginheight (with their respective values)

25 , Fall 2006IAT 800 handleEndTag  public void handleEndTag(HTML.Tag tag, int pos) –Called for tags like –tag stores the name of the tag –pos is the position in the file

26 , Fall 2006IAT 800 handleText  public void handleText(char[] data, int pos) –Handles anything that’s not a tag (the text between tags) –data is an array of characters containing the text –pos is the position

27 , Fall 2006IAT 800 Filling in these methods  You fill in these methods to do whatever processing you want  In the image collage example –handleSimpleTag is looking for images –handleStartTag is looking for the start of anchors and follows links


Download ppt ", Fall 2006IAT 800 Recursion, Web Crawling. , Fall 2006IAT 800 Today’s Nonsense  Recursion – Why is my head spinning?  Web Crawling – Recursing in HTML."

Similar presentations


Ads by Google