Download presentation
Presentation is loading. Please wait.
1
Getting a Web Page (And what to do once you’ve got it)
2
Fetching a web page Core utility: java.net.URL Abstract representation of resource name and location Parses and stores parts of URL Brokers different protocol types Accessing content: java.net.URLConnection Opens, tracks, closes network connections Provides content data Provides meta-data
3
The web fetch cycle URL u=new URL(“http://www.cs.unm.edu/~terran/”); Breaks out parts of URL protocol: “http” host: “www.cs.unm.edu” authority: “www.cs.unm.edu” file: “/~terran” path: “/~terran” Sets protocol handler handler: HttpHandler URLConnection con=u.openConnection() Generates protocol-specific connection handler return new handler.openConnection(this) return new HttpURLConnection(this) con.connect() Opens socket connection to hostRequests metadata con.getHeaderFields() Requests resource content; returns stream view of data con.getInputStream()
4
Design principles of URL Encapsulation Hides details of decomposing URL Hides details of connecting to network, protocol negotiations, etc. Abstraction: URLConnection is abstract version of many diff concrete connection classes “Make common things easy” #1 most common thing you want to do w/ a URL: convert String to URL # 2 most common: load the page Supported by simple methods: openConnection(), getInputStream(), etc.
5
Design Patterns of URL Delegator: URL/URLConnection don’t really do the work Instead, figures out what the correct protocol is and delegates hard work to that URL constructs an appropriate URLConnection object on the fly HTTPConnection, JARConnection, FTPConnection, etc. That object knows how to handle its own type of content
6
Parsing pages Much harder design problem: Want to offer a library API for parsing HTML Want it to encode all knowledge about structure of HTML (and eccentricities of HTML!) Want user of API to not have to think about details of HTML much/at all BUT also want it to be flexible Want to allow user of API to customize it to handle any sort of data processing on the HTML pages Design goals are in conflict...
7
Resolving the conflict Key idea: note that there are two separate, but intertwined, tasks here: Understanding the structure of HTML (parsing) Knowing what to do with the content (analysis) Idea: separate them into different objects Parsing object only knows how to break down HTML page into its parts; knows nothing about what to do with data inside the parts Analysis object only knows what to do with the data inside the parts when they’re fed to it; knows nothing about parsing HTML
8
New problem The two objects have to work together very intimately Parser needs to use methods from the analysis object But want user to be able to customize analysis object
9
The Callback Pattern Answer: provide an interface that defines the signatures of the analysis (data handling) methods This is called a callback interface The parser code “calls back” into user-defined code Write the parser object in terms of this interface Invoke the parser with an instance of this interface As parser recognizes (processes) each element of the HTML page, it messages (hands) the data part of that element to the appropriate callback method
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.