Presentation is loading. Please wait.

Presentation is loading. Please wait.

winter 2001C.Watters1 Proxy Servers winter 2001C.Watters2 What is a Proxy Server? Intermediary server between clients and the actual server Proxy processes.

Similar presentations


Presentation on theme: "winter 2001C.Watters1 Proxy Servers winter 2001C.Watters2 What is a Proxy Server? Intermediary server between clients and the actual server Proxy processes."— Presentation transcript:

1

2 winter 2001C.Watters1 Proxy Servers

3 winter 2001C.Watters2 What is a Proxy Server? Intermediary server between clients and the actual server Proxy processes request Proxy processes response Intranet proxy may restrict ALL outbound/inbound requests the intranet server

4 winter 2001C.Watters3 What does a Proxy Server do?  Between client and server  Receives the client request  Decides if request will go on to the server  May have cache and may respond from cache  Acts as the client with respect to the server  Uses one of it’s own IP addresses to get page from server

5 winter 2001C.Watters4 Usual Uses for Proxies Firewalls Employee web use control (email etc) Web content filtering (kids) –Black lists (sites not allowed) –White lists (sites allowed) –Keyword filtering of page content

6 winter 2001C.Watters5 User Perspective  Proxy is invisible to the client  IP address of Proxy is the one used or the browser is configured to go there  Speed up retrieval if using caching  Can implement profiles or personalization

7 winter 2001C.Watters6 Main Proxy Functions  Caching  Firewall  Filtering  Logging

8 winter 2001C.Watters7 Web Cache Proxy  Our concern is not with browser cache !!  Store frequently used pages at proxy rather than request the server to find or create again  Why?  Reduce latency: faster to get from proxy & so makes the server seem more responsive  Reduce traffic: reduces traffic to actual server

9 winter 2001C.Watters8 Proxy Caches  Proxy cache serves hundreds/thousands of users  Corporate and intranets often use  Most popular requests are generated only once (remember Zipf)  Good news: Proxy cache hit rates often hit 50%  Bad news: stale content (stock quotes!)

10 winter 2001C.Watters9 How does a Web Cache work?  Set of rules in either or both  Proxy admin  HTTP header

11 winter 2001C.Watters10 Don’t Cache rules  HTTP header  Cache-control: max-age=xxx, must-revalidate  Expires: date….  Last-Modified: date…..  Pragma: no-cache (doesn’t really work!)  Object is authenticated or secure  Fails proxy filter rules  URL  Meta data  MIME type  contents

12 winter 2001C.Watters11 Getting from Cache  Use cache copy if it is fresh  Within date constraint  Used recently and modified date is not recent

13 winter 2001C.Watters12 2. Firewalls  Proxies for Security protection  More on this later

14 winter 2001C.Watters13 3. Filtering at the Proxy 1.URL lists (black and white lists) 2.Meta data 3.Content filters

15 Filtering label base Web doc URL lists keywords ratingsURLs ratings URLs

16 winter 2001C.Watters15 The Problem: the Web  1 billion documents (April 2000)  Average query is 2 words (eg., sara name)  Continual growth  Balance global indexing and access and unintentional access to inappropriate material

17 winter 2001C.Watters16 Filtering Application Types  Proxies  Black and white lists  Keyword profiles  labels

18 winter 2001C.Watters17 Black and White Lists  Black List : URLs proxy will not access  White List: URLs proxy will allow access

19 winter 2001C.Watters18 How is filtering/selection done?  Build a profile of preferences  Match input against the profile using rules

20 winter 2001C.Watters19 Black and White Lists  Black list of URLs  No access allowed  White list of URLs  Access permitted

21 winter 2001C.Watters20 LISTS in action  1 billion documents!  Who builds the lists  Who updates them  Frequency of updates

22 winter 2001C.Watters21 Labels  Meta data tags  Rule driven : PICS rules for example  Labels are part of document or separate  Separate = Label Bureau

23 winter 2001C.Watters22 Labels  Meta data (goes with page)  Label Bureau (stored separately from page)

24 winter 2001C.Watters23 Meta Data as part of HTML doc …… Browser and/or proxy interpret the meta data

25 winter 2001C.Watters24 Meta data apart from doc  Label bureaus  Request for a doc is also a request for labels from one or more label bureaus  Who makes the labels  Text analysis  Community of users  Creator of document

26 winter 2001C.Watters25 Labels:Collaborative Filtering Search Engine Web Site Label Bureau A Label Bureau B Rating Service Labels Author Labels

27 winter 2001C.Watters26 PICS and PICS rules  Tools for communities to use profiles and control/direct access  Structure designed by www3 consortium  Content designed by communities of users

28 winter 2001C.Watters27 PICS Rating Data (PICS1-1 “http//www.abc.org/r1.5” by “John Doe” labels on “1998.11.05” until “2000.11.01” for http://www.xyz.com/new.htmlhttp://www.xyz.com/new.html ratings (violence 2 blood 1 language 4) )

29 winter 2001C.Watters28 Using a URL list Filtering (PicsRule-1.1 (Policy (RejectByURL (http://www.xyz.com:*/*) Policy (AcceptIf “otherwise”) )

30 winter 2001C.Watters29 Using the PICS Data (PicsRule-1.1 (serviceinfo ( http://www.lablist.org/ratings/v1.html shortname “PTA” bureauURL http://www.lablist.org/ratings UseEmbedded “N” ) Policy (RejectIf “((PTA.violence >3) or (PTA.language >2))”) Policy (AcceptIf “otherwise”) )

31 winter 2001C.Watters30 Example: Medical PICS labels  Su – UMLS vocab word: 0-9999999  Aud- audience: 1-patient, 3 para,5 GP etc  Ty-information type: 5scient,3 patient,3 prod  C-country: 1-Can, 4 Afghan,etc  Etc  Ratings(su 0019186 aud 3:5 Ty 3 C 1)

32 winter 2001C.Watters31 User Profiles for Labels  Rules for interpreting ratings  Based on  User preferences  User access privileges  Who keeps these  Who updates these  How fine is the granularity

33 winter 2001C.Watters32 Labels and Digital Signatures Labels can also be used to carry digital Signature and authority information

34 Example (''byKey'' ((''N'' ''aba21241241='') (''E'' ''abcdefghijklmnop=''))) (''on'' ''1996.12.02T22:20-0000'') (''SigCrypto'' ''aba1241241=='')) (''Signature'' ''http://www.w3.org/TR/1998/REC-DSig- label/DSS-1_0'' (''ByName'' ''plipp@iaik.tu-graz.ac.at'') (''on'' ''1996.12.02T22:20-0000'') (''SigCrypto'' ((''R'' ''aba124124156'') (''S'' ''casdfkl3r489'')))))

35 winter 2001C.Watters34 Proxy level (hidden)

36 winter 2001C.Watters35 Text analysis of Page content  Proxy examines text of page before showing it  Generally keyword based  Profile of “black” and/or “white” keywords

37 winter 2001C.Watters36 Profiles for Text analysis  Keywords (+ weights sometimes)  “Reflect” interest of user or user group  May be used to eliminate pages  “All but”  May be used to select pages  “Only those”

38 winter 2001C.Watters37 Keyword matching algorithms 1.Extract keywords 2.Eliminate “noisy” words with stop list (1/3) 3.Stem (computer compute computation) 4.Match to profile 5.Evaluate “value” of match 6.Check against a threshold for match 7.Show or throw!

39 winter 2001C.Watters38 Stop List (35%) Thefor Ofon Andis Towith Inby Aas Bethis Willare Fromthat Orat Beenan Waswere Havehas It (27 words)

40 winter 2001C.Watters39 Matching Profile to Page Similarity?? How many profile terms occur in doc? How often? How many docs does term occur in? How important is the term to the profile?

41 winter 2001C.Watters40 Cosine Similarity Measurement Profile terms weighted PW (0,1)->importance Document terms weighted TW (0,1) –frequency in doc – frequency in whole set Overall closeness of doc to profile Sum(all profile terms)[TW *PW] ------------------------------------------------------------- SqRoot(sum(all profile terms)[TW 2 ]*[PW 2 ])

42 winter 2001C.Watters41 What works well?

43 winter 2001C.Watters42 What’s the problem? Site Labels Who does them Are they authentic Has the source changed A billion docs? Black and White lists Ditto Text analysis of page contents Poor results


Download ppt "winter 2001C.Watters1 Proxy Servers winter 2001C.Watters2 What is a Proxy Server? Intermediary server between clients and the actual server Proxy processes."

Similar presentations


Ads by Google