LDAP crawlers use cases, dangers and how to cope with them 2 nd OpenLDAP Developers Day, Vienna, July 18, 2003 Peter Gietz

Slides:



Advertisements
Similar presentations
1 Knowledge Technologies for a Semantic Web: The Role of Directories TERENA Networking Conference Limerick, 3 June 2002 Peter Gietz
Advertisements

Getting Your Web Site Found. Meta Tags Description Tag This allows you to influence the description of your page with the web crawlers.
EAP Channel Bindings Charles Clancy Katrin Hoeper IETF 76 Hiroshima, Japan November 08-13, 2009.
SIP issues with S/MIME and CMS Rohan Mahy SIP, SIPPING co-chair.
Sponsored by the National Science Foundation 1 Activities this trimester 0.5 revision of Operational Security Plan Independently (from GPO) developing.
Domain Name System. DNS is a client/server protocol which provides Name to IP Address Resolution.
Crawling the WEB Representation and Management of Data on the Internet.
GGF2 -GIS WG \ GOS Grid Object Specification Presented by Gregor von Laszewski Developed under discussion by the whole working group and more July, 2001.
Crawling The Web. Motivation By crawling the Web, data is retrieved from the Web and stored in local repositories Most common example: search engines,
Long-term Archive Service Requirements draft-ietf-ltans-reqs-00.txt.
CORDRA Philip V.W. Dodds March The “Problem Space” The SCORM framework specifies how to develop and deploy content objects that can be shared and.
LDAP LIGHT WEIGHT DIRECTORY ACCESS PROTOCOL PRESENTATION BY ALAKESH APURVA DHAN AND ASH.
©Copyright 1999 Peter Shipley LDAP Security Peter Shipley Chief Security Architect
Configuration Management Supplement 67 Robert Horn, Agfa Healthcare.
03/07/08 © 2008 DSR and LDAP Authentication Avocent Technical Support.
How Do I Find a Job to Apply to?
Requirements for DSML 2.0. Summary RFC 2251 fidelity Represent existing directory protocols with new transport syntax Backwards compatibility with DSML.
Lecture 7 Page 1 CS 236 Online Password Management Limit login attempts Encrypt your passwords Protecting the password file Forgotten passwords Generating.
LDAP: Information Model Part 2 CNS 4650 Fall 2004 Rev. 2.
Wasim Rangoonwala ID# CS-460 Computer Security “Privacy is the claim of individuals, groups or institutions to determine for themselves when,
1 Privacy issues on pan-European White Pages service 4rd TF-LSD Meeting Amsterdam, Peter Gietz
1 LDAP based repositories for Metadata and Ontologies NetLab & Friends Conference Lund, 10. April 2002 Peter Gietz
23/4/2001LDAP Overview - HEPix - LAL 2001 LDAP Overview HEPix – LAL Apr Michel Jouvin
The EU DataGrid – Information and Monitoring Services The European DataGrid Project Team
Linux Technology Center 18 April 2003 © 2003 IBM LDAP Content Synchronization Kurt D. ZeilengaJong Hyuk Choi OpenLDAP ProjectIBM Research Title slide.
LDAP Search Criteria Fall 2004 Rev. 2. LDAP Searches Can be performed on Single directory entry Contents of a single container Entire subtree Required.
GRID Centralized management of the Globus grid-mapfile Carlo Rocca INFN, Catania.
LIGHT WEIGHT DIRECTORY ACCESS PROTOCOL Presented by Chaithra H.T.
Crawlers and Spiders The Web Web crawler Indexer Search User Indexes Query Engine 1.
Moodle (Course Management Systems). Assignments 1 Assignments are a refreshingly simple method for collecting student work. They are a simple and flexible.
1 Strassner-Policy Theory and Practice – IM2001 Purpose of the PCIM Provide a set of classes and relationships that provide an extensible means for defining.
Crawling Slides adapted from
NEEO Technical Workshop 2 Exchange of usage metadata Sciences Po, Paris January 15th, 2009 Benoit PAUWELS Université Libre de Bruxelles (ULB) Brussels.
18-jan-962. ETH-W4 (ra)1 security on the Web l security l authentication l privacy.
Firewall Technologies Prepared by: Dalia Al Dabbagh Manar Abd Al- Rhman University of Palestine
LDAP: LDIF & DSML Fall 2004 Rev. 2. LDIF Light-weight Data Interchange Format RFC 2849 Common format to exchange data entry schema.
Implementing LDAP Client/Server System for Directory Service By Maochun Sun Project Advisor: Dr. Chung-E Wang Department of Computer Science California.
LDAP Items
1 Crawling The Web. 2 Motivation By crawling the Web, data is retrieved from the Web and stored in local repositories Most common example: search engines,
The LDAP Schema Registry and its requirements on Slapd development OpenLDAP Developers' Day San Francisco 21 March 2003 Peter Gietz, DAASI International.
June 6, CRISP Overview and Update Andrew Newton VeriSign Labs
0 SharePoint Search 2013 Rafael de la Cruz SharePoint Developer Seneca Resources twitter.com/delacruz_rafael
LDAP (Lightweight Directory Access Protocol ) Speaker: Chang-Yu Wu Adviser: Quincy Wu Date:2007/08/22.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
LDAP/TIO implementations -2- Overview of TIO-index implementations Henny Bekker The DAG, GIDS and Desire TIO/LDAP index servers.
LDAP: Accessing Operational Information CNS 4650 Fall 2004 Rev. 2.
Protocols for public-key management. Key management –two problems Distribution of public keys (for public- key cryptography) Distribution of secret keys.
1 Network Firewalls CSCI Web Security Spring 2003 Presented By Yasir Zahur.
X-ASVP Executive Overview eXtensible Anti-spam Verification Protocol X-ASVP Committee Technical Working Group July 25, 2007.
1 News about the privacy document 6 th TF-LSD Meeting Limerick Peter Gietz
GRID Centralized Management of the Globus grid-mapfile Carlo Rocca, INFN Catania.
RDAP Andy Newton, Chief Engineer. Background WHOIS (Port 43) – Old, very old – Lot’s of problems Under specified, no I18N, insecure, no authentication,
1 Crawler policy document 6 th TF-LSD Meeting Limerick Peter Gietz
Session Traversal Utilities for NAT (STUN) IETF-92 Dallas, March 26, 2015 draft-ietf-tram-stunbis Marc Petit-Huguenin, Gonzalo Salgueiro.
X-ASVP Technical Overview eXtensible Anti-spam Verification Protocol X-ASVP Committee Technical Working Group July 22, 2007.
Currently Open Issues in the MIPv6 Base RFC MIPv6 security design team.
ASSIGNMENT 2 Salim Malakouti. Ticketing Website  User submits tickets  Admins answer tickets or take appropriate actions.
Directory Services CS5493/7493. Directory Services Directory services represent a technological breakthrough by integrating into a single management tool:
CS4432: Database Systems II
1 Crawling Slides adapted from – Information Retrieval and Web Search, Stanford University, Christopher Manning and Prabhakar Raghavan.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Information Systems Design and Development Security Precautions Computing Science.
LDAP: Creating Object Classes and Attributes CNS 4650 Fall 2004 Rev. 2.
Module 11: File Structure
Introduction to LDAP Frank A. Kuse.
Index Object Schema and Replication Infrastructure
2nd TF-LSD meeting, Amsterdam, 2. February 2001
EGEE Middleware: gLite Information Systems (IS)
WebDAV Design Overview
LDAP LIGHT WEIGHT DIRECTORY ACCESS PROTOCOL
Presentation transcript:

LDAP crawlers use cases, dangers and how to cope with them 2 nd OpenLDAP Developers Day, Vienna, July 18, 2003 Peter Gietz

2 © 2003 DAASI International Agenda  What can Crawlers do?  Proposal for crawler policy definition  Crawler detection mechanisms that could be implemented in the server

3 © 2003 DAASI International What is an LDAP crawler?  Just like a web crawler it crawls as much data as it can  To circumvent sizelimits a crawler does not do subtree searches, but one level searches  They follow all referrals and can start crawling the whole naming context of the referred to server  They could use DNS SRV records to find additional servers

4 © 2003 DAASI International Good or bad?  Thus crawlers can be a threat  when used by bad people, like spammers  But only on a public server with anonymous access  Or they could even hack non public servers  From privacy perspective there is a difference between  reading single public entries and  If it stills hits a size or time limit, it can do partial searches like (cn=a*), (cn=b*), etc. If it still hits a limit, it can do (cn=aa*), (cn=ab*), etc. –And so on...  getting all the data into one file  Crawlers can be usefull though, e.g. in an indexing context

5 © 2003 DAASI International Who uses crawlers?  By now crawlers are mostely used by „good people“ that work on indexing services  Sometimes web crawlers hit a web2LDAP gateway and get data  I have not yet seen an LDAP crawler that was used by a spammer  E.g.: the directory for the German research community (AMBIX, ambix2002.directory.dfn.de) we store fake entries with working -addresses that were never published elsewhere  We never got any spam at those addresses!  This might change in future!!!

6 © 2003 DAASI International So what can be done against that threat?  To distinguish between „good“ and „bad“ crawlers  we should use a similiar approach as in the Web: robots.txt  Crawler policy tells the crawler which behaviour is wanted and which is not  LDAP crawler policy should be stored in the LDAP server  To block „bad“ crawlers that don‘t care about the policy, we additionally need crawler detection

7 © 2003 DAASI International Crawler policy proposal  Originally part of the specs of the SUDALIS crawler (a crawler developed by DAASI for Surfnet)  Implementation of crawler policy is on its way  Strategy was to have a very flexible way to define policy  The current proposal is quite old and may need a little renovation since a lot has changed in the LDAP world since then  I didn‘t have time yet to do this update  Your comments are most welcome  I will put the specs document online

8 © 2003 DAASI International Root DSE Attributes ( sudalis-attributetypes.1 NAME ´supportedCrawlerPolicies´ EQUALITY objectIdentifierMatch SYNTAX numericOID USAGE directoryOperation )  Just in case that there will be different crawlerpolicy formats

9 © 2003 DAASI International Root DSE Attributes contd. ( sudalis-attributetypes.2 NAME ´indexAreas´ EQUALITY distinguishedNameMatch SYNTAX DN USAGE directoryOperation )  Pointer to the subtrees that are to be indexed  All other parts of the DIT are to be ignored by the crawler  If this attribute is empty, the naming contexts are crawled instead

10 © 2003 DAASI International Object class indexSubentry ( sudalis-objectclasses.1 NAME ´indexSubentry´ DESC ´defines index crawler policy’ SUP TOP STRUCTURAL MUST ( cn ) MAY ( indexCrawlerDN $ indexCrawlerAuthMethod $ indexObjectClasses $ indexAttributes $ indexFilter $ indexAreaLevels $ indexCrawlerVisitFrequency $ indexDescription ) )  These entries specify the policy  Alternative could be SUP subentry

11 © 2003 DAASI International Attribute Type indexCrawlerDN ( sudalis-attributetypes.3 NAME ´indexCrawlerDN´ EQUALITY distinguishedNameMatch SYNTAX DN ) # USAGE directoryOperation )  Defines for which crawler(s) this policy is meant  Several subentries for different crawlers  If this attribute is empty, the policy is meant for all crawlers  If not empty, the crawler has to bind with this DN

12 © 2003 DAASI International Attribute Type indexCrawlerAuthMethod ( sudalis-attributetypes.4 NAME ´indexCrawlerAuthMethod´ SYNTAX directoryString EQUALITY caseIgnoreMatch ) # USAGE directoryOperation )  Defines the authentication method the crawler has to use

13 © 2003 DAASI International Attribute Type indexObjectClasses ( sudalis-attributetypes.5 NAME ´indexObjectClasses´ SYNTAX OID EQUALITY objectIdentifierMatch ) # USAGE directoryOperation )  Defines which objectclass attribute values to include in the index.  No Filter criteria!  Models the LDIF entry that is to be put into the index  Needed to prevent the crawler from storing internal objectclasses into the index

14 © 2003 DAASI International Attribute Type indexAttributes ( sudalis-attributetypes.6 NAME ´indexAttributes´ SYNTAX OID EQUALITY objectIdentifierMatch ) # USAGE directoryOperation )  Defines which attributes to crawl.  The crawler must not take any other attributes

15 © 2003 DAASI International Attribute Type indexFilter ( sudalis-attributetypes.7 NAME ´indexFilter´ SYNTAX directoryString EQUALITY caseExactMatch SINGLE-VALUE ) # USAGE directoryOperation )  Filter that MUST be used by the crawler

16 © 2003 DAASI International Attribute Type indexAreaLevels ( sudalis-attributetypes.8 NAME ´indexAreaLevels´ SYNTAX INTEGER EQUALITY integerMatch SINGLE-VALUE ) # USAGE directoryOperation )  Number of hierarchy levels to crawl  If 0 the crawler MUST leave this subtree of the DIT  If empty no restrictions for the crawler as to the depth

17 © 2003 DAASI International Attribute Type indexCrawlerVisitFrequency ( sudalis-attributetypes.9 NAME ´indexCrawlerVisitFrequency´ SYNTAX INTEGER EQUALITY integerMatch SINGLE-VALUE ) # USAGE directoryOperation )  defines how often the data of the specified subtree are to be crawled  The value represents a timeperiod in seconds  The crawler MAY crawl less frequent  but MUST NOT crawl more frequent than stated here

18 © 2003 DAASI International Attribute Type indexDescription ( sudalis-attributetypes.10 NAME ´indexDescription´ SYNTAX directoryString EQUALITY caseExactMatch SINGLE-VALUE ) # USAGE directoryOperation )  Human readable description of the policy defined in the subentry

19 © 2003 DAASI International Crawler Policy and Access control  Crawler policy is interpreted by client  Access control is interpreted by server  Access control should be used to enforce crawler policy

20 © 2003 DAASI International Crawler registration  A crawler can register to a server by providing the following data:  Name of the Crawler  Description of the index the crawler collects data for  URI where to access the index  Pointer to a privacy statement about how the data will be used. This statement should comply to the P3P standard (  address of the crawler manager  Method and needed data (public key) for encrypted (PGP or S/MIME)

21 © 2003 DAASI International Crawler registration contd.  The data from the crawler will be entered in a dedicated entry, together with additional information:  Date of registration  Pointer to the person who made the decision  Date of last visit of the crawler  …  Password  This entry will be used for the indexCrawlerDN

22 © 2003 DAASI International Crawler Detection  So what about not conforming crawlers?  They have to be detected on the server side.  If crawler detection has no success this might be the end of public directories

23 © 2003 DAASI International Crawler characteristics  IP address can be used for identification  Regular requests over a certain period of time  Humans are slower, so more than X requests per 10 seconds must be a crawler  Paterns in searching:  Onelevel search at every entry given back in the former result  „(cn=a*)“, etc.  Known Spammer IPs could be blocked  Known spamming countries could be blocked as well

24 © 2003 DAASI International The further future  What if intelligent crawlers will try to hide their characteristics?  Random sleep in between requests ...  How public should this discussion take place?  What else could we do ?  Ideas wanted and needed !  Even more needed: Implementation of simple crawler detection as describe above in OpenLDAP !

25 © 2003 DAASI International BTW: Schema Registry Project update  Work almost finished  BarBoF Meeting at the IETF, deciding:  Either publish old drafts as RFC  Or Chris and Peter work on the drafts before to resolve some issues  There is currently no need for a new WG  Project Web-site now is:  Service still experimental  Pilot service will start beginning of August  Last document on business modell will come soon too

26 © 2003 DAASI International Thanks for your attention  Any Questions?  More info at: 