Ryerson University Library and Archives Searching the Deep Web Winter 2012
Virtual Parking Lot If you should have questions that are either too time consuming, theoretical or technical in nature to be addressed in this introductory session, then your question to Jay Wolofsky … the answer to your question(s) will be shared with the group.
The Deep Web The Deep Web is currently 400 to 500 times larger than the commonly defined Surface Web or WWW (7,500 terabytes of information compared to 19 terabytes of information in the Surface Web and is growing exponentially
Deep Web/Surface Web The Deep Web (a.k.a.) the Invisible Web contains high quality information not accessible from conventional conventional search engines such as Google
Deep Web/Surface Web Structured information contained in research databases cannot be accessed from the Surface Web
Deep Web/Surface Web The real problem is the spidering and crawling technology used by conventional search engines that return links based on popularity, not content Surface Web search results are ranked by the Frequency documents link to each other (page rank) The first results are those that have had the most references by other documents, and not necessarily the most relevant or recent Information or content
Federated Search Engines \ Federated search engines execute simultaneous real time search of the Deep Web using sophisticated software “connectors ” The results are collated and presented back to the user in a unified format
Federated Search Engines One type, a ‘web spider variant’ crawls information from from as many databases as possible creating a giant uniform index, e.g. Google ScholarGoogle Scholar A more advanced type searches across each database’s own indexing AND crawls information, e.g., Biznar, Mednar, DeepDyveBiznarMednarDeepDyve
Federated Search Engines There are 3 general types: The first type searches across each database using its own indexing The second type ‘web spider’ crawls Information from as many databases as possible creating a giant uniform index, e.g. Google Scholar, OpenDOARGoogle ScholarOpenDOAR The third type searches across each database’s own indexing AND crawls information, e.g. Biznar, Mednar, DeepDyve …BiznarMednarDeepDyve
Accessing Deep Web Content BiznarBiznar (Business) DeepDyveDeepDyve (Multidisciplinary) E-Print Network E-Print Network (Science and Technology) Google ScholarGoogle Scholar (Multidisciplinary) Highbeam Highbeam (Multidisciplinary) HighWireHighWire (Multidisciplinary) MednarMednar (Medicine) MetaPressMetaPress (Multidisciplinary) OpenDOAROpenDOAR (Multidisciplinary) Science.gov Science.gov (Science and Technology) ScirusScirus (Science and Technology) Social Science Research NetworkSocial Science Research Network (Social Sciences) World Wide ScienceWorld Wide Science (Science and Technology)