A Web Services Search Engine CS 8803 [AIA] - Spring 2008 Roland Krystian Alberciak Piotr Kozikowski Sudnya Padalikar Tushar Sugandhi
Outline Project Overview Searching Web-services o Tools / APIs o How to figure out what information to show Results :Working prototype o Locate, classify, rank, and present web-services System Integration o Diversity! Languages (no joke): Python, Ruby on Rails, PHP, C#, Java, Perl. Databases: MySQL, MSSQL
Project Overview Step 1 - There are web-services available on the web Step 2 - (Challanges) Obstacles to find WS vs. web pages because: Effort to Register Directories disconnected No Clustering available No Ranking available Step 3 - Profit Should be Beneficial for Web Developers Should be Beneficial for us
What is out there? Swoogle -“10,000 ontologies” (they are more concerned with “semantic web” and “metadata”, and not so much on web services) Programmableweb -726 (only APIs) "Yellow pages" web-services XMethods web-services UDDI - Discontinued but was useful to many web services to advertise themselves.
Survey of the Market-
We found solutions for Step 2! Step 1. Have web-services available on the web Step 2. (Solutions) Crawler, database, web application and a bunch of clustering algorithms and lots of "glue " Step 3. Our proposed solution - Web Slogger! - for us: content based advertising - for users: easy way to search for web-services
System Architecture
Crawling Yahoo! Why not Google? Restricted extraction: Could not extract many results What about Alexa? Couldn't afford it! :-) What did we crawl for?.wsdl and.asmx files How is Webslogger different from the Yellow Pages project (last year's class project)? Multiple Language support
Categorization and Clustering Glossaries Hierarchical Categirization (27 Categories) List of keywords for each category (2800 keywords) Web Service Partitioning By Importance Some sections in web service are more important than othe r e.g. Service Name / Operation Name is more important than message type name. Affinity Vector Weight assigned to each term in Webservice based on its mapping with Glossary Determines which web service belongs to which category
Ranking Insight Fundamental Difference: Web page ranking is based on inlinks and outlinks. Web service ranking should be based on objects and web methods. Recall: Our results are extracts from search engines. Therefore: We don't know how many pages link to a particular wsdl file. Search engine algorithms [ie. PageRank] have this data and can assert 'popularity', 'credibility' of hubs which locate sources. Resolution: We must find alternate ways to rank content
Ranking Options 1. Community Level: Collaborative Ranking: users can leave comments, Likert scale ranking rank good users / bad users in the community: experts 2. User Level: Usage statistic ranking: how long you view a wsdl do you go back to look at it again [since it is like an API...] inquire about what wsdl files they used to achieve a goal
Ranking Options..contd 3. Use Page Ranking provided by Google / Yahoo 4. File Level: Quality of file: "Do You Care if Your WSDL is W3C Compliant?" o Good format, thoroughness. Heuristics on model files. 5. Generate referral chain from WSDL o Understand citation network in order to determine valuable web services o Web services often use methods / objects from other web services. Use this linking to rank web services.
... element="xsd1:SubscriptionHeader"/>...
Future work Develop our own crawler Further improve clustering (there is always room for that!) Figure out an innovative (&& effective) way for ranking Location based clustering
Questions ?