Search for personal information using Yahoo BOSS by Evgeny Dosychev Dmitry Kichin Supervisor: Eddie Bortnikov.

Slides:



Advertisements
Similar presentations
Publishers Web Sites Standard Features. Objectives Access publishers websites Identify general features available on most publishers websites Know how.
Advertisements

Publisher: Name of service: License in place: within Service Type:
Managing References : Mendeley
Chapter 11 Designing the User Interface
Info Trac Features Many Full Text Articles Peer-Reviewed Scholarly Articles Video and audio files, transcripts Translates to Other Languages Print, ,
Dreamweaver MX 2004 “Viewing the Workspace” Mrs. Wilson.
HTML5 ETDs Edward A. Fox, Sung Hee Park, Nicholas Lynberg, Jesse Racer, Phil McElmurray Digital Library Research Laboratory Virginia Tech ETD 2010, June.
Search for personal information using Yahoo BOSS by Evgeny Dosychev Dmitry Kichin Supervisor: Eddie Bortnikov.
Finding Primary Source Documents The Student’s View.
Garland Library Online Orientation. Introduction  This portion of the Online orientation is intended to help library users gain the basic knowledge and.
Who is Giana Thomas? Intended Audience Friends and family.
 How many pages does it search?  How does it access all those pages?  How does it give us an answer so quickly?  How does it give us such accurate.
H YPERLINKING DIGITAL LIBRARIES ON THE WEB Juan Camilo Zapata ITEC – 810 Supervisor Robert Dale 1.
CIS392Semester Projects1 CIS392 Text Processing, Retrieval, and Mining Overview of Semester Projects.
Introducing Symposia : “ The digital repository that thinks like a librarian”
Using WilsonSelect. WilsonSelect (or WilsonSelectPlus) is a database of full-text articles from magazines and journals. It covers a very wide range of.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Chapter 13: Designing the User Interface
Help Manual for Tender Download on DAE portal.. Open the Internet Explorer and type the URL of the portal. In this case we are considering the example.
Managing references : Mendeley
Chapter 14 Designing the User Interface
Advanced Workgroup System. RED Advanced Workgroup Systems: Scan Features Copy Print Scan DNSG Software Our Customers Documents Our Customers Documents.
Garland Library Online Orientation. Introduction  This portion of the Online orientation is intended to help library users gain the basic knowledge and.
So – You want to learn how to put an advanced article submission (cut and paste) onto the state website. (Note: If you have not done so, you will need.
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
Students: Ilya Paskhover, Itay Gal Supervisors: Oleg Rokhlenko, Nadav Golbandi.
Lecturer: Ghadah Aldehim
SciFinder Web Version Pootorn R. Book Promotion & Service Co.,Ltd. Thailand.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Paul Mundy and Bob Huggan 1 Websites.
Programming with Microsoft Visual Basic 2012 Chapter 12: Web Applications.
Avalanche Internet Data Management System. Presentation plan 1. The problem to be solved 2. Description of the software needed 3. The solution 4. Avalanche.
Chapter 10 Accessing Authority Online. Signing On In a law office, your client is charged from the moment you sign on!
Lakeland Click arrow to advance show. Click on the “A” under “Listed By Name.” (“A” for Academic Search Database)
Audio and Video Chris McConnell Department of Radio-TV-Film November 30, 2006.
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
Lead Black Slide Powered by DeSiaMore1. 2 Chapter 8 Personal Productivity and Problem Solving.
Teaching system for advanced statistics I. Nagy FD ČVUT, Prague J. Homolová FD ČVUT, Prague E. Suzdaleva ÚTIA AV ČR,
Amy Dai Machine learning techniques for detecting topics in research papers.
Chapter 8 HTML Frames. 2 Principles of Web Design Chapter 8 Objectives Understand the benefits and drawbacks of frames Understand and use frames syntax.
Using the Internet for Work Medline: National Library of Medicine site (PubMed) Medline site Medline site good online help guide + can be downloaded /printed.
SEO. SEO Market Store Best Practice “The Rakuten Merchant Package for SEO will aid in improving the visibility of your store in search.” Getting Started.
SPRINGER ONLINE
1 EndNote X2 Your Bibliographic Management Tool 29 September 2009 Humanities and Social Sciences Resource Teams.
UoS Libraries 2011 EndNote X5 - basic graduate session.
ANNUAL REVIEWS
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Unit #7 Charts Questions? Comments?. MS PPT 2007: Presentations Made Easy; Planning and Preparing PowerPoint allows you to create a professional presentation.
GALILEO Tutorial ProQuest Search Basics Press a key or click the mouse button to advance to the next slide. July 2008.
Public Library Ebsco Database How to get full text educational articles from the Public Library.
IT’S OUR FAVORITES!! Delicious: It’s What’s for Dinner.
Steve Cassidy Computing at MacquarieNo 1 Searching The Web Steve Cassidy Centre for Language Technology Department of Computing Macquarie University.
Using Middle Search® Plus For Junior Academic Bowl Competitions.
Chapter 27 Getting “Web-ified” (Web Applications) Clearly Visual Basic: Programming with Visual Basic nd Edition.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Date of Presentation Name of Presenter Insert image _________ Toolkit.
Multimedia Web site development Plan your site Steps for creating web pages.
Web Wizard’s Cool Tools Darlene Fichter Data Library Coordinator, U of S Library.
Internet and Database Searching for Social Issues Joseph M. Compese Library Granada Hills Charter high School.
Data mining in web applications
Information Architecture
InfoTrac & PowerSearch: New User Interface and Features
Lesson 9 Windows Management
TC 310 The Computer in Technical Communication
Introduction to Information Retrieval
Visual recall of class information
TC 310 The Computer in Technical Communication
Information Retrieval and Web Design
Presentation transcript:

Search for personal information using Yahoo BOSS by Evgeny Dosychev Dmitry Kichin Supervisor: Eddie Bortnikov

HomePage Project Finding personal information in the web is not an easy task. We want to create an automatic tool that will find and present personal information for the requested person.

Technical Issues  We need an effective way to find information in the web. We will use Yahoo BOSS.  Personal information on the web is not in a standart format. We focus on working with IEEE pdf documents.

Technical Issues  How will we parse the info and identify the differnt details? PDF to Text - using special Java package. Using the standrt structure of the IEEE documents.  How will we avoid confusion between different people with the same name (name ambiguity)? Divide the info to clusters. Let the user make the choise between the clusters*.

Technologies Java Will be used to build the Windows desktop application. Yahoo! BOSS Provides free access to Yahoo search index. PDFbox Java library. Used for extracting text from PDF documents

BOSS Yahoo! Search BOSS (Build your Own Search Service) is a Yahoo! initiative that gives the developers free access to the Yahoo! Search index. The results can be supplied into the developer's application so that they can manipulate the resources according to their needs. Up to 500 results can be retrieved. Based on Wikipedia

HomePage functionality Desktop Java application. Gets from the user the search target. Searches the web using Yahoo! BOSS. Downloads and parses PDF documents and Images and produces HTML page with the information which was found. (Currently it is: , publication titles, publication short summary, images, and links to the full document)

HomePage functionality Devides the information to clusters (based on the key= ) Gets the user choise to decide which info fits. Produces HTML page with all the details.

Sceenshots

Clustering algorithm It is very hard to the computer to solve name ambiguity. We leave this task to the user. Each group of information items (cluster) will be defined by its key ( ) and the user make the choise. The result page will be produced from the chosen clusters

Workflow

Class Diagram

Flow Diagram

Challenges PDFbox appeared to be not reliable and problematic. It is not the best solution to PDF parsing. Perhaps the main challange was the semantic parsing (finding information in the text). We discovered that the sematic parsing by itself very problematic task, that requires time and resourses beyond the project scope.

Conclusions We learned the principle of the BOSS project, and used the power that it provides We prepared a well-designed object oriented infrastructure for the task. HomePage can be a good infrastructure for adding additional algorithms that find additional information in the texts. In order to extract and identify information from the text, we need to use specific algorithms and methods.