Applying Grid Computing Research to Commercial IR Applications Presented by Carl Sylvia, SBIR Project Manager Deep Web Technologies, LLC GGF-14 – June.

Slides:



Advertisements
Similar presentations
Open Source Intelligence: Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC IOP 06 Sheraton Premier, Tysons Corner, Virginia January.
Advertisements

Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Lorrie Apple Johnson Lead Librarian, Information Analysis & Services Office of Scientific and Technical Information (OSTI) National Academy of Sciences.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Environmental Terminology System and Services (ETSS) June 2007.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Internet Resources Discovery (IRD) IBM DB2 Digital Library Thanks to Zvika Michnik and Avital Greenberg.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Globus 4 Guy Warner NeSC Training.
Federated Search: True Enterprise Search Abe Lederman, President and CTO Deep Web Technologies Search Engine Meeting – April 28-29, 2008.
Global Discovery: Turning Vision into Reality Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC Symposium: Global Discovery on the.
What is BAM?. :Contents *Definition *Description *Goals and benefits *BAM Applications *BAM components.
L/O/G/O Metadata Business Intelligence Erwin Moeyaert.
Abe Lederman, President and CTO Deep Web Technologies 2008 STIP Working Meeting, April 23, 2008 Federated Search: The Technology For Making Global Discovery.
U.S. Department of the Interior U.S. Geological Survey Biodiversity Information Serving Our Nation (BISON): A National Resource for Species Occurrence.
Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.
Hussein Suleman University of Cape Town Department of Computer Science Advanced Information Management Laboratory High Performance.
© 2012 Deep Web Technologies, Inc. 03 December 2012 By Abe Lederman, CEO Deep Web Technologies Show and Tell Presentation to.
Five Years InterLab ’07 Los Alamos, New Mexico October 1–3, 2007 Valerie S. Allen, MSLIS U.S. Department of Energy Office of Scientific and.
Science Research: Journey to 10,000 Sources Presented by: Abe Lederman, President and Founder Deep Web Technologies, Inc. Special Libraries Association.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
© 2010 Deep Web Technologies, Inc. By Abe Lederman President and CTO Explorit Federated Search.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
© 2009 Deep Web Technologies, Inc. Federated Search Presentation Explorit Research Accelerator Focus Deep. Get Results.
Not All Federated Searches are Created Equal Abe Lederman, President and CTO Deep Web Technologies Thomson Scientific Government Event, April 10, 2008.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
You Found It ! A Wealth of Government Science Information A Wealth of Government Science Information C’mon in ! We’ll show you!
Abe Lederman, President and CTO Deep Web Technologies, Inc. ScienceEducation.gov Meeting National Academy of Sciences, March 18, 2009 A Look at the Technology.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
SLIDE 1DID Meeting - Montreal Integrating Data Mining and Data Management Technologies for Scholarly Inquiry Ray R. Larson University of California,
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
Sharon M. Jordan Assistant Director for Program Integration U.S. DOE Office of Scientific & Technical Information Vantage Point: Government R&D Results.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Uniting Global Information with Federated Search Abe Lederman, President, Deep Web Technologies Dr. Rosanne Hessmiller, CEO, Ferguson-Lynch Presentation.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
1 OSTI - Accelerating Science Information Dr. Walter L. Warnick Director U.S. Department of Energy Office of Scientific and Technical Information Federal.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Enabling the Future Service-Oriented Internet (EFSOI 2008) Supporting end-to-end resource virtualization for Web 2.0 applications using Service Oriented.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Deep Web Technologies Presentation to Gale for PowerSearchPlus Abe Lederman, President and Founder Maxine Swisa, Vice President of Engineering May 18,
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Leveraging Publisher’s Search Engines to Deliver Relevant Results to Users Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC 28 th.
Advancing Science: OSTI’s Current and Future Search Strategies Jeff Given IT Operations Manager Computer Protection Program Manager Office of Scientific.
Providing web services to mobile users: The architecture design of an m-service portal Minder Chen - Dongsong Zhang - Lina Zhou Presented by: Juan M. Cubillos.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Semantics and the EPA System of Registries Gail Hodge IIa/ Consultant to the U.S. Environmental Protection Agency 18 April 2007.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
8a Certified. About Us  Headquarters in Vienna, VA  Service Disabled Veteran-owned Small Business  SBA 8(a) program participant  Small Disadvantaged.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Joseph JaJa, Mike Smorul, and Sangchul Song
SCALABLE OPEN ACCESS Hussein Suleman
Grid Services B.Ramamurthy 12/28/2018 B.Ramamurthy.
Project Information Management Jiwei Ma
Fedora Filling the “Sweet Spot” in the Information Landscape
Malte Dreyer – Matthias Razum
Uniting Global Information with Federated Search
Uniting Global Information with Federated Search
Presentation transcript:

Applying Grid Computing Research to Commercial IR Applications Presented by Carl Sylvia, SBIR Project Manager Deep Web Technologies, LLC GGF-14 – June 28, 2005

Deep Web Technologies… Founded in 2002 by Abe Lederman an expert in the field of information retrieval Specialize in federated search and ranking of “deep web” content Build custom solutions based on proprietary Distributed Explorit TM software Develop and maintain highly visible innovative applications for DOE OSTI –science.gov –e-prints network Developing next generation search technologies for DOE OSTI.

The Deep Web Generally consists of high quality managed content scientific, technical, and business documents and data Much larger than the “surface web” Often requires interaction with cgi/web service May require authentication and authorization for access Unreachable by standard web crawling / indexing approach

Who's Interested in the DW? The Federal government spends over $127B a year on R&D CENDI is an interagency working group of Scientific and Technical Information Managers from 12 federal agencies CENDI members represent over 96% of the total federal R&D funding

Who's Interested Cont' Members of CENDI include –DOE, DOD, EPA, DOI, DOC Currently have 10s of millions of technical documents stored in the deep web Mandate for making results of publicly funded research available to citizens

OSTI ( DOE Office of Scientific and Technical Information Tasked with making the vast quantities of quality R&D output available across organizations and to the “science attentive citizen” Funds the development and maintenance of e-prints and science.gov search portals Currently funding our next generation grid solution for deep web searching

Science.gov Flagship search engine High quality managed content Scientific, technical, and business information Science.gov alliance Members –DOE, DOD, USGS, USFS, NASA

DWT’s DOE SBIR Project Distributed Relevance Ranking in Heterogeneous Document Collections Phase I – August 2003 to April 2004 Phase II – July 2004 to July 2006 Science.gov Fall '04 Science.gov 4.0 – Fall '05 Phase II consultant - Professor Geoffrey Fox

Project Goals Perform precision searching across hundreds of heterogeneous sources Return higher percentage of the most relevant documents (recall and precision) –Analyze richer set of meta data –Selectively download and index full-text documents –Customize ranking algorithms to improve precision and recall Support mining and analysis of search results –Multi-level filtering

Problem Description Distributed content –We do not own the content being searched –Resides at content owners' facilities Heterogeneous content –Service level –Access methods –Content quality Large quantities of streaming data –10's of millions of documents (over 47 million currently searched by science.gov)

Our Grid-Based Solution Uses open standards (Web Services, WSDL, SOAP, XML) Runs on distributed nodes Is platform independent (Java based) Enables scalability Very flexible, providing a framework for integration of various filtering and analysis tools Powered by “hierarchical filter grids”

Distributing the Workload as Grid Services

Filter Services FS = BFS Multiple filters may exist at each level of processing to define composite filters, provide service replication, maximize throughput and eliminate bottlenecks through each filtering layer

Science.gov 4.0 Vision Use r

Three-Pronged Approach QuickRank – Ranks results based on occurrence of search terms in title and snippet (Science.gov 2.0 – May ’04) MetaRank – Ranks results utilizing custom algorithms applied to meta- data (Science.gov 3.0 – Fall ’05) Deep Rank – Downloads and indexes full-text documents (Science.gov 4.0 – Fall ’06) HEAVY LIFTING REQUIRED!

Science.gov Feature Timeline Clustering, Summarization Deep Rank MetaRank Fielded & Boolean Searching Alerts QuickRank Federated Search 4.0 Fall ’ Fall ’ Feb ‘ May ’ Dec ’02 Version Feature

Science.gov v3.0 Alpha 1.1 Powered by 4-node Grid distributed across three locations

General Use Case Science.gov is an instance of a solution with much broad applicability Large numbers of distributed streaming data sources with significant variation in service levels Application of one or more filters (computation) to these data streams Aggregation of filtered results Clear concise presentation of filtered output

Use Case Cont' Property F(F(a)+F(b)) == F(a + b) Many streaming data analysis applications have this property –Digital archives / libraries –Bioinformatics –Particle physics –Sensor nets

Necessity of Standards Interoperable search Workflow / scheduling Security Stateful resources Notification Addressing Registry / Resource Discovery Data Access Monitoring

Beyond science.gov DWT would be interested in formalizing this use case for the ggf community GGF Working group to address this problem domain? Development of standards to facilitate interoperability for both search and results analysis Actively searching for applications of this technology within both government and corporate organizations

Conclusions Current R&D efforts at DWT provide a real-world application for grid computing that will be available to the general public The architecture on which the next generation science.gov portal is based may have broad applicability This application may be formalized as a use case for related grid applications Thank both DOE and OSTI for their investment in this project Keep an eye on science.gov

For More Information, Contact Carl Sylvia Senior Software Engineer (505)