GEONSearch: From Searching to Recommending GeoInformatics 2006 May 10-12, Reston, Virginia Ullas Nambiar, Bertram Ludaescher Dept. of Computer Science University of California, Davis Ghulam Memon, Dogan Seber, Chaitan Baru San Diego Supercomputer Center University of California, San Diego
NSF Large ITR project – collaborative effort GEON is creating an IT infrastructure to “enable” interdisciplinary geoscience research -- not a group of researchers, but the entire community will benefit Support efficient Knowledge Discovery from GeoScientific Data – GEONPortal provided as a Web-based tool for knowledge discovery
Mapping Services ArcIMS WMS WFS Logging Services Usage Stats Collection & Analysis Data Services DB2, Postgres mySQL OpenDAP SRB Data Registration Services Indexing Services Spatial Temporal Conceptual Data Integration Services Ontology Enabled Integration Computational & Modeling Services Modeling, Analysis Tools Metadata Services GEON Catalog Others RegistrationGEONsearch GEONworkbench workflow, visualization, HPC Web/Grid Services Interfaces (WSDL) Physical Grid RedHat Linux, ROCKS, OGSI, Internet, I2, OptIPuter (planned) Other Core Services GridFTP OGSA-DAI CSF GEON: GEOsciences Network Slide adapted from Dr Dogan Seber, SDSC
Scientific Knowledge Discovery Source: Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R., Advances in Knowledge Discovery and Data Mining, GEON Network GEONSearch GEON Map/Query Integration & Tools
Prototypical query over GEON Query: “Find Gravity data for regions near Rocky Mountains where geologic age is Jurassic” What is needed ? – Definition of “regions near” Rocky Mountains Approximate “regions near” as “Rocky Mountain States” Colorado, Idaho, Montana, Nevada, Utah and Wyoming – Geologic age data for above regions – Gravity measurements taken in those regions – Map of continental US or that of Rocky Mountain States How to obtain it ? – From GEON and with the help of GEONSearch
Current GEONSearch Features GEONSearch is the resource discovery tool available under GEON Portal Allows users to retrieve datasets by querying its – Keywords, Title and Description – Metadata Search the Subject Taxonomy under which datasets are classified – Spatial Coverage A bounding box of region covered by a registered dataset – Temporal Coverage Age of objects evaluated or time when evaluations done – Concepts from Ontology Concepts to which dataset or items contained in dataset are mapped to during registration
Keyword Search in GEON Rockies gravity data or Colorado gravity or Montana gravity or gravity data …. Rocky Mountain Region Map or Colorado Map or state maps…. Rockies Jurassic or Colorado Jurassic or Jurassic …… Too few answers Too many answers
Advanced GEONSearch Enter the potential subject group of datasets ? Datasets containing a city in Colorado, Idaho, etc Data with age Jurassic Datasets mapped to concept mountains or gravity or geologic age
Improving GEONSearch The challenge before GEONSearch – Reduce iterative querying effort from Scientists The solution – Suggest “similar queries” Queries with more/less keywords Queries that likely to have highly similar answers – Suggest “related answers” for every result based on Spatial proximity Temporal proximity Usage patterns Common Ontology mappings Caveat – Suggested queries and answers must be ranked using corresponding distance measures
Knowledge Discovery using GEON GEONSearch RD1V1 DD2V4 Query/Map Integration Tools DD6V1 DD7V1 Integration Cart (MyGEON) GEON Network Registered Datasets RD1V1 RD2V2 Derived Datasets DD2V2 DD2V4 MetadataMetadata O n t o l o g y Need Gravity data around Rocky Mountains …… How to share result with other scientists ! Register DD6V1 DD7V1
Supporting Versions in GEON GEON network can be visualized as a virtual scientific database that stores results of scientific inquiry – Every dataset available reflects a scientific process E.g. Dataset D1 contains gravity measurements around Davis Datasets may change over time – Parameters of a scientific process can be changed resulting in additional results – Updating a registered dataset could affect outcome of other scientific inquiries E.g. Updating D1 with new data may make results of processes using D1 irreproducible
GEON Versioning System Focus – Revision management for locally-hosted data ASCII, Excel, ESRI Shapefiles, GeoTIFF, OWL files etc Operations supported – Revisions – Branching Assumptions – Dataset provider decides between Branch or Revision – No support for eventual merging
Provenance Management Provenance Management becomes necessary in presence of versioning The provenance of a piece of data is data about process involved in generating the data. – Who, What, Where, How, Assumptions transformation a1a1 a1a1 a2a2 a2a2 Useful for verifying quality of datasets recommended by GEONSearch a1a1 a2a2
Summary GEONSearch is a necessary tool for knowledge discovery using GEON – Allows both simple Keyword Search and advanced Search GEONSearch will soon be enhanced to provide “related content” and thus improve the “Search Process” GEON Versioning System under development for supporting version management and provenance tracking
Thank You! Contact Info: Search for “Ullas Nambiar” in your favorite Search Engine Feedback is very welcome: Questions Suggestions Specific Use Cases