Presentation is loading. Please wait.

Presentation is loading. Please wait.

ONTOLOGY-DRIVEN DISCOVERY OF SCIENTIFIC COMPUTATIONAL ENTITIES Pearl Brazier Department of Computer Science University of Texas-Pan American November 2,

Similar presentations


Presentation on theme: "ONTOLOGY-DRIVEN DISCOVERY OF SCIENTIFIC COMPUTATIONAL ENTITIES Pearl Brazier Department of Computer Science University of Texas-Pan American November 2,"— Presentation transcript:

1 ONTOLOGY-DRIVEN DISCOVERY OF SCIENTIFIC COMPUTATIONAL ENTITIES Pearl Brazier Department of Computer Science University of Texas-Pan American November 2, 2010

2 2 Outline Motivation Research Goals and Objectives Significance of Contribution Background Information and Context Research Efforts GEO-SEED Architecture Scientific Computational Entity Discovery Ontology RDF Repository Usability and Performance Studies Conclusions and Future Work

3 November 2, 20103 Motivation: Geosciences Web Services Web contains many scientific resources Scientific data (sharing datasets, experimental results) Geosciences web services metadata Resources are currently shared via publication human contact web portals Metadata annotations needed to assist collaboration allow machine processing

4 November 2, 20104 Research Goal To investigate an ontology ‐ driven discovery approach that can be distributed on the Web and that can support the elicitation, documentation, and registration of computational entities and other resources

5 BACKGROUND INFORMATION AND CONTEXT November 2, 2010 5

6 Cyberinfrastructure/e-science Supports building new types of scientific and engineering knowledge environments and organizations Supports modern in-silico experiments that can lead to important scientific discoveries through scientific data repositories, semantic mediation services, and scientific workflows Describes computationally intensive science, which is carried out in highly distributed network environments, or science that uses immense data sets November 2, 2010 6

7 Web Technologies-1 Web 2.0 Technologies Includes social networks and Wiki technologies Used by humans Semantic Web Allows machines to understand meaning of information on the Web Used by machines and automated agents Supports core standards such as RDF, SPARQL, OWL - 62 Wine Ontology Sample ontology used in the OWL specification documents. http://www.w3.org/TR/2003/ CR-owl-guide- 20030818/wine# November 2, 2010 7

8 Web Technologies-2 Ontologies Captures concepts and relationships among them Provides standard vocabulary and classifications Web Services Metadata WSDL WSDL-S OWL-S and SWSO November 2, 2010 8

9 RESEARCH OBJECTIVES AND ACTIVITIES 9

10 10 Objective 1 Define an ontology for scientific computational entities that supports the development of a repository and a system that can retrieve computational entities. Activities: Define use cases for the ontology. Determine the essential elements of an ontology that documents the features and relationships used to identify computational entities and distinguish one from another.

11 November 2, 2010 11 Objective 2 Define an architecture that supports an ontology-driven approach. Activities Investigate efficient approaches for storing information. Investigate the relationships of registration, annotation, and knowledge extraction.

12 12 Objective 3 Evaluate the usability of a system based on the ontology-driven discovery approach. Activities Design and implement a prototype system based on the ontology-driven discovery approach. Conduct a usability study of the prototype system with computer scientists and geoscientists (novices and experts).

13 13 Objective 4 Evaluate the performance of a system based on the ontology- driven approach. Activities Design the schema and implement a relational RDF repository that supports efficient storage and querying of documented scientific computational entities based on the ontology-driven discovery approach. Run a simulation to analyze the performance of the system.

14 Research Contributions and Significance  Designed new Scientific Computational Entity Discovery Ontology  More comprehensive and domain specific than existing discovery ontologies, enabling the scientist to more easily share their computational entities  Created a novel design for organizing the RDF data  Uses SPARQL queries for the RDF representation  Supports more efficient query evaluation  Developed GEO-SEED wiki using Web 2.0 and Semantic Web Technologies  Supports discovery and sharing of scientific computational entities 14

15 RESEARCH EFFORTS: GEO-SEED ARCHITECTURE AND ONTOLOGY 15

16 16

17 GEO-SEED ARCHITECTURE 17

18 GEO-SEED Scientific Computational Entity Discovery Ontology 18

19 Computational Entity Profiles PROFILE NAMEPURPOSE GeneralProfileBasic Identifying Information QoSProfileQuality-of-service characteristics InvocationProfileDetails needed to execute an entity DeploymentProfileDetails needed to download and deploy an entry from the Web ImplementationProfileInformation related to the implementation of a computational entity GeoscienceProfileInformation obtained with a domain specific ontology 19

20 General Profile Descriptors Descriptor NameRelease Date AuthorsLanguages Contact InformationSubject ContributorsType DescriptionCost URIsLicense Unique IdentifierSupport Version 20

21 QoS Profile Descriptors Descriptor TrustSecurity ReliabilityKnown Failures AvailabilityOverall Rating Processing TimeUser Reviews Requests per Second Custom Metric 21

22 Deployment Profile Descriptors Descriptor URLsHardware Software Architecture Installation Operating Systems Activation and Deactivation Software Dependencies 22

23 Invocation Profile Descriptors Descriptor Processes and Operations Effects TypesBindings Parameters and MessagesWSDL Document PreconditionsOWL-S Document 23

24 Implementation Descriptors Descriptor Source Repositories Algorithms Software Development Environment (SDE) Software Components Languages and Paradigms Documentation 24

25 Geoscience Descriptors Descriptor Ontologies Domain-Ontology Dependent Annotations 25

26 Scientific Computational Entity Discovery Ontology 26

27 RESEARCH EFFORTS: RDF REPOSITORY 27

28 28 Schema Mapping Strategies Five approaches to generate database schemas: Schema-Oblivious Schema-Aware Data Driven User-Customizable Hybrid

29 29 Schema-Oblivious (Triple Table). "Pearl Brazier". "5". “0.9". “4". spo :WS1rdf:type:WebService :WS1describedBy:GP1 :WS1describedBy:QoSP1 :GP1rdf:type:GeneralProfile :QoSP1rdf:type:QoSProfile :GP1Subject:Gridding :GP1authorPearl Brazier :QoSp1trust5 :QoSP1availability0.9 :QoSP1overallRating5 Triple subject predicateobject Extracted RDF Triples

30 30 Schema-Aware (Property Table). "Pearl Brazier". "5". “0.9". “4". so :WS1:WebService :GP1:GeneralProfile :QoSP1:QoSProfile Property_type so :GP1Pearl Brazier Property_author so :WS1:GP1 :WS1:QoSP1 Property_describedBy

31 31 User Customizable (Profile Tables). "Pearl Brazier". "5". “0.9". “4". spo :GP1rdf:type:GeneralProfile :GP1subject:Gridding :GP1authorPearl Brazier GeneralProfile SPo :QoSP1rdf:type:QoSProfile :QoSp1Trust5 :QoSP1availability0.9 :QoSP1overallRating4 QoSProfile

32 SPARQL Query Retrieves the quality-of-service descriptors of a Web service :WS1: Select ?profile ?pre ?obj Where { :WS1 :describedBy ?profile. ?profile rdf:type :QoSProfile. ?profile ?pre ?obj. } 32

33 Triple Table: Two Joins Triple Triple Triple. Note: Tables can get large Property Table: Two Joins describedBy type (trust ⋃ reliability ⋃ availability ⋃ ⋯⋃ userReview) Note: Union result is not indexed Query Complexity Comparison 33

34 OR Many Joins: (describedBy type trust) ⋃ (describedBy type reliability) ⋃ (describedBy type availability) ⋃⋯⋃ (describedBy type userReview) Note: Indexed but re-computes the (describedBy type) many times Profile Table: One Join describedBy QosProfile 34

35 Empirical Comparison of the Three Approaches Created a GEO-SEED dataset that describes 10,000 web services. Defined six common queries using SPARQL Ran queries on PC with 3.00 GHz Intel Core 2 CPU, 4GB RAM, 750 GB disk space running Evaluated the execution time 35

36 36 Performance Test Queries 1.Find web services that implement a computational entity with the name “gridding” 2.Find web services, along with their user reviews and overall quality-of-service ratings, that implement a computational entity “gridding” 3.Find web services that implement a computational entity with the name “gridding” and that have trust ≥ 4 and availability ≥ 0.8 ratings 4.Retrieve a general profile of a particular Web service. 5.Retrieve a quality-of-service profile of a particular Web service 6.Retrieve quality-of-service profiles of two Web services

37 37 Performance Study Results

38 RESEARCH EFFORTS: GEO- SEED WIKI PROTOTYPE 38

39 39 Overview GEO-SEED consists of two components: Wiki and RDF repository Wiki serves as a collaborative environment for knowledge sharing of geosciences web services. Provides interface for human interaction RDF repository serves as a meta-data database readily accessible by machines and automated agents

40 CONCLUSIONS 40

41 Conclusions  GEO-SEED architecture supports a new generation Web portal  metadata repository for scientific computational entities in geosciences for sharing and discovery  Ontology-driven profiles approach supports usability for  Humans  Machines  Unique User-customizable profile table design for storing the RDF data allows efficient queries of large metadata collections 41

42 Future Work Explore user-guided metadata extraction algorithms for the Wiki Explore coupling GEO-SEED with an existing SWFMS Extend the project to support annotation and discovery of scientific workflows and datasets in geosciences Refine the prototype to address user interface issues 42

43 Thank You! Questions? November 2, 2010 Summer 2010Spring 2010 Cactus from El Paso 2005 43

44 Presentations Abraham, John, Brazier, Pearl, Chebotko, Artem, Jaime Navarro, and Piazza, Anthony, "Distributed Storage and Querying Techniques for a Semantic Web of Scientific Workflow Provenance", in Proc. of the 7th IEEE International Conference on Services Computing (SCC'10), Miami, Florida, USA, July 5-10, 2010. Acceptance rate: 18% DownloadDownload Brazier, Pearl, Chebotko, Artem, Gonzalez, Eric, Kashlev, Andrey, and Piazza, Anthony, "Supporting Geosciences Web Services Metadata Management and Discovery", in Proc. of the 7th IEEE International Conference on Services Computing (SCC'10), Miami, Florida, USA, July 5-10, 2010. Brazier, Pearl, Chebotko, Artem, Gates Ann Q., Piazza, Anthony, and Salayandia, Leonardo. (2009) “Web 2.0 and Semantic Web Portal for Annotation and Discovery of Web Services in Geosciences”, Presented and published in 2009 International Conference on Semantic Web and Web Services (SWWS 2009), Las Vegas, Nevada, July 13-16, CSREA Press, USA. Brazier, Pearl, Chebotko, Artem, Gates Ann Q., and Salayandia, Leonardo. (2009) “GEO-SEED: A Metadata Repository for Geosciences Web Service Discovery”, Presented and published IEEE 2009 Third International Workshop on Scientific Workflows (SWF 2009), Los Angeles, CA., July 6-10. August, 2010 UTEP Computer Science Dissertation Defense 44

45 August 12, 2010 UTEP Dissertation Defense 45

46 RESEARCH EFFORTS: USABILITY STUDY August 12, 2010 UTEP Dissertation Defense 46

47 Usability Study Overview 31 Invitations sent to Geology faculty, students, Computer Science faculty and students (17 Responses) Steps in study: – Register – Login – Submit a computational entity – Search for a computational entity – Add a user rating for an entity – Complete a survey rating the experience 47 UTEP Dissertation Defense August 12, 2010

48 Overall GEO-SEED would be a useful tool for sharing August 12, 2010 UTEP Dissertation Defense 48

49 Other Usability Study Results August 12, 2010 UTEP Dissertation Defense 49

50 Descriptive Statistical Analysis BINOM-DIST – Grouped responses into two group Strongly Disagree + Disagree Agree + Strongly Agree – Compared p-values for < 0.05 t Test – Used 4 groups Strongly Disagree + Disagree + Agree + Strongly Agree – Compared p-values for < 0.05 August 12, 2010 UTEP Dissertation Defense 50

51 Register log in getting started change help BINOM DIST p value t test p value t test mean Assessment experienced no difficulty registering as a GEO-SEED user 0.02 1.07Support experienced no difficulty logging in to GEO-SEED 0.040.0041.13Support clear how to get started to register[submit] my information in GEO-SEED 0.150.440.08Not Support able to add to or change existing information in GEO- SEED 0.390.230.36Not Support easy to learn to use the system0.400.240.31Not Support adequate information provided how to use the system 0.500.630.37Not Support Statistical Results of BINOM-DIST and t test (Meeting Goal)

52 August 12, 2010 UTEP Dissertation Defense 52

53 Rate features of GEO-SEED Wiki Helpfulness BINOM DIST p value t test p value t test mean Assessment successfully searched0.0040.0061.27Support able to understand the fields0.210.130.50Not Support able to find help when needed 0.130.240.40Not Support sharing processes a Geologist uses 0.020.011.00Support sharing existing datasets information 0.020.011.00Support sharing geology application software 0.020.011.00Support locating web portals a Geologist might use 0.05 0.83Support helpful for Geology students0.030.020.93Support

54 August 12, 2010 UTEP Dissertation Defense 54

55 Statistical Results of BINOM-DIST and t test (Meeting Goal) Degree of difficulty entering new Information in GEO-SEED Profiles BINOM DIST p value t test p value t test mean Assessment General Information0.290.210.23Not Support Invocation Information0.390.290.17Not Support Deployment Information0.190.130.33Not Support Quality of Service Information 0.610.500.00Not Support Implementation Information 0.610.500.00Not Support Domain Knowledge Information 0.500.400.08Not Support

56 Data set for Performance Study (GeneralProfile & QoSProfile). "gridding".. "name of a web service". "author1". "contact1", "contributor1". "description1". August, 2010 UTEP Computer Science Dissertation Defense 56

57 Dr. Pearl W. Brazier August 12, 2010 Summer 2010Spring 2010 Cactus from El Paso 2005

58 "url1". "identifier1". "version1". "releaseDate1". "language1". "cost1". "license1". "support1". "5". "1.0". "0.8". "250 ms". "40". "0". "unknown". "5". "User wrote something August, 2010 58

59 Sample Data Sets SPARQL to SQL translations for Q1-Q6 Available in Appendix E of Dissertation August, 2010 UTEP Dissertation Defense 59

60 Ontology Descriptors Sources Semantic Markup for Web Services (OWL-S) Semantic Web Services Ontology (SWSO) Web Services Description Language (WSDL) Web Service Semantics (WSDL-S) Additional Web services descriptors August, 2010 UTEP Dissertation Defense 60

61 Scuccelent Plant Ontology August, 2010


Download ppt "ONTOLOGY-DRIVEN DISCOVERY OF SCIENTIFIC COMPUTATIONAL ENTITIES Pearl Brazier Department of Computer Science University of Texas-Pan American November 2,"

Similar presentations


Ads by Google