Presentation is loading. Please wait.

Presentation is loading. Please wait.

Harvard University, Boston, MA

Similar presentations


Presentation on theme: "Harvard University, Boston, MA"— Presentation transcript:

1 Harvard University, Boston, MA
Going Under the Hood of Profiles Research Networking Software (RNS) Griffin Weber, MD, PhD Harvard University, Boston, MA Eric Meeks UCSF, San Francisco, CA

2 Searching and Extracting Data

3 Methods for Extracting Data
RDF Crawl Search API SPARQL API Beta API

4 Method #1 RDF Crawl

5 Accessing Data: Method #1 – RDF Crawl
Semantic Web: Every faculty HTML page has a corresponding RDF document file (VIVO ontology) The URI is the same for both The Request Type determines what is returned RDF enables Profiles to link to other Semantic Web applications text/html application/rdf+xml

6 RDF Triples Subject Predicate Object “Griffin” Subject Predicate Object “Halamka” Subject Predicate Object

7 RDF – Database Representation
Table: [RDF.].[Node] NodeID Value 1 2 3 “Griffin” 4 5 6 “Halamka” 7 Table: [RDF.].[Triple] TripleID Subject Predicate Object 1 2 3 4 5 6 7

8 RDF – XML Representation
<rdf:RDF xmlns:prns=" xmlns:rdf=" xmlns:foaf=" <rdf:Description rdf:about=" <foaf:firstName>Griffin</foaf:firstName> <prns:similarTo rdf:resource=" /> </rdf:Description> <rdf:Description rdf:about=" <foaf:lastName>Halamka</foaf:lastName> </rdf:RDF>

9 RDF – Namespaces Full URI Namespaces Tag name with namespace prefix
Namespaces foaf = vivo = Tag name with namespace prefix foaf:Person vivo:personInPosition

10 RDF – Classes vs Properties
DataType Properties ObjectType Properties

11 VIVO Ontology: Selected Classes and Object Properties
Main classes used by Profiles RNS are highlighted in bold. vivo:Address vivo:mailingAddress vivo:mailingAddressFor vivo:webpage vivo:webpageOf foaf:Agent vivo:URLLink foaf:Person prns:FacultyRank prns:hasFacultyRank prns:hasPersonFilter prns:PersonFilter vivo:FacultyMember vivo:authorInAuthorship vivo:linkedAuthor vivo:awardOrHonor vivo:awardOrHonorFor vivo:hasMemberRole vivo:memberRoleOf vivo:hasTeacherRole vivo:teacherRoleOf vivo:personInPosition vivo:positionForPerson vivo:Relationship vivo:AwardReceipt vivo:Role vivo:Position vivo:Authorship vivo:TeacherRole vivo:FacultyPosition vivo:hasResearchArea vivo:researchAreaOf vivo:MemberRole vivo:NonAcademicPosition vivo:linkedInformationResource vivo:informationResourceInAuthorsip vivo:InformationResource skos:Concept vivo:roleRealizedIn vivo:realizedRole vivo:roleRealizedIn vivo:realizedRole vivo:positionInOrganization vivo:organizationForPosition bibo:Document vivo:Project event:Event foaf:Agent bibo:Article vivo:subjectAreaFor vivo:hasSubjectArea vivo:Course foaf:Organization bibo:AcademicArticle core:Department event = foaf = prns= skos = vivo = bibo:Book core:Division bibo:Patent

12 [RDF.].[GetDataRDF] @subject
Determine the class (rdf:type) Lookup which properties to return for that class using [Ontology.].[ClassProperty] Get all triples with those properties for Lookup which properties to “expand” Repeat steps 1-5 for the objects of those properties until there is nothing left to expand Lookup which properties should include “network statistics”, and calculate on-the-fly Generate RDF/XML for all triples

13 [Ontology.].[ClassProperty] (Class = foaf:Person)
IsDetail Limit Include Descr- iption Include Network View Security Group foaf:firstName NULL -1 foaf:lastName vivo: -20 vivo:overview 1 vivo:authorInAuthorship vivo:hasResearchArea 5

14 Security Groups Security Group ID Label Description -50 Admins -40
Limited to a restricted set of site administrators with special access permissions to configure the website. -40 Curators Limited to a small number of users whose job is to manage content on the website. -30 Harvesters Limited to authorized automated processes that synch data between this website and other systems. -20 Users Limited to people who have logged into website. -10 No Search Open to the general public, but blocked to certain (but not all) search engines such as Google. -1 Public Open to the general public and may be indexed by search engines. Undefined Cannot be accessed by any users.

15 Method #2 Search API

16 Accessing Data: Method #2 – Search API
Post an XML request message to the API Returns RDF of matching items (e.g., people, pubs, courses) <SearchOptions> <MatchOptions> <SearchString>decision making</SearchString> </MatchOptions> <OutputOptions> <Offset>100</Offset> <Limit>25</Limit> <SortByList> <SortBy Property=" /> </SortByList> </OutputOptions> </SearchOptions>

17 Profiles RNS Search Searches all content (people, publications, concepts, etc.) Uses stemming and thesaurus for term expansion Uses ontology for search filters, faceting, expanding and ranking of search results Search matches literals, not profiles

18 Example #1 overview My research is in social network analysis and bibliometrics. (Person)

19 Search for “Bibliometrics”
overview My research is in social network analysis and bibliometrics. (Person) What is the chance the person is an expert in “bibliometrics”?

20 Search Relevance Score
Text Weight (node) Probability that a literal node L is relevant to the search phrase S Connection Weight (triple) Probability that node N is connected to node L through property P Search Weight (property) Probability that N is relevant to the search phrase assuming N is connected to L through P and L is relevant to the search phrase Relevance Score Search Weight * Connection Weight * Text Weight

21 How relevant is “bibliometrics” to this literal?
Text Weight overview My research is in social network analysis and bibliometrics. (Person) 0.2 How relevant is “bibliometrics” to this literal?

22 Is this really this person’s overview?
Connection Weight overview My research is in social network analysis and bibliometrics. (Person) 1.0 Is this really this person’s overview?

23 Search Weight Is the person really an expert in the
overview My research is in social network analysis and bibliometrics. (Person) 0.5 Is the person really an expert in the topics mentioned in her overview?

24 Relevance Score overview My research is in social network analysis and bibliometrics. (Person) 0.5 * 1.0 * 0.2 = 0.1 There is a 10% chance the person is an expert in “bibliometrics” based only on this overview

25 Example #2 What is the chance the person
My research is in social network analysis and bibliometrics. overview (Person) researchArea Bibliometric Analysis What is the chance the person is an expert in “bibliometrics”?

26 How relevant is “bibliometrics” to these literals?
Text Weight My research is in social network analysis and bibliometrics. overview 0.2 (Person) researchArea Bibliometric Analysis 0.5 How relevant is “bibliometrics” to these literals?

27 Connection Weight Is this really the person’s overview, and
My research is in social network analysis and bibliometrics. overview 1.0 (Person) researchArea 0.3 Bibliometric Analysis Is this really the person’s overview, and is this really the person’s research area?

28 Search Weight My research is in social network analysis and bibliometrics. overview 0.5 (Person) researchArea 0.8 Bibliometric Analysis Is this person an expert in the topics in her overview, and in the areas she actually publishes about?

29 and 12% only on the researchArea
Relevance Score My research is in social network analysis and bibliometrics. overview 0.5 * 1.0 * 0.2 = 0.10 (Person) researchArea 0.8 * 0.3 * 0.5 = 0.12 Bibliometric Analysis This person is an expert in “bibliometrics” with probabilities of 10% based only on the overview and 12% only on the researchArea

30 P(Expert) = 1 - P(Not an Expert) = 1 - (1 - 0.1) * (1 - 0.12) = 0.208
Relevance Score My research is in social network analysis and bibliometrics. overview 0.5 * 1.0 * 0.2 = 0.10 (Person) researchArea 0.8 * 0.3 * 0.5 = 0.12 Bibliometric Analysis P(Expert) = 1 - P(Not an Expert) = 1 - ( ) * ( ) = There is a 20.8% chance the person is an expert based on both the overview and the researchArea

31 Example #3 – Find “Weber”
“Griffin” firstName “Weber” lastName similarTo label (Person) “Weber, Smith” (Person) author linkedIR label “Weber G (1):147” (Authorship) (Article) subject label “Sturge-Weber Syndrome” (Concept)

32 Example #3 – Find “Weber”
“Griffin” firstName “Weber” lastName similarTo label (Person) “Weber, Smith” (Person) author linkedIR label “Weber G (1):147” (Authorship) (Article) subject label “Sturge-Weber Syndrome” (Concept)

33 Text Weight “Griffin” firstName “Weber” lastName 1.0 similarTo label
firstName “Weber” lastName 1.0 similarTo label (Person) “Weber, Smith” 0.5 (Person) author linkedIR label “Weber G (1):147” 0.25 (Authorship) (Article) subject label “Sturge-Weber Syndrome” 0.33 (Concept)

34 Connection Weight “Griffin” firstName 1.0 “Weber” lastName 1.0
similarTo label (Person) “Weber, Smith” 0.3 1.0 (Person) author 0.4 linkedIR label “Weber G (1):147” 0.5 1.0 (Authorship) (Article) subject 0.5 label “Sturge-Weber Syndrome” 1.0 (Concept)

35 Search Weight “Griffin” firstName 0.5 “Weber” lastName 1.0 similarTo
label (Person) “Weber, Smith” 1.0 (Person) author 0.01 linkedIR label “Weber G (1):147” 1.0 1.0 (Authorship) (Article) subject 0.5 label “Sturge-Weber Syndrome” 1.0 (Concept)

36 Relevance Score “Griffin” firstName 0.5*1.0*0 “Weber” lastName
1.0*1.0*1.0 similarTo label (Person) “Weber, Smith” 0*0.3* 1.0*1.0*0.5 (Person) author 0.4*0.01* linkedIR label “Weber G (1):147” 1.0*0.5* 1.0*1.0*0.25 (Authorship) (Article) subject 0.5*0.5* label “Sturge-Weber Syndrome” 1.0*1.0*0.33 (Concept)

37 Relevance Score “Griffin” firstName “Weber” lastName 1.0 0.5 similarTo
label (Person) “Weber, Smith” (Person) author 0.31 linkedIR label “Weber G (1):147” 0.16 (Authorship) (Article) subject label 0.33 “Sturge-Weber Syndrome” (Concept)

38 Search Phrase Parsing treatments for lung cancer Compare to thesaurus:
Select best parsing treatments for lung cancer  1 Cancer Neoplasm 2 Cancer of the Lung Lung Cancer Lung Neoplasm

39 Search Phrase Parsing treatments for lung cancer
Remove stop words not in recognized phrases treatments lung cancer Stemming for words not in recognized phrases treatment* lung cancer Expand using thesaurus “treatment*” AND (“cancer of the lung” OR “lung cancer” OR “lung neoplasm”)

40 Search Options Pagination Filter by class Filter by property
Offset, Limit Filter by class Example: only return people Filter by property Example: “cancer” and lastName = “Smith” Example: “cancer” and NOT facultyRank = “Full Prof.” Sort by property Example: sort by lastName, firstName, middleName Default: relevance score, label

41 Search Request XML <SearchOptions> <MatchOptions> <SearchString>treatments for lung cancer</SearchString> <ClassURI> <SearchFiltersList> <SearchFilter Property=" </SearchFiltersList> </MatchOptions> <OutputOptions> <Offset>0</Offset> <Limit>25</Limit> <SortByList> <SortBy IsDesc=“0" Property=" /> <SortBy IsDesc="0" Property=" /> </SortByList> </OutputOptions> </SearchOptions>

42 [Search.].[GetNodes] Process keyword string by removing stop words, stemming, and [Utility.NLP].[Thesaurus] Use fulltext search to find nodes whose value matches search phrases. Weight = Rank*0.001 Find all nodes connected to #2 via triples. Weight = #2 weight * predicate weight (from [Ontology.].[ClassProperty]) Repeat until no more new nodes Select only nodes of class foaf:Person Get RDF for #5

43 [Ontology.].[ClassProperty] (Class = foaf:Person)
SearchWeight ViewSecurityGroup vivo: 0.1 -20 foaf:firstName -1 foaf:lastName 0.5 vivo:overview vivo:personInPosition vivo:authorInAuthorship vivo:hasResearchArea 0.25 prns:numberOfPublications

44 Method #3 SPARQL API

45 Accessing Data: Method #3 – SPARQL Query
SPARQL Protocol and RDF Query Language Ad-hoc queries for RDF (like SQL for relational data) Slower performance, but most flexible Example: Select the phone number of the person whose first name is “Griffin” and whose last name is “Weber”. PREFIX core: < PREFIX foaf: < SELECT ?phone WHERE { ?person foaf:firstName "Griffin" . ?person foaf:lastName "Weber" . ?person vivo:phoneNumber ?phone }

46 Method #4 Beta API

47 Accessing Data: Method #4 – Beta API
Provides backwards compatibility for Profiles RNS Beta XML request and XML response (no RDF) Does not search or return all content No longer supported People still like it because it is simple


Download ppt "Harvard University, Boston, MA"

Similar presentations


Ads by Google