The Web-Enabled Research Commons: Applications, Goals, and Trends Thinh Nguyen October 2009
Use Case #1 NeuroCommons Project: Science Commons project using Semantic Web to link massive amounts of data
27,266 papers 4,563 papers 41,985 papers 10,365 papers 128,437 papers
NeuronDB BAMS Literature Homologene SWAN Entrez Gene Gene Ontology Mammalian Phenotype PDSPki BrainPharm AlzGene Antibodies PubChem MESH Reactome Allen Brain Atlas credit: W3C HCLS
NeuronDB BAMS Literature Homologene SWAN Entrez Gene Gene Ontology Mammalian Phenotype PDSPki BrainPharm AlzGene Antibodies PubChem MESH Reactome Allen Brain Atlas
Web page links to making computers understand linkages (the WWW)
receptorCell membrane is located in directed, contextual links
receptorCell membrane is located in “URI” (unique names for things on the web)
receptorCell membrane is located in channelCell membrane is located in neuronCell membrane has
Cell membrane “compartment” “container” “doohickey” using the web to integrate data and databases
prefix go: prefix rdfs: df-schema#> prefix owl: < owl: prefix mesh: mmons/record/mesh/> prefix sc: prefix ro: < ro: select ?genename ?processname wheree { graph < graph { ?paper ?p mesh:D ?article sc:identified_by_pmid ?paper.dentified_by_pmid ?paper. ?gen ?gene sc:describes_gene_or_gene_product_mentioned_by ?article. } graph.org/commons/hcls/goa> { ?protei { ?protein rdfs:subClassOf ?res. ?res owl:onProperty ro:has_function. ?res owl:someValuesFrom ?res2. ?res2 owl:onProperty ro:realized_as. ?res2 owl:someValuesFrom ?process. graph ttp://purl.org/commons/hcls/2007 {{?process go:GO_ } union {?process rdfs:subClassOf go:GO_ }} ?protein rdfs:subClassOf ?parent. ?parent owl:equivalentClass ?res3. ?res3 owl:hasValue ?gene.owl:hasValue ?gene. } graph < graph { ?gene rdfs:label ?genename } graph purl.org/commons/hcls/ > { ?process rdfs:label ?processname} } Mesh: Pyramidal Neurons Pubmed: Journal Articles Entrez Gene: Genes GO: Signal Transduction better answers through better formats:
reformat what we already have reformat into a commons, not a closed system get the materials into the emerging research web
What data sharing protocol (legal and policy) best enables use of Web technology?
“Licensing” Archetypes Public Domain: No restrictions on use or distribution, no contracts, copyright waived. Community Licenses: standard “open access” licenses, a range of rights, some rights reserved, available to all Private Licenses: custom agreements, varies by institution, privately negotiated, may be offered only to some
Goals Interoperable: data from many sources can be combined without restriction Reusable: data can be repurposed into new and interesting contexts Administrative Burden: low transaction costs and administrative costs over time Legal Certainty: users can rely on legal usability of the data Community Norms: consistent with community expectations and usages
Interoperability Public Domain **** –Can be combined with other data sources with ease Community Licenses *** / ** –Depends on type of license: share-alike or copyleft are unsuitable, but attribution-only licenses are less problematic Private Licenses * / ** –Depends on restrictions, but not scalable; permutations too large
Reusable Public Domain **** –No restrictions on subsequent use Community Licenses *** –Depends on license, but some licenses such as NC / ND can be restrictive Private Licenses ** –Depends on license, but typically restrictive
Administrative Burden Public Domain **** –No paperwork or legal review needed Community License *** –Little paperwork, but some legal review needed (attribution stacking issues) Private Licenses * –Large amounts of paperwork, frequent legal review needed
Legal Certainty Public Domain **** / *** –Clear rights; generally irrevocable; (copyright should be addressed) Community Licenses *** –Generally credible, good track record with open access and open source licenses Private Licenses ** –Must be considered individually; few private licenses tested by time
Community Norms Public Domain *** –Traditional method for scientific data sharing (citation) Community Licenses *** –Relatively new, but familiar to computer scientists and open source community (attribution) Private Licenses ** –tendency to emphasize private / individual interests rather than community norms
Overall Grade Public Domain *** –Easiest and least restrictive form of sharing Community Licenses ** –Can be used to implement community expectations, but can be burdensome / restrictive Private Licenses * –High transaction costs, burdensome, unpredictable
Convergence
CC0 Released by Creative Commons in 2009 Result of a 3-year policy exploration process Not a license but a waiver of copyright
Why is it needed “Borderline” copyright European sui generis database rights Varying legal standards for copyright protection in different countries
CC0 [deed]
CC0 Waiver of copyright Waiver of sui generis database rights Waiver of “neighboring rights” Does not affect trademarks or patents Only affects rights of person making assertion
Use Case #2 Coordination and Sustainability of International Mouse Informatics Resources (CASIMIR) (EU Project) Commentary in Letter to Nature (Sept 2009) recommends PD and use of CC0 for sharing mouse genomic data Recommendations endorsed by scientists, NIH representatives, Jackson Labs, and editors of top scientific journals
Use Case #3 Personal Genome Project - personalized medicine project from George Church lab Adopted CC0 to release sequence and medical data collected from volunteers
Summary Solving some bioinformatics problems require ability to integrate massive quantities of data from diverse sources Public Domain sharing best fits this need CC0 waiver can be used to enrich public domain and provide clarity
Thank You Thinh Nguyen On the Web: