Presentation is loading. Please wait.

Presentation is loading. Please wait.

Identifiers: what are they and why are they useful? Anita Bandrowski, UCSD.

Similar presentations


Presentation on theme: "Identifiers: what are they and why are they useful? Anita Bandrowski, UCSD."— Presentation transcript:

1 Identifiers: what are they and why are they useful? Anita Bandrowski, UCSD

2 W HEN PEOPLE RAN THE WORLD … People identify things based on location, group membership Roman Kowalski, Nowa Gora (individual) (family) (location) Graph theory points to social networks of roughly 100 relationship

3 New York phone system, as complexity grows people need to be replaced W HEN PEOPLE RAN THE WORLD …

4 A UNIVERSAL T URNG MACHINE Performing tasks retrieving thousands or millions of datum is relatively unencumbered Storing digits is ‘native’ to machine Can we use this to help humans make calls? 650 - 483 - 0698 (directional)(location)(specific)

5 W HY NOT A NAME ? What is a cell? What is a nucleus? What is a Curie? Who is Alexander? Who is Olek? One to many, many to one

6 A N ID Points uniquely to a single entity Resolves information about the entity without ambiguity In relational databases, a unique key Can we have IDs that are unique across multiple systems? URI (like a URL but resolves a single entity uniquely between systems)

7 C AN WE PUSH ENTITIES FURTHER ? Can identifiers be generated for everything? Can there be relationships between everything? Can there be a path that a computational system traverse between unambiguous terms?

8 C AN WE PUSH ENTITIES FURTHER ? Can identifiers be generated for everything? Can there be relationships between everything? Can there be a path that a computational system traverse between unambiguous terms? Cell Neuron is_a Pyramidal cell is_a

9 C AN WE PUSH ENTITIES FURTHER ? Can identifiers be generated for everything? Can there be relationships between everything? Can there be a path that a computational system traverse between unambiguous terms? Cell Neuron is_a Pyramidal cell is_a Neocortex Brain part_of

10 C AN WE PUSH ENTITIES FURTHER ? Can identifiers be generated for everything? Can there be relationships between everything? Can there be a path that a computational system traverse between unambiguous terms? Cell Neuron is_a Pyramidal cell is_a Neocortex Brain part_of Glutamate Molecul e is_a neurotransmitter_of

11 O NTOLOGY Philosophical study of the nature of being. Categories of being and their relations What are entities, how can they be grouped, related? Computer science: Formal representation of knowledge in a domain …but what do you code into an ontology?

12 A RESOURCE CATALOG : Must have a reasonable account of what is out there http://neuinfo.org Image repository Database Atlas NITRC Harvard Nencki Inst. Institution Resource Type is_a has_role

13 A RESOURCE CATALOG IS IMPORTANT, BUT Software tools appear and disappear Portals are created and change frequently Databases update data annually, monthly, weekly So how can you catalog the ephemeral?

14 H OW DO YOU KEEP A REGISTRY CURRENT ? http://neuinfo.org

15 A SHARED RESOURCE REGISTRY http://neuinfo.org NIF

16 S OCIAL N ETWORK OF R ESOURCES ? 3DVC – 182 Force11 – 88 Monarch – 88 OneMind – 609 GeneOntology Tools - 140 1 6 1 2 12 6 Which resources are shared by multiple communities ? Structured data allows us to answer questions easily http://neurolex.org/

17 C AN WE MINE RELATIONSHIPS BETWEEN RESOURCES ? http://neuinfo.org Human annotations give a different graph of relationships Text mining gives a picture of the most used resources PDB

18 … BUT DATABASES CAN CONTAIN A LOT OF DATA NOT EASILY FOUND BY SEARCHING KEYWORDS Databases continue to be opaque to search engines They defy cataloguing efforts They can update daily There are over 2500 of them Where is data relevant to me? DISCO tool suite was built to incorporate data directly from databases into a unified index in NIF.

19 >200 data sources >850M data records >6M links to Articles >200 data sources >850M data records >6M links to Articles neuinfo.org

20 D ATA ARE DIVIDED INTO TYPES http://neuinfo.org

21 U NIFORM SEARCH BASED ON ONTOLOGIES http://neuinfo.org

22 D ATA ABOUT THE SUBTHALAMUS http://neuinfo.org

23 Each resource implements a different model, which works well for the resource C ONNECTOME DATABASES http://neuinfo.org

24 U NIFORM RESOURCE LAYER ALSO MEANS UNIFORM DATA ACCESS disco.neuinfo.org Luis Marenco, Rixin Wang; Yale

25 L ET ’ S PLAY A GAME

26 W HAT IS THIS ? http://neuinfo.org

27 H OMUNCULUS http://neuinfo.org

28 H OMUNCULUS *Careful mapping of the entire somatosensory cortex yields a representation of the amount of area devoted to sensing each body region. *From the homunculus we learn that humans pay attention to the lips, hands and genitals. http://neuinfo.org

29 E ACH ANIMAL HAS A SET OF BODY REGIONS THAT IT IS PARTICULARLY CONCERNED WITH http://neuinfo.org

30 Is there a data homunculus? If so, how can we know it? http://neuinfo.org

31 T HE B RAIN AND ITS ’ DATA Ontologies provide a semantic framework for understanding data/resource landscape Data sources included in NIF -Complete list: http://disco.neuinfo.orghttp://disco.neuinfo.org -Services: http://neuinfo.org/developershttp://neuinfo.org/developers Striatum Hypothalamus Olfactory bulb Cerebral cortex Brain Brain region Data source Vadim Astakhov, Kepler Workflow Engine

32 Most popularLeast popular Brain region popularity http://neuinfo.org Brain Cerebral cortex Striatum Amygdala Thalamus Cerebellar cortex Cerebellum Hypothalamus Olfactory bulb Forebrain Nucleus accumbens Third ventricle Substantia nigra Midbrain Medulla oblongata Ventral tegmental area Pons Stria terminalis Subbrachial nucleus Commissural nucleus of vagus nerve Dorsal longitudinal fasciculus of medulla Medullary raphe nuclear complex Abducens nerve root Central tegmental tract of midbrain Spinothalamic tract of midbrain Superior cerebellar peduncle of midbrain Central tegmental tract of midbrain Medial longitudinal fasciculus of midbrain Spinothalamic tract of midbrain Superior cerebellar peduncle of midbrain White matter of the cerebellar cortex Accessory nerve root Vagus nerve root Oculomotor nerve root Trochlear nerve root Optic nerve root Olfactory nerve root

33 W HICH BRAIN REGIONS HAVE MOST ANNOTATIONS ? Sum per Level Sum for Major Brain Region http://neuinfo.org

34 S O HOW CAN WE DO BETTER AT ANNOTATING DATA ? Can Identifiers help?

35 A SYSTEM TO IDENTIFY NOT JUST WHO PRODUCED A FINDING, BUT WHAT PRODUCED IT Faulty Antibodies Continue to Enter US and European Markets, Warns Top Clinical Chemistry Researcher- Genome Web Daily, October 11, 2013 “…of the findings in the literature about neuronal NF-κB are based on data garnered with antibodies that are not selective for the NF-κB …” --Herkenham et al. “…of the findings in the literature about neuronal NF-κB are based on data garnered with antibodies that are not selective for the NF-κB …” --Herkenham et al.

36 W HAT STUDIES USED MY MONOCLONAL MOUSE ANTIBODY AGAINST ACTIN IN HUMANS ? The following antibodies were used for immunoblotting: -actin mAb (1:10,000 dilution, Sigma-Aldrich); - tubulin mAb (1:10,000, Abcam); T46 mAb (specific to tau 404–441, 1:1000, Invitrogen); Tau-5 mAb (human tau 218–225, 1:1000, BD Biosciences) (Porzig et al., 2007); AT8 mAb (phospho-tau Ser199, Ser202, and Thr205, 1:500, Innogenetics); PHF-1 mAb (phospho-tau Ser396 and Ser404, 1:250, gift from P. Davies); 12E8 mAb (phospho-tau Ser262 and Ser356, 1:1000, gift from P. Seubert); NMDA receptors 2A, 2B and 2D goat pAbs (C terminus, 1:1000, Santa Cruz Biotechnology)… mAb=monoclonal antibody

37 … SURELY THIS YOU HAVE FOUND A TERRIBLE PAPER, THIS CAN ’ T BE THE NORM

38 Hypothesis: Resources in the published literature are not uniquely identifiable Gather journal articles 5 domains: Immunology Cell biology Neuroscience Developmental biology General biology 5 domains: Immunology Cell biology Neuroscience Developmental biology General biology 3 impact factors: High Medium Low 3 impact factors: High Medium Low 84 Journals 238 papers 707 antibodies 104 cell lines 258 constructs 210 knockdown reagents 437 model organisms Vasilevsky et al, PeerJ, 2013

39 The problem is general across multiple resource types and disciplines Vasilevsky et al, Peer J 2013

40 R ESOURCE I DENTIFICATION I NITIATIVE Two pre-meetings with editors and publishers Society for Neuroscience, 2012 NIH: June, 2013 Society for Neuroscience, 2013 Designed pilot project Entities Procedure Infrastructure Established working group through FORCE11 Signed up partners Led by: Matt Brush, Nicole Vasilevsky, Anita Bandrowski And more https://www.force11.org/Resource_identification_initiative

41 P ILOT P ROJECT Authors to identify 3 types of research resources: Software /databases Antibodies Model organisms Include RRID in methods section Voluntary for authors Journals did not have to modify their submission system Journals have flexibility in implementation. Send request to author at: Submission During review After acceptance Launched February 2014: 3 month commitment and more…

42 RII P ORTAL A single portal for authors >10 databases One search interface Simple directions Big “Cite This” button Uniform format for citation Help desk for authors http://scicrun.ch/resources

43 W HAT STUDIES USED … >100 articles have appeared to date 15 journals 630 RRID’s 3 removed by typesetting 95% correct 14% false negative rate >200 antibodies were added >75 software tools/databases were added Database available at: https://www.force11.org/node/5635

44 An update of Vasilevsky et al.

45 W HAT CAN WE DO WITH AN RRID? A resolver service has been created 3rd party tools are being created to provide linkage between resources and papers Utopia prototype ScienceDirect http://scicrunch.com/resolver/RRID:nlx_144509

46 W HAT HAVE WE LEARNED ? Authors are willing to adopt new types of citations Authors were fairly accurate at performing the task RRID’s resolved by search engines without requiring specialized citation services Citation drives registration Clear role for repositories as authorities

47 H OW C AN Y OU H ELP ? Authors: Use IDs in YOUR next paper. At least 100 of your friends already have. scicrun.ch/resources Tool Makers: Register your tools! Make authors job easy. Display the proper citation ID format proudly. Reviewer: Ask authors to put identifiers in their methods, you know they will do almost anything to get you off their back. Editors: Still time to join the RII, go to Force11 to download a sample letter to authors. Publishers: Central instructions to authors have been updated at Springer and Elsevier, where are yours? abandrowski@ucsd.edu


Download ppt "Identifiers: what are they and why are they useful? Anita Bandrowski, UCSD."

Similar presentations


Ads by Google