Entity Summaries Jing Jiang and Xu Lin BeeSpace Programmers’ Meeting Sept. 6, 2006.

Slides:



Advertisements
Similar presentations
Annotation of Gene Function …and how thats useful to you.
Advertisements

Epigenetic phenomena Epigenetics refers to genetic inheritance that is not coded by the DNA sequence It includes changes in gene expression due to modification.
Problem Results: Question: 1. You screen two libraries- cDNA; genomic
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
The problem How to integrate the massive amounts of data on Drosophila neurobiology to explore anatomy, formulate hypotheses and find reagents?
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extraction Processes and Data Integration Methods William W. Cohen, Sunita Sarawagi.
Gene Ontology John Pinney
Genetic Analysis of Lac Operon Make partial diploids to do complementation tests: 1 copy of lac operon on E. coli chromosome. 2nd copy of lac operon on.
Unit 2: Genetics. Genes: the blueprint for proteins Genetics: the study of how inheritable characteristics are passed on from generation to generation.
Role of Clock Gene period in Starvation Resistance
2 March, 2005 Chapter 12 Mutational dissection Normal gene Altered gene with altered phenotype mutagenesis.
Gene Regulation in Eukaryotes Same basic idea, but more intricate than in prokaryotes Why? 1.Genes have to respond to both environmental and physiological.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Sexual and Asexual Reproduction
Mining the Medical Literature Chirag Bhatt October 14 th, 2004.
Protein-Protein Interaction Screens. Bacterial Two-Hybrid System selectable marker RNA polymerase DNA binding protein bait target sequence target.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Urbana, IL| MAY 22, 2009 Anatomical Localization BeeSpace 5 th Annual Workshop Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Unit 3: Reproduction, Heredity and Evolution
Getting the most out of FlyBase. Tools –QuickSearch – Controlled Vocabularies, Term Reports and TermLink –QueryBuilder.
Apetala1 Mutant.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Accomplishments and Challenges in Literature Data Mining for Biology L. Hirschman et al. Presented by Jing Jiang CS491CXZ Spring, 2004.
Screening a Library Plate out library on nutrient agar in petri dishes. Up to 50,000 plaques or colonies per plate.
Outline Quick review of GS Current problems with GS Our solutions Future work Discussion …
A. DNA (deoxyribonucleic acid)  A set of “blueprints” for the organism  Every cell in the body has the exact same DNA copies (except gametes – ½ the.
Automatically Generating Gene Summaries from Biomedical Literature (To appear in Proceedings of PSB 2006) X. LING, J. JIANG, X. He, Q.~Z. MEI, C.~X. ZHAI,
Research Papers.. Biology 423L Research Paper: Genetics behind cloning of a human gene: Outline due Oct. 23.
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
Life Science “The Molecular Basis of Heredity”. Amino Acid Any of the organic acids that are the chief component of proteins, either manufactured by cells.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
COLING 2012 Extracting and Normalizing Entity-Actions from Users’ comments Swapna Gottipati, Jing Jiang School of Information Systems, Singapore Management.
Genes Traffic lights quiz Hold up the coloured card that matches the correct answer you see on the screen.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
MiRNA Reading: Lecture notes.
Transcription control elements (DNA sequences) are binding sites for transcription factors, proteins that regulate transcription from an associated.
 The reproductive structures of plants called angiosperms.
Domain Adaptation for Biomedical Information Extraction Jing Jiang BeeSpace Seminar Oct 17, 2007.
Integration of chemical-genetic & genetic interaction data links bioactive compounds to cellular target pathways Parsons et al Nature Biotechnology.
Mapping and cloning Human Genes. Finding a gene based on phenotype ’s of DNA markers mapped onto each chromosome – high density linkage map. 2.
The “ABC’s” of Floral Madness Architecture of a Prototypical Problem Space John Greenler and Doug Green.
1 Genomics Advances in 1990 ’ s Gene –Expressed sequence tag (EST) –Sequence database Information –Public accessible –Browser-based, user-friendly bioinformatics.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
The two-hybrid system – why?
In pea plants, the tall-stem allele and the short-stem allele are different forms of the same ____________________. gene.
BIOBASE Training TRANSFAC ® Containing data on eukaryotic transcription factors, their experimentally-proven binding sites, and regulated genes ExPlain™
GENOME ORGANIZATION AS REVEALED BY GENOME MAPPING WHY MAP GENOMES? HOW TO MAP GENOMES?
(Draw and label a picture of a neurone here)
Chromosomal Basis of Inheritance Lecture 13 Fall 2008
Warm-Up The Q gene encodes a protein responsible for arm length: people with one or more of the Q allele have super-long arms (like, 6 feet long), and.
Meiosis and Punnett Square Notes
Genomes and Their Evolution
Peter John M.Phil, PhD Atta-ur-Rahman School of Applied Biosciences (ASAB) National University of Sciences & Technology (NUST)
Chromosomes and Genes.
Genetics Definitions Definition Key Word
חיזוי ואפיון אתרי קישור של חלבון לדנ"א מתוך הרצף
Basics of Genetic Algorithms (MidTerm – only in RED material)
AQA GCSE INHERITANCE, VARIATION AND EVOLUTION PART 2
The student is expected to: 6A identify components of DNA, and describe how information for specifying the traits of an organism is carried in the DNA.
DNA, protein synthesis, gene expression & mutations
EDEXCEL GCSE BIOLOGY GENETICS Part 2
Basics of Genetic Algorithms
Material for Quiz 5 from Chapter 8
BIOBASE Training TRANSFAC® ExPlain™
590 Web Scraping – Test 2 Review
Strategies for Engineering Natural Product Biosynthesis in Fungi
Presentation transcript:

Entity Summaries Jing Jiang and Xu Lin BeeSpace Programmers’ Meeting Sept. 6, 2006

A quick review of the NER component Use two types of information to make a prediction –Word features and word surface features E.g. p53, XXXless –Contextual features E.g. XXX expression, XXX mutants Prediction of the same word/phrase is context-sensitive

Examples of Some Ambiguous Gene Names foraging –We assayed response decrement for natural and mutant rover and sitter alleles of the foraging (for) gene that encodes a Drosophila PKG. (FN) –Hybrid disadvantage in the larval foraging behaviour of the two neotropical species of Drosophila pavani and Drosophila gaucha… (TN)

Examples of Some Ambiguous Gene Names ss –…SmZF1 binds both ds and ss DNA oligonucleotides,… (TN) –Coexpression of Ss and Tgo in Drosophila SL2 cells… (TP) –The origin of germline-limited chromosomes (Ks) as descendants of somatic chromosomes (Ss) and their… (FP)

Examples of Some Ambiguous Gene Names black –The purpose of this study was to investigate the black gene, and protein,… (FN) –…beta-alanine biosynthesis is regulated by black. (FN) –Screening a cDNA library prepared from silk- producing glands of the black widow spider,… (TN)

Examples of Some Ambiguous Gene Names clock –…a novel fitness-related phenotype may be linked to noncircadian expression of clock genes in the ovaries. (TP) –…mPer1 could operate in the adaptation of the circadian clock of nocturnal mice to… (TN)

Examples of Some Ambiguous Gene Names ERG –To establish the predicted existence of a Drosophila gene in the erg subfamily and… (FN) –The ERG analysis of the norpA mutants suggests that… (TN) –Here we show that the electroretinogram (ERG), the extracellular recording…(FP)

Examples of Some Ambiguous Gene Names pdf –PDF is coded in a precursor protein together with another neuropeptide… (TP) –…the Drosophila brain that express the period (per) and pigment dispersing factor (pdf) genes play… (TP)

Old System with Keyword Search Use synonym list from FlyBase, to increase recall String match in the whole abstract

New System with NER Will be replaced by an automated normalizer if there is String match only in the tagged gene mentions, to increase precision

Changes with integration of NER Recall –Tokenization –Keyword match (whole abstract => gene mentions) –Synonym list Precision –Exact match => exact match but allowing crossing tag boundary

Example Query: ABC-a Without NER: match all 4 cases With NER: not match the second case xxxxxxxxxxxxxx ABC a gene xxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxx ABC a xxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxx ABC a encodes xxxxxxxxxxxx xxx gene ABC a xxxxxxxxxxxxxxx

Effects of NER on Gene Summarizer FP  TN (increase precision) –…mPer1 could operate in the adaptation of the circadian clock of nocturnal mice to… (TN) TP  FN (decrease recall) –…beta-alanine biosynthesis is regulated by black. (FN) FP  FP (no effect, but not what we want) –Here we show that the electroretinogram (ERG), the extracellular recording…(FP)

A Different Approach Motivation: to solve a simpler problem than NER because we already know the gene name and its synonyms Approach: build a classifier that focuses on contextual features to identify FPs –Only use contextual features because the term/phrase already matches a gene name –Need some “good” negative examples (ambiguous gene names) in the training data