Presentation is loading. Please wait.

Presentation is loading. Please wait.

EXTENDING GENE ANNOTATION WITH GENE EXPRESSION

Similar presentations


Presentation on theme: "EXTENDING GENE ANNOTATION WITH GENE EXPRESSION"— Presentation transcript:

1 EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
Chris Stoeckert, Penn Center for Bioinformatics

2 Extending Gene Annotation with Gene Expression Patterns
Genomic Sequence ESTs Clones, tags, cDNA, oligos BLAST arrays, SAGE, differential display Genes What is it? What does it do? Sequence similarity How does it do it? Where/When does it do it? Gene expression What happens when it doesn’t do it right? } }

3 Data Flow for Annotation by Gene Expression
ESTs Clones, tags, cDNA, oligos arrays, SAGE, differential display BLAST Database of Transcribed Sequences RNA Abundance Database Pattern Generator and Analysis What is the gene? Where is it expressed? Look for co-regulation. Look for networks. Sequence Analysis

4 RAD Schema Enhances Annotation by Providing Better Understanding of Data
Data verification consistency within an experiment Reproducibility consistency between experiments Comparison between platforms can track data for same mRNA Integration with other resources DOTS (gene info), Anatomy (sample info)

5 RAD Schema: Three sets of Tables Provide Flexibility
Experiments Data Platforms Anat_rel Others... SpotResult ExperimentCondition StanfordSpotFamily SpotFamilyResult Experiment GenomeSystems SpotFamily ExperimentyResult SynteniSpotFamily

6 RAD:DOTS Interface RNA Abundance Database
Database of Transcribed Sequences Anatomy Cellular role Clones/PCR filter/array Experiment probe other parameters hyb/wash conditions Results signal/ % background adjustments SWISS-PROT neighbors protein EST clusters RNA Genomic sequence DNA Regulatory elements

7 RAD Web Interface choose through DOTS

8 RAD Web Interface example of retrieving by cell role or library

9 Expression Pattern Algorithm
Input: files with identifier and value (e.g., IMAGE clone ID, percent of total signal) for each experiment. tolerated variance between replicates. Dynamic range (e.g., top 15% of signals are meaningful) dependencies between experiments Output: expression patterns based on ratios between experiments (list of bins) variance between replicates genes above background distribution of ratios for use in statistical analysis

10 Expression Pattern Algorithm
1. Determine minimum useful value for each group of replicate experiments based on specified dynamic range. Raise all values below the minimum useful value to equal this value. 2. Determine the ratio (cutratio) that contains specified percentage of ratios between replicates. Default = 2. 3. Take ratios between average values for each group of replicates. Use the median value if group or use specified reference group in denominator. 4. Bin the ratios. Use powers of the cutratio and the range of ratios to generate the cut-off ratios for each bin. Generate a second set of bins offset from the first to capture ratios which straddle the first set of bins.

11 Statistical Analysis of Patterns
Models the ratios as a multinomial experiment Null hypothesis is independence between genes, given a model describing dependencies between experiments. (independent, reference, conditional, 1st order Markov) The number of expected patterns are calculated based on the distribution of ratios and the experiment model. The likelihood of the observed number of patterns can be calculated (or simulated) using the number of expected patterns. Simulators throw weighted die to generate patterns for each gene to obtain the number of genes in a specified pattern. A score is generated using this expected number and the actual numbers of genes in a pattern.

12 Sample output of pattern program

13 Extended Annotation from Expression Patterns
RAD: Pattern: DOTS: GenBank: (GAIA) TESS: Extract EST data in experiment groups. Find ESTs with same pattern. Map ESTs to transcribed sequences. Get promoters. (genomic sequence upstream of transcribed sequences) Look for shared transciption factor binding sites.

14 Extended Annotation from Expression Patterns
Comparison of array data stored in RAD from: HEL, HEL+hemin, CD34, erythroblasts Significant cluster of clones down-regulated in erythroblasts: 10/24 clones coded for ribosomal proteins according to DOTS. Obtained promoters for 4 of these ribosomal proteins from GenBank. Used TESS to find shared transcription factor binding sites. All contain sites for PU.1 which is antagonistic to red cell growth. Result is subset of ribosomal proteins which are co-regulated and potential mechanism for co-regulation.

15 Summary genomic sequence ESTs candidate genes DOTS: integrated info
RAD: facilitate analysis Pattern: co-regulation candidate function candidate genes with related role

16 Acknowledgements


Download ppt "EXTENDING GENE ANNOTATION WITH GENE EXPRESSION"

Similar presentations


Ads by Google