Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri.

Slides:



Advertisements
Similar presentations
Recombinant DNA Technology
Advertisements

Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Modeling Functional Genomics Datasets CVM Lesson 3 13 June 2007Fiona McCarthy.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Gene Expression Chapter 9.
Gene expression analysis summary Where are we now?
Alignment of mRNAs to genomic DNA Sequence Martin Berglund Khanh Huy Bui Md. Asaduzzaman Jean-Luc Leblond.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
Microarrays: Theory and Application By Rich Jenkins MS Student of Zoo4670/5670 Year 2004.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Modeling Functional Genomics Datasets CVM Lessons 4&5 10 July 2007Bindu Nanduri.
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
Fine Structure and Analysis of Eukaryotic Genes
HC70AL Spring 2009 Gene Discovery Laboratory RNA and Tools For Studying Differential Gene Expression During Seed Development 4/20/09tratorp.
Ch10. Intermolecular Interactions and Biological Pathways
‘Omics’ - Analysis of high dimensional Data
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
CO 10.
GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.
Gene expression and DNA microarrays Old methods. New methods based on genome sequence. –DNA Microarrays Reading assignment - handout –Chapter ,
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
Finish up array applications Move on to proteomics Protein microarrays.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Literature reviews revised is due4/11 (Friday) turn in together: revised paper (with bibliography) and peer review and 1st draft.
Network & Systems Modeling 29 June 2009 NCSU GO Workshop.
Organizing information in the post-genomic era The rise of bioinformatics.
Chapter 21 Eukaryotic Genome Sequences
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Protein and RNA Families
By Melissa Rivera.  GENE CLONING: production of multiple identical copies of DNA  It was developed so scientists could work directly with specific genes.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
EB3233 Bioinformatics Introduction to Bioinformatics.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
An overview of Bioinformatics. Cell and Central Dogma.
A Report on CAMDA’01 Biointelligence Lab School of Computer Science and Engineering Seoul National University Kyu-Baek Hwang and Jeong-Ho Chang.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
High-throughput omic datasets and clustering
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
GO based data analysis Iowa State Workshop 11 June 2009.
Gene expression and DNA microarrays No lab on Thursday. No class on Tuesday or Thursday next week –NCBI training Monday and Tuesday –Feb. 5 during class.
ESTs Ian Keller Laboratory Techniques in Molecular Bio.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
High throughput biology data management and data intensive computing drivers George Michaels.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Human Genome Project.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Department of Genetics • Stanford University School of Medicine
Genomes and Their Evolution
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri

Lesson 1: Data to Biological sense. What we are trying to achieve. Introduction to functional genomics modeling strategies.

Transcriptomics and Proteomics

Why study gene expression changes????? Transcription is predominant form of regulation

Northern Blots Mol Vis Nov 4;2:11

Basic concept: Reverse Northern blot on a large scale High throughput: hybridize control and experimental samples simultaneously using distinct fluorescent dyes many assays can be carried out in parallel Microarrays

Affymetrix oligo arrays design (11 to 16) Usually the most 3 prime area, often UTR 25mer AAAA.. 25mer

Genomic Tiling Array Design Genome Sequence Multiple probes 5´3´ Center-Center Resolution 38 bp

ISB Systems Biology Course 2006

Is mRNA level = Protein level? Is there a correlation??? Comparison of protein levels (MS, 2D gels) and RNA levels (SAGE) for 156 genes in yeast mRNA levels unchanged, but protein levels varied by up to 20X protein levels unchanged, but mRNA levels varied by up to 30X Highly expressed mRNAs correlate well with protein levels Gygi et al. (1999) Mol. Cell. Biol.

ISB Systems Biology Course 2006

Expressed Sequence Tags ESTs…pieces of DNA sequence (usually 200 to 500 nt) generated by sequencing either one or both ends of an expressed gene Bits of DNA that represent genes expressed in certain cells, tissues, or organs from different organisms and Can be useful "tags" to fish a gene out of a portion of chromosomal DNA by matching base pairs

EST Sequence Clustering EST Sequence Clustering Gene can be expressed as mRNA many,many times, ESTs derived from this mRNA may be redundant many identical, or similar, copies of the same EST redundancy and overlap means that when someone searches dbEST for a particular EST, they may retrieve a long list of tags, many of which may represent the same gene UniGene database automatically partitions GenBank sequences into a non-redundant set of gene-oriented clusters

ESTs: EST mapping to the genome, annotation differential expression differential expression Transcriptome: Clustering, differential expression analysis Proteome: differential expression analysis

Multiple data analysis platforms Proteomics Transcriptomics EST analysis LIST of elements

Modeling Function Modeling function requires: knowing the components of the system (structural annotation) knowing what these components do & how they interact (functional annotation)

Where do you begin???? Specifics

Transcriptome Analysis

Similar expression patterns = similar regulation?Clustering clustering algorithms help us identify patterns in complex data Key Goal: identify co-regulated groups of genes Hierarchical clustering K-means clustering Self organizing feature maps Principal component analysis

Qualitative : total number of identified proteins data intersections Quantitative: changes in protein expression Proteomics

Proteomic data analysis tools

Use GO for……. Grouping gene products by biological function Determining which classes of gene products are over- represented or under-represented Focusing on particular biological pathways and functions (hypothesis-driven data interrogation) Relating a protein’s location to its function

Course Overview Introduction to functional annotation. Orthologs and homologs; clusters of orthologous genes (COGs) and the gene ontology (GO); and how to find what functional annotation is available Tools for functional annotation. Accessing functional data; computational strategies to obtain more complete functional annotation; the AgBase GO annotation pipeline. Introduction to pathways analysis. Theory and strategies for pathway analysis modeling in different species and tools for pathway analysis. Functional genomics modeling : prokaryotic and eukaryotic examples

Some Useful Links (comprehensive access to information regarding complete and ongoing genome projects around the world.) (provides a controlled vocabulary to describe gene and gene product attributes in any organism) (integrated protein informatics resource for genomics and proteomics) (protein database) (maintains a set of generic databases as well as the systematic comparative analysis of microbial, fungal, and plant genomes.) (comprehensive resource for public databases, literature and tools) (System that maintains automatic annotation of large eukaryotic genomes) (expert protein analysis system) (BioCyc is a collection of 260 Pathway/Genome Databases: metabolic pathways) (biological systems" database integrating both molecular building block information and higher-level systemic information)

Some Useful Links (functional genomics studies on a variety of pathogens for which genomic sequence information is currently, or will soon be, available) (comprehensive resource for microbial genomics) (High throughput proteome annotations) (Arabidopsis resources) (systems biology portal) (mathematical models of biological interests) (species-specific collections of genes and annotation) (Microarray analysis resources) (Database for Annotation, Visualization and Integrated Discovery) (swine genetics community)

Some Useful Links (pathways and tools for analysis) (database of human genes that includes automatically-mined genomic, proteomic and transcriptomic information, as well as orthologies, disease relationships, SNPs, gene expression, gene function, and service links for ordering assays and antibodies) (proteomics tools) (open access institute) (A network of genes and proteins extends through the scientific literature) (comparative analysis of protein sequence) (genome-scale algorithm for grouping ortholog protein sequences) (ortholog prediction program) (transcription factor database)

Some Useful Links (curated knowledgebase of biological pathways) Virtual Library of Biochemistry,Moleculer Biology and Cell Biology) (Stanford genomic resources) (collection of tools for annotation and analysis of sequences) (prediction of transmembrane domains in proteins) (subcellular localization predictions) (prediction of membrane-spanning regions and their orientation) (functional analysis of agricultural plant and animal gene products)