Download presentation
Presentation is loading. Please wait.
1
Modeling Functional Genomics Datasets CVM8890-101 Lesson 1 13 June 2007Bindu Nanduri
2
Lesson 1: Data to Biological sense. What we are trying to achieve. Introduction to functional genomics modeling strategies.
3
Transcriptomics and Proteomics
4
Why study gene expression changes????? Transcription is predominant form of regulation
5
Northern Blots Mol Vis. 1996 Nov 4;2:11
6
Basic concept: Reverse Northern blot on a large scale High throughput: hybridize control and experimental samples simultaneously using distinct fluorescent dyes many assays can be carried out in parallel Microarrays
7
Affymetrix oligo arrays design (11 to 16) Usually the most 3 prime area, often UTR 25mer AAAA.. 25mer http://www.affymetrix.com
8
Genomic Tiling Array Design Genome Sequence Multiple probes 5´3´ Center-Center Resolution 38 bp
9
ISB Systems Biology Course 2006
10
Is mRNA level = Protein level? Is there a correlation??? Comparison of protein levels (MS, 2D gels) and RNA levels (SAGE) for 156 genes in yeast mRNA levels unchanged, but protein levels varied by up to 20X protein levels unchanged, but mRNA levels varied by up to 30X Highly expressed mRNAs correlate well with protein levels Gygi et al. (1999) Mol. Cell. Biol.
11
ISB Systems Biology Course 2006
17
Expressed Sequence Tags ESTs…pieces of DNA sequence (usually 200 to 500 nt) generated by sequencing either one or both ends of an expressed gene Bits of DNA that represent genes expressed in certain cells, tissues, or organs from different organisms and Can be useful "tags" to fish a gene out of a portion of chromosomal DNA by matching base pairs http://www.ncbi.nlm.nih.gov/About/primer/est.html
19
EST Sequence Clustering EST Sequence Clustering Gene can be expressed as mRNA many,many times, ESTs derived from this mRNA may be redundant many identical, or similar, copies of the same EST redundancy and overlap means that when someone searches dbEST for a particular EST, they may retrieve a long list of tags, many of which may represent the same gene UniGene database automatically partitions GenBank sequences into a non-redundant set of gene-oriented clusters
20
ESTs: EST mapping to the genome, annotation differential expression differential expression Transcriptome: Clustering, differential expression analysis Proteome: differential expression analysis
21
Multiple data analysis platforms Proteomics Transcriptomics EST analysis LIST of elements
27
Modeling Function Modeling function requires: knowing the components of the system (structural annotation) knowing what these components do & how they interact (functional annotation)
29
http://www.protonet.cs.huji.ac.il/ProToGO/Introduction.html
30
Where do you begin???? Specifics
31
Transcriptome Analysis
32
Similar expression patterns = similar regulation?Clustering clustering algorithms help us identify patterns in complex data Key Goal: identify co-regulated groups of genes Hierarchical clustering K-means clustering Self organizing feature maps Principal component analysis
33
Qualitative : total number of identified proteins data intersections Quantitative: changes in protein expression Proteomics
35
Proteomic data analysis tools
38
Use GO for……. Grouping gene products by biological function Determining which classes of gene products are over- represented or under-represented Focusing on particular biological pathways and functions (hypothesis-driven data interrogation) Relating a protein’s location to its function
39
Course Overview Introduction to functional annotation. Orthologs and homologs; clusters of orthologous genes (COGs) and the gene ontology (GO); and how to find what functional annotation is available Tools for functional annotation. Accessing functional data; computational strategies to obtain more complete functional annotation; the AgBase GO annotation pipeline. Introduction to pathways analysis. Theory and strategies for pathway analysis modeling in different species and tools for pathway analysis. Functional genomics modeling : prokaryotic and eukaryotic examples
40
Some Useful Links http://www.genomesonline.org/ (comprehensive access to information regarding complete and ongoing genome projects around the world.) http://www.geneontology.org/ (provides a controlled vocabulary to describe gene and gene product attributes in any organism) http://pir.georgetown.edu/ (integrated protein informatics resource for genomics and proteomics) http://www.pir.uniprot.org/ (protein database) http://mips.gsf.de/ (maintains a set of generic databases as well as the systematic comparative analysis of microbial, fungal, and plant genomes.) http://www.ncbi.nlm.nih.gov/ (comprehensive resource for public databases, literature and tools) http://www.ebi.ac.uk/ensembl/ (System that maintains automatic annotation of large eukaryotic genomes) http://expasy.org/ (expert protein analysis system) http://www.biocyc.org/ (BioCyc is a collection of 260 Pathway/Genome Databases: metabolic pathways) http://www.genome.jp/kegg/ (biological systems" database integrating both molecular building block information and higher-level systemic information)
41
Some Useful Links http://pfgrc.tigr.org/index.shtml (functional genomics studies on a variety of pathogens for which genomic sequence information is currently, or will soon be, available) http://www.tigr.org/ (comprehensive resource for microbial genomics) http://www.cs.ualberta.ca/~bioinfo/PA/ (High throughput proteome annotations) http://garnet.arabidopsis.org.uk/systems_biology_tools.htm (Arabidopsis resources) http://www.systems-biology.org/002/ (systems biology portal) http://www.ebi.ac.uk/biomodels/ (mathematical models of biological interests) http://www.genmapp.org/current_databases.html (species-specific collections of genes and annotation) http://bioinfo.bgu.ac.il/bsu/microarrays/links/ (Microarray analysis resources) http://david.abcc.ncifcrf.gov/ (Database for Annotation, Visualization and Integrated Discovery) http://www.animalgenome.org/pigs/community/links.html (swine genetics community)
42
Some Useful Links http://www.biocarta.com/FeaturedProducts/index.asp (pathways and tools for analysis) http://www.genecards.org/index.shtml (database of human genes that includes automatically-mined genomic, proteomic and transcriptomic information, as well as orthologies, disease relationships, SNPs, gene expression, gene function, and service links for ordering assays and antibodies) http://www.proteomecommons.org/ (proteomics tools) http://harvester.embl.de/ http://bioinformatics.org/ (open access institute) http://www.ihop-net.org/UniPub/iHOP/ (A network of genes and proteins extends through the scientific literature) http://www1.jcsg.org/psat/help/document.html (comparative analysis of protein sequence) http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi (genome-scale algorithm for grouping ortholog protein sequences) http://www.pathogenomics.ca/ortholuge/ (ortholog prediction program) http://www.gene-regulation.com/pub/databases.html (transcription factor database)
43
Some Useful Links http://www.reactome.org/ (curated knowledgebase of biological pathways) http://www.biochemweb.org/systems.shtml(The Virtual Library of Biochemistry,Moleculer Biology and Cell Biology) http://genome-www.stanford.edu/ (Stanford genomic resources) http://www.softberry.com/berry.phtml (collection of tools for annotation and analysis of sequences) http://sosui.proteome.bio.tuat.ac.jp/sosuiframe0E.html (prediction of transmembrane domains in proteins) http://www.psort.org/psortb/ (subcellular localization predictions) http://www.ch.embnet.org/software/TMPRED_form.html (prediction of membrane-spanning regions and their orientation) http://www.agbase.msstate.edu/ (functional analysis of agricultural plant and animal gene products)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.