Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 Probe selection for Microarrays Considerations and pitfalls.

Slides:



Advertisements
Similar presentations
LECTURE 17: RNA TRANSCRIPTION, PROCESSING, TURNOVER Levels of specific messenger RNAs can differ in different types of cells and at different times in.
Advertisements

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Analysis of SAGE Data: An Introduction Kevin R. Coombes Section of Bioinformatics.
Transcriptome Sequencing with Reference
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction methods Gene indices Mapping cDNA on genomic DNA Genome-genome.
Technologies and utility
Gene Expression And Regulation Bioinformatics January 11, 2006 D. A. McClellan
1 Alternative Splicing. 2 Eukaryotic genes Splicing Mature mRNA.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Bacterial Physiology (Micr430)
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Alternative Splicing As an introduction to microarrays.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Microarrays: Theory and Application By Rich Jenkins MS Student of Zoo4670/5670 Year 2004.
Introduce to Microarray
Lecture 12 Splicing and gene prediction in eukaryotes
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Eukaryotic Gene Finding
Making, screening and analyzing cDNA clones Genomic DNA clones
and analysis of gene transcription
Doug Brutlag 2011 Next Generation Sequencing and Human Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University.
Reading the Blueprint of Life
Fine Structure and Analysis of Eukaryotic Genes
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
The Genome is Organized in Chromatin. Nucleosome Breathing, Opening, and Gaping.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
1. Bacterial genomes - genes tightly packed, no introns... HOW TO FIND GENES WITHIN A DNA SEQUENCE? Scan for ORFs (open reading frames) - check all 6 reading.
Technology for Systems Biology. Nucleic Acid Hybridization In principle complementary strands will associate Chemistry is quite different on surfaces.
Remember the limitations? –You must know the sequence of the primer sites to use PCR –How do you go about sequencing regions of a genome about which you.
Expression of the Genome The transcriptome. Decoding the Genetic Information  Information encoded in nucleotide sequences contained in discrete units.
ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2.
Verna Vu & Timothy Abreo
Grupo 5. 5’site 3’site branchpoint site exon 1 intron 1 exon 2 intron 2 AG/GT CAG/NT.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
MPL Identification of alternative spliced mRNA variants related to cancers by genome-wide ESTs alignment KIM DAE SOO Oncogene Apr.
LECTURES 3/4. CONSTRUCTING and SCREENING cDNA LIBRARIES to ISOLATE NEW GENES ORIGINAL ARTICLES: CLONING BY COMPLEMENTATION: Lew, D, Dulic, V, and Reed.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Genomics.
DNA Microarrays: An Introduction Jochen Mueller
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
` Gene Diversification and Transcript Variants by Transposable Elements Un-Jong Jo 1, Dae-Soo Kim 1, Tae-Hyung Kim 1, Jae-Won Huh 2 and Heui-Soo Kim 1,2.
Probe Selection for Microarrays Considerations and Pitfalls Kay Hofmann MEMOREC Stoffel GmbH Cologne/Germany.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
How can we find genes? Search for them Look them up.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
High-Throughput Cloning and Expression Library Creation for Functional Proteomics The International Proteomics Tutorial Program.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
Finding genes in the genome
From: Duggan et.al. Nature Genetics 21:10-14, 1999 Microarray-Based Assays (The Basics) Each feature or “spot” represents a specific expressed gene (mRNA).
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Vicky Fan, Bioinformatics Institute.  Solid surface which the sequences from thousands of different genes (or proteins) are attached at fixed locations.
Detecting DNA with DNA probes arrays. DNA sequences can be detected by DNA probes and arrays (= collection of microscopic DNA spots attached to a solid.
Canadian Bioinformatics Workshops
bacteria and eukaryotes
The Transcriptional Landscape of the Mammalian Genome
Human Genome Project.
Lecture 8 A toolbox for mechanistic biologists (continued)
Experimental Verification Department of Genetic Medicine
Expression of the Genome
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
Gene Sizes Vary Strachan p146 DYSTROPHIN.
Gene Sizes Vary Strachan p146 DYSTROPHIN.
Presentation transcript:

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Probe selection for Microarrays Considerations and pitfalls

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Probe selection wish list  Probe selection strategy should ensure  Biologically meaningful results (The truth...)  Coverage, Sensitivity (... The whole truth...)  Specificity (... And nothing but the truth)  Annotation  Reproducibility

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Technology Probe immobilization Oligonucleotide coupling Synthesis with linker, covalent coupling to surface Oligonucleotide photolithography ds-cDNA coupling cDNA generated by PCR, nonspecific binding to surface ss-cDNA coupling PCR with one modified primer, covalent coupling, 2nd strand removal Spotting With contact (pin-based systems) Without contact (ink jet technology)

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Technology-specific requirements General Not too short (sensitivity, selectivity) Not too long (viscosity, surface properties) Not too heterogeneous (robustness) Degree of importance depends on method Single strand methods (Oligos, ss-cDNA) Orientation must be known ss-cDNA methods are not perfect ds-cDNA methods don’t care

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Probe selection approaches AccuracyThroughput Selected Gene Regions Selected Genes Anonymous ESTs Cluster Representatives

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Non-Selective Approaches Anonymous (blind) spotting Using clones from a library without prior sequencing Only clones with interesting expression pattern are sequenced Normalization of library highly recommended Typical uses: HT-arrays of ‘exotic’ organisms or tissues Large-scale verification of Differential Display clones EST spotting Using clones from a library after sequencing Little justification since sequence availability allow selection

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Spotting of cluster representatives Sequence Clustering For human/mouse/rat EST clones: public cluster libraries Unigene (NCBI) THC (TIGR) For custom sequence: clustering tools STACK_PACK (SANBI) JESAM (HGMP) PCP (Paracel, commercial)

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF A benign clustering situation

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF In the absence of 5‘-3‘ links Two clusters corresponding to one gene !

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Overlap too short Three clusters corresponding to one gene !

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Chimeric ESTs One cluster corresponding to two genes !!

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Chimeric ESTs... continued Chimeric ESTs are quite common Chimeric ESTs are a major nuisance for array probe selection One of the fusion partners is usually a highly expressed mRNA Double-picking of chimeric ESTs can fool even cautious clustering programs. Unigene contains several chimeric clusters The annotation of chimeric clusters is erratic Chimeric ESTs can be detected by genome comparison There is one particularly bad class of chimeric sequences that will be subject of the exercises.

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF How to select a cluster representative If possible, pick a clone with completely known sequence Avoid problematic regions Alu-repeats, B1, B2 and other SINEs LINEs Endogenous retroviruses Microsatellite repeats Avoid regions with high similarity to non-identical sequences In many clusters, orientation and position relative to ORF are unknown and cannot be selected for. Test selected clone for sequence correctness Test selected clone for chimerism Some commercial providers offer sequence verified UNIGENE cluster representatives

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Selection of genes If possible, use all of them Biased selection Selection by tissue Selection by topic Selection by visibility Selection by known expression properties Selection from unbiased pre-screen Use sources of expression information EST frequency Published array studies SAGE data

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Selection of gene regions 3‘ UTR ORF 5‘ UTR

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Alternative polyadenylation

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Alternative polyadenylation Constitutive polyA heterogeneity 3’-Fragments: reduced sensitivity no impact on expression ratio Regulated polyA heterogeneity Fragment choice influences expression ratio Multiple fragments necessary Detection of cryptic polyA signals Prediction (AATAAA) Polyadenylated ESTs SAGE tags

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Alternative splicing

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Alternative splicing Constitutive splice form heterogeneity Fragment in alternative exon: reduced sensitivity No impact on expression ratio Regulated splice form heterogeneity Fragment choice influences expression ratio Multiple fragments necessary Detection of alternative splicing events Hard/Impossible to predict EST analysis (beware of pre-mRNA) Literature

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Alternative promoter usage

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Alternative promoter usage What is the desired readout? If promoter activity matters most: multiple fragments If overall mRNA level matters most: downstream fragment Detection of alternative promoter usage Prediction difficult (possible?) EST analysis Literature

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF UDP-Glucuronosyltransferases UGT1A8 UGT1A7

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Selection of gene regions Coding region (ORF) Annotation relatively safe No problems with alternative polyA sites No repetitive elements or other funny sequences danger of close isoforms danger of alternative splicing might be missing in short RT products 3’ untranslated region Annotation less safe danger of alternative polyA sites danger of repetitive elements less likely to cross-hybridize with isoforms little danger of alternative splicing 5’ untranslated region close linkage to promoter frequently not available

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF A checklist Pick a gene Try get a complete cDNA sequence Verify sequence architecture (e.g. cross-species comparison) Mask repetitive elements (and vector!) If possible, discard 3’-UTR beyond first polyA signal Look for alternative splice events Use remaining region of interest for similarity searches Mask regions that could cross-hybridize Use the remaining region for probe amplification or EST selection When working with ESTs, use sequence-verified clones