Download presentation
Presentation is loading. Please wait.
Published byIlene Parrish Modified over 9 years ago
1
Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE
2
Outline Basic Genomics Signal Processing for Genomic Sequences Signal Processing for Gene Expression Resources and Co-operations Challenges and Future Work
3
Basic Genomics
4
Genome Every human cell contains 6 feet of double stranded (ds) DNA This DNA has 3,000,000,000 base pairs representing 50,000- 100,000 genes This DNA contains our complete genetic code or genome DNA regulates all cell functions including response to disease, aging and development Gene expression pattern: snapshot of DNA in a cell Gene expression profile: DNA mutation or polymorphism over time Genetic pathways: changes in genetic code accompanying metabolic and functional changes, e.g. disease or aging.
6
Gene: protein-coding DNA Protein mRNA DNA transcription translation CCTGAGCCAACTATTGATGAA PEPTIDEPEPTIDE CCUGAGCCAACUAUUGAUGAA
7
In more detail (color ~state)
8
Signal Processing for Genomic Sequences
9
The Data Set
10
The Problem Genomic information is digital letters A, T, C and G Signal processing deals with numerical sequences, character strings have to be mapped into one or more numerical sequences Identification of protein coding regions Prediction of whether or not a given DNA segment is a part of a protein coding region Prediction of the proper reading frame Comparing to traditional methods, signal processing methods are much quicker, and can be even more accurate in some cases.
11
Sequence to signal mapping
12
Signal Analysis Spectral analysis (Fourier transform, periodogram) Spectrogram Wavelet analysis HMT: wavelet-based Hidden Markov Tree Spectral envelope (using optimal string to numerical value mapping)
13
Spectral envelope of the BNRF1 gene from the Epstein-Barr virus (a)1 st section (1000bp), (b) 2 nd section (1000bp), (c) 3 rd section (1000bp), (d) 4 th section (954bp) Conjecture: the 4 th quarter is actually non-coding
14
Signal Processing for Gene Expression
15
Biological Question Sample preparation Microarray Life Cycle Data Analysis & Modeling Microarray Reaction Microarray Detection Taken from Schena & Davis
16
cDNA clones (probes) PCR product amplification purification printing microarray Hybridise target to microarray mRNA target) excitation laser 1laser 2 emission scanning analysis overlay images and normalise 0.1nl/spot
18
Image Segmentation Simple way: fixed circle method Advanced: fast marching level set segmentation Advanced Fixed circle
19
Clustering and filtering methods Principal approaches: Hierarchical clustering (kdb trees, CART, gene shaving) K-means clustering Self organizing (Kohonen) maps Vector support machines Gene Filtering via Multiobjective Optimization Independent Component Analysis (ICA) Validation approaches: Significance analysis of microarrays (SAM) Bootstrapping cluster analysis Leave-one-out cross-validation Replication (additional gene chip experiments, quantitative PCR)
20
ICA for B-cell lymphoma data Data: 96 samples of normal and malignant lymphocytes. Results: scatter-plotting of 12 independent components Comparison: close related to results of hierarchical clustering
21
Resources and Co-operations Resources: databases on the internet such as GeneBank ProteinBank Some small databases of microarray data Co-operations in need: First hand microarray data Biological experiment for validation
22
Challenges and Future Work Genomic signal processing opens a new signal processing frontier Sequence analysis: symbolic or categorical signal, classical signal processing methods are not directly applicable Increasingly high dimensionality of genetic data sets and the complexity involved call for fast and high throughput implementations of genomic signal processing algorithms Future work: spectral analysis of DNA sequence and data clustering of microarray data. Modify classical signal processing methods, and develop new ones.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.