Download presentation
Presentation is loading. Please wait.
Published byKristian Stanley Modified over 9 years ago
1
Affymetrix GeneChips and Analysis Methods Neil Lawrence
2
Schedule 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo Lecture 9 th MayAffymetrix GeneChips 16 th MayGuest Lecturer – Dr Pen Rashbass 23 rd MayAnalysis methods and some of this
3
Photolithography Photolithography (Affymetrix) –Based on the same technique used to make the microprocessors. –Oligonucleotides are generated in situ on a silicon surface. –Oligonucleotides up to 30bp in length. –Array density of 10 6 probes per cm -2.
4
Affymetrix Stock Price
5
Affymetrix Only one biological sample per chip. Oligonucleotides represent a portion of a gene’s sequence. Twenty sub- sequences present for each gene.
6
Perfect vs Mismatch For each oligonucleotide there is –A perfect match –A mismatch The perfect match is a sub-sequence of the true sequence. The mismatch is a sub-sequence with a ‘central’ base-pair replaced.
7
Affymetrix Analysis Mismatch is designed to measure ‘background’. Signal from each sub-sequence is I Perfect match – I Mismatch Twenty of these sub-sequences are present. Average of all these signals is taken.
8
Problems Sometimes I mismatch > I perfect match –Solution: set it to 20??!!! Other issues –Present/Absent call Based on the number of Signals > 0. Proprietary Technology –You don’t know what the subsequences are. Apparently this is changing!
9
Scaling Factors – Maximum likelihood estimation The data produced is still affected by undesirable variations that we need to remove. We can assume that the variations are primarily multiplicative: (No intensity dependent or print-tip effect) Obs.-exp.Level = true-exp.Level * error *random-noise (chip variations) (biological noise)
10
Model Assumption Organise the twelve values from three exogenous control species in a matrix: X=[NControls * NChips] Error model: Here m i is associated with each control and r j is associated with each chip or experiment. Taking logs we have:
11
Scaling Factors Calculating scaling factors using maximum likelihood estimation of the model parameters Likelihood: Estimates are calculated solving Scaling factors are thus :
12
You Should Know The Central Dogma (Gene Expression). cDNA chip overview. Noise in cDNA chips. Affymetrix GeneChip overview.
13
Analysis of Microarray Data Vanilla-flavour analysis: –Obtain temporal profiles (e.g. from last week’s mouse experiment). –‘Cluster’ profiles –Assume genes in the same cluster are functionally related.
14
Temporal Profiles Lack of statistical independence. Take temporal differences to recover. Justified by assuming and underlying Markov process.
15
Analysis of Microarray Data Day 1Day 2Day 3Day 4Day 5Day 6 0 40 80 120 2-13-2 4-35-4 6-5 -80 -40 0 40 80 Original Temporal Profile Take Temporal Differences Gene expression level Change in exp. level
16
Consider Clustering via MSE These two similar profiles won’t cluster Day 1Day 2Day 3Day 4Day 5Day 6 0 40 80 120 Gene expression level Day 1Day 2Day 3Day 4Day 5Day 6 20 60 100 140 Gene expression level
17
The Temporal Differences Will 2-13-2 4-35-4 6-5 -80 -40 0 40 80 Change in exp. level 2-13-2 4-35-4 6-5 -80 -40 0 40 80 Change in exp. level
18
Many Other Different Techniques Hierachical Clustering Self-Organising Maps ML-Group –Generative Topographic Mappings (GTM)
19
GTM Data lies in high dimensional space (>2). Model it with a lower embedded dimensionality (2). MATLAB Demo of embedded dimensions.
20
GTM on Gene Data MATLAB Demo.
21
Conclusions Take Temporal differences of Profiles. Attempt to Cluster. Test Hypothesis that clustered Genes are functionally related. Good luck in the Exam!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.