Yuval Hart, Weizmann 2010© 1 Introduction to Matlab & Data Analysis Final Project: That’s all, Folks!
2 Outline Parsing files Efficient programming - vectorization Correlation coefficients Passing extra parameters Image plotting Curve Fitting & Optimization Figure handling
3 “Rotation in 60 minutes”
4 Rotation in 60 minutes: During the past month you’ve measured promoter activity of 20 genes. Your PI wants you to present your results at the next group meeting.
5 To Do List Get the sequences of the genes from a GenBank+Fasta files and calculate GC content Display all correlation coefficients of the measured PA and relation to GC content Find for the highest 4 genes, how correlation decays with distance from initial gene in the pathway
7 GenBank file format
8 Step 3: Attach every gene name with its DNA sequence Build the structure with all needed fields: % Build the structure Genes with the desired genes and their data: % name, startPosition, endPosition, sequence, complement (1/0), GCcontent % This is also the way to preallocate for structures: % Genes(1,sum(indGeneList))=struct( 'name', [], 'complement', [], 'sequence',[],... % 'StartPosition',[],'EndPosition',[],'GCcontent',1); Genes=struct('name',geneNames(indGeneList),… 'complement', num2cell(indComplement(indGeneList)'),... 'StartPosition',CDSpositionStartEndCelled(indGeneList,1)',… 'EndPosition',CDSpositionStartEndCelled(indGeneList,2)',... 'sequence',seq,'GCcontent',GCcontent); a=Genes; Note: Structures are assigned one by one only with cell arrays
10 Calculate and plot Correlation Matrix Load the list of genes and measurements % Input: % measurement mat file contains: % geneList - a cell array of the genes Names % measurements - a matrix of 20 genes measurements at 1001 time points % GenesGCcontent - a vector of the genes GCcontent values %measurements has a row for each gene containing its measurements through %1001 time points and the geneList names load measurements
11 Plot GC content and mean PA dependence Plot fit results upon the previous graph: Note: Smoothed data can lower the effect of outliers
12 Calculate and plot Correlation Matrix Calculate and display the corr. matrix
14 Step 2: Fit correlations to the desired function Using anonymous function to add more Parameters and fitting using lsqcurvefit: function y_hat=FittingCurveExpGuess(c,x,init) % This assumes an exponential decreasing curve y_hat=init+c(1)*exp(c(2).*x); initDis=-0.1; c0=[.7 0.1]; %assigning the initial values for the fit search paramfunc %def. of the anonymous function ExpParam=lsqcurvefit(paramfunc,c0,XdataPoints,correl,[0 -1],[1 1],options); Function nameInitial guessX dataY data Lower bound upper bound
15 Step 3: Plot the correlation data and fit
16 Best of Luck in the Group Meeting !
18 This is the end, my friend, the end "Louis, I think this is the beginning of a beautiful friendship."