Yuval Hart, Weizmann 2010© 1 Introduction to Matlab & Data Analysis Tutorial 13: That’s all, Folks! Please change directory to directory E:\Matlab (cd.

Slides:



Advertisements
Similar presentations
Importing Data into IX1D v 3 – A Tutorial © 2006 Interpex Limited All rights reserved Version 1.0.
Advertisements

Matrices and MATLAB Dr Viktor Fedun
Answer "What If" Questions
Exercise 7.5 (p. 343) Consider the hotel occupancy data in Table 6.4 of Chapter 6 (p. 297)
Introduction to Engineering MATLAB – 11 Plotting - 4 Agenda Multiple curves Multiple plot.
M AT L AB Programming: scripts & functions. Scripts It is possible to achieve a lot simply by executing one command at a time on the command line (even.
Introduction to MATLAB for Biomedical Engineering BME 1008 Introduction to Biomedical Engineering FIU, Spring 2015 Lesson 2: Element-wise vs. matrix operations.
§ 3.4 Matrix Solutions to Linear Systems.
MATLAB – What is it? Computing environment / programming language Tool for manipulating matrices Many applications, you just need to get some numbers in.
Microsoft Excel. Click on “Start,” then “Microsoft Office Excel.”
MATLAB Functions – Part II Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University September 2013.
Programming Environment S. Awad, Ph.D. M. Corless, M.S.E.E. E.C.E. Department University of Michigan-Dearborn Introduction to Matlab: Cells and Structures.
CS231A Matlab Tutorial Philip Lee Winter Overview  Goals › Introduction to Matlab › Matlab Snippets › Basic image manipulations › Helpful Matlab.
CREATING A MULTIPLE PAGE REPORT Presented by: Dr. Ennis-Cole.
Engineering H192 - Computer Programming The Ohio State University Gateway Engineering Education Coalition Lect 19P. 1Winter Quarter MATLAB: Script and.
Microsoft Excel 2010 Chapter 7
By Hrishikesh Gadre Session II Department of Mechanical Engineering Louisiana State University Engineering Equation Solver Tutorials.
Web Page Development Identify elements of a Web Page Start Notepad
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
Data processing in MathCAD. Data in tables Tables are analogous to matrices Tables are analogous to matrices The numbers of columns and rows can be dynamically.
INTRO TO PROGRAMMING Chapter 2. M-files While commands can be entered directly to the command window, MATLAB also allows you to put commands in text files.
Lab Assignment 7 | Web Forms and Manipulating Strings Interactive Features Added In this assignment you will continue the design and implementation of.
Chapter 5 Review: Plotting Introduction to MATLAB 7 Engineering 161.
1 iSee Player Tutorial Using the Forest Biomass Accumulation Model as an Example ( Tutorial Developed by: (
Tutorial 14 Working with Forms and Regular Expressions.
MATLAB INTRO CONTROL LAB1  The Environment  The command prompt Getting Help : e.g help sin, lookfor cos Variables Vectors, Matrices, and Linear Algebra.
by Chris Brown under Prof. Susan Rodger Duke University June 2012
Introduction to MATLAB Session 1 Prepared By: Dina El Kholy Ahmed Dalal Statistics Course – Biomedical Department -year 3.
PLOTS AND FIGURES DAVID COOPER SUMMER Plots One of the primary uses for MATLAB is to be able to create publication quality figures from you data.
 2004 Prentice Hall, Inc. All rights reserved. 1 Chapter 11 - JavaScript: Arrays Outline 11.1 Introduction 11.2 Arrays 11.3 Declaring and Allocating Arrays.
Agenda Sed Utility - Advanced –Using Script-files / Example Awk Utility - Advanced –Using Script-files –Math calculations / Operators / Functions –Floating.
REVIEW 2 Exam History of Computers 1. CPU stands for _______________________. a. Counter productive units b. Central processing unit c. Copper.
ELG 3120 Signal and System Analysis 1 Introduction to MATLAB TAs Wei Zhang Ozgur Ekici (Section A)(Section B) ELG 3120 Lab Tutorial 1.
Introduction to Matlab & Data Analysis
Ekstrom Math 115b Mathematics for Business Decisions, part II Trend Lines Math 115b.
MEGN 536 – Computational Biomechanics MATLAB: Getting Started Prof. Anthony J. Petrella Computational Biomechanics Group.
A Brief Introduction to Matlab Laila Guessous Dept. of Mechanical Engineering Oakland University.
Demographic Profiles of Agency Clients - Part 2 Next, we will create a table and a column chart for the conservator field in my database. Because we are.
Recap Summary of Chapter 6 Interpolation Linear Interpolation.
Matlab Programming for Engineers Dr. Bashir NOURI Introduction to Matlab Matlab Basics Branching Statements Loops User Defined Functions Additional Data.
10/24/20151 Chapter 2 Review: MATLAB Environment Introduction to MATLAB 7 Engineering 161.
MAE 1202: AEROSPACE PRACTICUM An Introduction to MATLAB: Part 2 Mechanical and Aerospace Engineering Department Florida Institute of Technology Developed.
By Melissa Dalis Professor Susan Rodger Duke University June 2011 Multiplication Table.
EXAM REVIEW PROJECT Microsoft Excel Exam 1. EXAM PROCEDURES 10 minutes to review project before starting 60 minutes to complete the exam In this presentation,
© Copyright by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. 1 Tutorial 17 – Flag Quiz Application Introducing One-Dimensional.
Yuval Hart, Weizmann 2010© 1 Introduction to Matlab & Data Analysis Final Project: That’s all, Folks!
Lecture 2 - Matlab Introduction CVEN 302 June 5, 2002.
Introduction to MATLAB Session 1 Simopekka Vänskä, THL 2010.
Introduction to MATLAB Session 5 Simopekka Vänskä, THL 2010.
Lab 6 (2) Arrays ► Lab 5 (1) Exercise Review ► Array Concept ► Why Arrays? ► Array Declaration ► An Example of Array ► Exercise.
Files: By the end of this class you should be able to: Prepare for EXAM 1. create an ASCII file describe the nature of an ASCII text Use and describe string.
Array Creation ENGR 1181 MATLAB 2. Civil engineers store seismic data in arrays to analyze plate tectonics as well as fault patterns. These sets of data.
Files Tutor: You will need ….
Intermacs Form Download Excel Tutorial Pivot Tables, Graphic Tools, Macros By: Devin Koehl.
INTRODUCTION TO MATLAB DAVID COOPER SUMMER Course Layout SundayMondayTuesdayWednesdayThursdayFridaySaturday 67 Intro 89 Scripts 1011 Work
1 Lecture 5 Post-Graduate Students Advanced Programming (Introduction to MATLAB) Code: ENG 505 Dr. Basheer M. Nasef Computers & Systems Dept.
Copyright © 2005 by Nelson, a division of Thomson Canada Limited 14-0 EXCEL CHAPTER 14 PHILIP BEDIENT.
T U T O R I A L  2009 Pearson Education, Inc. All rights reserved Student Grades Application Introducing Two-Dimensional Arrays and RadioButton.
SCRIPTS AND FUNCTIONS DAVID COOPER SUMMER Extensions MATLAB has two main extension types.m for functions and scripts and.mat for variable save files.
1 Berger Jean-Baptiste
Curve Fitting the Calibration Data of a Thermistor Voltage Divider Portland State University Department of Mechanical Engineering ME 121: Engineering Problem.
An Introduction to Programming in Matlab Emily Blumenthal
Examples, examples: Outline
String Manipulation Chapter 7 Attaway MATLAB 4E.
Unit 4 Mathematics Created by Educational Technology Network
Array Creation ENGR 1181 MATLAB 02.
Functions continued.
Presentation transcript:

Yuval Hart, Weizmann 2010© 1 Introduction to Matlab & Data Analysis Tutorial 13: That’s all, Folks! Please change directory to directory E:\Matlab (cd E:\Matlab;) From the course website ( ) Download: tFinal.zip

2 Outline Parsing files Efficient programming - vectorization (Profiling) Correlation coefficients Passing extra parameters Image plotting Curve Fitting & Optimization Figure handling

3 “Rotation in 60 minutes”

4 Rotation in 60 minutes: During the past month you’ve measured promoter activity of 20 genes. Your PI wants you to present your results at the next group meeting.

5 To Do List Get the sequences of the genes from a GenBank+Fasta files and calculate GC content Display all correlation coefficients of the measured PA and relation to GC content Find for the highest 4 genes, how correlation decays with distance from initial gene in the pathway

6 To Do List Get the sequences of the genes from a GenBank+Fasta files and calculate GC content Display all correlation coefficients of the measured PA and relation to GC content Find for the highest 4 genes, how correlation decays with distance from initial gene in the pathway

7 GenBank file format

8 Step 1: get data from files Get the DNA sequence from the fasta file: % Extract gene sequences from the fasta file fid_fasta_data = fopen(fnamefasta,'r'); %check that file was opened correctly if fid_fasta_data<0 error('GenBank File name is not correct, please issue file name again'); end celledFasta=textscan(fid_fasta_data, '%c'); % '%c' is single character fclose(fid_fasta_data); fasta=celledFasta{1}'; %fasta is a char array of the sequence

9 Step 1: get data from files Put the entire file in a cell array divided to rows: %open the gene file fid_gene_input=fopen(fnamegene,'r'); %check that file was opened correctly if fid_gene_input<0 error('GenBank File name is not correct, please issue file name again'); end % parse the file such that every row of file is inside a cell element celledData=textscan(fid_gene_input,'%s','delimiter','\n'); fclose(fid_gene_input); % remove all white spaces from beginning and end of rows celledDataTrim=strtrim(celledData{1});

10 Step 2: Get genes names and sequence position from GenBank data Get the genes names and sequence location from the GenBank file: % indCDS has an index for all ocurrances of CDS % line format is 'CDS pos1..pos2' or 'CDS complement(pos1..pos2)' % so CDSposition are the tokens from the row CDSposition=regexp(celledDataTrim,'^CDS\s+(?:complement\()*(\d+)\.\.(\d+)','tok ens'); indCDS=~cellfun('isempty',CDSposition); % gene name is one row below the CDS info so shift index one place right indGene=circshift(indCDS,[1 1]); % since already looked for right patterning, only need to check if there is % complement or not. indComplement indicates if it is a complement or % regular sequence indComplement=~cellfun('isempty',regexp(celledDataTrim(indCDS),'complement'));

11 Step 2: Get genes names and sequence position from GenBank data Get the genes names and sequence location from the GenBank file: %List of genes corresponding to the CDS found geneNames=regexp(celledDataTrim(indGene),'gene="(\w+)"','tokens'); geneNames=[geneNames{:}]; % Consider only cell elemets that had 'CDS' in them onlyCDSposition=CDSposition(indCDS); % Flatten the tokes cell array such that onlyCDSposition will have odd % elements as position 1 (start of gene) and even elements as position 2 % (end of gene) onlyCDSposition=[onlyCDSposition{:}]; CDSpositionStartEndCelled=cat(1,onlyCDSposition{:}); % cancatinates as two % columns and not in a single row (try cat(2,onlyCDSposition{:}))

12 Step 2: Get genes names and sequence position from GenBank data Get the index of only the genes we are interested in (found in genePool): % indGene specifies all ocurrances of genes in the file that are in the % "pool"/desired list indGeneList=ismember(geneNames,genePool);

13 Step 3: Attach every gene name with its DNA sequence Use indices to build array of gene sequence and calculate GC content: % Initialize gene list index j=0; % Note: i is the index of the vector searched (serial number of the gene in % the genBank list, j is the index of the specified genes, e.g. there could % be only 2 genes but their serial number in genBank file is 151 and 352, therefore % i= [ ] but j=[1 2]) seq=cell(1,sum(indGeneList)); GCcontent=cell(1,sum(indGeneList)); for i=find(indGeneList==1) j=j+1; % get the sequence from the fasta data by the start and end positions seq{j}=fasta(CDSpositionStartEndNum(i,1):CDSpositionStartEndNum(i,2)); % GCcontent is the percent of G or C in the sequence GCcontent{j}=length(regexp(seq{j},'[GC]'))/length(seq{j}); end

14 Step 3: Attach every gene name with its DNA sequence Build the structure with all needed fields: % Build the structure Genes with the desired genes and their data: % name, startPosition, endPosition, sequence, complement (1/0), GCcontent % This is also the way to preallocate for structures: % Genes(1,sum(indGeneList))=struct( 'name', [], 'complement', [], 'sequence',[],... % 'StartPosition',[],'EndPosition',[],'GCcontent',1); Genes=struct('name',geneNames(indGeneList),… 'complement', num2cell(indComplement(indGeneList)'),... 'StartPosition',CDSpositionStartEndCelled(indGeneList,1)',… 'EndPosition',CDSpositionStartEndCelled(indGeneList,2)',... 'sequence',seq,'GCcontent',GCcontent); a=Genes; Note: Structures are assigned one by one only with cell arrays

15 Profiling Compare runs of these two files: GetGenesData.m pars_gb_file.m What are the pitfalls of each one ? (hint: efficiency vs. memory usage).

16 To Do List Get the sequences of the genes from a GenBank+Fasta files and calculate GC content Display all correlation coefficients of the measured PA and relation to GC content Find for the highest 4 genes, how correlation decays with distance from initial gene in the pathway

17 Calculate and plot Correlation Matrix Load the list of genes and measurements % Input: % measurement mat file contains: % geneList - a cell array of the genes Names % measurements - a matrix of 20 genes measurements at 1001 time points % GenesGCcontent - a vector of the genes GCcontent values %measurements has a row for each gene containing its measurements through %1001 time points and the geneList names load measurements

18 Plot GC content and mean PA dependence Plot mean PA vs. GC content with the correlation coefficient figure(1); corrGCvsPA=corrcoef(ScaledGCcontent,MeanPA); plot(ScaledGCcontent,MeanPA,'or','MarkerSize',8,'LineWidth',2); set(gcf,'units','normalized','outerposition',[ ]);%set the plot to full screen title(sprintf('Mean Promoter Activity vs. GCcontent, Correlation is %2.4f',... corrGCvsPA(1,2)),'FontSize',14); xlabel('Scaled GC content [% deviation from 0.5]','FontSize',14); ylabel('Mean Promoter Activity [a.u.]','FontSize',14); hold on;

19 Plot GC content and mean PA dependence Plot fit results upon the previous graph: % Check for a linear fit to the curve fittedfunc=polyfit(ScaledGCcontent,MeanPA',1); plot(ScaledGCcontent,polyval(fittedfunc,ScaledGCcontent),'r','LineWidth',2); % Smooth the data and then fit to a polynomial: SmoothPA=smooth(ScaledGCcontent,MeanPA,0.25,'rloess'); %plot the smooth data set with robust smoothing plot(ScaledGCcontent,SmoothPA,'ob','MarkerSize',8,'LineWidth',2); Smofittedfunc=polyfit(ScaledGCcontent,SmoothPA',1); plot(ScaledGCcontent,polyval(Smofittedfunc,ScaledGCcontent),'b','LineWidth',2); text(0.05,2.1,['\leftarrow', sprintf('y= %2.2f x+%2.2f', fittedfunc(1),fittedfunc(2))],... 'HorizontalAlignment','left','FontSize',18,'Color',[1 0 0]); %See text properties text(-0.11,4,['\leftarrow',sprintf('y= %2.2f x+%2.2f‘...,Smofittedfunc(1),Smofittedfunc(2))], 'HorizontalAlignment','left','FontSize',18,... 'Color',[0 0 1]); %See text properties Robust smooth

20 Plot GC content and mean PA dependence Plot fit results upon the previous graph: Note: Smoothed data can lower the effect of outliers

21 Calculate and plot Correlation Matrix Calculate and display the corr. matrix figure(2); %note that corrcoef works on columns so we need to transpose measurements %calculate the correlation matrix of all genes measurements corrMat=corrcoef(measurements'); colormap('hot'); %set color scheme, popular choices are also: 'jet','hsv' imagesc(corrMat); %creates the image, data is scaled to max value of matrix colorbar; %plots also the color bar in the figure. set(gcf,'units','normalized','outerposition',[ ]);%set the plot to full screen set(gca,'XTick',1:20,'XTickLabel',geneList,'FontSize',12,'XAxisLocation','top') %sets the Ticks to be the genes Names and present them at top of figure set(gca,'YTick',1:20,'YTickLabel',geneList,'FontSize',12) %sets the Ticks to be the genes Names title('Gene correlations','FontSize',16);

22 Calculate and plot Correlation Matrix Calculate and display the corr. matrix

23 Calculate and plot Correlation Matrix If we first need to cluster the correlations from high to low: measurementsPermuted=measurements(randperm(GenesAmount),:); corrMatPerm=corrcoef(measurementsPermuted'); colormap('hot'); %set color scheme, popular choices are also: 'jet','hsv' imagesc(corrMatPerm); %Now we want to cluster them together by the mean correlation of % each gene with all other genes: MeanCorrMatPerm=mean(corrMatPerm); [sortedCorr indPerm]=sort(MeanCorrMatPerm,'descend'); imagesc(corrMatPerm(indPerm,indPerm));

24 Calculate and plot Correlation Matrix measurementsPermuted=measurements(randperm(GenesAmount),:); corrMatPerm=corrcoef(measurementsPermuted'); colormap('hot'); %set color scheme, popular choices are also: 'jet','hsv' imagesc(corrMatPerm); %Now we want to cluster them together by the mean correlation of % each gene with all other genes: MeanCorrMatPerm=mean(corrMatPerm); [sortedCorr indPerm]=sort(MeanCorrMatPerm,'descend'); imagesc(corrMatPerm(indPerm,indPerm));

25 To Do List Get the sequences of the genes from a GenBank+Fasta files and calculate GC content Display all correlation coefficients of the measured PA and relation to GC content Find for the highest 4 genes, how correlation decays with distance from initial gene in the pathway

26 Step 1: initialize and set parameters Set figure parameters and external fit parameters of the curves: figure(3); set(gcf,'units','normalized','outerposition',[ ]);%set plot to full screen %want to check if a vertical displacement helps, so added variable: initDis %which is part of the fitting function formula initDis=-0.1; GenesAmount=size(measurements,1);

27 Step 2: Fit correlations to the desired function Using anonymous function to add more parameters and fitting using lsqcurvefit: correl=corrMat(i,(1+i):end); %assigning the current correlation matrix values, from row i and columns after the diagonal % definition of the anonymous function which can have only two inputs, % yet we use three: fitting parameters, x values and initial displacement paramfunc %definition of the % anonymous function c0=[.7 0.1]; %assigning the initial values for the fit search XdataPoints=(1+i):GenesAmount; options = optimset('TolFun',1e-8,'GradObj','on'); % default=1e-6 %lsqcurvefit(function name,init guess,xdata,ydata,lower bound,upper % bound,options) ExpParam=lsqcurvefit(paramfunc,c0,XdataPoints,correl,[0 -1],[1 1],options); for i=1:numGenesToPlot end

28 Step 2: Fit correlations to the desired function Using anonymous function to add more Parameters and fitting using lsqcurvefit: function y_hat=FittingCurveExpGuess(c,x,init) % This assumes an exponential decreasing curve y_hat=init+c(1)*exp(c(2).*x); initDis=-0.1; c0=[.7 0.1]; %assigning the initial values for the fit search paramfunc %def. of the anonymous function ExpParam=lsqcurvefit(paramfunc,c0,XdataPoints,correl,[0 -1],[1 1],options); Function nameInitial guessX dataY data Lower bound upper bound

29 Step 3: Plot the correlation data and fit for i=1:numGenesToPlot % missing parts on previous slides… %Plotting the correlation graph with the found parameters: subplot(numGenesToPlot,1,i); plot(XdataPoints,correl,'ob',… XdataPoints,init+ExpParam(1)*exp((XdataPoints).*ExpParam(2)),'r','LineWidth',2); set(gca,'XTick',XdataPoints,'XTickLabel',geneList(XdataPoints),'FontSize',12); set(gca,'YLim',[0 max(correl)+0.1]); title(sprintf('%s Correlation Data, Fit parameters: c1=%2.2f, c2=%2.2f,… Displacement=%2.2f ',geneList{i},ExpParam(1),ExpParam(2),initDis),'FontSize',14); end Plotting with dots, each subplots with its own genes names and curvefit parameters:

30 Step 3: Plot the correlation data and fit

31 Best of Luck in the Group Meeting !

32 Best of Luck in the Group Meeting ! (and exam )

33 What did we learn? Matlab syntax Array manipulation, Cells, Structures Programming: Functions Writing efficient code Files & strings manipulation Data analysis and Signal Processing

34

35 This is the end, my friend, the end "Louis, I think this is the beginning of a beautiful friendship."