Tutorial: Expression analysis part Ⅰ~ Ⅳ 2009 – 03- 05 김 경 의
Importing array data NCBI Gene Expression Omnibus(GEO) database에서 data set download :http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE6943&targ=gsm&form=text&view=data 데이터 다운로드 후 원하는 디렉토리에 저장하고 Toolbar에서 파일을 Import한다. ^SAMPLE = GSM160089 #ID_REF = #VALUE = GCOS signal #ABS_CALL = Present/absent per Affy software !sample_table_begin ID_REF VALUE ABS_CALL
Import Annotation file Affymetrix web site: http://www.affymetrix.com RAE230A를 검색하여 annotation file을 다운로드합니다.
Toolbox | Expression Analysis | Set up Experiment Grouping the samples Toolbox | Expression Analysis | Set up Experiment
Defining the number of groups
Defining the number of groups Group을 Delete 할 수 있고, Add New Group을 이용하여 추가 할 수도 있음
Naming the groups
Assigning the samples to groups First 6 samples right-click and select Heart, Select the last 6 samples, right-click and select Diaphragm
The experiment table Total present count : The number of present calls for all samples. IQR-Expression values: The interquartile range for all samples.
Annotation level
Add annotations, Create experiment, Download sequence Add Array Annotations: Adding array annotations Create Experiment from selection: creating a sub-experiment from a selection Download Sequence: Downloading sequences from the experiment table
Toolbox | Expression Analysis | General Plots | Create MA Plot Transformation Toolbox | Expression Analysis | General Plots | Create MA Plot
Scatter plot view of an experiment , Inside , Major ticks X axis Y axis
MA plot before transformation M : log-intensity ratio =log₂R - log₂G A : mean log-intensity = (log₂R + log₂G)/2 M과 A값을 이용한 Plotting은 위의 log₂R 과 log₂G를 이용한 plot을 45° 회전시킨 plot으로 0값을 기준선으로 gene data를 관찰
Transformation Toolbox | Expression Analysis | Transformation and Normalization | Transform
Normalization Toolbox | Expression Analysis | Transformation and Normalization | Normalize Select a number of samples or an experiment and click Next
Choose normalization method
Normalization settings
MA plot after transformation
Comparing spread and distribution Toolbox | Expression Analysis | Quality Control | Create Box Plot
Box plot of the 12samples in the experiment
Toolbox | Expression Analysis | General Plots | Create Histogram
Selecting which values the histogram should be based on Show Table
Table view of a histogram
Group differentiation Toolbox | Expression Analysis | Quality Control | Principal Component Analysis
Principal component analysis colored by group
Dot properties | select GSM160090 in the drop-down box | Show names Naming the outlier Dot properties | select GSM160090 in the drop-down box | Show names
Hierarchicla clustering Toolbox | Expression Analysis | Quality Control | Hierarchical Clustering of Samples Leave the parameters at their default and click Finish Euclidean distance 1 – Pearson correlation Manhattan distance Single linkage Average linkage Complete linkage
Sample clustering
Result of hierarchical clustering of samples Show Heat Map
Feature clustering Toolbox | Expression Analysis | Feature Clustering | Hierarchical Clustering of Features
Parameters for hierarchical clustering of features Euclidean distance 1 – Pearson correlation Manhattan distance Single linkage Average linkage Complete linkage
Hierarchical clustering of features
K-means/medoids clustering Toolbox | Expression Analysis | Feature Clustering | K-means/medoids Clus-tering
Parameters for k-means/medoids clustering
Parameters for k-means/medoids clustering
Five clusters created by k-means/medoids clustering
Statistical analysis – T-tests Toolbox | Expression Analysis | Statistical Analysis | Statistical Analysis
Statistical analysis – ANOVA Two groups 이상 선택했을 경우
Corrected p-values
FDR p-values compared to Bonferroni-corrected p-values
Filtering on FDR p-values
Inspecting the volcano plot Ctrl key를 누르고 volcano plot을 누르면 두개의 view가 나타난다. 선택 된 데이터에 대해서는 dot이 붉은색으로 표현된다.
Filtering absent/present calls and fold change Add search criterion (+) button을 누르면 criteria를 추가할 수 있다. Filtering genes where at least 5 out of 6 calls in each group are present. The absolute value of group mean difference should be larger than 2
Saving the gene list
New experiment Save
Processes that are over-represented in the small list Toolbox | Expression Analysis | Annotation Test | Hypergeometric Tests on Annotations Highest IQR: the feature with the highest interquartile range(IQR) is kept Highest value: the feature with the highest expression value is kept
The result of testing on GO biological process
Gene Set Enrichment Analysis (GSEA) Toolbox | Expression Analysis | Annotation Test | Gene Set Enrichment Analysis(GSEA) Original full experiment select
Gene set enrichment analysis based on GO biological process
The result of a gene set enrichment analysis based on GO biological process
Toolbox | Annotations test | Add Array Annotations
Download Sequence Select 한 개수 만큼 sequence를 download 할 수 있습니다.
Created sequence 선택한 개수만큼 sequence 생성
Saving sequence Sequence name을 하나씩 드래그하여 Navigation Area에 저장합니다.
Toolbox | BLAST Search | NCBI BLAST 방금 저장한 sequence를 선택 3개의 sequence를 한번에 BLAST Search 할 수 있음
Choose program and database
BLAST Search result