Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT.

Similar presentations


Presentation on theme: "Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT."— Presentation transcript:

1 Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard

2 Background  geWorkbench makes extensive use of the notion of sets: it allows the full set of markers or arrays/phenotypes to be divided into different subsets.  The multiple different subsets of the data allows the same data to be characterized and analyzed in different ways in geWorkbench.

3 geWorkbench offers two different way to group data: 1.Individual markers or arrays can be grouped into sets : ►Sets can be defined by the user, or may be created as a result of an analysis. ►Sets of arrays can be used to distinguish between different experimental states, for example as part of a statistical analysis. ♦The t-test requires two states, represented by sets, be defined for comparison. ►Sets of markers are returned from various analysis routines. For example the t-test returns a list of markers showing significant differential expression, and after hierarchical clustering, the markers in a subtree of the resulting dendrogram can be saved. 2.Sets of markers or arrays are grouped into collections. A collection named “Default” is automatically created by geWorkbench. Different types of data grouping

4 Overview ►How to create a set of markers or arrays. ► How to mark a set of arrays as "Active“. ► How to classify a set of arrays, e.g. as "case" vs. "control". ► How to deactivate a data set from data analysis. ►How to group markers or arrays in different ways with descriptive tags. In this presentation you will learn

5 Sets of Markers or Arrays Overview  Individual markers (genes) or arrays can be grouped into Set.  A Set of markers or array can be used to dissect the potentially massive expression data into more manageable chunks.

6 Sets of Markers or Arrays Sample data  To demonstrate how to create sets of markers or arrays, we will use the samples data from a congestive cardiomyopathy experiment, which are found in geWorkbench tutorial data section: http://wiki.c2b2.columbia.edu/workbench/index.php/Tutorial_-_Data  we will load 10 individual Affymetrix MAS5 format files (all beginning with JB-) and merge them into a single dataset as our sample data.

7 Sets of Markers or Arrays Sample data preparation 2. Next, right-click on the new Project entry and select Open Files. 1.Create a Project. All data must belong to a project. Right-click on the Workspace entry in the Project Folders window at upper left to create a new project. To load the sample data, following steps below:

8 Sets of Markers or Arrays Sample data preparation 3. Select file type Affymetrix MAS5/GCOS as shown. 4. Make sure to check the Merge files checkbox. 5. Select 10 MAS5 format text files from the tutorial data directory. 6. Click Open. The chip type HG_U95Av2 is recognized...

9 The merged dataset is now listed in the Project folder. The data is displayed, in single array format, in the Microarray Viewer. Note we have increased the intensity slider to maximum here. Sets of Markers or Arrays Sample data preparation

10 In this example, we will create two sets of array data for disease and normal states and leave them in the Default collection. 1.In the Arrays/Phenotypes component, select the six arrays beginning with JB-ccmp, which represent the samples from the congestive cardiomyopathy disease state. 2. Right click, select Add to Set. Sets of Markers or Arrays Assigning arrays to sets 1 2 First Select and label arrays which contain samples from the congestive cardiomyopathy disease state:

11 3. Enter "CCMP" in the input box and click OK. 4. Next, similarly label the arrays beginning with JB-n as "Normal“. The Array/Phenotype Sets component will now show the two sets added: 4 Sets of Markers or Arrays Assigning arrays to sets 3

12 Sets of Markers or Arrays Activating sets The boxes next to the set name can be checked to indicate that a set of arrays is "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers. Note – if no Array sets are explicitly activated, then all Array are implicitly active. The same applies to Marker.

13 For statistical tests such as the t-test, Case and Control groups can be specified. 1. Left-click on the thumb-tack icon in front of the phenotype name. 2. Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default considered Control. Sets of Markers or Arrays Classifying data set for statistical tests 1 2

14 3. A red thumbtack indicates an array set has been marked as "Case". 3 Sets of Markers or Arrays Classifying dataset for statistical tests

15 Sets of Markers or Arrays Deactivate a data set  To deactivate a set, click on the set and the selected set will be highlighted. Then perform one of the following actions:  Right-clicking on the set and then select Deactivate  Unselecting the checkbox next to the set  Through the main menu, select Commands Panel> Deactivate Panel

16 Collection of Sets Overview There could be different grouping requirements of the same arrays in the Arrays/Phenotypes and Marker components. geWorkbench uses Collections to hold sets of arrays or markers to facilitate a better data management.  Different collection of sets can be made, both for Markers and for Arrays. They may differ in membership or in how members are named (e.g. amount of detail).  The collection of sets in geWorkbench offers a highly efficient way for users to manage sets of data with descriptive tags.

17 Collection of Sets Creating a new collection Both Marker and Array/Phenotypes tab have two sections in the GUI: the upper frame lists the full data set, and the lower frame lists any user-defined groupings. geWorkbench automatically creates a default collection “Default ” to hold sets of data. To create a new collection for the array, click on the New button on Array/Phenotype Sets located at the lower left in the application (arrow labeled New). The drop down collection list (arrow on the left) will be updated to reflect the addition in the collection.. New

18 Collection of Sets Examples of array collections ►Here we show how several different collections are defined in the example data file " Bcell-100.exp ”, which can be found in geWorkbench’s tutorial data ( Bcell-100.zip). ( http://wiki.c2b2.columbia.edu/workbench/index.php/Tutorial_-_Data ) http://wiki.c2b2.columbia.edu/workbench/index.php/Tutorial_-_Data ►After loading this file into geWorkbench as type "Affymetrix File Matrix", four collections of sets can be seen in the Arrays/Phenotypes group pull-down menu at right.

19 If we choose the collection called "Class", the sets of arrays at right are displayed: Collection of Sets Examples of array collections

20 If instead we choose the collection “Source detailed", a different collection of sets of the same arrays is seen: Collection of Sets Examples of array collections

21 Need More Information? NCI is developing an extensive knowledge base to support various NCI molecular analysis tools. Visit us at NCI’s Molecular Analysis Tool Knowledge center at: https://cabig-kc.nci.nih.gov/MediaWiki/index.php/Main_Page. https://cabig-kc.nci.nih.gov/MediaWiki/index.php/Main_Page For more information on how to use geWorkbench, please visit NCI Knowledge Center, geWorkbench section at : https://cabig- kc.nci.nih.gov/Molecular/KC/index.php/GeWorkbench.geWorkbench section https://cabig- kc.nci.nih.gov/Molecular/KC/index.php/GeWorkbench Have a geWorkbench related question? Find the answers in geWorkbench FAQ section at: https://cabig- kc.nci.nih.gov/Molecular/KC/index.php/GeWorkbench_FAQ.geWorkbench FAQ https://cabig- kc.nci.nih.gov/Molecular/KC/index.php/GeWorkbench_FAQ New more helps? Post it in geWorkbench Forum at : https://cabig- kc.nci.nih.gov/Molecular/forums/viewforum.php?f=3.geWorkbench Forum https://cabig- kc.nci.nih.gov/Molecular/forums/viewforum.php?f=3


Download ppt "Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT."

Similar presentations


Ads by Google