Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, 18-24 September 2007. 4-3 Chip-chip and handling.

Slides:



Advertisements
Similar presentations
Excel Review – Part 2 (Tutorials 6-10) Some information contained from Tutorials 1-5 also Excel Review – Part 2 (Tutorials 6-10) Some information contained.
Advertisements

1 So, what happens here? ABCDE =(A5+B4)*A7 9 =(B2+C1)*B4.
DATABASE BASICS: INSERTING AND FORMATTING DATA EXCEL 07 SESSION II.
Jeopardy Objects Navigation Buttons True/False Parts of a Report Vocabulary Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $200 Q $300 Q $400 Q $500 Final.
Using Microsoft ® Excel Formulas and Functions Start Microsoft ® Excel. Type data into cells as shown.
MIS: Chapter 14 Cumulative concepts, features and functions, plus new functions COUNTIFS, SUMIFS, AVERAGEIFS (Separate ppt on REACH.louisville.edu) All.
Microsoft Excel The Basics. spreadsheet A type of application program which manipulates numerical and string data in rows and columns of cells. The value.
Pre-defined System Functions Simple IF & VLOOKUP.
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
Chapter 5 Creating, Sorting, and Querying a Table
Exploring Microsoft Excel 2002 Chapter 7 Chapter 7 List and Data Management: Converting Data to Information By Robert T. Grauer Maryann Barber Exploring.
Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Introduction to examination.
Tutorial 7: Using Advanced Functions and Conditional Formatting
Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September 2008 Elisabet Andersson, Alistair Chalk Stem Cell Biology and Bioinformatic.
Chapter 7 Data Management. Agenda Database concept Import data Input and edit data Sort data Function Filter data Create range name Calculate subtotal.
Physical Mapping II + Perl CIS 667 March 2, 2004.
Mary K. Olson PS Reporting Instance – Query Tool 101.
Tips and Tricks with Excel Bingbing Yuan Dec. 8, 2008.
Tutorial 8: Working with Advanced Functions
Excel for Property Professionals Kyleigh Perkins, CPPS.
Working with the Conifer_dbMagic database: A short tutorial on mining conifer assembly data. This tutorial is designed to be used in a “follow along” fashion.
DAY 6: EXCEL CHAPTER 2 Tazin Afrin September 05,
AGB 260: Agribusiness Information Technology Advanced Functions and Logic.
Computational Skills Course week 1 Mike Gilchrist NIMR May-July 2011.
Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Bioinformatics Primer.
Chapter 10: Working with Large Data Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
CTS130 Spreadsheet Lesson 3 Using Editing and Formatting Tools.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
An Introduction to Designing, Executing and Sharing Workflows with Taverna Nowgen, Next Gen Workshop 17/01/2012.
Miscellaneous Excel Combining Excel and Access. – Importing, exporting and linking Parsing and manipulating data. 1.
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Analyzing Data For Effective Decision Making Chapter 3.
Tutorial session 2 Network annotation Exploring PPI networks using Cytoscape EMBO Practical Course Session 8 Nadezhda Doncheva and Piet Molenaar.
® Microsoft Office 2010 Access Tutorial 3 Maintaining and Querying a Database.
 Agenda: 4/24/13 o External Data o Discuss data manipulation tools and functions o Discuss data import and linking in Excel o Sorting Data o Date and.
UBio Training Courses Micro-RNA web tools Gonzalo
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
Searching and Sorting. Why Use Data Files? There are many cases where the input to the program may come from a data file.Using data files in your programs.
Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Day 5-2 What bioinformatics.
Access Chapter 5-Table Tricks, Advanced Queries and Custom Forms.
ID Mapping to accessions from different databases. COST Functional Modeling Workshop April, Helsinki.
CREATING DATABASE Presenter: Jolanta Soltis. When to use Excel Use Excel when you: –Require a flat or non-relational view of your data (you do not need.
GE3M25: Computer Programming for Biologists Python, Class 5
Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Day 1 – 3 Introduction.
Using Microsoft® Excel This presentation is designed for Chapter 1, Section 1.2.
GET 236: E NTERPRISE DATA ANALYSIS : TOOLS AND TECHNIQUES W EEK 03: I MPORT AND V ALIDATE DATA.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
ORAFACT Text Processing. ORAFACT Searching Inside Files grep - searches for patterns within files grep [options] [[-e] pattern] filename [...] -n shows.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Microsoft Excel 2007 Noris Bt. Ismail Faculty of Information and Communication Technology Tel : (Ext 8408) BCOMP0101.
Excel Tips and Tricks Leda Voigt Green River College.
Microarray Data Analysis Roy Williams PhD; Burnham Institute for Medical Research.
AGB 260: Agribusiness Information Technology Advanced Functions and Logic.
Pmt Function Vlookup Function Excel Database.  Naming a Range is very useful Makes copying more clear because formula or function contains a name rather.
Microsoft Excel Illustrated Introductory Workbooks and Preparing them for the Web Managing.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Lesson 5-Exploring Utilities
Spreadsheet – Microsoft Excel 2010
Miscellaneous Excel Combining Excel and Access.
2007 MICROSOFT EXCEL INTERMEDIATE
ID Mapping tools: Converting Accessions between Databases
Access Tutorial 8 Sharing, Integrating, and Analyzing Data
Intro to Excel CSCI-150.
Pivot tables and charts
MS Excel – Analyzing Data
The Basics of Excel Part I Monday, April 3rd 2017
// Data visualisation: Generating maps with ESPON mapkit and open data
Presentation transcript:

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Chip-chip and handling large datasets exercises You don't need to complete all exercises for this section. Appreciate what formulas can do for you in your current analysis

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Section 1: Excel

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Excel referencing =A3 : The value in A3 =A3/A4 : A3 divided by A4 A B C 1 3 2

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Excel formulas Common excel formulas =IF(condition,value if true, value if false)‏ =mid(text,start,end)‏ =left()‏ =right()‏ Database and lookup =vlookup(key,table,result_column,exactmatch?)‏ =find()‏ Formula names are different in swedish versions of excel! There are hundreds of formulas!

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Excel formula copy $ = fixed position $A$2 – this won't change during copying of formulas use $ before row/column to keep it constant  $A1 : A stays fixed  $A$1 : A and 1 stays fixed  A$1 : 1 stays fixed A B C 1 3 2

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Data lookup in excel VLOOKUP command Find row information by matching the identifier. Used to combine datasets Data table Subset of interest Search Resulting data for subset

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Excel exercise 1 – excel Open the “shoes.txt” example (from day one) in excel  a) calulate mean, Lower Quartile, Upper Quartile for height. (You must put the calculations in column B)‏  b) Do the same for shoe size by copying the cells from column A  c) in Column F write “boden flicka” if the person if from boden and is a girl using a formula.  d) modify c) to write a description of people who are not “boden flicka” Open “cities.txt”  use vlookup to add population size information to the table in “shoes”

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Complex formulas Formulas can be combined Give me the first 4 letters in the sequence only if it contains a GGGG motif =IF(find(D1,“GGGG”),left(D1,4), FALSE)‏ Give me the D2 only for sequences that contain a GGGG motif =IF(find(D1,“GGGG”),D2, FALSE)‏

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Excel exercise 2 – working with sequences using formulas download a sequence from refseq in FASTA format (i.e. NM_001024)‏ Find all 25 mers for that sequence  Hint: Use the mid() command How many contain the following motif AAGCG (exact match)‏  Hint: Use find() command How many start with the following motif AAGCG‏? What is the length of your sequence? (use excel formulas only)‏

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Excel exercise 3 – vlookup for genomic data “Where are the genes with specific GO terms in the human genome?” Download the known genes table for human from ensembl bioMart  1) 1 table with all genes and chromosome positions  2) 1 table with genes with a GO term you are interested in Load data into excel Use vlookup() to find genomic coordinates for the genes from 2)‏ Use IF statements to find all genes in chromosome 1 within positions and

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Excel exercise 4 – vlookup for miRNA data miRNA analysis “What are the properties of the targets of my expressed miRNA” Download the known targets for miRNA from Load data into excel now download informaton about the targets from BioMART (ensMART)‏ How many targets are on Chromosome 1?

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September What to do when many-many relationships exist? excel is not well suited to many-many (without jumping through some hoops)‏ Solution  use unix or SQL databases Unix solutions  grep: looks for lines in a file that contain a specific pattern grep -e “NM_001024” filename looks for lines containing NM_ While we won't teach command line tools, we recommend them for handling large datafiles, sorting data, manipulating data, and filtering data to more meaningful datasets that can then be handled in excel.

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Section 2: Cytoscape

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September PluriNet exercises Got to and load the cytoscape visualisation of the PluriNetwww.stemcellmatrix.org Explore the pluriNet in Cytoscape –How many Nodes are in the network? –Export the complete node list, including all node information –Colour nodes based on cellular location

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Cytoscape exercises cont. Add additional data to the network from external sources –Download some Stem Cell gene expression data from ArrayExpress, integrate it into the network (do it with your own data if you have it)‏ Colour nodes based on this external data –Gene expression (up/down in a study)‏

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Todays exercises Add additional data to the network from –Characterizing the mouse ES cell transcriptome with Illumina sequencing Ruben Rosenkranza, Tatiana Borodinaa, Hans Lehracha and Heinz Himmelbauer Table 2: MGI symbolRefSeq IDTranscription specific forNo. of readsReads/kb Pou5f1 (Oct4)NM_013633Pluripotent stem cell NanogNM_028016Pluripotent stem cell Sox2NM_011443Pluripotent stem cell Sox1NM_009233Ectoderm21.65 Sox17NM_011441Endoderm72.24 T (Brachyury)NM_009309Mesoderm146.84

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Todays exercises From the nodes in the pluriNet, run them through a DAVID and GSEA enrichment analysis –What pathways do you find? –What is the difference in pathways between DAVID and GSEA? (if any)‏ –Find all the genes in the top pathways, add this information back into the cytoscape network –Color the network based on the pathway in the last question –What other pathways would be informative here?

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Section 3: Unix For reference only, no exercises in this section!

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September gawk programming I Delimiter : What seperates a column.  tab (“\t”)‏  comma (“,”)‏ Set delimiter to tab with FS=”\t” Column naming  $1 = column 1  $2 = column 2 ...

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September gawk structure BEGIN {}  process these commands before the file END {}  process these commands after the file /PATTERN/ {COMMANDS}  for each line containing this pattern, do the following COMMANDS

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September gawk programming II gawk 'BEGIN {FS = “\t”;}//{print $2”\t”$1;}' filename > filename.new  Swap column1 and column2 for all lines /^PREFIX/ {print $0}  Prints all lines starting with PREFIX /^PREFIX/ {print ($1+$2)}  Add column1 and column2 for all lines starting with PREFIX

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September gawk programming III gawk 'BEGIN {FS = “\t”;}/^PREFIX/{sum = sum + $2*$1;}END {print sum;}' filename > filename.new Print the sum of column1 * column2 for all lines starting with PREFIX

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September sed Replace AF00245 with NM_ in file  sed 's/AF00245/NM_001024/g;' filename > filename.new  s = substitute. g = global (all examples on line), omitting the g will only replace the first occurance on each line.  You can create pattern matching (example s/[0-9]//;).  Special characters such as. ] } ; must have a \ to delimit them

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Combining statements with | “pipe” Compound statements  gawk '//{print $2”\t”$1;}' filename | sed 's/Acc1/Acc2/g;s/\.[0-9]+//g;' > filename.new | = pipe (pipe the result to the next program in the command line. [0-9]+ : matches numbers (a string of characters between 0 and 9). [a-z] and [A-Z] are other examples.

Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September The bottom line For small-midrange dataset sizes analysis it is worthwhile to learn  excel For extensive data analysis it is worthwhile to learn  unix  SQL Where to find more information?  Excel online tutorials  Programming in gawk/perl/sed/python/ruby  Bioinformatics links / primers