1/18 Bioinformatics tools and techniques Into the heart of darkness Elaine Kenny Colm O’Dushlaine 15/11/07.

Slides:



Advertisements
Similar presentations
With Folder HelpDesk for Outlook, support centres and other helpdesks can work efficiently with support cases inside Microsoft Outlook. The support tickets.
Advertisements

With Public Folder HelpDesk for Outlook, support centres and other helpdesks can work efficiently with support cases inside Microsoft Outlook. The support.
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 9 Strings.
Sharpdesk Overview Desktop Composer Search Imaging      
The Maize Inflorescence Project Website Tutorial Nov 7, 2014.
Refworks Part I. How can I access Refworks Refworks can be accessed from: – The homepage of the Jotello F Soga Library (
Variant Calling Workshop Chris Fields Variant Calling Workshop v2 | Chris Fields1 Powerpoint by Casey Hanson.
Calendar Browser is a groupware used for booking all kinds of resources within an organization. Calendar Browser is installed on a file server and in a.
Using HapMap.Org A Tutorial Lincoln Stein, Cold Spring Harbor Laboratory.
TD Ameritrade IT audit intern Ramez Mina. Position definition Department head  IT audit intern Managers  system analyst and developer to build automated.
:NEUROPSYCHIATRIC GENETICS [BIOSTATISTICS|BIOINFORMATICS] CORE BIOSTATISTIC/BIOINFORMATIC TOOLS FOR GENETICS DATA: DATA MANAGEMENT AND ANALYSIS RICHARD.
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson NIEHS SNPs Workshop.
Guide To UNIX Using Linux Third Edition
Matt Masson| Senior Program Manager
SNPs DNA differs between humans by 0.1%, (1 in 1300 bases) This means that you can map DNA variation to around 10,000,000 sites in the genome Almost all.
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
DEiXTo.
November 2011 At A Glance GREAT is a flexible & highly portable set of mission operations analysis tools that increases the operational value of ground.
Linkage Analysis in Merlin
Supervisor: Yihong Jennifer Tan Eric Gähwiler Karim Hamidi
Advanced File Processing
Polymorphism and Variant Analysis Lab
Selecting and Combining Tools F. Duveau 02/03/12 F. Duveau 02/03/12 Chapter 14.
Automated Data Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Data Management Training Cairo, Egypt April.
Trinity College Dublin, The University of Dublin A Brief Introduction to Scientific Programming with Python Karsten Hokamp, PhD TCD Bioinformatics Support.
Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files.
Guide To UNIX Using Linux Fourth Edition
GCSE Information and Communications Technology. Assessment The course is split into 60% coursework and 40% exam You will produce coursework in year 10.
Web Indexing and Searching By Florin Zidaru. Outline Web Indexing and Searching Overview Swish-e: overview and features Swish-e: set-up Swish-e: demo.
Introduction to R Lecture 1: Getting Started Andrew Jaffe 8/30/10.
Galaxy: Integrative, Reproducible Analysis of Genomics Data Genomic and Proteomic Approaches to Heart, Lung, Blood and Sleep Disorders Jackson Laboratories.
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
Computer Programming for Biologists Class 10 Dec 5 th, 2014 Karsten Hokamp
Polymorphism & Variant Analysis Lab Saurabh Sinha Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 1 Powerpoint by Casey Hanson.
Organizing a project, making a table Biostatistics 212 Session 5.
Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110.
Guide to Programming with Python Chapter One Getting Started: The Game Over Program.
Organizing a project, making a table Biostatistics 212 Lecture 7.
Introduction to the Gramene Genetic Diversity module 5/2010 Build #31.
ParSNP Hash Pipeline to parse SNP data and output summary statistics across sliding windows.
Design Verification Code and Toggle Coverage Course 7.
11/25/2015Slide 1 Scripts are short programs that repeat sequences of SPSS commands. SPSS includes a computer language called Sax Basic for the creation.
C OMPUTING E SSENTIALS Timothy J. O’Leary Linda I. O’Leary Presentations by: Fred Bounds.
NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 Slides from Michael Dicuccio’s Genome Workbench.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Comparison of different output options from Stata
MDPHnet & ESP Data Partner Participation Overview The following slides describe the necessary steps for a data partner to participate in the MDPHnet Network.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Genome Wide Haplotype analyses of human.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Copyright OpenHelix. No use or reproduction without express written consent1.
Resources at HapMap.Org HapMap3 Tutorial Marcela K. Tello-Ruiz Cold Spring Harbor Laboratory.
Expression Analysis of the Sphingolipid Metabolism Gene Extraction: Pathway Modification: Branch Addition: Gene Addition: Data Formatting Download GenMAPP.
1 Bioinformatics Tools for Genotyping Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics.
The Simple Corpus Tool Martin Weisser Research Center for Linguistics & Applied Linguistics Guangdong University of Foreign Studies
Refworks Part I.
Integrated technology
Prepared by Kimberly Sayre and Jinbo Bi
Using SQL*Plus.
ECONOMETRICS ii – spring 2018
Introduction to Data Formats and tools
Integrated technology
Integrated technology
MATLAB – What Is It ? Name is from matrix laboratory Powerful tool for
MATLAB – What Is It ? Name is from matrix laboratory Powerful tool for
Chapter Four UNIX File Processing.
Smart Integration Express
A Tutorial Lincoln Stein, Cold Spring Harbor Laboratory
Using SQL*Plus.
Use Cases Simple Machine Translation (using Rainbow)
Presentation transcript:

1/18 Bioinformatics tools and techniques Into the heart of darkness Elaine Kenny Colm O’Dushlaine 15/11/07

2/18 Summary Simple overviews of some of the tools and methods used by EK and CO’D TK notebook get_hapmap_snps.pl: retrieve HM genotype information for a list of SNPs GeneViewer.pl & cross_ref.pl: visualise e.g. SNPs in the context of other genomic landmarks. Score SNPs depending on how many of these landmarks they overlap with ld_expander.pl: find SNPs in LD with SNPs of interest, based on user-specified r 2 and “LD window” (distance between SNPs) STATA VIM: command line text editor Lab website

3/18 TK notebook Application for saving notes, to-do lists, daily logs, and any other kind of textual information in a place where you can find it all again, and where related information is easily found Easy to edit and rapidly searchable DEMO – editing DEMO – search

4/18 get_hapmap_snps.pl Simple script to read in a 1-column list of SNPs and retrieve HapMap genotypes Can select population and strand DEMO Retrieved data can be loaded into HaploView DEMO

5/18 cross_ref_scored.pl Score SNPs based on how many putatively functional regions they overlap with:  On a per gene / chromosome basis Gene basis:  Type: perl cross_ref_scored.pl file_A file_B file_C... where file_A - 2-column file of SNPs (format = id, location) file_B - 3-column file of EXONS (format = id/name, start, stop) file_C... - whatever you want, (format = id/name, start, stop) i.e. other regions like CpGs, TFBS, clusters. Any order. …

6/18 cross_ref_scored.pl example output: Can then be merged with HapMap / Perlegen to retrieve MAF data for SNPs

7/18 Merge cross_ref_scored data with HapMap/ Perlegen data using merge_per_hap.pl Type: perl merge_per_hap.pl perlegen.txt hapmap.txt overlapped_region_scored.txt Where: hapmap.txt = 3-column file (format: rsid, ref_allele, ref_allele_freq), perlegen.txt = 3-column file (format: rsid, ref_allele, ref_allele_freq)

8/18 cross_ref.pl applied to WGA data cross_ref.pl: Scoring SNPs throughout genome Data analysed on coding/non-coding basis (coding) perl cross_ref.pl Overlapped_regions_scored.WTCCC.chr22.coding.txt 22 WTCCC_T2D_chr22_without_inferred.forCrossRef WGA_databases/coding_non_synon_SNPs_UCSC.clean=3 WGA_databases/coding_synon_SNPs_UCSC.clean=2 WGA_databases/RefSeq_Genes_UCSC.byExon.uniqid=1 WGA_databases/Triplexes_may2006.bed=2 WGA_databases/splice_site_SNPs_UCSC.clean=2 > Overlapped_regions_scored.WTCCC.chr22.coding.log & (input-dependent, coding/non-coding dependent, arbitrary) (noncoding) perl cross_ref.pl Overlapped_regions_scored.WTCCC.chr22.NONcoding.txt 22 WTCCC_T2D_chr22_without_inferred.forCrossRef WGA_databases/TFBS.chr22=1 WGA_databases/CpG_islands_UCSC.uniqid=1 WGA_databases/Most_conserved_phastConsElements17way_UCSC.clean=1 WGA_databases/promoters_knowngene_hg18.txt=1 WGA_databases/sno_or_miRNA_UCSC.uniqid=1 > Overlapped_regions_scored.WTCCC.chr22.NONcoding.log &

9/18 cross_ref.pl cross_ref.pl output: Load into STATA. If SNPs have e.g. association p-values, calculate adjusted p- value (R. Anney) as -log 10 [P] + [cross_ref_score]

10/18 GeneViewer.pl GeneViewer.pl: Visualise overlapping features (e.g. exons, SNPs etc.) along e.g. your gene of interest (html output)

11/18 ld_expander.pl Find proxies (SNPs in LD) for a list of SNPs User specifies the r 2 and “LD window” Currently configured to obtain proxies from HM CEU Result is a list of additional proxy SNPs that have been obtained by LD expansion DEMO Note: don’t LD expand > SNPs, or HapMap will ban you! CO’D has an alternative version that uses local pre-computed pairwise LD SNP files

12/18 STATA Extremely powerful and flexible >65k rows handled – shock horror! Can write scripts to automate tasks, e.g. read in file, do analysis, save results When use GUI to run some commands, the commands are shown in the command window, so can save in a do file CO’D, EK and R. Anney strongly advocate this as a platform for both file manipulation and statistical analysis

13/18 STATA example using WTCCC data Bipolar Disorder, Coronary Artery Disease, Crohn's Disease, Hypertension, Rheumatoid Arthritis, Type 1 Diabetes, Type 2 Diabetes

14/18 DATA FORMAT 3 folders:  Basic Each case collection against the pooled control groups 58C and UKBS  Combined cases Combining other case collections as controls  Combined controls Combining phenotypically relevant case collections (e.g. RA/T1D, autoimmune ) Data are split by chromosome

15/18 Questions How do I get all of the chromosome data for my gene of interest into one file? How do I search easily all of the SNP information for my gene(s) of interest?  Create a “.do” file for all manipulations that you want to carry out to the data DEMO Good starting resource:

16/18 VIM “Vi Improved”. Mainly UNIX but cross- platform text editor (available for Windows). Full list of commands outside scope of this demonstration Very fast and efficient, esp. with search and replace functions on large datasets Regular expression pattern matching DEMO Integrates with Cygwin ( – very useful UNIX emulator for windows)

17/18 Group website Some useful stuff up there! Please send information about current projects etc. Good for our image as a group and minimal effort required on your part DEMO

18/18 Conclusions Small summary of some things you can do Slides and video demonstrations will be online at: sychiatry/Protocols/ sychiatry/Protocols/ CO’D & EK available for advice (Friday’s am) These things will help you in your work!!