Download presentation
Presentation is loading. Please wait.
1
1/18 Bioinformatics tools and techniques Into the heart of darkness Elaine Kenny Colm O’Dushlaine 15/11/07
2
2/18 Summary Simple overviews of some of the tools and methods used by EK and CO’D TK notebook get_hapmap_snps.pl: retrieve HM genotype information for a list of SNPs GeneViewer.pl & cross_ref.pl: visualise e.g. SNPs in the context of other genomic landmarks. Score SNPs depending on how many of these landmarks they overlap with ld_expander.pl: find SNPs in LD with SNPs of interest, based on user-specified r 2 and “LD window” (distance between SNPs) STATA VIM: command line text editor Lab website
3
3/18 TK notebook Application for saving notes, to-do lists, daily logs, and any other kind of textual information in a place where you can find it all again, and where related information is easily found Easy to edit and rapidly searchable DEMO – editing DEMO – search
4
4/18 get_hapmap_snps.pl Simple script to read in a 1-column list of SNPs and retrieve HapMap genotypes Can select population and strand DEMO Retrieved data can be loaded into HaploView DEMO
5
5/18 cross_ref_scored.pl Score SNPs based on how many putatively functional regions they overlap with: On a per gene / chromosome basis Gene basis: Type: perl cross_ref_scored.pl file_A file_B file_C... where file_A - 2-column file of SNPs (format = id, location) file_B - 3-column file of EXONS (format = id/name, start, stop) file_C... - whatever you want, (format = id/name, start, stop) i.e. other regions like CpGs, TFBS, clusters. Any order. …
6
6/18 cross_ref_scored.pl example output: Can then be merged with HapMap / Perlegen to retrieve MAF data for SNPs
7
7/18 Merge cross_ref_scored data with HapMap/ Perlegen data using merge_per_hap.pl Type: perl merge_per_hap.pl perlegen.txt hapmap.txt overlapped_region_scored.txt Where: hapmap.txt = 3-column file (format: rsid, ref_allele, ref_allele_freq), perlegen.txt = 3-column file (format: rsid, ref_allele, ref_allele_freq)
8
8/18 cross_ref.pl applied to WGA data cross_ref.pl: Scoring SNPs throughout genome Data analysed on coding/non-coding basis (coding) perl cross_ref.pl Overlapped_regions_scored.WTCCC.chr22.coding.txt 22 WTCCC_T2D_chr22_without_inferred.forCrossRef WGA_databases/coding_non_synon_SNPs_UCSC.clean=3 WGA_databases/coding_synon_SNPs_UCSC.clean=2 WGA_databases/RefSeq_Genes_UCSC.byExon.uniqid=1 WGA_databases/Triplexes_may2006.bed=2 WGA_databases/splice_site_SNPs_UCSC.clean=2 > Overlapped_regions_scored.WTCCC.chr22.coding.log & (input-dependent, coding/non-coding dependent, arbitrary) (noncoding) perl cross_ref.pl Overlapped_regions_scored.WTCCC.chr22.NONcoding.txt 22 WTCCC_T2D_chr22_without_inferred.forCrossRef WGA_databases/TFBS.chr22=1 WGA_databases/CpG_islands_UCSC.uniqid=1 WGA_databases/Most_conserved_phastConsElements17way_UCSC.clean=1 WGA_databases/promoters_knowngene_hg18.txt=1 WGA_databases/sno_or_miRNA_UCSC.uniqid=1 > Overlapped_regions_scored.WTCCC.chr22.NONcoding.log &
9
9/18 cross_ref.pl cross_ref.pl output: Load into STATA. If SNPs have e.g. association p-values, calculate adjusted p- value (R. Anney) as -log 10 [P] + [cross_ref_score]
10
10/18 GeneViewer.pl GeneViewer.pl: Visualise overlapping features (e.g. exons, SNPs etc.) along e.g. your gene of interest (html output)
11
11/18 ld_expander.pl Find proxies (SNPs in LD) for a list of SNPs User specifies the r 2 and “LD window” Currently configured to obtain proxies from HM CEU Result is a list of additional proxy SNPs that have been obtained by LD expansion DEMO Note: don’t LD expand >150000 SNPs, or HapMap will ban you! CO’D has an alternative version that uses local pre-computed pairwise LD SNP files
12
12/18 STATA Extremely powerful and flexible >65k rows handled – shock horror! Can write scripts to automate tasks, e.g. read in file, do analysis, save results When use GUI to run some commands, the commands are shown in the command window, so can save in a do file CO’D, EK and R. Anney strongly advocate this as a platform for both file manipulation and statistical analysis
13
13/18 http://www.wtccc.org.uk/ STATA example using WTCCC data Bipolar Disorder, Coronary Artery Disease, Crohn's Disease, Hypertension, Rheumatoid Arthritis, Type 1 Diabetes, Type 2 Diabetes
14
14/18 DATA FORMAT 3 folders: Basic Each case collection against the pooled control groups 58C and UKBS Combined cases Combining other case collections as controls Combined controls Combining phenotypically relevant case collections (e.g. RA/T1D, autoimmune ) Data are split by chromosome
15
15/18 Questions How do I get all of the chromosome data for my gene of interest into one file? How do I search easily all of the SNP information for my gene(s) of interest? Create a “.do” file for all manipulations that you want to carry out to the data DEMO Good starting resource: http://www.ats.ucla.edu/stat/stata/ http://www.ats.ucla.edu/stat/stata/
16
16/18 VIM “Vi Improved”. Mainly UNIX but cross- platform text editor (available for Windows). Full list of commands outside scope of this demonstration Very fast and efficient, esp. with search and replace functions on large datasets Regular expression pattern matching DEMO Integrates with Cygwin (www.cygwin.com – very useful UNIX emulator for windows)www.cygwin.com
17
17/18 Group website Some useful stuff up there! Please send information about current projects etc. Good for our image as a group and minimal effort required on your part DEMO
18
18/18 Conclusions Small summary of some things you can do Slides and video demonstrations will be online at: http://www.medicine.tcd.ie/psychiatry/research/neurop sychiatry/Protocols/ http://www.medicine.tcd.ie/psychiatry/research/neurop sychiatry/Protocols/ CO’D & EK available for advice (Friday’s 9-9.02am) These things will help you in your work!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.