Polymorphism & Variant Analysis Lab Saurabh Sinha Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 1 Powerpoint by Casey Hanson.

Slides:



Advertisements
Similar presentations
Intro to WinHex CSC 414.
Advertisements

The New User Interface MEDITECH Training & Education.
Using the SmartPLS Software
BST 775 Lecture PLINK – A Popular Toolset for GWAS
Guide to MCSE , Enhanced 1 Activity 14-1: Browsing Security Templates Objective: To become familiar with built-in security templates Start  Run.
Enrichment Map GSEA Tutorial
Installing geant4 v9.5 using Windows Daniel Brandt, 06 April 2012 Installing Geant4 v9.5 for Windows A step-by-step guide for Windows XP/Vista/7 using.
®® Microsoft Windows 7 for Power Users Tutorial 2 Customizing Microsoft Windows 7.
Variant Calling Workshop Chris Fields Variant Calling Workshop v2 | Chris Fields1 Powerpoint by Casey Hanson.
Using HapMap.Org A Tutorial Lincoln Stein, Cold Spring Harbor Laboratory.
:NEUROPSYCHIATRIC GENETICS [BIOSTATISTICS|BIOINFORMATICS] CORE BIOSTATISTIC/BIOINFORMATIC TOOLS FOR GENETICS DATA: DATA MANAGEMENT AND ANALYSIS RICHARD.
Lab 03 Windows Operating Systems (Cont.). PYP002 Preparatory Computer ScienceWindows Operating System2 Objectives Develop a good understanding of 1. The.
Installing geant4 v9.5 using Windows Daniel Brandt, 06 April 2012 Installing Geant4 v9.5 for Windows A step-by-step guide for Windows XP/Vista/7 using.
Business Objects For End Users BI_BOBJ_200 1BI_BOBJ_200 Business Objects for End Users.
RIMS II Online Order and Delivery System Tutorial on Downloading and Viewing Multipliers.
Linkage Analysis in Merlin
NGS Analysis Using Galaxy
TrendReader Standard 2 This generation of TrendReader Standard software utilizes the more familiar Windows format (“tree”) views of functions and file.
Polymorphism and Variant Analysis Lab
PLINK tutorial, December 2006; Shaun Purcell, PLINK gPLINK Haploview Whole genome association software tutorial Shaun Purcell.
Variant Calling Workshop Chris Fields Variant Calling Workshop | Chris Fields | PowerPoint by Casey Hanson.
CHAPTER 9 Introducing Microsoft Office Learning Objectives Start Office programs and explore common elements Use the Ribbon Work with files Use.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
Designing Interface Components. Components Navigation components - the user uses these components to give instructions. Input – Components that are used.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Tools Menu and Other Concepts Alerts Event Log SLA Management Search Address Space Search Syslog Download NetIIS Standalone Application.
Session Objectives • Login to PeopleSoft Test Framework(PTF)
An Introduction to CCP4i The CCP4 Graphical User Interface Peter Briggs CCP4.
Introduction to the Gramene Genetic Diversity module 5/2010 Build #31.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Chip-Seq Peak Calling in Galaxy | Lisa Stubbs | PowerPoint by Casey Hanson.
FastMap Tutorial. Installation Requirements Computer with 2GB of RAM JAVA Runtime Environment (JRE) 6 =1281.
FIX Eye FIX Eye Getting started: The guide EPAM Systems B2BITS.
® Microsoft Office 2010 Exploring the Basics of Microsoft Windows 7.
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
Review Windows XP/Vista/7. OS: Operating System The major tasks working on a operating system and Office 2010: Using GUI: The starting interface is desktop.
CMPF124: Basics Skills for Knowledge Workers Manipulating Windows GUI.
SPSS- Tutorial The following power-point slides show you how to use some of the features in SPSS. A survey of 20 randomly selected companies asked them.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Lab 0 / Chapter 0 Windows XP Environment. 2 User Interfaces: A different perspective.
GVS: Genome Variation Server Materials prepared by: Warren C. Lathe, PhD Updated: Q Version 2.
Using This PowerPoint This PowerPoint presentation assumes your Computer Science teacher has provided you with the InstallingJava folder, which contains.
PLINK / Haploview Whole genome association software tutorial
GenABEL: an R package for Genome Wide Association Analysis
® Microsoft Office 2010 Exploring the Basics of Microsoft Windows 7.
Copyright OpenHelix. No use or reproduction without express written consent1.
THE WINDOWS OPERATING SYSTEM Computer Basics 1.2.
Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions.
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
1 Berger Jean-Baptiste
Genetic mapping and QTL analysis - JoinMap and QTLNetwork -
Installing and Using Evolve Evolve is written in Java and runs on any system with Java 1.6 Download Evolve from Unzip.
Protein Sequence, Structure, and Function Lab Gustavo Caetano - Anolles Protein Sequence, Structure, and Function Lab v1 | Gustavo Caetano - Anolles 1.
Chapter 2 – Introduction to Windows Operating System II Manipulating Windows GUI 1CMPF112 Computing Skills for Engineers.
Fundamentals of Windows Mouse n 4 Basic Operations: –Pointing –Clicking –Double Clicking –Dragging.
Appendix A Introduction to Windows 7
Setting up Categories, Grading Preferences and Entering Grades
Lab 1 Introduction to ArcGIS Feb 17, 2016
Regulatory Genomics Lab
Variant Calling Workshop
Computer Modeling Fundamentals
Computer Modeling Fundamentals
Introduction to Data Formats and tools
JCreator Settings Only
Exploring the Basics of Microsoft Windows 7
A Tutorial Lincoln Stein, Cold Spring Harbor Laboratory
Chapter 1: Digital Communication Tools
Regulatory Genomics Lab
Computer Modeling Fundamentals
Regulatory Genomics Lab
Computer Modeling Fundamentals
Presentation transcript:

Polymorphism & Variant Analysis Lab Saurabh Sinha Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 1 Powerpoint by Casey Hanson

Exercise In this exercise, we will do the following:. 1.Gain familiarity with a graphical user interface to PLINK 2.Run a Quality Control (QC) analysis on genotype data of 90 individuals of two ethnic groups(Hong Chinese and Japanese) genotyped for ~230,000 SNPs. 3.Use our QC data to perform a genome wide association test (GWAS) across two phenotypes: case and control. We will compare the results of our GWAS with and without multiple hypothesis correction. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 2

Step 0A: Shared Desktop Directory For viewing and manipulating files on the classroom computers, we provide a shared directory in the following folder on the desktop: classes/mayo In today’s lab, we will be using the following folder in the shared directory: classes/mayo/sinha2 Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 3

Step 0B: Copying GWAS Directory to Desktop Navigate to our shared folder directory: classes/mayo/sinha2/ Right click on the gwas folder and select Copy. Right click on the Desktop and select Paste. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 4

Dataset Characteristics Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 5 filenamemeaning plink.exeAn executable of the PLINK GWAS toolkit. gPLINK.jar A JAVA graphical user interface (GUI) that interfaces with plink.exe. Haploview.jar A haplotype analysis program written in JAVA. Used to view PLINK results and SNP analysis. wgas1.pedGenotype data for 228,694 SNPS on 90 people. wgas1.mapMap file for the snps in wgas1.ped. extra.ped Genotype data for 29 SNPS on the same 90 people. extra.mapMap file for the SNPS in extra.ped. pop.cov Population membership of the 90 people. (1 = Han Chinese, 2 = Japanese)

The PED File Format The PED File Format specifies for each individual their genotype for each SNP and their phenotype. Family ID is either CH (Chinese) or JP (Japenese) Paternal and Maternal IDs of 0 indicate missing. Sex is either Male=1, Female=2, Other=Unknown Phenotype is either 0 = missing, 1 = affected, 2 = unaffected. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 6 Family ID Individual ID Paternal ID Maternal ID SexPhenotypeGenotype… CH NA A A G..

The MAP File Format The MAP File Format specifies the location of each SNP. Note: Morgans (M) are a special kind of genetic distance derived from chromosomal recombination studies. Morgans can be used to reconstruct chromosomal maps. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 7 chrSNP IDM Base Pair Position 8rs

Configuring gPLINK In this exercise, we will configure gPLINK to work with our data. Additionally, we will perform a format conversion to speed up our QC analysis. Finally, we will validate our conversion and see what individuals and SNPs would be filtered out with default filters for QC analysis. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 8

Step 1: Starting gPLINK gPLINK is a graphical user interface, written in JAVA, to the command line program PLINK. To start gPLINK, navigate to the gwas directory we copied to the Desktop. Double click on gplink.jar. A window should appear similar to the one on the right. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 9

Step 2A: Configuring gPLINK Click on the Project item on the Menu Bar. Select Open from the drop down menu. The pop-up window should look similar to the screenshot below. Click on Browse. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 10

Step 2B: Configuring gPLINK In the file browser, navigate to the Desktop. Click on the gwas directory and click Open. Click OK on the Open Project window. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 11

Step 2C: Configuring gPLINK You should see the files in the gwas folder in the Folder Viewer on the left hand side of gPLINK. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 12

Step 3A: Creating a Binary Input File Click the PLINK item on the Menu Bar. Click Data Management. Click Generate fileset. In the next window, select Standard Input on the tab bar. Select wgas1 under Quick Fileset. Check Binary fileset. Under Output File input wgas2. Click OK. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 13

Step 3B: Creating a Binary Input File On the Execute Command window, click OK. This will convert our wgas1 files to a binary format. Under the Operations Viewer, you will wgas2 with an R next to it indicating running. Wait for it to turn GREEN. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 14

Step 3C: Creating a Binary Input File In the Folder Viewer, you should see a bunch of new wgas2 files created during the file creation process. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 15

Step 4A: Validating the Conversion Click the PLINK item on the Menu Bar. Click Summary Statistics. Click Validate Fileset. In the next window, select Binary Input on the tab bar. Select wgas2 under Quick Fileset. Under Output File input validate. Click OK. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 16

Step 4B: Validating the Conversion On the Execute Command window click OK. Wait for the command to finish (validate will show the icon) Click on the validate track: Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 17

Step 4C: Validating the Conversion Look in the Log viewer out of ~ 230,000 SNPs were removed because the failed the MAF. 623 SNPS were removed because they were not genotyped in enough individuals (minimum, 90%). Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 18

Step 4D: Validating the Conversion Click the + adjacent to the Validate track to expand it. Click the + adjacent to the Output track to expand it. Right click validate.irem and click Open in default viewer. You should see the following: JA19012NA19012 The family ID is JA19012 (Japanese) and the individual ID is NA This individual was removed because of a low genotyping rate. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 19

Quality Control Analysis In this exercise, we will perform Quality Control Analysis (QC) to filter our data according to a set of criteria. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 20

Quality Control Filters The validation tool will impose the following criterion on our data. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 21 filtermeaningthreshold Minor Allele Frequency (MAF) The proportion of the minor allele to the major allele of a SNP in the population must exceed this threshold for the SNP to be included in the analysis 1% Individual Genotyping rate The number of SNPs probed for an individual must exceed this threshold for the person to be analyzed. 95% SNP genotyping rate The SNP must be probed for at least this many individuals. 95%

Step 5A: Quality Control Analysis Click the PLINK item on the Menu Bar. Click Data Management. Click Generate Fileset. In the next window, select Binary Input on the tab bar. Select wgas2 under Quick Fileset. Click Binary fileset. Under Output File input wgas3. Click Threshold. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 22

Step 5B: Quality Control Analysis On the Threshold window: Set Minor allele frequency to Set Maximum SNP missingness rate to Set Maximum individual missingness rate to 0.05 Set Hardy Weinberg equilibrium to Click OK. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 23

Step 5C: Quality Control Analysis Click OK. On the Execute Command window, click OK. This will create a new set of files prefixed gwas3 that are filtered according to the thresholds on the previous slide. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 24

Genome Wide Association Test (GWAS) In this exercise, we will a GWAS on our filtered data across two phenotypes: a case study and control. We will then compare the results between unadjusted p-values and multiple hypothesis corrected p-values. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 25

Step 6A: GWAS Click the PLINK item on the Menu Bar. Click Association. Click Allelic Association Tests. In the next window, select Binary Input on the tab bar. Select wgas3 under Quick Fileset. Click Adjusted p-values. Under Output File input assoc1. Click OK. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 26

Step 6B: GWAS On the Execute Command window, click OK. This will perform the GWAS analysis on our data and store the results under assoc1 in the main window of gPLINK. Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 27

Step 7: GWAS Without Multiple Hypothesis Correction Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 28

Step 8: GWAS With Multiple Hypothesis Correction Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 29

Step 9: P-Value Distribution Graph Polymorphism and Variant Analysis Lab v1 | Saurabh Sinha 30