Creating a Kinship Matrix using Microsatellite Analyzer (MSA) Zhifen Zhang The Ohio State University.

Slides:



Advertisements
Similar presentations
Applications of genome sequencing projects 1) Molecular Medicine 2) Energy sources and environmental applications 3) Risk assessment 4) Bioarchaeology,
Advertisements

applications of genome sequencing projects
Fathom Overview Workshop on using Fathom in School Improvement Planning (SIP)
UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Part 3 Data Analysis.
Identification of markers linked to Selenium tolerance genes
Excel application for accounting principles. Contents (1) The content of Excel screen. (2) The Excel ribbon. (3) How to create new workbooks. (4) Excel.
GBS & GWAS using the iPlant Discovery Environment
Genome-wide association mapping Introduction to theory and methodology
Basics of Linkage Analysis
Signatures of Selection
DATA ANALYSIS Module Code: CA660 Lecture Block 2.
Generation and Analysis of AFLP Data
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Constant Allele Frequencies Hardy-Weinberg Equilibrium.
Version 4 for Windows NEX T. Welcome to SphinxSurvey Version 4,4, the integrated solution for all your survey needs... Question list Questionnaire Design.
RIMS II Online Order and Delivery System Tutorial on Downloading and Viewing Multipliers.
Linkage Analysis in Merlin
Lecture 7 Desktop Publishing IV – Spreadsheet Software Introduction to Information Technology With thanks to Dr. A. Zhang, Dr. Haipeng Guo, and Dr. David.
Chapter 3 -- Genetics Diversity Importance of Genetic Diversity Importance of Genetic Diversity -- Maintenance of genetic diversity is a major focus of.
Using mutants to clone genes Objectives 1. What is positional cloning? 2.What is insertional tagging? 3.How can one confirm that the gene cloned is the.
Introduction to SPSS Edward A. Greenberg, PhD
Broad-Sense Heritability Index
Sample size vs. Error A tutorial By Bill Thomas, Colby-Sawyer College.
4/22/2017 5:36 PM EViews Training Creating Workfiles.
14 Population Genetics and Evolution. Population Genetics Population genetics involves the application of genetic principles to entire populations of.
IAP workshop, Ghent, Sept. 18 th, 2008 Mixed model analysis to discover cis- regulatory haplotypes in A. Thaliana Fanghong Zhang*, Stijn Vansteelandt*,
Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph.
Molecular identification of living things. Molecular Markers Single locus marker Multi-locus marker RFLP Microsatellite DNA Fingerprinting AFLP RAPD.
Spreadsheet Basics chapter 7
Announcements: Proposal resubmission deadline 4/23 (Thursday).
A lesson approach © 2011 The McGraw-Hill Companies, Inc. All rights reserved. a lesson approach Microsoft® Excel 2010 © 2011 The McGraw-Hill Companies,
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Gramene: Interactions with NSF Project on Molecular and Functional Diversity in the Maize Genome Maize PIs (Doebley, Buckler, Fulton, Gaut, Goodman, Holland,
1 DNA Polymorphisms: DNA markers a useful tool in biotechnology Any section of DNA that varies among individuals in a population, “many forms”. Examples.
Comparison of different output options from Stata

Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT.
Basics of Using Excel with Moodle. Remember Excel?
Simple-Sequence Length Polymorphisms SSLPs Short tandemly repeated DNA sequences that are present in variable copy numbers at a given locus. Scattered.
1 CA202 Spreadsheet Application Focusing on Specific Data using Filters Lecture # 5.
Chapter 22 - Quantitative genetics: Traits with a continuous distribution of phenotypes are called continuous traits (e.g., height, weight, growth rate,
Plant Breeding Shree Krishna Adhikari ©Shree Krishna Adhikari.
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
Progress on TripalBIMS Breeding Information Management System in Tripal Sook Jung, Taein Lee, Chun-Huai Chen, Jing Yu, Ksenija Gasic, Todd Campbell, Kate.
Introduction to Excel EC 151 Principles of Microeconomics Block 3,
8 and 11 April, 2005 Chapter 17 Population Genetics Genes in natural populations.
Genetic mapping and QTL analysis - JoinMap and QTLNetwork -
1 Bioinformatics Tools for Genotyping Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics.
Simple-Sequence Length Polymorphisms
Fall HORT6033 Molecular Plant Breeding
DNA profiling of sweetpotato cultivars and clones
T3/Tutorials: Data Submission
Multi-Axis Tabular Loads in ANSYS Workbench
Molecular Marker Characterization of plant genotypes
DNA Marker Lecture 10 BY Ms. Shumaila Azam
Mapping Quantitative Trait Loci
Genome-wide Association Studies
تهیه کننده بهارا رستمی نیا بهار 94
A Step-by-Step Procedure for Preparing BBS Sheets with Manual Entry of Each Bar is shown here Select BarBeQue 2009 Icon From Desktop Start Program BarBeQue.
AGEseq: Analysis of Genome Editing by Sequencing
Introduction to Database Programs
Welcome to the Markers Database Tutorial
Eviews Tutorial for Labor Economics Lei Lei
Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.
Introduction to Database Programs
Lesson 13 Working with Tables
EViews Training Creating Workfiles. EViews Workfiles EViews main operating principles: Any work in EViews is created in workfiles – which are place-holders.
Presentation transcript:

Creating a Kinship Matrix using Microsatellite Analyzer (MSA) Zhifen Zhang The Ohio State University

Why create a kinship matrix? Visualize family relatedness Use in association mapping The Objective of this Tutorial: To introduce one approach to generate a kinship matrix for a population

Types of Populations for Association Mapping (Yu, et al. 2006): Background Population TypeProsCons “Ideal population”Minimal population structure and familial relatedness Gives the greatest statistical power Difficult to collect, small size Narrow genetic basis Family-based populationMinimal population structure Limited sample size and allelic diversity Population with population structure Larger size and broader allelic diversity False positives or loss in power due to familial relatedness. Population with familial relationships within structured population Larger size and broader allelic diversity Inadequate control for false positives due to population structure The highlighted population needs a kinship matrix for analysis.

Germplasm GenotypingPhenotyping (Y) Background markers Genome-wide scan Genome-wide polymorphisms (G) Population structure (Q), kinship (K) Association analysis (General Mixed Model): Y=G+Q+K+e Scheme of Mapping Zhu et al. 2006

Coefficient of Kinship Coefficient of kinship is used to measure relatedness DEFINITION: Coefficient of kinship is the probability that the alleles of a particular locus chosen randomly from two individuals are identical by descent (Lange, 2002) * Identical by descent: the identical alleles from two individuals arise from the same allele in an earlier generation

Kinship Matrix Calculation The kinship coefficient is the probability that an allele (a) taken randomly from population i (at a given locus) will be identical by descent to an allele taken randomly from population j at the same locus Kf= ∑ k ∑ a (f ai * f aj )/D Where ∑ k ∑ a (f ai * f aj )/D is the sum of all loci and all alleles; f ai is the frequency of allele a in population i, f aj is the frequency of allele a in population j, D is the number of loci. Cavalli-Sforza and Bodmer, MSA manual page 23

An Example PopulationLocus ALocus BLocus C 611R2Abc Abc 611R3ABc ABc Coefficient of kinship for populations 611R2 and 611R3: Kf= ∑ k ∑ a (f ai * f aj )/D = (1x1+1x0 + 0x1+1x1)/3=2/3 The frequencies of alleles A in populations 611R2 and 611R3 are 1; the frequency of allele b is 1 in population 611R2 but 0 in population 611R3; the frequency of allele B is 0 in population 611R2 but 1 in population 611R3; the frequency of allele c is 1 in each population. In population 611R2 and 611R3, three loci were studied and summarized.

Kinship Matrix In the unified mixed model (Yu et al. 2006), marker-based kinship coefficient matrices are included A marker-based kinship coefficient matrix can be generated using Microsatellite Analyzer (MSA)

Pipeline for Matrix Generation Genotyping with Markers Data Entry to Create a Spreadsheet Arrange Data in a Format Recognized by MSA Import the.dat File into MSA Run the Kinship Coefficient Analysis

A Case Study Bacterial spot data provided by Dr. David Francis, The Ohio State University

Data Preparation For genotyping, molecular markers can be: Microsatellite (Simple Sequence Repeat, SSR) SNP (Single Nucleotide Polymorphism) RFLP (Restriction Fragment Length Polymorphism) AFLP (Amplified Fragment Length Polymorphism) Other sequencing markers MSA was designed for SSR analysis For other marker types, a value is designated for each allele, e.g. For SNPs, A can be 11, C can be 12, G can be 13, and T can be 14.

Data Entry After genotyping, a spreadsheet of genotypic data can be generated as shown below. Marker Population Genotype Score Genotype Score: 0 is homozygous for allele a, 1 is homozygous for allele b, 2 is heterozygous.

Data Formatting Population # Mating scheme Group name: default “1” Give a new value to each allele for each marker Row 1 can be left empty. Columns a, b, and c have to be filled. In cell A1, enter ‘1’. Row 2 is marker name. Genotype data start in row 3. These 3 cells should be empty.

Data Formatting Two-row entry for each individual (we are treating individuals as populations) to specify heterozygotes and homozygotes Homozygotes have the same value in both rows. Heterozygotes have different values in each row.

After Formatting Save the file in a “Tab Delimited” text format, change the extension name “.txt” into “.dat”.

MSA (Microsatellite Analyzer) is available for free, provided by Daniel Dieringer.

MSA is compatible with different operation systems Choose the operating system you use and install it as other software

Install MSA For Mac, a folder for MSA will be formed after you extract the downloaded file

Use MSA Manual for MSA is available in the folder Sample data is available

Use MSA Menu of Commands Enter command “i” to import file

Use MSA Import your data file to MSA easily by dragging it to the MSA window

Use MSA Enter command “d” to access submenu of “Distance setting” After the data file is imported,

Use MSA Enter “4” to turn on kinship coefficient module.

Use MSA Each line will be treated as a population in this study Input command “c” to turn on “Pair-wise populations distances calc”

Use MSA After everything is set, input “!” command to run the analysis. NOTE: Dkf in the matrix will be 1-kf in default.

Use MSA After the analysis, a folder of results will be created automatically in the same folder as your data file

Use MSA The kinship matrix is in the folder of “Distance _data”

Use MSA The Kinship matrix is saved as a file named “KSC_Pop.txt” Open the file with Excel to read the the matrix

Kinship Matrix The population (line) code Number of individuals

Kinship Matrix The population (line) code Number of individuals Please remember that the value in the matrix is “1-kinship coefficient”.

Final Matrix Use “1-X” (X is the value of each cell in the matrix) to generate a new matrix that will be the kinship matrix in Excel.

The Mixed Model Y= αX+ γM+ βQ + υK + e Population structure Marker effect Fixed Effect rather than markers Residual or Error Phenotype score Yu et al. 2006

References Cited & External Link References Cited Cavalli-Sforza, L. L., and W. F. Bodmer The genetics of human populations, W.H. Freeman and Company, NY. Lange, K Mathematical and statistical methods for genetic analysis, 2 nd edition. Springer-Verlag, NY. Yu, J. G. Pressoir, W. H. Briggs, I. V. Bi, M. Yamasaki, J. F. Doebley, M. D. McMullen, B. S. Gaut, D. M. Nielsen, J. B. Holland, S. Kresovich, and E. S. Buckler A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics 38: (Available online at: (verified 8 July 2011). Zhu, C. M. Gore, E. S. Buckler, and J. Yu Status and prospects of association mapping in plants. The Plant Genome 1: (Available online at: (verified 8 July 2011). External Link Dieringer, D. Microsatellite analyzer (MSA) [Online]. Available at: (verified 8 July 2011).