Standard land plant barcoding requires a multi loci approach? Peter Gasson Sujeevan Ratnasingham Robyn Cowan
Mitochondrial DNA in land plants: undergoes rearrangements transfer of genes to nucleus incorporation of foreign genes substitution rates are VERY slow (with a few notable exceptions e.g. Plantago, Cho & al.)
Partners Instituto de Biologia UNAM,Mexico – Gerardo Salazar Imperial College, UK - Timothy Barraclough Natural History Museum, Denmark - Gitte Petersen Natural History Museum (London), UK - Mark Carine New York Botanical Garden, USA - Kenneth Cameron Royal Botanic Garden Edinburgh, UK - Peter Hollingsworth Royal Botanic Gardens, Kew, UK - Mark Chase South African National Biodiversity Institute - Ferozah Conrad University of Cape Town, South Africa - Terry Hedderson U. Estadual de Feira de Santana, Brazil - Cássio van den Berg Universidad de los Andes - Santiago Madriñán U. of Wales Aberystwyth UK (previously University of Reading, UK) - Mike Wilkinson Alfred P. Sloan Foundation Gordon and Betty Moore Foundation
To develop a universal approach to barcoding of all landplants Phase 1: primer development (protein motifs); complete genome sequences; problems: ferns; 46 pairs of sister taxa from mosses, liverworts, hornworts, lycopods, ferns/fern allies, gymnosperms, angiosperms – percent PCR success & percent polymorphisms Phase 2: in depth trials of six markers identified in phase I on a range of well sampled taxa from across land plants
So what are the characteristics of a good barcode? High inter-specific, low intra-specific sequence divergence Universal amplification/sequencing with standard primers Technically simple to sequence Short enough to sequence in one reaction Easily alignable (few insertions/deletions) Readily recoverable from museum or herbarium samples and other degraded samples **Universal + Variable**
What sort of marker should we use? Mitochondrial DNA Plastid Ribosomal DNA (ITS) Low-copy nuclear DNA (protein coding) Length variable ? Single loci Multiple loci (one genomic compartment) ? Multiple loci (two genomic compartments) ?
Advantages of plastid DNA (hence its use in phylogenetics) Monomorphic (separation of different copies not required in hybrids) High copy number (can even be amplified from highly degraded DNA) Potentially highly diagnostic (in spite of its reputation to the contrary) However, will not detect hybrids, introgression, paralogy
Coding or non-coding? Non-coding regions: sometimes more variable microsatellites difficult to sequence through numerous indels-impossible to align, length variable cannot translate to check for pseudoproteins and to aid aligment sometimes contain rearrangements and coding insertions (character based identification)
trnH-psbA spacer region
Criterion for locus selection 1.Species level sequence divergence 2.Appropriate length ( bp) 3.Presence of conserved primer target sites 4.At least 200bp exon sequence
Our Strategy 1.Identify suitable loci on the basis of in silico screens using Nicotiana cp sequence 2.Design universal primers (sets of 4 primers/locus) using amino acid and nucleic acid sequence data 3.Perform initial screen for universality (1 primer pair) 4.Screen for sequence variation using diverse species pairs 5.Improve universality (e.g. use all primer combinations) 6.Use statistical modelling approaches to identify optimal primer sets
Standard PCR Recipe NH 4 x1 Mg mM dNTPs 0.2mM FW test primer 1 M RE test primer 1 M Taq DNA polymerase 2 units BSA 0.1mg/ml Template 40ng Water to 20 l
Results of First PCR Gene ndhK ndhJ rpoC1 rpoB YCF2 accD rpoC2 ndhA YCF9YCF5 matK rpl22 Total success % success
Number of Variable Sites Gene matK (11)YCF5 (10)accD (6) rpoC2 (7)rpoB (4) rpoC1 (3) YCF9 (9)ndhJ (2) ndhA (8)ndhK (1) YCF2 (5) Variable sites Length % sites variable
Trial regions Selected seven genes that represent the different levels of universality and variability. Blue= high, green = medium, yellow= low. Gene ndhAYCF9rpoC2accDndhKYCF2rpoBndhJrpoC1YCF5matK Variability Universality
Trial groups Asterella Anastrophyllum-Barbilophozia Tortella Bryum Triquetrella Homalothecim Tortella Elaphoglossum Asplenium Equisetum Cupressus Pinus Araucaria Labordia Conostylis Dactylorhiza maculata/incarnata Mimetes Inga Hordeum Scalesia Crocus Laelia Cattleya Mormodes Deiregyne Lauraceae
GroupFamilyPrimary generaaccDmatKndhJrpoBrpoC1 Angio asteridsAsteraceaeScalesia Angio asteridsLoganicaceaeLabordia2+4X Angio eudicotsProteaceaeMimetes1+4*** Angio magnoliidsLauraceae 2+4X Angio monocotAgavaceaeAgave Angio monocotHaemodoraceaeConostylis2+4X Angio monocotIridaceaeCrocus2+42.1a Angio monocotOrchidaceaeAulosepalum Angio monocotOrchidaceaeCattleya2+42.1a+5* Angio monocotOrchidaceaeDactylorhiza2+4X Angio monocotOrchidaceaeSophronitis2+42.1a Angio monocotPoaceaeHordeumMissing2.1a Angio rosidsFabaceaeInga2+4X FernAspleniaceaeAsplenium**LP1+LP5** FernDryopteridaceaeElaphoglossumLP1+LP4***LP1+LP5 Fern allyEquisetaceaeEquisetum1+LP3FE+RELP1+LP4LP1.1+LP4.3LP1+LP5 GymnospermAraucariaceaeAraucaria2+4FE+RE?1+32+LP32+4 GymnospermCupressaceaeCupressus1+4* *2+4 GymnospermPinaceaePinus2+4FE+REMissing2+LP32+4 GymnospermZamiaceaeEncephalartos1+4FE+RE* LiverwortAytoniaceaeAsterella2+4*1+3*2+4 LiverwortLophoziaceaeAnastrophyllum**LP1+LP4*2+4 MossBryaceaeBryum**LP1+LP4LP1.1+LP MossPottiaceaeTortella***LP1.1+LP3.2LP1+4 MossPottiaceaeTriquetrella**LP1+LP4LP1.1+LP MossPtychomniaceae*1+4* LP1.1+L P
Summary rpoC1accDndhJrpoBmatK 5/255/205/198/206/16
Trial regions Selected seven genes that represent the different levels of universality and variability. Blue= high, green = medium, yellow= low. Gene ndhAYCF9rpoC2accDndhKYCF2rpoBndhJrpoC1YCF5matK Variability Universality
Agavaceae X 22 sp. Crocus X 9 sp. Aulosepalum X 8 sp.(?all) Cattleya X 30sp.(2 clades approx 43 sp.) Dactylorhiza 15 sp. (species complex) Sophrinitis 27 sp. (approx. 37 sp.) Scalesia X 4 (species complex) Conostylis X 42 (?all) Equisetum X 14 Pinus X 66 Hordeum X 10 Lauraceae
Gaps as a 5th StateGaps = missing dataWith duplicates removed Haplo - types% Haplo types matK % % % rpoB % % % rpoC % % % rpoB + matK % % % rpoB+rpoC % % % rpoC1+mak % % % rpoC1+rpoB+matK % % % Individuals Species289 Samples with unique ‘barcode’
Users of DNA Barcoding: ‘The Traffic Light approach’ Green - non-problematic taxa (current markers appropriate, silver standard) Orange - need for gold standard (polyploidy, introgression, paralogy) Red - barcoding needs investigation, species complex, etc