HICF-based physical mapping of Mimulus guttatus and M. lewisii Anna Blenda 1, John Willis 2, Todd Vision 3, Eric Fang 1, Barbara Blackmon 1, Jeanice Troutman 1, David Henry 1, Stephen Ficklin 1, Michael Atkins 1 and Jeff Tomkins 1 1 Clemson University Genomics Institute, Clemson, SC, 29634, USA 2 Dept. Biology, Duke University, Durham, NC, 27708, USA 3 Dept. Biology, University of North Carolina, Chapel Hill, NC, 27599, USA This project is supported by NSF-FIBR grant No and NSF-MRI grant No This project is supported by NSF-FIBR grant No and NSF-MRI grant No Mimulus guttatus (Monkeyflower) Mimulus lewisii (Pink Monkeyflower) Genomic regions conferring reproductive barriers and adaptive differences between species in the genus Mimulus, a well- developed ecological plant model, have been identified by QTL mapping in two different populations. Further progress on understanding the molecular genetic basis of adaptation and speciation in Mimulus requires comparative genomic tools. Comprehensive structural, functional and comparative studies of a genome are increasingly dependent upon the availability of integrated physical/genetic maps. Anchoring molecular markers to specific large-insert DNA clones (BACs) associated with various traits, particularly QTLs, is an important objective in developing a physical framework. The high information content fingerprinting (HICF) technique, based on 5-enzyme digestion, ABI SNaPshot labeling kit and ABI3730 capillary electrophoresis, was used to construct physical maps for two Mimulus species; M. guttatus and M. lewisii. Physical map data for both species is available at To integrate the genetic and physical maps, cDNA-derived markers were anchored to all three Mimulus BAC libraries by overgo hybridization using a high-throughput probe pooling approach. Physical Mapping Status. A total of 56,624 (80%) clones from both M. guttatus BAC libraries were used for contig construction after filtering the data set in the GeneMapper and GenoProfiler programs. Prior to FPC assembly, Mimulus libraries were screened with chloroplast and mitochondrial DNA probes and the non-nuclear organelle BAC clones removed from the data set. The contig assembly produced 2,535 contigs containing 38,337 clones (68% of total clones used for FPC) with an average Sulston score of Similarly, 33,163 (91%) clones were used for construction of the M. lewisii physical map of 1,913 contigs. The integrated genetic/physical map is available at Chromosome 6 Marker info Genetic map display FPC contigs display References Lazo GR, Lui N, Gu YQ, Kong X, Coleman-Derr D, Anderson OD. Hybsweeper: a resource for detecting high-density plate gridding coordinates. Biotechniques (3):320,322,324. Luo M-C, Thomas C, You FM, Hsiao J, Ouyang S, Buell CR, Malandro M, McGuire PE, Anderson OD, Dvorak J. High- throughput fingerprinting of bacterial artificial chromosomes using the SNaPshot labeling kit and sizing of restriction fragments by capillary electrophoresis Genomics 82: Nelson WM, Bharti AK, Butler E, Wei F, Fuks G, Kim H, Wing RA, Messing J, Soderlund C. Whole-genome validation of high- information-content fingerprinting. Plant Physiol (1): Methods and Procedures. Fingerprinting pipeline was used for data processing and analysis, which incorporates the use of the following software applications: ABI Data Collection v2.0, ABI GeneMapper v3.7 (software packages for ABI DNA sequencers), GenoProfiler v1.1, and FPC v CUGI uses the publicly available phred and crossmatch programs to perform base calling and to remove residual vector from sequences. We use our in-house scripts to filter low quality sequences from the final data sets. Sequences with at least 100 base pairs with a phred value of 20 or higher and less than 5% ambiguous bases are considered successful. All others are removed from the dataset. The final BESs submitted to Genbank contain only the longest contiguous non-vector sequence of bases with low quality bases trimmed off both the beginning and end of the sequence. Accession numbers: M. guttatus BAC-end sequences were submitted to the Genbank under accession numbers ED ED806636, M. lewisii BAC-end sequences were submitted to the Genbank under accession numbers ED ED Integration of Genetic and Physical Map Data. Hybridization of genetically mapped ESTs (containing SSRs) onto the fingerprinted Mimulus BAC libraries is being done via overgo probe pooling (5 x 5 x 5 design). The HybSweeper program (Lazo et al., 2005) is used for high-throughput hybridization data scoring and the data is then de-convoluted using a PERL script. At present, 523 and 376 ESTs have been anchored to the M. guttatus and M. lewisii physical maps, respectively. Using the FPC contig assemblies, ~5,000 clones for each species were selected and re-arrayed into minimum tile libraries for comparative studies and BAC-end sequenced (BES). BES data is being used to develop additional genetic markers (SSRs). Goal = 1,000 STS markers anchored to both M. guttatus and M. lewisii physical maps. M. guttatus HICF raw data after GeneMapper processing Overgo design scheme Overgo hybridization scheme Pool 1 Pool 2 Pool 3 Pool 4 Pool 5 Overgo probe pooling design (5 x 5 x 5) used to hybridize STS markers onto Mimulus BAC libraies CUGI website Mimulus project pages M. guttatus WebFPC v. 2.1 FPC contig with hybridization data HybSweeper HybSweeper program for high-throughput hybridization data scoring Mimulus BES Homology Blast and SSR Analysis Results. To gain a glimpse into the Mimulus genome and to provide a resource for other researchers, the successful BESs were queried against the Swiss-Prot protein database, NCBI's non-redundant protein database, and MIPS Arabidopsis database. The results of the December, 2006 homology searches are provided in the form of downloadable Excel spreadsheets. Alternatively, researchers are able to perform BLAST or FASTA homology searches of their own sequences against the Mimulus publicly available BES library through CUGI's online FASTA and BLAST servers. In-house scripts were used to mine SSRs in the Mimulus BESs (Table 1), and the Primer3 program for locating primers. The resulting SSRs are defined as dinucleotides (motifs with 2 bp), trinucleotides (motifs with 3 bp), tetranucleotides (motifs with 4 bp), pentanucleotides (motifs with 5 bp) and hexanucleotides (motifs with 6 bp). Only dinucleotides with at least five repeats, trinucleotides with at least four repeats, and tetra-, penta- and hexanucleotides with at least three repeats are included in the result set. Forward and reverse primers for SSRs were also generated. Primers are mostly generated for SSRs from sequences that have a GC content between 40% and 60% with at least 20 base pairs of sequence on either side of the SSR. The resulting Excel spreadsheet also indicates if the SSRs are located within a putative coding region.