Visual Basic and Perl Applications for Genome Project Management. Svetlana N.Yurgel* 1, Brenda K. Schroeder 1, Hao Jin 2 and Michael L. Kahn 1,3. Institute of Biological Chemistry 1, School of Electrical Engineering and Computer Science 2 and School of Molecular Biosciences 3 Washington State University, Pullman WA Design, Ordering, and Organization of PCR Primers Perl Primer Design Update Table Access Database Access Database The design of our nested PCR required that all the gene specific primers be 20 base pairs long with 12 base pair tags attached to the forward and reverse primers. This was accomplished using the Visual Basic program ‘Primer Design’. Once the primers were constructed in silico, they were used to confirm the start and stop information for each in the database using a Perl routine. This information was then entered into the Microsoft® Access® Database. ‘Primer Compare’ is a Visual Basic program used to compare the file received after primers were constructed to confirm which primers were received from which order and if the primer sequences were correct. This information was then entered into the Access Database using the Visual Basic program ‘Update Table’. This program allowed an asynchronous process for ordering primers, selecting primers etc. to be carried out as the separate requirements for each step of each ORF was completed. Primer Compare PCR Amplification and Confirmation Access Database Create Gel Form Aliquot primers Run PCR Score Data Access Database The list of the next 94 ORFs to be PCR amplified was obtained by querying the Access ® Database, asking for ORFs that had primers diluted and ready to use. These diluted primers were then aliquoted into strip cap tubes and used to amplify the DNA of the respective ORFs using PCR. ‘Create Gel Form’ is a Visual Basic program which utilizes the list of 94 ORFs and their nucleotide start and stop information to construct forms to be used for recording data in the laboratory. This list of 94 ORFs remained fixed during the cloning process. Data about each ORF at each step in the process was entered into the Database using ‘Table Update’. Update Table Genomics research requires coordinating data analysis with experimental manipulations. Cloning more than 6300 Open Reading Frames (ORFs) from the bacterium, Sinorhizobium meliloti (Schroeder et al. 2005) required a database that could be used to schedule an asynchronous workflow to maintain a high throughput with minimum clerical effort. A Microsoft® Access® database operating on linked Microsoft® Excel® spreadsheets was implemented to track each ORF through the cloning process. The familiar Excel spreadsheet format decreased the time needed to train students to input and analyze data and increased the level of comfort and enthusiasm for automated tracking. For similar reasons, exchanging spreadsheets with vendors was relatively simple. Using Microsoft® Access® together with Visual Basic applications embedded in Microsoft® Excel®, allowed us to rapidly develop new utilities as they were needed. Visual Basic and Perl routines were mainly used for extracting or manipulating data on a global scale, to change formats or to check for the presence of certain patterns. The query language allowed us to select data using criteria specific to the particular stage of the procedures and to output the results as worksheets for subsequent experimental manipulation. Finally, data stored in a Microsoft® Access® database can be transferred easily to more powerful databases (mySQL or Oracle) for further analysis. This relatively simple database format was adequate for this task and we offer it as an example of customizable software that can significantly accelerate the manipulation of data and materials in a complex project. Abstract The utilization of a relational database in Microsoft®Access® using linked Microsoft®Excel® spreadsheets provided several advantages compared to other database applications. First, the startup time was immediate, no database design was required. Second, working in the format of familiar Excel spreadsheets decreased the time required to learn how to input and manipulate the data. Third, using Microsoft®Access® tools like the query language facilitated the data search process, allowing us to select data using criteria appropriate to the particular stage of the process and to configure the data output into formats useful for subsequent experimental manipulations. Fourth, manually executed routines written in Visual Basic were used for data manipulation, generation of data sheets, and analysis. Finally, the storage format for data in a Microsoft®Access® database is directly transferable to more powerful databases such as mySQL. There were numerous steps in the cloning process which were routinely followed with the database. These included generating primer orders, selecting primers for polymerase chain reaction (PCR), completing the PCR, determining which clones were cloned and tracking storage locations for each step in the process. Visual Basic routines were utilized in numerous ways including constructing the primer sequences, checking the sequence of the primers received, and generating worksheets that could be taken into the laboratory. This poster describes the use of a Microsoft®Access® database in which tables are linked to Microsoft®Excel® spreadsheets. A Perl routine checked the position of primers in the genome and extracted DNA sequences flanked by the primers for further manipulation. This approach allowed simple questions to be answered in order to move forward in the laboratory in a very efficient way. The partnering of a Microsoft®Access® database linked to Microsoft®Excel® spreadsheets and the complementary Visual Basic routines generally served as an “electronic notebook” for the project. Overview Transfer of data from “electronic notebook” to Web accessible database Plasmid Host strain Vector Grandparental strain Parental strain Strain ORF Strain PCR Website design Database schema The organization of the database schema is based on the data collected for the genomic project in the “electronic notebook” and the data was directly uploaded from the “electronic notebook” to the web accessible MYSQL database. The website design was based on how the pieces of data are interrelated. Below is an example of how to move through the different pages of the website based on their interrelated features to obtain information about specific ORFs that were cloned in various destination vectors. This information can include the sequence of the PCR amplified ORF, the sequence of the different destination vectors and strain information about the different constructs that have been constructed. Cloning Process Precipitate PCR Product Complete BP Clonase reaction Electroporation and plating Plasmid Isolation PCR Confirmation of plasmid DNA Access Database Use previously generated worksheet for this step. Score data Update Table After PCR, the PCR products were PEG precipitated and resuspended in buffer. A 3 µl aliquot was removed and placed in a new set of strip cap tubes. The BP Clonase reaction buffer and entry vector were added to each tube and the reaction was incubated ath room temperature. A 3 µl aliquot was then used for electroporation to introduce the cloned plasmid DNA into an E.coli host. Cells carrying the plasmid were then cultured and the plasmid DNA was isolated. This isolated DNA was then used as template for a PCR reaction to confirm that each plasmid was carrying the appropriate size insert. The amplified DNA was run on an electrophoresis gel and the gel was scored for the presence of the correct size bands using the information found on the previously generated worksheet for this set. The data was recorded and entered into the Excel file used to store the data. The data was then added to the database using the Visual Basic program ‘Update Table’. The database can then be queried for success with an individual ORF at each step. This information can then be used to start the process over at the appropriate step if necessary.