Access to Sequence Data and Related Information From Bioinformatics and Functional Genomics, Third Edition, Jonathan Pevsner.2015
Learning objectives define the types of molecular databases; define accession numbers and the significance of RefSeq identifiers; describe the main genome browsers and use them to study features of a genomic region; and use resources to study information about both individual genes (or proteins) and large sets of genes/proteins.
Introduction to Biological Databases In 1995 the complete genome of a free-living organism was sequenced for the first time, the bacterium Haemophilus influenzae DNA sequence data collected from over 300,000 different species of organisms 1970s dideoxynucleotide sequencing (“Sanger sequencing”) Since 2005 next-generation sequencing (NGS) technology
Centralized Databases Store DNA Sequences
NCBI
EBI
DDBJ
Growth of DNA sequence in repositories
Scales of DNA base pairs
Contents of DNA, RNA, and Protein Databases
Genbank data file division
Types of Data in GenBank/EMBL-Bank/DDBJ
Genomic DNA Databases DNA-Level Data: Sequence-Tagged Sites (STSs) DNA-Level Data: Genome Survey Sequences (GSSs) DNA-Level Data: High-Throughput Genomic Sequence (HTGS)
Sequence-tagged site A sequence-tagged site (or STS) is a short (200 to 500 base pair) DNA sequence that has a single occurrence in the genome and whose location and base sequence are known. STSs can be easily detected by the polymerase chain reaction (PCR) using specific primers. For this reason they are useful for constructing genetic and physical maps from sequence data reported from many different laboratories. They serve as landmarks on the developing physical map of a genome. When STS loci contain genetic polymorphisms (e.g. simple sequence length polymorphisms, SSLPs, single nucleotide polymorphisms), they become valuable genetic markers, i.e. loci which can be used to distinguish individuals. They are used in shotgun sequencing, specifically to aid sequence assembly. STSs are very helpful for detecting microdeletions in some genes. For example, some STSs can be used in screening by PCR to detect microdeletions in Azoospermia (AZF) genes in infertile men. fromf
RNA data RNA-Level Data: cDNA Databases Corresponding to Expressed Genes RNA-Level Data: Expressed Sequence Tags (ESTs) RNA-Level Data: UniGene
Protein Databases UniProt
Central Bioinformatics Resources: NCBI and EBI
Accession Numbers to Label and Identify Sequences
The Reference Sequence (RefSeq) Project
Access to Information via Gene Resource at NCBI
Flatfile type&Fasta
Command-Line Access to Data at NCBI
Access to Information: Genome Browsers The University of California, Santa Cruz (UCSC) Genome Browser The Ensembl Genome Browser The Map Viewer at NCBI