Presentation is loading. Please wait.

Presentation is loading. Please wait.

Access to Sequence Data and Related Information

Similar presentations


Presentation on theme: "Access to Sequence Data and Related Information"— Presentation transcript:

1 Access to Sequence Data and Related Information
From Bioinformatics and Functional Genomics, Third Edition, Jonathan Pevsner.2015

2 Learning objectives define the types of molecular databases;
define accession numbers and the significance of RefSeq identifiers; describe the main genome browsers and use them to study features of a genomic region; and use resources to study information about both individual genes (or proteins) and large sets of genes/proteins.

3 Introduction to Biological Databases
In 1995 the complete genome of a free-living organism was sequenced for the first time, the bacterium Haemophilus influenzae DNA sequence data collected from over 300,000 different species of organisms 1970s dideoxynucleotide sequencing (“Sanger sequencing”) Since 2005 next-generation sequencing (NGS) technology

4 Centralized Databases Store DNA Sequences

5 NCBI

6 EBI

7 DDBJ

8 Growth of DNA sequence in repositories

9 Scales of DNA base pairs

10 Contents of DNA, RNA, and Protein Databases

11 Genbank data file division

12 Types of Data in GenBank/EMBL-Bank/DDBJ

13 Genomic DNA Databases DNA-Level Data: Sequence-Tagged Sites (STSs)
DNA-Level Data: Genome Survey Sequences (GSSs) DNA-Level Data: High-Throughput Genomic Sequence (HTGS)

14 Sequence-tagged site A sequence-tagged site (or STS) is a short (200 to 500 base pair) DNA sequence that has a single occurrence in the genome and whose location and base sequence are known. STSs can be easily detected by the polymerase chain reaction (PCR) using specific primers. For this reason they are useful for constructing genetic and physical maps from sequence data reported from many different laboratories. They serve as landmarks on the developing physical map of a genome. When STS loci contain genetic polymorphisms (e.g. simple sequence length polymorphisms, SSLPs, single nucleotide polymorphisms), they become valuable genetic markers, i.e. loci which can be used to distinguish individuals. They are used in shotgun sequencing, specifically to aid sequence assembly. STSs are very helpful for detecting microdeletions in some genes. For example, some STSs can be used in screening by PCR to detect microdeletions in Azoospermia (AZF) genes in infertile men. fromf

15 RNA data RNA-Level Data: cDNA Databases Corresponding to Expressed Genes RNA-Level Data: Expressed Sequence Tags (ESTs) RNA-Level Data: UniGene

16 Protein Databases UniProt

17 Central Bioinformatics Resources: NCBI and EBI

18 Accession Numbers to Label and Identify Sequences

19 The Reference Sequence (RefSeq) Project

20 Access to Information via Gene Resource at NCBI

21 Flatfile type&Fasta

22 Command-Line Access to Data at NCBI

23 Access to Information: Genome Browsers
The University of California, Santa Cruz (UCSC) Genome Browser The Ensembl Genome Browser The Map Viewer at NCBI

24


Download ppt "Access to Sequence Data and Related Information"

Similar presentations


Ads by Google