生物資料庫搜尋 ( 第八組 ) 連威森 497542342 王鼎 497542108 黃智楹 497542469 張鈞淵 497542720.

Slides:



Advertisements
Similar presentations
Gene Expression Chapter Eleven. What is Gene Expression? When a gene is expressed – that gene’s protein product is made: 1.DNA is transcribed into RNA.
Advertisements

COT 6930 HPC and Bioinformatics Bioinformatics Resources and Databases Xingquan Zhu Dept. of Computer Science and Engineering.
Beta-Globin gene Activity From a kit by 3 D Molecular Designs3 D Molecular Designs Image from: Image from:
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Structural Genomics and Human Health
Organization of Genetics Course Molecular Genetics Cytogenetics Transmission Genetics Population Genetics.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Control of Gene Expression. The Central Dogma From DNA to Proteins DNA RNA Protein Translation Transcription Genotype Phenotype.
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Chromosomes carry genetic information
Genome Analysis Determine locus & sequence of all the organism’s genes More than 100 genomes have been analysed including humans in the Human Genome Project.
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
SRY Gene on Chromosome Y Jon Scales Genetics Fall GTAACAAAGAATCTGGTAGAAGTGAGTTTTGGATAGTAAAATAAGTTTCGAACTCTGGCA 61 CCTTTCAATTTTGTCGCACTCTCCTTGTTTTTGACAATGCAATCATATGCTTCTGCTATG.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Fine Structure and Analysis of Eukaryotic Genes
Genes (3.1) IB Diploma Biology Essential Idea: Heritable traits are passed down to offspring through genes.
Chapter 2 Genes Encode RNAs and Polypeptides
Unit 4 Vocabulary Review. Nucleic Acids Organic molecules that serve as the blueprint for proteins and, through the action of proteins, for all cellular.
William S. Klug Concepts of Genetics Eight Edition Chapter 1 Introduction to Genetics Copyright © 2006 Pearson Prentice Hall, Inc.
Human Genome Project by: Amanda Mosello. What is the Human Genome Project? created in 1990, by the National Institutes of Health and the US Department.
Chapter 2: From genes to Genomes. 2.1 Introduction.
Genomics BIT 220 Chapter 21.
Restriction Nucleases Cut at specific recognition sequence Fragments with same cohesive ends can be joined.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Genomics Analysis Chapter 20 Overview of topics to be discussed  The Human Genome Analysis  Variable Number Tandem Repeats  Short Tandem Repeats 
©Edited by Mingrui Zhang, CS Department, Winona State University, 2008 Identifying Lung Cancer Risks.
CS177 Lecture 10 SNPs and Human Genetic Variation
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Online Mendelian Inheritance in Man (OMIM): What it is & What it can do for you Knowledge Management & Eskind Biomedical Library January 27, 2012 helen.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Chapter 2 From Genes to Genomes. 2.1 Introduction We can think about mapping genes and genomes at several levels of resolution: A genetic (or linkage)
ESTs Ian Keller Laboratory Techniques in Molecular Bio.
11 Gene function: genes in action. Sea in the blood Various kinds of haemoglobin are found in red blood cells. Each kind of haemoglobin consists of four.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
How many genes are there?
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Lesson Four Structure of a Gene. Gene Structure What is a gene? Gene: a unit of DNA on a chromosome that codes for a protein(s) –Exons –Introns –Promoter.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Genetics 3.1 Genes. Essential Idea: Every living organism inherits a blueprint for life from its parents.
Faculdade de Medicina da Universidade de Coimbra Curso de Medicina 1º Ano Ano lectivo 2009/2010.
Department of Biotechnology Bangladesh Agricultural University
GENETIC DISEASES Lecture 5
GENOME ORGANIZATION AS REVEALED BY GENOME MAPPING WHY MAP GENOMES? HOW TO MAP GENOMES?
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
SC.912.L.16.3 DNA Replication. – During DNA replication, a double-stranded DNA molecule divides into two single strands. New nucleotides bond to each.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Human Genome Project.
Lesson Four Structure of a Gene.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Lesson Four Structure of a Gene.
Figure 1. Partial genetic and physical map of chromosome 5q
Genes 3.1.
Genes and Genomes.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Every living organism inherits a blueprint for life from its parents.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
3.1 Genes Genes and hence genetic information is inherited from parents, but the combination of genes inherited from parents by each offspring will be.
What has happened? Substitution mutation
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Characterization of New Members of the Human Type II Keratin Gene Family and a General Evaluation of the Keratin Gene Domain on Chromosome 12q13.13  Michael.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

生物資料庫搜尋 ( 第八組 ) 連威森 王鼎 黃智楹 張鈞淵

內容大綱 & 小組分工 Exercise 1: Finding public biological databases  黃智楹 Exercise 2: The GeneCards database  連威森 Exercise 3: Identification of disease genes  王鼎+張鈞淵

Exercise 1: Finding public biological databases

2011 NAR Database Summary Paper Category List Nucleotide Sequence Databases RNA sequence databases Protein sequence databases Structure Databases Genomics Databases (non-vertebrate) Metabolic and Signaling Pathways Human and other Vertebrate Genomes Human Genes and Diseases Microarray Data and other Gene Expression Databases Proteomics Resources Other Molecular Biology Databases Organelle databases Plant databases Immunological databases

MGI 資料庫性質 此次選擇的是 MGI database(Mouse Genome Informatics) 這個資料庫主要是提供實驗老鼠包含了 Genome 、 Pathway 、 Expression…… 等各種生 物資料,並進而來研究人類的癌症及各項 疾病並提出改善的可能

About MGI 包含了幾個大範圍的資料庫 Mouse Genome Database (MGD) Project 、 Gene Expression Database (GXD) Project 、 Mouse Tumor Biology (MTB) Database Project 、 Gene Ontology (GO) Project at MGI 、 MouseCyc Project at MGI ,這些索引可以在 about MGI 找到 ( 可見下圖標紅色處 )

About MGI 包含 Gene 庫、 Gene 的表達、老鼠的腫瘤研究、 Gene Ontology (GO) Project …… 等等

About MGI( 各資料庫內容 ) MGD : 含 gene characterization, nomenclature, mapping, gene homologies among mammals, sequence links, phenotypes, allelic variants and mutants, and strain data. GXD 含 different types of gene expression information from the mouse and provides a searchable index of published experiments on endogenous gene expression during development. MTB 含 data on the frequency, incidence, genetics, and pathology of neoplastic disorders, emphasizing data on tumors that develop characteristically in different genetically defined strains of mice.

特點 較特別的是本資料庫,所針對提供資料的 對象為實驗老鼠,除了一般資料之外,還 提供了腫瘤研究的相關資料,因此,資料 庫最終目的還是為了解決人類相關疾病如 癌症所設

建議的搜尋方式 建議從 Home 進入主頁面,直接從各個圖案 進入自己所要進入的資料庫,搜尋相關資 料較快 ( 見下兩張圖 ) ,點了 Home 之後,會 進入有很多圖示的頁面,每一個圖示對應 著一種資料的連結

點 Home 之後 …… Home Page

每一個圖示對應一種資料連結

Home Page 每一個圖示對應一種資料連結

Genes 點進 Access Data 後可在此頁面打入限定範圍並進行查詢

Exercise 2: The GeneCards database

GeneCards Genecards 是一個收集並展示人類基因及其產物和相關 疾病等綜合信息的知識平台。 它是由以色列的 Weizmann 研究所基因組研究中心和生 物信息學中心共同開發的,擁有完整的導覽輔助應用系統, 並且有專家建議的提示,再加上拼字檢查功能,可以說是 一套方便好用的生物醫學資源工具。 目前, GeneCards 的版本已演進到3.0 版,有 個 基因資料,其中 個已經被 HUGO 基因命名委員會審核 通過。

使用 GeneCards 搜尋 HBA2 gene Summaries The human alpha globin gene cluster located on chromosome 16 spans about 30 kb and includes seven loci: 5'- zeta -pseudozeta - mu - pseudoalpha-1 - alpha-2 - alpha-1 - theta - 3'. The alpha-2 (HBA2) and alpha-1 (HBA1) codingsequences are identical. These genes differ slightly over the 5' untranslated regions and the introns, but they differsignificantly over the 3' untranslated regions. Two alpha chains plus two beta chains constitute HbA, which in normaladult life comprises about 97% of the total hemoglobin; alpha chains combine with delta chains to constitute HbA-2, which with HbF (fetal hemoglobin) makes up the remaining 3% of adult hemoglobin. Alpha thalassemias result fromdeletions of each of the alpha genes as well as deletions of both HBA2 and HBA1; some nondeletion alpha thalassemiashave also been reported. (provided by RefSeq)

Genomic Location Function Involved in oxygen transport from the lung to the various peripheral tissues. Start:140,887 bp from pter End:141,750 bp from pter Size:864 bases Orientation:plus strand

Protiens Recommended Name: Hemoglobin subunit alpha Size: 142 amino acids; Da Subunit: Heterotetramer of two alpha chains and two beta chains in adult hemoglobin A (HbA); two alpha chains and twodelta chains in adult hemoglobin A2 (HbA2); two alpha chains and two epsilon chains in early embryonic hemoglobinGower-2; two alpha chains and two gamma chains in fetal hemoglobin F (HbF) Miscellaneous: Gives blood its red color Sequence caution: Sequence=BAD ; Type=Erroneous initiation

Exercise 3: Identification of disease genes

Problem A laboratory has generated an EST library from a hemochromatosis patient and wants to identify the gene(s) causing the phenotype.

Hemochromatosis Hemochromatosis is the most common form of iron overload disease. Primary hemochromatosis, also called hereditary hemochromatosis, is an inherited disease. Secondary hemochromatosis is caused by anemia, alcoholism, and other disorders. Without treatment, the disease can cause the liver, heart, and pancreas to fail.

Outline - the steps solving the problem 1.Compare an EST from a hemochromatosis patient to the human genome (using BLAST). 2.Identify the gene(s) aligning the ESTs and download their sequences (using Map Viewer). 3.Identify whether the ESTs contain any known nucleotide variations (single nucleotide polymorphisms) (using dbSNP). 4.Determine whether a mutant form of the gene is known to cause a phenotype (using OMIM).

Step 1. Compare ESTs to the human genome (using BLAST)

Expressed sequence tag (EST) An expressed sequence tag or EST is a short sub-sequence of a transcribed cDNA sequence. Because these clones consist of DNA that is complementary to mRNA, the ESTs represent portions of expressed genes.

Contig (in DNA sequencing) In shotgun DNA sequencing projects, a contig is a set of overlapping DNA segments derived from a single genetic source. A contig in this sense can be used to deduce the original DNA sequence of the source.

Step 1. (1/4)

Step 1. (2/4)

Step 1. (3/4)

Step 1. (4/4)

Step 1. Questions On which chromosome is the EST located? On which contig?  EST located on Homo sapiens chromosome 6 genomic contig, GRCh37.p2 reference primary assembly. Is the EST sequence 100% identical to the genomic sequence?  Their is one nucleotide difference between the contig NT_ and the EST sequence (G/A).

Step 2. Identify the gene(s) aligning the ESTs and download their sequences (using Map Viewer).

Single-nucleotide polymorphism (SNP) That is a DNA sequence variation occurring when a single nucleotide — A, T, C, or G — in the genome differs between members of a biological species or paired chromosomes in an individual.

Step 2. (1/4)

Step 2. (2/4)

Step 2. (3/4)

Step 2. (4/4)

Step 2. Question Note the orientation of the gene. Is the gene on the forward strand or the reverse strand?

Step 3. Identify whether the ESTs contain any known nucleotide variations (using dbSNP).

Step 4. Determine whether a mutant form of the gene is known to cause a phenotype (using OMIM)

資料來源 BlastGen.cgi?taxid=9606