Introduction to Bioinformatics Topic 1 Introduction to Bioinformatics and Sequence Analysis
Session 1 Learning Outcomes: The scope of bioinformatics The origins & growth of DNA databases Evidence of evolution from bioinformatics Example sequence analysis and displays using human Factor IX
Bioinformatics: Concerns the generation, visualization, analysis, storage, and retrieval of large quantities of biological information.
GenBank growth: How much data we are talking about? The amount of DNA sequence data in public databases NCBI: US national centre for Biotechnology Information DDJB: DNA Data Bank Japan EBI: European Bioinformatics Institute The contest of these data base are synchronized.
What DATA??? Human Genome Project Projecting now come from scientists in numerous field of biology, medicine, agriculture, ecology, history, energy, and forensic. Lets give some examples which you can explore in your own interest:
http://www.1000genomes.org The genomes of 1000 people to identify genetic variants that affect 1% of the human population
www.1001genomes.org The genomes of 1001 strains that differ in phenotype including adaptation to growth in a wide variety conditions.
https://genome10k.soe.ucsc.edu/ An effort to sequence the genomes of 10,000 species, one from each genus.
http://www.arthropodgenomes.org/wiki/i5K
http://www.ncbi.nlm.nih.gov/genome/browse/ Metagenomics database
Cancer genome atlas
ANNOTATION: The information describing genetic and protein sequences structures, similarities, functions, and prediction associated with these sequences.
Advantageous Deleterious Neutral WITNESSING EVOLUTION THROUGH BIOINFORMATICS Random mutation in sequences is a common phenomenon. Advantageous Organism kept it for future population Deleterious Quickly eliminated from the population Neutral May or may not be retained
Recent evolutionary changes to plants & animals 10,000 years ago hunter-gather life-style to practicing agriculture. Domestication of animals. Cows milk production Horses speed or strength Sheep wool quantity and quantity Poultry more breast meat Fish speed of maturation
LARGE SOURCES OF HUMAN SEQUENCE VARIATION First time sequencing of human genome both cost and time was high. Resequencing cost decline sharply as using the first sequence as template. Resequencing show considerable differnces seen between individual people.
Single Nucleotide polymorphisms (SNPs): Human genome 3.2 billions bp Approximately 3 million nucleotides differ between two individual genomes The common differences are found in about 1% of the population.
Copy Number Variations (CNVs): Comparing your DNA sequence to that of the human “standard genome”, there are thousands of DNA segments which range from 1000 to several million nucleotides in length and they are either present, present in multiple copies or absent from your genome.
Africa (50,000 years ago) Middle East Europe Neanderthals RECENT EVOLUTIONARY CHANGES TO HUMAN POPULATIONS Africa (50,000 years ago) Middle East Europe Neanderthals Eastern Europe Lithuania
block damaging of uv light Examples of genetic changes associated with adaptation (diet and lifestyle): Skin Color: African Indian Southern European Northern European Near pole Paler skin color make vitamin D Near equator Darker skin color block damaging of uv light Sequence variation in number of genes, one of it is SLC24A5
Other examples: (self study) Lactose intolerance Digestion of starch Malaria resistance and sickle cell anemia Life at high altitude
DNA SEQUENCE IN DATABASES
Two types of DNA sequences are available in databases: Genomic DNA cDNA
Genomic DNA assembly
cDNA:
SEQUENCE ANALYSIS AND DATABASE DISPLAY The sequence of the mRNA for human Factor IX Accession number: NM_000133
Applying two rules for describing the human Factor IX mRNA sequence: Coding regions begins with ATG Coding regions end with one of three terminator sequences: TAA TGA TAG
Coding regions are read at triplets. Others are 5’ and 3’ UTR
Coding region triplets are translated into amino acids.
The protein sequence of human factor IX (461 amino acids)
Pairwise alignment: Factor IX gene which is over 38000 nt. A single mutation, changing a G to T at coordinate 25531, results in hemophilia B, a severe bleeding disorder.
Alignment of human (Query) and chimpanzee (Subject/Subjct) Factor IX proteins
Factor IX has five major domains Cleaved by signal peptidase, 12 Gla residues in the second domain. Activated by cleaving the protein into 2 peptide Cleave X protein, clotting cascade pathway To direct the protein to the ER of liver cells, from where it secreted into the blood. Epidermal growth factor- like domain bind Ca++
The entire 38000 nt gene is shown as the black arrow F9.
Location of Factor IX gene in chr X.
THE END