Download presentation
Presentation is loading. Please wait.
1
Computer Applications and Bioinformatics
2
Generalised Picture of a Computer Motherboard
CPU Socket DIMM (Dual In-line Memory Module) IDE (Integrated Drive Electronics) Connector Power Supply Connector 1 SATA Connectors Front Panel Connectors South Bridge USB connectors PCI Express Modem Network North Bridge Fire wire (IEEE1394) / USB (Universal Serial Bus) Mouse Key Board Connection Power Supply Connector 2 15 1 14 2 13 12 3 11 4 10 5 9 8 7 6
3
What is bioinformatics?
Marriage between Computer Technology and Biological Data
4
Definition of bioinformatics
Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data. As an interdisciplinary field of science, bioinformatics combines computer science, statistics, mathematics, and engineering to analyze and interpret biological data.
5
Broad Applications of Bioinformatics
Sequence retrieval Sequence alignment Structure determination PCR Primer Design Construction of Phylogenetic Trees
6
Today's topic of discussion
Sequence retrieval Sequence alignment
7
Databases URL: www.ncbi.nlm.nih.gov URL: www.rcsb.org
Sequence Database Structural Database
8
Repository URL:
9
Formats of sequence writing:
FastP used previously for protein sequences FastN used previously for nucleotide sequences FastA most common and recent format used for both protein and nucleotide sequences. FASTA format >ATCCATGCGCGATGCATGGTCATG >gi| |HsHbA ATCCTAGCATCCGGATATGGCATA GenBank format
10
Sequence Alignment SEQ 1: AATTAAT SEQ 2: ATTTTAT Score = 5 Score = 2
SEQ 1: ATTGCATAA SEQ 2: ATGCTTAGC Score = 2
11
Gap introduction (gi) and Gap Extensions (gex)
Alignment using Gaps SEQ 1: ATTGCATAA ATTGCATAA SEQ 2: ATGCTTAG A-TGCTTAG Gap introduction (gi) and Gap Extensions (gex) SEQ 3: GCATGCGCTAA GCATGCGCTAA SEQ 4: GATTAA G-AT----TAA gi gex
12
However, gapped matches might not always be desirable.
Gap Penalty (Gp): Gp = gi + gex Score(s) = S – Gp Amino acid substitution matrices BLOSUM (Block Substitution Matrix) PAM (Accepted Point Mutation)
15
Quantifying alignment results
P-value: The P-value of an identified similarity of Score S is the probability that a score of atleast S (i.e., S or greater than S) would have been obtained in a chance match between two unrelated sequences of similar composition and length. E-value: For an identified similarity of score S the E-value is the expected frequency of scores of atleast S. (It is the number of scores of atleast S that would have been expected to have occurred by chance.)
16
Sequence alignment types
Pair – wise alignment Here one sequence is aligned with another sequence. Two types - Local alignment Global alignment Multiple sequence alignment Here more than two sequences are aligned with each other.
17
Global alignment Local alignment
SEQ 1: ATGCATGCATCGTAGC ATGCATGGCATGTAGT SEQ 2: Complete sequence is aligned. Smallest information is not obtained. In case of ancestral study or phylogenetic study global alignment is more predictable. SEQ 1: ATGCATGCATCGTAGC TATGTCGCATTCATCT SEQ 3: A part of the sequence is aligned. Smallest information is obtained. It is not preferred in case of ancestral study or phylogenetic study.
18
National Center for Biotechnology Information
19
Basic Local Alignment Search Tool (BLAST)
21
Next class on Multiple sequence alignment
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.