Predict Protein Sequence by Fuzzy-Association Rules

Slides:



Advertisements
Similar presentations
Parallel BioInformatics Sathish Vadhiyar. Parallel Bioinformatics  Many large scale applications in bioinformatics – sequence search, alignment, construction.
Advertisements

Bioinformatics “Other techniques raise more questions than they answer. Bioinformatics is what answers the questions those techniques generate.” SheAvery
Structural bioinformatics
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
Bioinformatics and Phylogenetic Analysis
Molecular Evidence Using DNA, RNA or Protein Sequences to Classify Organisms.
Discovery of RNA Structural Elements Using Evolutionary Computation Authors: G. Fogel, V. Porto, D. Weekes, D. Fogel, R. Griffey, J. McNeil, E. Lesnik,
Biology Workbench Introduction. What is it used for? It is a web-browser to use bioinformatics tools to analyze and visualize nucleotide and protein sequences.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
Ayesha Masrur Khan Spring Course Outline Introduction to Bioinformatics Definition of Bioinformatics and Related Fields Earliest Bioinformatics.
EECS 395/495 Algorithmic Techniques for Bioinformatics General Introduction 9/27/2012 Ming-Yang Kao 19/27/2012.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
SUPERVISED NEURAL NETWORKS FOR PROTEIN SEQUENCE ANALYSIS Lecture 11 Dr Lee Nung Kion Faculty of Cognitive Sciences and Human Development UNIMAS,
C OMPUTATIONAL BIOLOGY. O UTLINE Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity of the Algorithms.
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S Primary Supervisor: Prof. Heiko Schroder.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
What is Genetic Research?. Genetic Research Deals with Inherited Traits DNA Isolation Use bioinformatics to Research differences in DNA Genetic researchers.
Discovering the Correlation Between Evolutionary Genomics and Protein-Protein Interaction Rezaul Kabir and Brett Thompson
Construction of Substitution Matrices
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
ARE THESE ALL BEARS? WHICH ONES ARE MORE CLOSELY RELATED?
Copyright OpenHelix. No use or reproduction without express written consent1.
A Tutorial of Sequence Matching in Oracle Haifeng Ji* and Gang Qian** * Oklahoma City Community College ** University of Central Oklahoma.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
BioInformatics Database of Primer Results In order to help predict the way proteins will act in an organism, biologists cross-examine sequences of amino.
Positional Association Rules Dr. Bernard Chen Ph.D. University of Central Arkansas.
Overview of Bioinformatics 1 Module Denis Manley..
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Bioinformatics Curriculum Issues, goals, curriculum.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Step 3: Tools Database Searching
Copyright OpenHelix. No use or reproduction without express written consent1.
1 Lecture 4 The Fuzzy Controller design. 2 By a fuzzy logic controller (FLC) we mean a control law that is described by a knowledge-based system consisting.
Introduction to Sequence Alignment. Why Align Sequences? Find homology within the same species Find clues to gene function Practical issues in experiments.
Bioinformatics What is a genome? How are databases used? What is a phylogentic tree?
BME435 BIOINFORMATICS.
Using BLAST to Identify Species from Proteins
Bioinformatics Overview
Introduction to Bioinformatics Resources for DNA Barcoding
Data-intensive Computing: Case Study Area 1: Bioinformatics
Investigating Diversity Part 2
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Overview of Multiple Sequence Alignment Algorithms
Using BLAST to Identify Species from Proteins
Determine protein structure from amino acid sequence
Predicting Active Site Residue Annotations in the Pfam Database
Introduction to Computer Programming
Bioinformatics and BLAST
Overview Bioinformatics: Analyzing biological data using statistics, math modeling, and computer science BLAST = Basic Local Alignment Search Tool Input.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Validation of Ebola LOD
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Evidence and Phylogenetic trees
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Identify D. melanogaster ortholog
LESSON 1 INTNRODUCTION HYE-JOO KWON, Ph.D /
Introduction to Bioinformatic
Applying principles of computer science in a biological context
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Supporting High-Performance Data Processing on Flat-Files
Introduction to Bioinformatics
Using BLAST to Identify Species from Proteins
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Predict Protein Sequence by Fuzzy-Association Rules Student Name: Shasha Luo Instructor: Dr. Zhang Course Name: CI (csc8710) Georgia State university Fall 2003

Introduction Bioinformatics Fuzzy-association rules system How to predict the sequence? (Algorithm) Prediction results Conclusion and future work

Bioinformatics Create and maintain databases of biological information Analyze and interpret of various types of data including amino acid sequences, protein domains, and protein structures… Develop and implement tools to efficient access and manage different types of information.

Protein Sequence Matching Develop methods to predict the structure and discover proteins and structural RNA sequences. Cluster protein sequences into families of related sequences and the development of protein models. Align similar proteins and generate phylogenetic trees to examine evolutionary relationships. Find the genes in DNA sequences of various organisms.

Fuzzy-Association Rules System Fuzzy logic Mamdani fuzzy control model If …, then… Define membership function, rulebases, linguistic values. Defuzzification Association rules Min-support value Min-confidence value Apriori algorithm

How to Predict the Type of Protein Sequences? Give a large protein sequence file with 7 types of protein, divide it into two databases in which each sequence has five amino elements and a type element. Protein sequence: ABCCEFG Sequence type: BCEGHST A new fuzzy-association system Use one of databases applying association rule (Apriori algorithm) to generate the rule for prediction Rule example: 3 <- A B D (10.0%, 5.0%) Use the other database as an input file passing through fuzzy system to predict which type of sequence is.

System Structure 1 2 3 output 4 5 6 7 Compare Outputs From fuzzy Association rules To be predicted Protein sequence Three inputs Which type Protein sequence? Fuzzy system

Algorithm (Con.) After generate the rules, input the sequences needed to predict and compare with rules, see if there are matches. How to compare? Compare with rules of each type at one time, as t. Repeat seven times, total time T. During t, there are various rules. We divide them into at most five categories: 1, 2, 3, 4, 5 elements rules, donate type1, type2, type3, type4, type5.

Algorithm (Con.) First, compare with type1, donate as C1. For other comparisons donate as C2, C3…C7. If there are several matches, then choose the highest support and confidence value, later save as an input in sub-fuzzy system. Also repeat this step three times (for our case). ABCDE => 3 <- ABD (10.0%, 5.0%) Repeat last step seven time, then we will get all the inputs for the fuzzy system. Now we have seven different outputs through seven fuzzy system. Compare them, choose the largest output and we get the predict result of which type it is because of the largest probability.

How to Define Fuzzy Logic System? For each subsystem, three inputs and one output. Each input [0,40] = 25% * Supp + 75% * Conf Output [0, 45] Rulebases: all subsystem use same rulebases. The input of C1 has the 15% priority. The input of C2 has 25% priority. The input of C3 has 60% priority. 27 rulebases for each fuzzy system. Output and inputs have low, Mid, high.

Prediction Results Use 11259 protein sequence to generate association rules Use 11259 protein sequence as input to be predict. For type 2,3,5,6,7 of sequence, Min-support = 5% and Min-conf = 5% For type 1,4 of sequence, Min-support = 1% and Min-conf = 1%

Prediction Results (Con.) Association rules Support and confidence values 2 <- C S 2 <- C G 2 <- P V G 2 <- P L G 2 <- N L G (4.5%, 25.5%) (5.7%, 26.0%) (4.1%, 33.3%) (4.1%, 31.4%) (4.3%, 22.8%)

Prediction Results (Con.)

Conclusion and Future Work Conclusion: result is not very accurate. Future work: Because Apriori algorithm does not consider the data order and repetition, our association rules is not very accurate. Design a compatible Apriori algorithm for rules search.