Download presentation
Presentation is loading. Please wait.
1
Motif Space Database Design Kiranjit Sidhu
2
2 Outline Schema Design Content of Database Functionality Future Plans
3
3 Sample PDB File Sample PDB File Sample PDB File Each PDB File represented as a text file (~ 60K Lines) Inefficient for pattern matching Relational Database required for most efficient solution
4
4 Structure of Database DB divided into two major components: Protein Data Motif (Occurrence) Data Protein Data Obtained from PDB Files (Protein Data Bank) Derived Data Motif Data Obtained from Luke’s FFSM technique Derived Data
5
5 Schema Design
6
6 Schema Design - Protein
7
7 Schema Design - Motif
8
8 Tools Used Obtaining Data Perl Scripts Database: SQL Server 2000 and SQL Server 2005 T-SQL (Bulk Import Data)
9
9 Obtaining Data PDB FileTemp Tables (T-SQL) T-SQL Procedures CSV File Extract Import Final DB Convert and Derive
10
10 Uploading Protein Data Input dataset: ~ 70,000 PDB/Chain Combinations Entries in tables: E.g. Approx. 800 Million Rows in the proteinchaindistance table Initial version imported 10 PDB files in 1 day Current version: under 3 minutes
11
11 Current Functionality Protein (PDB) data has been completely uploaded into both: Production Database (MotifSpace) Development Database (MotifSpaceDev) Visualize protein structure using data from database (data available) Data can be obtained from Server using SOAP or web services. Basic Queries such as Different PDBs a specific motif occurs in? Histograms to compute statistics.
12
12 Demo
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.