Presentation is loading. Please wait.

Presentation is loading. Please wait.

3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong.

Similar presentations


Presentation on theme: "3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong."— Presentation transcript:

1 3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong

2 3/1/2004MSE Presentation I2 Outline Project Overview Requirements Cost Estimation Project Plan Potential Risks Demonstration References Acknowledgments

3 3/1/2004MSE Presentation I3 Project Overview -- Objective Build a web-based, user-friendly Expressed Sequence Tags model database (ESTMD) system to help biology scientists search expression sequences and related information to make further decision

4 3/1/2004MSE Presentation I4 Project Overview -- Background ESTs: Expressed Sequence Tags, are partial sequences of randomly chosen cDNA, obtained from the results of a single DNA sequencing reaction. Typically, EST processing includes raw sequence cleaning, cleaned sequence assembling, and unique sequence annotation and functional assignment. Trace Files Raw (clone) sequences Cleaned (EST) sequences Assembled (unique) sequences Phred Cross_match & PERL program Cap3 Unique sequences with hit Blast

5 3/1/2004MSE Presentation I5 Project Overview -- Background (cont’d) Gene Ontology A set of controlled vocabularies used to describe biological features within a specified domain of biological knowledge. Gene Ontology describes the molecular functions, biological processes and cellular components of gene. Pathway The sequence of enzyme catalyzed reactions by which an energy-yielding substance is utilized by protoplasm.

6 3/1/2004MSE Presentation I6 Project Overview -- System Architecture Client Tier Responsible for presenting data, and receiving user inputs Application-server Tier Responsible for recording and abstracting business processes Data-server Tier Responsible for data storage Three-tier Architecture

7 3/1/2004MSE Presentation I7 Project Overview -- Technologies and Tools HTML with JavaScript will be used to build client interfaces Java Servlets, JSP (Java ServerPage) and JDBC will be used on the server-side XML and XSLT will be used to describe and present Gene Ontology tree structure MySQL4.0 is chosen as database management system

8 3/1/2004MSE Presentation I8 Project Overview -- Technologies and Tools (cont’d) JBuilder Enterprise9 is used as development tool Rational Rose is used to create UML models MS-Project is used for project plan Some verification and validation software (such as Alloy, USE, or SPIN) will be used for formal requirement specification

9 3/1/2004MSE Presentation I9 Project Overview -- E-R Model

10 3/1/2004MSE Presentation I10 Requirements

11 3/1/2004MSE Presentation I11 Requirements (cont’d) Search in Detail Users search detail information by gene name or symbol, sequence ID, FlyBase ID, or GenBank ID Users can decide the fields shown in the result The output format is html/text (A sample output is shown on the right side) unisequenceID: Contig1 uniSeq: CGCGGCCGCGTCGACGAGATTCGGAGGTTAG AAACATGACTCGCAAACGCCGTAATGGAGGA CGGGCTAAGCACGGCCGTGGCCACGTTAAGG CGGTGAGATGCACCAACTGCGCGCGTTGCGT GCCTAAGGACAAAGCTATCAAAAAGTTCGTG ATCAGGAATATTGTCGAAGCGGCTGCCGTCA GGGATATCAACGAAGCTTCCGTATATGCATC ATTCCAGCTGCCGAAGCTGTATGCAAAGCTC CACTACTGCGTCTCCTGCGCCATCCACAGCA AAGTTGTGCGCAACAGGTCTAAGAAGGACAG GAGAATCCGCACACCACCCAAGAGCACCTTC CCCAGGGACATGCAGCGCCCACAGAATGTGC AAAGGAAGTGAAGTGATTTACAATAAATTTT AAGAAAACCC flybaseID: FBgn0004413 evalue: 2.00E-49 hitLength: 114 bitScore: 190 identity: 93/115

12 3/1/2004MSE Presentation I12 Requirements (cont’d) Search by Keyword Users search the sequences at each stage by keyword The output includes sequence ID, length (with a link to sequence), gene name, symbol and a link to contig view image A sample output cloneIDRaw Length Cleaned Length Unisequence ID Unisequence Length Gene Name symbolContig View pb42ad- 1_001_a07.pb42primer 876409Contig1413Ribosomal protein S26 RpS26View link pb42ad- 1_001_f07.pb42primer 886205Contig1413Ribosomal protein S26 RpS26View link pyes2-ct_012_c12.p1ca291286Contig1413Ribosomal protein S26 RpS26View link pyes2-ct_034_h06.p1ca803398Contig1413Ribosomal protein S26 RpS26View link

13 3/1/2004MSE Presentation I13 Requirements (cont’d) Gene Ontology Search Users search gene ontology information by gene names, symbols, IDs, or a text file. The output is a table including GO ID, term, type, sequence ID, hit ID, and gene symbol. The hyperlinks on terms can show gene ontology tree structure. A sample output GO IDTermTypeSequence IDHit IDGene Symbol GO:0006412protein biosynthesisBiological_processContig1FBgn0004413RpS26 GO:0005843cytosolic small ribosomal subunit (sensu Eukarya) Cellular_componentContig1FBgn0004413RpS26 GO:0005840ribosomeCellular_componentContig1FBgn0004413RpS26 GO:0003735structural constituent of ribosome Molecular_functionContig1FBgn0004413RpS26

14 3/1/2004MSE Presentation I14 Requirements (cont’d) Gene Ontology Classification Users input a batch of gene names/symbols, or a local text file containing sequence IDs. Users can choose the gene ontology types which they want to classify. The output is a table including gene ontology type, subtype, sequence count, and percentage of sequences. A sample output typesubtypesequence_count% Cellular_componentcell375% Biological_processCell growth and/or maintenance375% Molecular_functionenzyme125% Molecular_functionProtein tagging125% Molecular_functionStructural molecule375%

15 3/1/2004MSE Presentation I15 Cost Estimation The effort of the project is estimated by Function Point Analysis (FPA) COCOMO II Model

16 3/1/2004MSE Presentation I16 Cost Estimation -- Function Point Analysis Unadjusted Function Points Function Type SimpleAverageComplexTotal UFP AmountWeightAmountWeightAmountWeight Inputs7 33 0 44 0 66 21 Outputs2 44 5 55 0 77 33 Inquires11 33 0 44 2 66 43 Files0 77 3  10 0  15 30 Interfaces1 55 1 77 0  10 12 Total UFP138

17 3/1/2004MSE Presentation I17 Cost Estimation -- Function Point Analysis (cont’d) Function Point Analysis Total Unadjusted Function Points (UFP) = 138 Product Complexity Adjustment (PC) = 0.65 + (0.01× 40) = 1.05 Total Adjusted Function Points (FP) = UFP × PC = 144.9 Language Factor (LF) for Java assumed as 35 Source Lines of Code (SLOC) = FP × LF = 5071.5

18 3/1/2004MSE Presentation I18 Cost Estimation -- COCOMO II For application programs: Delivered Source Instructions (KDSI) = 5.0715 Programmer Effect (PM) = 2.4 × (KDSI) 1.05 = 13.2 person-month Development Time in month (TDEV) = 2.5 × (PM) 0.38 = 6.66 months

19 3/1/2004MSE Presentation I19 Project Plan Phase I: Requirement ( 1/12/04 ~3/1/04) Phase II: Design (2/23/04 ~ 4/23/04) Phase III: Implementation and Test (4/26/04 ~ 7/30/04)

20 3/1/2004MSE Presentation I20 Project Plan (cont’d)

21 3/1/2004MSE Presentation I21 Potential Risks The requirements may change continually Some biology knowledge is needed Some new technologies, such as XML, XSLT, need to be leaned

22 3/1/2004MSE Presentation I22 Demonstration http://129.130.115.72:8080/estmd/index.html

23 3/1/2004MSE Presentation I23 References IEEE STD 830-1998, IEEE Recommended Practice for Software Requirements Specifications, 1998 Edition, IEEE, 1998 IEEE Standard for SW Quality Assurance Plans (IEEE Std 730-1998) Walker Royce, Software Project Management -- A United Framework, 1998 Marty Hall, Core Servlets and JavaServer Pages, 2000 Roger. S. Pressman, Software Engineering: A practitioner’s Approach, 5 th Edition. Dr. Gustafson, CIS 540 lecture http://sunset.usc.edu/research/COCOMOII/index.html

24 3/1/2004MSE Presentation I24 Acknowledgments Committee: Dr. Mitchell L. Neilsen Dr. Gurdip Singh Dr. Daniel Andresen

25 3/1/2004MSE Presentation I25 Suggestions and Comments Thank You!


Download ppt "3/1/2004MSE Presentation I1 ESTMD System -- A Web-based EST Model Database System Yinghua Dong."

Similar presentations


Ads by Google