Aquatic Bacteria Database Or: Using an established knowledge base to assist the Institute of Aquaculture’s diagnostic services Aquatic Bacteria Database.

Slides:



Advertisements
Similar presentations
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Advertisements

Data Transfer Data Import Data Export Database Backup and Restore Uninstalling and re-installing LIMS.
Use Mobile Guidebook to Evaluate this Session – M1.5 Allowing Students to Update Their Program of Study Online.
Overview of IS Controls, Auditing, and Security Fall 2005.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Programming with Alice Computing Institute for K-12 Teachers Summer 2011 Workshop.
Chapter 2: Algorithm Discovery and Design
THE BRIEF PSYCHIATRIC RATING SCALE SYSTEM Senior Project by John Newman.
Academic Advisor: Prof. Ronen Brafman Team Members: Ran Isenberg Mirit Markovich Noa Aharon Alon Furman.
Input Validation For Free Text Fields ADD Project Members: Hagar Offer & Ran Mor Academic Advisor: Dr Gera Weiss Technical Advisors: Raffi Lipkin & Nadav.
Prototyping. CS351 - Software Engineering (AY2004)2 Scenario Customer: “We would like the word processor to check the spelling of what is typed in. We.
1 Software Requirements Specification Lecture 14.
School of Computer ScienceG53FSP Formal Specification1 Dr. Rong Qu Introduction to Formal Specification
Establishing Table Structures Chapter 7 Database Design for Mere Mortals.
Chapter 2: Algorithm Discovery and Design
Systems Analysis I Data Flow Diagrams
Web Searching. Web Search Engine A web search engine is designed to search for information on the World Wide Web and FTP servers The search results are.
Troy Eversen | 19 May 2015 Data Integrity Workshop.
Process Modeling SYSTEMS ANALYSIS AND DESIGN, 6 TH EDITION DENNIS, WIXOM, AND ROTH © 2015 JOHN WILEY & SONS. ALL RIGHTS RESERVED. 1 Roberta M. Roth.
1 CSI5388 Data Sets: Running Proper Comparative Studies with Large Data Repositories [Based on Salzberg, S.L., 1997 “On Comparing Classifiers: Pitfalls.
Software Development Unit 2 Databases What is a database? A collection of data organised in a manner that allows access, retrieval and use of that data.
1 Wireless Warehouse Management System Compsee’s M.A.T. Mobile Application Terminal.
OCLC Online Computer Library Center A Global OpenURL Resolver Registry Phil Norman OCLC Dlsr4lib Workshop March 23 rd, 2006 Arlington VA.
Objectives of the Lecture :
Today’s Lecture application controls audit methodology.
Dataface API Essentials Steve Hannah Web Lite Solutions Corp.
Component-Based Software Engineering Components and Interfaces Paul Krause.
2006 Palisade User ConferenceNovember 14 th, 2006 Inventory Optimization of Seasonal Products with.
RESEARCH A systematic quest for undiscovered truth A way of thinking
SWIS Digital Inspections Project (SWIS DIP) Chris Allen, Information Management Branch California Integrated Waste Management Board November 5, 2008 The.
Vancouver User Group Presented by Trevor Moore
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
The University of Akron Dept of Business Technology Computer Information Systems DBMS Functions 2440: 180 Database Concepts Instructor: Enoch E. Damson.
The GAVO Cross-Matcher Application Hans-Martin Adorf, Gerard Lemson, Wolfgang Voges GAVO, Max-Planck-Institut für extraterrestrische Physik, Garching b.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
SYSTEM TESTING AND DEPLOYMENT CHAPTER 8. Chapter 8: System Testing and Deployment 2 KNOWLEDGE CAPTURE (Creation) KNOWLEDGE TRANSFER KNOWLEDGE SHARING.
1 15 quality goals for requirements  Justified  Correct  Complete  Consistent  Unambiguous  Feasible  Abstract  Traceable  Delimited  Interfaced.
Database Application Design and Data Integrity AIMS 3710 R. Nakatsu.
Branches and Program Design
 So far in ICT we’ve covered how data is entered into computers (data capture) and how it’s checked (validation and verification).  In this section.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
Soft Computing Lecture 19 Part 2 Hybrid Intelligent Systems.
Chapter 2 Database System Concepts and Architecture Dr. Bernard Chen Ph.D. University of Central Arkansas.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
A guide to... Safe Systems of Work.
Software Engineering Requirements + Specifications.
Safe Systems of Work. Legislation w HSWA Section 2 (2) (a): Provide and maintain plant and systems of work that are, so far as is reasonably practicable,
Integration integration of all the information flowing through a company – financial and accounting, human resource information, supply chain information,
Software Requirements Specification Document (SRS)
8.1 8 Algorithms Foundations of Computer Science  Cengage Learning.
FatMax Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5.
Chapter 11 Data Input and Output. Input Data Capture Forms Data can be collected using a data capture form or questionnaire that is printed on a piece.
Session 6: Data Flow, Data Management, and Data Quality.
Reviewing, Revising and Writing Effective Constructed- Response Mathematics Items 1 Writing and scoring test questions based on Ohio’s Academic Content.
© Copyright  People at Work Project - Overview  People at Work Project - Theoretical Underpinnings  People at.
Dillon: CSE470: ANALYSIS1 Requirements l Specify functionality »model objects and resources »model behavior l Specify data interfaces »type, quantity,
Classifications of Software Requirements
Lecture 7 Constructing hypotheses
System Design Ashima Wadhwa.
By Dr. Abdulrahman H. Altalhi
Unit 27 - Web Server Scripting
TPP Standard PCS Requirements
Rank Aggregation.
Coding Concepts (Basics)
6.1 Quality improvement Regional Course on
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
The ultimate in data organization
Presentation transcript:

Aquatic Bacteria Database Or: Using an established knowledge base to assist the Institute of Aquaculture’s diagnostic services Aquatic Bacteria Database - Frank O'Hanlon

The Institute & Diagnosis IoA maintains large database of bacteriological bio- chemical test results. Academics offer confidential consultancy worldwide to identify potential/specific bacteria given an organisation’s test results. Correct ID to inform health & safety concerns, e.g. disease-causing bacteria, hence diagnosis. Presently diagnosis is conducted manually(!). Aquatic Bacteria Database - Frank O'Hanlon

Test Results to Diagnosis: Examples StrainTest 1Test 2Test 3Test 4Test 5Test 6Test 7 A B C SampleTest 1Test 2Test 3Test 4Test 5Test 6Test 7 C1+++???? C2++???-- C3?+--+?? Knowledge Base Clients’ Samples Samples supplied are incomplete: Clients constrained by their own resources. We evaluate each sample in turn... Aquatic Bacteria Database - Frank O'Hanlon

Exact Match: Strain A StrainTest 1Test 2Test 3Test 4Test 5Test 6Test 7 A B C SampleTest 1Test 2Test 3Test 4Test 5Test 6Test 7 C1+++???? C2++???-- C3?+--+?? A precise, singular match! Aquatic Bacteria Database - Frank O'Hanlon Knowledge Base Clients’ Samples

Multiple Matches: Strains A or B StrainTest 1Test 2Test 3Test 4Test 5Test 6Test 7 A B C SampleTest 1Test 2Test 3Test 4Test 5Test 6Test 7 C1+++???? C2++???-- C3?+--+?? Supplied samples eliminate w/in KB, but not to singular results. Aquatic Bacteria Database - Frank O'Hanlon Knowledge Base Clients’ Samples

No Matches StrainTest 1Test 2Test 3Test 4Test 5Test 6Test 7 A B C SampleTest 1Test 2Test 3Test 4Test 5Test 6Test 7 C1+++???? C2++???-- C3?+--+?? Though some tests correspond, others do not, clearly no exact match! Aquatic Bacteria Database - Frank O'Hanlon ? Knowledge Base Clients’ Samples

Implementation Functions to include: I.Search/diagnose II.Maintain/add to KB III.Useable GUI (likely via Java/Python) IV.Admin & Inquirer user types V.Advise further tests to narrow to singular result VI.Vary ‘threshold of accuracy’ of diagnosis Aquatic Bacteria Database - Frank O'Hanlon

Goals & Extension Measures of Success: 1.It produces correct diagnosis 2.Product functionality (of I.-VI. noted). Extension: – Web pages w/PHP script to interface to DB/KB. – Case-based learning approach. Aquatic Bacteria Database - Frank O'Hanlon

Considerations on Diagnosis Diagnosis only as good as tests: Difficulties in progress: – Coherency of KB: Is it accurate to the biology? – Advising: not all ‘no matches’ are ‘new strains’ in no-match case. – Capturing expert-like insight in complex cases. How to recommend further tests, how to integrate cost/efficacy. Embedding significance of certain tests. – It is known ~6 particular tests are usually enough to ID Genus alone. – Effectively Boolean test results: Revisions to bio-chemical test specifications could effect/invalidate KB. Ideal would be to take raw numeric values and permit tuning of test conditions in the product. MORE = MERRIER Aquatic Bacteria Database - Frank O'Hanlon

ANY QUESTIONS? Image: Aquatic Bacteria Database - Frank O'Hanlon

Appendix A: Diagnosis Pseudocode Client tests input I[t] to be checked against knowledge base KB[r,t] Create M which will hold matched results ForAll t { IF isMatch(I, KB[r]) THEN add KB[r] to M } return M Def N : isMatch(I, KB[r]) = { True: forall t. IF ( (I[t] == KB[r,t]) OR (I[t] = '?') ) False: Otherwise } IF (M is “No matches found for your test results” ELSE IF (M is only one “Unique match found. Here are the details...” ELSE IF (M is “Multiple possible matches found.” FOR each entry in “ ” Aquatic Bacteria Database - Frank O'Hanlon t:Test r:‘Row’: genus-species-strain ?:Not supplied KB[r]:Strain KB[r,t]:Test result of t, strain r I[t]:Sample test result M:Holds all KB(r) which match

Appendix B: Assumptions i.The KB aspires to completeness: all entries have full, unambiguous test results. All new entries must obey this. i.This can be altered. It is possible to accommodate incomplete entries, to work with partial information. Awaiting confirmation from Dr Crumlish on the generality of such assumptions ii.Strains are coherent: a strain’s profile cannot be duplicated by another distinct strain. iii.Strains have one profile: the same strain cannot yield two distinct profiles. iv.Diagnosis is conducted one sample at a time. i.Again, it is possible to imagine methods of input (e.g. via correctly prepared standardised spreadsheets) which can be accessed easily allowing users to diagnose more than one sample at one time. (Harness the power of computing!) v.A match is a valid if all known and specified tests of the input tests are correctly identified to strain(s) in the KB. i.We can conceive of this being scalable: a user may elect for, say, ‘only 80% of inputs’ to qualify for the diagnosis, given some reason for the uncertainty. Moreover, output could be a scale of ‘most accurate matches’ down to a lower threshold (even as far as simply ordering the KB with ‘most accurate matches’ given first.) Even more, it could be imagine a client could ask “Can we be sure >sample subset of strains subset of strains<”. Aquatic Bacteria Database - Frank O'Hanlon