Presentation is loading. Please wait.

Presentation is loading. Please wait.

Open Source Solutions for Tissue Banking Informatics Jules J. Berman, Ph.D., M.D. INFORMATICS FOR REPOSITORIES Wednesday, May 21, 2008 3:30 pm – 4:05 pm.

Similar presentations


Presentation on theme: "Open Source Solutions for Tissue Banking Informatics Jules J. Berman, Ph.D., M.D. INFORMATICS FOR REPOSITORIES Wednesday, May 21, 2008 3:30 pm – 4:05 pm."— Presentation transcript:

1 Open Source Solutions for Tissue Banking Informatics Jules J. Berman, Ph.D., M.D. INFORMATICS FOR REPOSITORIES Wednesday, May 21, 2008 3:30 pm – 4:05 pm

2 Approaches to finding open source solutions 1. Generalize (don't specialize). Wherever possible, don't think of your tissue repository problems as being unique. Try to think of your problems as instances of very general informatics problems. In most cases, the same open source solutions that work for bioinformaticians, astronomers, and factory inventories will likely work for you

3 Approaches to finding open source solutions 2. Learn a popular open source programming language that is easy to learn and that is supplemented by an enthusiastic biomedical community Perl Python Ruby

4 Approaches to finding open source solutions 3. Use open source, unencumbered nomenclatures, codes, syntactic formats. Otherwise, can't share or post data through web MESH (standard, open source, free) UMLS (standard, encumbered) SNOMED (standard, encumbered) Neoplasm Classification (non-standard, open source, free, standard syntax – XML, RDF) http://www.julesberman.info/

5 Approaches to finding open source solutions 4. Use an open source and general data syntax HTML (formatting and linking) XML (describing data) RDF (getting meaning from described data)

6

7 All data can be specified using RDF, developed by the W3C. RDF files are collections of statements expressed as data triples “Jules Berman” “blood glucose level” “85” “Mary Smith” “eye color” “brown” “Samuel Rice” “eye color” “blue” “Jules Berman” “eye color” “brown” When you bind a key/value pair to a specified object, you're moving from the realm of data structure (i.e., XML) into the realm of data meaning.

8 Medical file: “Jules Berman” “blood glucose level” “85” “Mary Smith” “eye color” “brown” “Samuel Rice” “eye color” “blue” “Jules Berman” “eye color” “brown” Merged Jules Berman database: “Jules Berman” “blood glucose level” “85” “Jules Berman” “eye color” “brown” “Jules Berman” “hat size” “9” Hat file: “Sally Frann” “hat size” “8” “Jules Berman” “hat size” “9” “Fred Garfield” “hat size” “9” “Fred Garfield” “hat_type” “bowler” RDF permits data to be merged between different files

9

10 Approaches... 5. Use open source utilities – not software applications (open source or otherwise)l Utilities are simple programs that do one type of job, very well. Often work from command-line (i.e., no GUI) Once you've mastered a dozen or so utilities, you can handle most informatics task that you'll come across. Applications are often complex and seldom provide the functionality you need (now or future).

11 Approaches... 6. Learn the algorithms for your discipline. Algorithms are process descriptions that work every time. Most informatics algorithms can be implemented in under ten lines of software code You can think of software applications as many algorithms working under a GUI If you really understand algorithms, you can make important contributions to your field.

12 Approaches... 7. De-emphasize standards. Most standards are difficult to understand, and there are many of them, often covering obscure domains. Many standards are just bad. Data kept in a standard today may be non- standard legacy data tomorrow. Unlike physical standards, standards are transformable (so why fuss over any one standard?). Standards can be encumbered

13

14 Specifications often a better solution than Standards Specifications are just descriptions of your data. A specification requires a common language for describing data (so that you and your computer can understand what it's trying to convey). Specifications give you enormous freedom to create and describe new and unconventional data objects. Usually done in RDF If you've specified your data well, you can port between standards when you need to.

15 Example: Pathology image annotation

16 Important descriptors of an image might include: File information Image capture information Image format information Specimen information Patient information Pathology information Region of interest information

17 JPEG is an image format that is used by millions of people in all types of professions, including the medical profession JPEG can now be used without worrying about IP issues You can put any information you want into the header of a JPEG image (including an RDF document) so that specified clinical/pathological information can be conveyed with the image Because images non-physical, it is usually easy to interconvert image formats

18 By annotating our images, we can ensure that the image conveys meaning and value By using RDF, we can ensure that the individual triples can be integrated with heterogeneous data sources beyond those of images. By using pre-existing international general standards for describing any kind of data, we attain interoperability and avoid the confusion and complexity that occurs whenever a new standard is created. See: http://www.julesberman.info/spec2img.htm

19

20

21

22

23 Would you like to write a Tissue Respository/Tissue Informatics book? jjberman@alum.mit.edu


Download ppt "Open Source Solutions for Tissue Banking Informatics Jules J. Berman, Ph.D., M.D. INFORMATICS FOR REPOSITORIES Wednesday, May 21, 2008 3:30 pm – 4:05 pm."

Similar presentations


Ads by Google