Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2005 Bioinformatics Indiana University April, 27 2005::: Troy Campbell Advisors: Mehmet Dalkilic, Informatics Claudia Johnson, Paleontology Erika Elswick,

Similar presentations


Presentation on theme: "© 2005 Bioinformatics Indiana University April, 27 2005::: Troy Campbell Advisors: Mehmet Dalkilic, Informatics Claudia Johnson, Paleontology Erika Elswick,"— Presentation transcript:

1 © 2005 Bioinformatics Indiana University April, 27 2005::: Troy Campbell Advisors: Mehmet Dalkilic, Informatics Claudia Johnson, Paleontology Erika Elswick, Paleontology Paleoinformatics : Bringing the Future to the Past School of Informatics Indiana University Bloomington, Indiana 01011011110101 101001 1101101 10011 1001 0101 1100001 10001010111 10110101 10101 101010101 101010101001 10101 1011101 111110101 1011101 101101 111010

2 © 2005 Bioinformatics Indiana University April, 27 2005::: Talk Outline Motivation & Background Problem Statement Existing Solutions PASTT Conclusions & Future Work Motivation & Background <<

3 © 2005 Bioinformatics Indiana University April, 27 2005::: 1 Paleontology Time x Space (t now, s now ) is annotated (t act, s act ) is what we want Motivation & Background <<

4 © 2005 Bioinformatics Indiana University April, 27 2005::: The History of the Earth in 1 hour Motivation & Background <<

5 © 2005 Bioinformatics Indiana University April, 27 2005::: Paleontology Dimensions Time (20 different scales exist) Space (3 Dimensional itself) Species (each with a unique set of descriptors) The challenge is organizing and visualizing all 3 major dimensions together Motivation & Background <<

6 © 2005 Bioinformatics Indiana University April, 27 2005::: Literature Resources Problem No link between databases and relevant publications Paleontology Journals slowly becoming available online Like in biology, one word has many meanings, which degrades the language Only Keyword search currently available, thus context is not considered Motivation & Background <<

7 © 2005 Bioinformatics Indiana University April, 27 2005::: Type Collections Type Collection is a physical repository of newly discovered species Def. Type: A published, new species Primary mechanism that drives and validates discoveries in paleontology Motivation & Background <<

8 © 2005 Bioinformatics Indiana University April, 27 2005::: IU Type Collection Most important local data set IU Type Collection contains information and fossils of over 17K discovered species –UNIQUE –Ex. Bryozoan Specimens located in many places No physical Tag ID not maintained Motivation & Background <<

9 © 2005 Bioinformatics Indiana University April, 27 2005::: Points to Consider Important data is likely being lost Data is too hard to find (too many locations) Research takes many unnecessary tedious hours to find specimens Ex. Inputting 4 example Types for demonstration purposes –Required 3 faculty –Lasted 3 hours There needs to be a better way to research, store, and manage data. Motivation & Background <<

10 © 2005 Bioinformatics Indiana University April, 27 2005::: Type Collection Problem We must consolidate this information in a digital format to manage the collection People need to be able to search this collection –For discovery of other species –For research Motivation & Background Problem Statement <<

11 © 2005 Bioinformatics Indiana University April, 27 2005::: CHRONOS “an interactive network of federated data and tools for sedimentary geology and paleobiology” – Chronos Provide GIS, certain search capability on several large databases. Schema is basically one large table. Cannot run “ad hoc” queries Currently a collection of disconnected tools Motivation & Background Problem Statement Existing Solutions <<

12 © 2005 Bioinformatics Indiana University April, 27 2005::: CHRONOS Motivation & Background Problem Statement Existing Solutions <<

13 © 2005 Bioinformatics Indiana University April, 27 2005::: Current IU Type Collection Motivation & Background Problem Statement Existing Solutions << Memo’s Classic Movies

14 © 2005 Bioinformatics Indiana University April, 27 2005::: 2 PASTT Architecture Motivation & Background Problem Statement Existing Solutions PASTT <<

15 © 2005 Bioinformatics Indiana University April, 27 2005::: PaleoKnOT Paleontological Knowledge Ontology and TFIDF Initial work was conducted by Mehmet Dalkilic and Jim Costello on last year’s Capstone, BioKnOT Uses a local database populated from GeoRef Over 6,100 articles with abstracts in the database. Utilizes LUCAS, a web service provided by the School of Library and Information Science –Developed by Javed Mostafa and Yueyu Fu Motivation & Background Problem Statement Existing Solutions PASTT <<

16 © 2005 Bioinformatics Indiana University April, 27 2005::: PaleoKnOT Flow Chart Motivation & Background Problem Statement Existing Solutions PASTT <<

17 © 2005 Bioinformatics Indiana University April, 27 2005::: Step by Step Filtering Initial Search: Reduces the set of documents to only those where keyword(s) are present in the title or abstract TFIDF: Term Frequency * Inverse Document Frequency web service generates the most relevant terms based on keyword search Motivation & Background Problem Statement Existing Solutions PASTT <<

18 © 2005 Bioinformatics Indiana University April, 27 2005::: Step by Step Filtering Users choose terms from the list that are of importance. An option is given to enter in a small description of the search in sentence format. Users can then choose to weight relationships that are found in the abstracts. Motivation & Background Problem Statement Existing Solutions PASTT <<

19 © 2005 Bioinformatics Indiana University April, 27 2005::: Results Users can select a dynamically generated link to IUCAT to view the full text or find the hard copy Term-relationship ontologies are also available Full Citation can also be displayed for each result Motivation & Background Problem Statement Existing Solutions PASTT <<

20 © 2005 Bioinformatics Indiana University April, 27 2005::: The Data IU Type Collection database holds all descriptive information on the discovered species. –Stratigraphy –Characteristics –Taxonomy –Time of Existence Motivation & Background Problem Statement Existing Solutions PASTT <<

21 © 2005 Bioinformatics Indiana University April, 27 2005::: Type Collection Data Warehouse Search is the most important aspect of the type collection Space is not an issue A Data Warehouse model is put in place to capture the dimensions of the type collection Motivation & Background Problem Statement Existing Solutions PASTT <<

22 © 2005 Bioinformatics Indiana University April, 27 2005::: Data Warehousing Warehousing is used mostly in the business world Online Analytical Processing (OLAP) generates a large amount of data that can be “mined” for decision support systems. Most famous Data Warehouse: Wal-mart We use the warehousing “star schema” because it models the data in an easily searchable way. In General: We have lots of data, and we need to get knowledge from it as easily as possible. Motivation & Background Problem Statement Existing Solutions PASTT <<

23 © 2005 Bioinformatics Indiana University April, 27 2005::: What is a Star Schema? Star Schemas excel at search because they use a level of redundancy This allows users to easily “drill down” without adding extra tables –Drill Down means you access information by starting out general and becoming more specific, reducing the size of the results. Update and Insert issues are not a priority as they are rare. Not the best, but most ubiquitous Motivation & Background Problem Statement Existing Solutions PASTT <<

24 © 2005 Bioinformatics Indiana University April, 27 2005::: A Look at a Star Schema One table connects the all the other tables together. Main linking table is called the Fact table All other tables are called Dimension tables There may be several entries in the fact table to describe one discovery event. Dimension tables will be very wide (many attributes) but not deep Fact table will be narrow, but potentially could be very deep (many rows) Motivation & Background Problem Statement Existing Solutions PASTT <<

25 © 2005 Bioinformatics Indiana University April, 27 2005::: Our Schema Motivation & Background Problem Statement Existing Solutions PASTT <<

26 © 2005 Bioinformatics Indiana University April, 27 2005::: TC Web Interface Allows users to keyword search on the type collection Returns a list of matching specimens according to the dimensions Users can get the full details of the specimen Keywords are generated that link directly to the PaleoKnOT for direct search of available literature Motivation & Background Problem Statement Existing Solutions PASTT <<

27 © 2005 Bioinformatics Indiana University April, 27 2005::: Flow of Web Interface Motivation & Background Problem Statement Existing Solutions PASTT <<

28 © 2005 Bioinformatics Indiana University April, 27 2005::: Demonstration PaleoKnOT Type Col. Data Warehouse Motivation & Background Problem Statement Existing Solutions PASTT <<

29 © 2005 Bioinformatics Indiana University April, 27 2005::: Results: Strengths PaleoKnOT can customize searches in without any boolean search knowledge Data Warehouse is the first at IU in digitizing paleontological data Unique Star Schema model will allow fast search Motivation & Background Problem Statement Existing Solutions PASTT Conclusions & Future Work <<

30 © 2005 Bioinformatics Indiana University April, 27 2005::: Issues PaleoKnOT has a limited size database –Out of the 20,000 articles downloaded, only 6,073 have abstracts Code needs to be more efficient Need more entries into Data Warehouse to experiment its uses more (bottleneck) Motivation & Background Problem Statement Existing Solutions PASTT Conclusions & Future Work <<

31 © 2005 Bioinformatics Indiana University April, 27 2005::: Conclusions The group we work with is very excited to launch paleoinformatics.org We hope to gather some user feedback soon Motivation & Background Problem Statement Existing Solutions PASTT Conclusions & Future Work <<

32 © 2005 Bioinformatics Indiana University April, 27 2005::: Future Work Awarded Multidisciplinary Ventures and Seminars Fund –Pays for Type Collection data entry –Funds future work on PASTT Track physical locations of the Types. Motivation & Background Problem Statement Existing Solutions PASTT Conclusions & Future Work <<

33 © 2005 Bioinformatics Indiana University April, 27 2005::: Special Thanks Memo, Claudia and Erika Haixu Tang Marty Siegel Andrew Albrecht Jim Costello Motivation & Background Problem Statement Existing Solutions PASTT Conclusions & Future Work <<

34 © 2005 Bioinformatics Indiana University April, 27 2005::: Be kind to your history Motivation & Background Problem Statement Existing Solutions PASTT Conclusions & Future Work << Avoid Fossil Abuse


Download ppt "© 2005 Bioinformatics Indiana University April, 27 2005::: Troy Campbell Advisors: Mehmet Dalkilic, Informatics Claudia Johnson, Paleontology Erika Elswick,"

Similar presentations


Ads by Google