© 2005 Bioinformatics Indiana University April, 27 2005::: Troy Campbell Advisors: Mehmet Dalkilic, Informatics Claudia Johnson, Paleontology Erika Elswick,

Slides:



Advertisements
Similar presentations
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
Advertisements

Technical BI Project Lifecycle
Management Information Systems, Sixth Edition
Chapter 3 Database Management
Chapter 1 Overview of Databases and Transaction Processing.
INTRODUCTION TO DATABASE USING MS ACCESS 2013 PART 2 NOVEMBER 4, 2014.
Chapter 11 Databases.
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
BioKnOT Biological Knowledge through Ontology and TFIDF By: James Costello Advisor: Mehmet Dalkilic.
Discovering Gene-Disease Association using On-line Scientific Text Abstracts. Raj Adhikari Advisor: Javed Mostafa.
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
Microsoft Access Designing and creating tables and populating data.
Okalo Daniel Ikhena Dr. V. Z. Këpuska December 7, 2007.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Pooja Sharma Shanti Ragathi Vaishnavi Kasala. BUSINESS BACKGROUND Lowe's started as a single hardware store in North Carolina in 1946 and since then has.
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
A System for Automatic Personalized Tracking of Scientific Literature on the Web Tzachi Perlstein Yael Nir.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
Chapter 1 Overview of Databases and Transaction Processing.
© 2017 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
Data Mining for Expertise: Using Scopus to Create Lists of Experts for U.S. Department of Education Discretionary Grant Programs Good afternoon, my name.
Jacynthe Touchette, MSI JGH Health Sciences Library
Popular Database Management Systems
Prepared By: Bobby Wan Microsoft Access Prepared By: Bobby Wan
Data Mining – Intro.
THE PROFESSIONAL SERVICES CATEGORY HALLWAY
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Data warehouse and OLAP
Using computers to search electronic databases
Fundamentals & Ethics of Information Systems IS 201
Chapter 13 The Data Warehouse
Personalized Social Image Recommendation
User Interface HEP Summit, DESY, May 2008
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Federated & Meta Search
Databases.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Intro To Design 1 Elementary School Library: User Sub-System Class Diagrams Software Engineering CSCI-3321 Dr. Tom Hicks Computer Science Department.
MD Online IEP System Instructional Series – PD Activity
Scopus - Elsevier (Advanced Course Module 8)
Increased Efficiency and Effectiveness
MANAGING DATA RESOURCES
University of Houston-Clear Lake Kaiser Permanente San Jose
Data Warehousing and Data Mining
An Introduction to Data Warehousing
C.U.SHAH COLLEGE OF ENG. & TECH.
Chapter 1: The Database Environment
MANAGING DATA RESOURCES
IL Step 3: Using Bibliographic Databases
Introduction of KNS55 Platform
Introduction of Week 9 Return assignment 5-2
Chapter 1: The Database Environment
Chapter 1 Database Concepts.
The Database Environment
Spreadsheets, Modelling & Databases
Data Warehousing Concepts
Chapter 3 Database Management
Databases This topic looks at the basic concept of a database, the key features and benefits of a Database Management System (DBMS) and the basic theory.
Microsoft Office Illustrated Fundamentals
The Database Environment
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
Scopus - Elsevier (Advanced Course: Module 8)
SDMX IT Tools SDMX Registry
Presentation transcript:

© 2005 Bioinformatics Indiana University April, ::: Troy Campbell Advisors: Mehmet Dalkilic, Informatics Claudia Johnson, Paleontology Erika Elswick, Paleontology Paleoinformatics : Bringing the Future to the Past School of Informatics Indiana University Bloomington, Indiana

© 2005 Bioinformatics Indiana University April, ::: Talk Outline Motivation & Background Problem Statement Existing Solutions PASTT Conclusions & Future Work Motivation & Background <<

© 2005 Bioinformatics Indiana University April, ::: 1 Paleontology Time x Space (t now, s now ) is annotated (t act, s act ) is what we want Motivation & Background <<

© 2005 Bioinformatics Indiana University April, ::: The History of the Earth in 1 hour Motivation & Background <<

© 2005 Bioinformatics Indiana University April, ::: Paleontology Dimensions Time (20 different scales exist) Space (3 Dimensional itself) Species (each with a unique set of descriptors) The challenge is organizing and visualizing all 3 major dimensions together Motivation & Background <<

© 2005 Bioinformatics Indiana University April, ::: Literature Resources Problem No link between databases and relevant publications Paleontology Journals slowly becoming available online Like in biology, one word has many meanings, which degrades the language Only Keyword search currently available, thus context is not considered Motivation & Background <<

© 2005 Bioinformatics Indiana University April, ::: Type Collections Type Collection is a physical repository of newly discovered species Def. Type: A published, new species Primary mechanism that drives and validates discoveries in paleontology Motivation & Background <<

© 2005 Bioinformatics Indiana University April, ::: IU Type Collection Most important local data set IU Type Collection contains information and fossils of over 17K discovered species –UNIQUE –Ex. Bryozoan Specimens located in many places No physical Tag ID not maintained Motivation & Background <<

© 2005 Bioinformatics Indiana University April, ::: Points to Consider Important data is likely being lost Data is too hard to find (too many locations) Research takes many unnecessary tedious hours to find specimens Ex. Inputting 4 example Types for demonstration purposes –Required 3 faculty –Lasted 3 hours There needs to be a better way to research, store, and manage data. Motivation & Background <<

© 2005 Bioinformatics Indiana University April, ::: Type Collection Problem We must consolidate this information in a digital format to manage the collection People need to be able to search this collection –For discovery of other species –For research Motivation & Background Problem Statement <<

© 2005 Bioinformatics Indiana University April, ::: CHRONOS “an interactive network of federated data and tools for sedimentary geology and paleobiology” – Chronos Provide GIS, certain search capability on several large databases. Schema is basically one large table. Cannot run “ad hoc” queries Currently a collection of disconnected tools Motivation & Background Problem Statement Existing Solutions <<

© 2005 Bioinformatics Indiana University April, ::: CHRONOS Motivation & Background Problem Statement Existing Solutions <<

© 2005 Bioinformatics Indiana University April, ::: Current IU Type Collection Motivation & Background Problem Statement Existing Solutions << Memo’s Classic Movies

© 2005 Bioinformatics Indiana University April, ::: 2 PASTT Architecture Motivation & Background Problem Statement Existing Solutions PASTT <<

© 2005 Bioinformatics Indiana University April, ::: PaleoKnOT Paleontological Knowledge Ontology and TFIDF Initial work was conducted by Mehmet Dalkilic and Jim Costello on last year’s Capstone, BioKnOT Uses a local database populated from GeoRef Over 6,100 articles with abstracts in the database. Utilizes LUCAS, a web service provided by the School of Library and Information Science –Developed by Javed Mostafa and Yueyu Fu Motivation & Background Problem Statement Existing Solutions PASTT <<

© 2005 Bioinformatics Indiana University April, ::: PaleoKnOT Flow Chart Motivation & Background Problem Statement Existing Solutions PASTT <<

© 2005 Bioinformatics Indiana University April, ::: Step by Step Filtering Initial Search: Reduces the set of documents to only those where keyword(s) are present in the title or abstract TFIDF: Term Frequency * Inverse Document Frequency web service generates the most relevant terms based on keyword search Motivation & Background Problem Statement Existing Solutions PASTT <<

© 2005 Bioinformatics Indiana University April, ::: Step by Step Filtering Users choose terms from the list that are of importance. An option is given to enter in a small description of the search in sentence format. Users can then choose to weight relationships that are found in the abstracts. Motivation & Background Problem Statement Existing Solutions PASTT <<

© 2005 Bioinformatics Indiana University April, ::: Results Users can select a dynamically generated link to IUCAT to view the full text or find the hard copy Term-relationship ontologies are also available Full Citation can also be displayed for each result Motivation & Background Problem Statement Existing Solutions PASTT <<

© 2005 Bioinformatics Indiana University April, ::: The Data IU Type Collection database holds all descriptive information on the discovered species. –Stratigraphy –Characteristics –Taxonomy –Time of Existence Motivation & Background Problem Statement Existing Solutions PASTT <<

© 2005 Bioinformatics Indiana University April, ::: Type Collection Data Warehouse Search is the most important aspect of the type collection Space is not an issue A Data Warehouse model is put in place to capture the dimensions of the type collection Motivation & Background Problem Statement Existing Solutions PASTT <<

© 2005 Bioinformatics Indiana University April, ::: Data Warehousing Warehousing is used mostly in the business world Online Analytical Processing (OLAP) generates a large amount of data that can be “mined” for decision support systems. Most famous Data Warehouse: Wal-mart We use the warehousing “star schema” because it models the data in an easily searchable way. In General: We have lots of data, and we need to get knowledge from it as easily as possible. Motivation & Background Problem Statement Existing Solutions PASTT <<

© 2005 Bioinformatics Indiana University April, ::: What is a Star Schema? Star Schemas excel at search because they use a level of redundancy This allows users to easily “drill down” without adding extra tables –Drill Down means you access information by starting out general and becoming more specific, reducing the size of the results. Update and Insert issues are not a priority as they are rare. Not the best, but most ubiquitous Motivation & Background Problem Statement Existing Solutions PASTT <<

© 2005 Bioinformatics Indiana University April, ::: A Look at a Star Schema One table connects the all the other tables together. Main linking table is called the Fact table All other tables are called Dimension tables There may be several entries in the fact table to describe one discovery event. Dimension tables will be very wide (many attributes) but not deep Fact table will be narrow, but potentially could be very deep (many rows) Motivation & Background Problem Statement Existing Solutions PASTT <<

© 2005 Bioinformatics Indiana University April, ::: Our Schema Motivation & Background Problem Statement Existing Solutions PASTT <<

© 2005 Bioinformatics Indiana University April, ::: TC Web Interface Allows users to keyword search on the type collection Returns a list of matching specimens according to the dimensions Users can get the full details of the specimen Keywords are generated that link directly to the PaleoKnOT for direct search of available literature Motivation & Background Problem Statement Existing Solutions PASTT <<

© 2005 Bioinformatics Indiana University April, ::: Flow of Web Interface Motivation & Background Problem Statement Existing Solutions PASTT <<

© 2005 Bioinformatics Indiana University April, ::: Demonstration PaleoKnOT Type Col. Data Warehouse Motivation & Background Problem Statement Existing Solutions PASTT <<

© 2005 Bioinformatics Indiana University April, ::: Results: Strengths PaleoKnOT can customize searches in without any boolean search knowledge Data Warehouse is the first at IU in digitizing paleontological data Unique Star Schema model will allow fast search Motivation & Background Problem Statement Existing Solutions PASTT Conclusions & Future Work <<

© 2005 Bioinformatics Indiana University April, ::: Issues PaleoKnOT has a limited size database –Out of the 20,000 articles downloaded, only 6,073 have abstracts Code needs to be more efficient Need more entries into Data Warehouse to experiment its uses more (bottleneck) Motivation & Background Problem Statement Existing Solutions PASTT Conclusions & Future Work <<

© 2005 Bioinformatics Indiana University April, ::: Conclusions The group we work with is very excited to launch paleoinformatics.org We hope to gather some user feedback soon Motivation & Background Problem Statement Existing Solutions PASTT Conclusions & Future Work <<

© 2005 Bioinformatics Indiana University April, ::: Future Work Awarded Multidisciplinary Ventures and Seminars Fund –Pays for Type Collection data entry –Funds future work on PASTT Track physical locations of the Types. Motivation & Background Problem Statement Existing Solutions PASTT Conclusions & Future Work <<

© 2005 Bioinformatics Indiana University April, ::: Special Thanks Memo, Claudia and Erika Haixu Tang Marty Siegel Andrew Albrecht Jim Costello Motivation & Background Problem Statement Existing Solutions PASTT Conclusions & Future Work <<

© 2005 Bioinformatics Indiana University April, ::: Be kind to your history Motivation & Background Problem Statement Existing Solutions PASTT Conclusions & Future Work << Avoid Fossil Abuse