SHHS Data Dissemination to the Research/Academic Community: Needs and Opportunities Kenneth A. Loparo Nord Professor of Engineering EECS Department Case.

Slides:



Advertisements
Similar presentations
DDI for the Uninitiated ACCOLEDS /DLI Training: December 2003 Ernie Boyko Statistics Canada Chuck Humphrey University of Alberta.
Advertisements

Recruitment Booster.
13:10:58 A New Tool for Mapping Microarray Data onto the Gene Ontology Structure ( Abstract e GOn (explore Gene Ontology) is a.
The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.
© Tefko Saracevic, Rutgers University1 Services in digital libraries Following functions? Following new capabilities?
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
Workshop to develop guidelines for Electronic Interchange of Polysomnography Data January 27, 2006 Philadelphia, PA 1 OpenXDF Discussion.
1 SleepXML - XML Application For Polysomnographic Data Storage And Exchange Supported by NIH SBIR Grant # 1R43 HL Phase I Funding period: August.
EDF and EDF+ Bob Kemp Leiden University, Neurology Westeinde Hospital, Sleep Centre
Tutorial 8 Sharing, Integrating and Analyzing Data
Project Communications Management J. S. Chou Assistant Professor.
Impact of Open Source Library Automation System on Public Library Users Barbara Albee Hsin-liang Chen SLIS, Indiana University.
Software Development Unit 2 Databases What is a database? A collection of data organised in a manner that allows access, retrieval and use of that data.
Electronic EDI e-EDI. The EDI has been in use since 1999 using a paper-based system and computerized spreadsheets to collect and manage EDI data. Over.
REDCap Overview Institute for Clinical and Translational Science Heath Davis Fred McClurg Brian Finley.
HTML Comprehensive Concepts and Techniques Intro Project Introduction to HTML.
Developing Health Geographic Information Systems (HGIS) for Khorasan Province in Iran (Technical Report) S.H. Sanaei-Nejad, (MSc, PhD) Ferdowsi University.
Interpretation of Polysomnography
WFDB / PhysioNet Formats George B. Moody Harvard-MIT Division of Health Sciences and Technology Cambridge, Massachusetts.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Objectives Overview Define the term, database, and explain how a database interacts with data and information Define the term, data integrity, and describe.
AON Data Questionnaire Results 21 Respondents Last Updated 27 March 2007 First AON PI Meeting Scot Loehrer, Jim Moore.
Online Autonomous Citation Management for CiteSeer CSE598B Course Project By Huajing Li.
Computing Fundamentals Module Lesson 19 — Using Technology to Solve Problems Computer Literacy BASICS.
DLI Workshop -- Mar Hosted by Dalhousie University March 2000 DLI Training Workshop.
THE TUH EEG CORPUS: A Big Data Resource for Automated EEG Interpretation A. Harati, S. López, I. Obeid and J. Picone Neural Engineering Data Consortium.
IBIS-Admin New Mexico’s Web-based, Public Health Indicator, Content Management System.
Meet and Confer Rule 26(f) of the Federal Rules of Civil Procedure states that “parties must confer as soon as practicable - and in any event at least.
File Systems (1). Readings r Reading: Disks, disk scheduling (3.7 of textbook; “How Stuff Works”) r Reading: File System Implementation ( of textbook)
NPSG Data Exchange Where We Are Today and What We Need David M. Rapoport, MD NYU School of Medicine Philadelphia January 27, 2006.
An Introduction to the Inquiry Page and the Biology Workbench An Introduction to the Inquiry Page and the Biology Workbench Anu Murphy Department of Molecular.
11 3 / 12 CHAPTER Databases MIS105 Lec15 Irfan Ahmed Ilyas.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Facilitate Scientific Data Sharing by Sharing Informatics Tools and Standards Belinda Seto and James Luo National Institute of Biomedical Imaging and Bioengineering.
IBISAdmin Utah’s Web-based Public Health Indicator Content Management System.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
ITGS Databases.
EPI 218 Queries and On-Screen Forms Michael A. Kohn, MD, MPP 9 August 2012.
REDCap Overview Institute for Clinical and Translational Science Fred McClurg Neil Nuehring.
REDCap Overview Institute for Clinical and Translational Science Heath Davis Fred McClurg Brian Finley.
Data resource management
PubMed Overview From the main HINARI webpage, we can access PubMed by clicking on Search HINARI journal articles through PubMed (Medline). Note: If you.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
Computing Fundamentals Module Lesson 6 — Using Technology to Solve Problems Computer Literacy BASICS.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
CONTENTS  Definition And History  Basic services of INTERNET  The World Wide Web (W.W.W.)  WWW browsers  INTERNET search engines  Uses of INTERNET.
Agency for Healthcare Research and Quality Advancing Excellence in Health Care A Web-Based Tool for Quality and Utilization Reporting Visit.
TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH Data Preparation: An Introduction to Getting Data Ready for Analysis.
3/6: Data Management, pt. 2 Refresh your memory Relational Data Model
Generating Summaries from FOT Data ITS World Congress, Detroit 2014 Dr. Sami Koskinen, VTT
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Data Analysis Generation of the corpus statistics was accomplished through the analysis of information contained in the EDF headers. Figure 4 shows some.
Microsoft Office 2013 Try It! Chapter 4 Storing Data in Access.
CHAPTER 9 File Storage Shared Preferences SQLite.
Data Coordinating Center University of Washington Department of Biostatistics Elizabeth Brown, ScD Siiri Bennett, MD.
University of Colorado at Denver and Health Sciences Center Department of Preventive Medicine and Biometrics Contact:
Software Application Overview
Overview of MDM Site Hub
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
N. Capp, E. Krome, I. Obeid and J. Picone
MANAGING DATA RESOURCES
Unit# 5: Internet and Worldwide Web
Health On-Line Patient Education Web Site
Lesson 3 Bioinformatics Laboratory
Final Design Authorization
The Internet and Electronic mail
Presentation transcript:

SHHS Data Dissemination to the Research/Academic Community: Needs and Opportunities Kenneth A. Loparo Nord Professor of Engineering EECS Department Case Western Reserve University

PSG Data Dissemination Site Case Sleep and Epidemiology Research Center Case Western Reserve University SHHS Data Dissemination Project: The objective of the web-based SHHS Reading Center data dissemination project is to provide researchers with PSG and covariate data collected from participating SHHS centers over many years. The current implementation supports sleep research directed at (1) investigating clinical or epidemiological associations, (2) developing or testing scoring algorithms, or (3) cross-validating findings. Due to the large number of subjects, the web site provides a very convenient way to query and select studies that meet the current needs of sleep researchers. The electronic distribution of the data makes the data accessible to researchers worldwide.

Challenges (1)Data Format: PSG data from various studies are collected with a variety of proprietary data formats depending upon the collection and scoring software utilized. A common non-proprietary format for signal data is required. (2) Matching PSG with Covariate Data: Covariate data is essential to ensure that the population of PSG subjects represents the desired population of interest. The selection of studies from a large dataset for any research project involves applying often complex exclusion and inclusion criteria.

Implementation Data is downloadable in EDF format with and without annotations The SHHS database contains raw PSG signals, summary PSG variables, covariate data, and additional data including which includes other physiological measures and questionnaire data. The dynamic query tool allows the investigator to specify research criteria online using key covariate data to determine how many available studies meet their criteria.

PSG Data Dissemination Site Case Sleep and Epidemiology Research Center Case Western Reserve University Web Site Operation: Step 1: Create query based on demographic and covariate variables (e.g. age, gender, race, BMI, …). For example, a researcher can select all the subjects who are in the age range of 60 to 70 years old, BMI between 25 to 30, Male with certain Arousal Index. Step 2:Choose the data format for download. Currently the data can be obtained in European Data Format (EDF) with and without annotations in the following formats: a) EDF (European Data Format): Raw PSG data in binary format b) XML: Study annotations c) Individual Level Dataset (TXT): Covariate Variables in text format d) Individual Level Dataset (XML): Covariate Variables in XML format e) Summary Dataset (TXT): Statistical Summary of the covariate variables in text format f) Summary Dataset (XML): Statistical Summary of the covariate variables in XML format

PSG Data Dissemination Site Case Sleep and Epidemiology Research Center Case Western Reserve University Web Site Operation (cont.) Step 3: Variable selection for an individual level dataset or summary dataset (access to individual level data is restricted to SHHS collaborators, non collaborators only receive a statistical summary of the dataset). Note: Selection of covariate variables is accomplished in a hierarchical manner- variable classes, then variable sub- classes and then individual variables. Step 4: After variable selection a summary of the query is presented for verification before submission for processing. After the query is processed and their data is compiled users will receive an with the link to the FTP site to download their studies. Step 5:With a username and password, a user can log into the FTP server and download their studies.

Case Sleep and Epidemiology Research Center Case Western Reserve University EDF Data Format for SHHS Data Dissemination: European Data Format (EDF) has been in use for PSG data since the 1987 international Sleep Congress in Copenhagen EDF is a simple and straightforward format for storing and exchanging multi-channel biological signals over a variety of platforms EDF does not easily handle annotations/events directly

PSG Data Dissemination Site Case Sleep and Epidemiology Research Center Case Western Reserve University A Typical PSG Study and EDF Data Specification: CHANNELSAMPLING RATERESOLUTION SaO2 (Systemic arterial Oxygen saturation)1 Hz16 bit H.R. (Heart Rate)1 Hz16 bit EEG (Electroencephalogram)125 Hz8 bit EEG (second)125 Hz8 bit ECG (Electrocardiogram)250 Hz8 bit EMG (Electromyogram) (Chin or Leg Movement)125 Hz8 bit EOG (L) (Electrooculography) (Eye Movement)50 Hz8 bit EOG (R)50 Hz8 bit SOUND10 Hz8 bit AIRFLOW10 Hz8 bit THOR RES (Thoracic Respiratory)10 Hz8 bit ABDO RES (Abdominal Respiratory)10 Hz8 bit POSITION1 Hz16 bit LIGHT1 Hz16 bit NEWAIR10 Hz16 bit OX stat1 Hz16 bit

PSG Data Dissemination Site Case Sleep and Epidemiology Research Center Case Western Reserve University General Format of an EDF file: HEADER Version of the data format Local patient identification Local recording identification start date of recording Start time of recording Number of bytes in header record Number of data records Duration of a data record, in seconds Number of signals in data record HEADER (Specification of each individual signal) Label Transducer type Physical dimension Physical minimum Physical maximum Digital minimum Digital maximum Pre-filtering Number of samples in each data record DATA RECORD First signal in the data record Second signal in the data record. First signal in the data record Second signal in the data record.

PSG Data Dissemination Site Case Sleep and Epidemiology Research Center Case Western Reserve University Drawbacks of the EDF Format: All signal data are saved as 16 bit in two’s complement. However, not all the requires 16 bit resolution. For example, light status is a binary variable. Annotations must be saved in a different file (This problem is addressed in EDF+ format) Data recording for EDF must be continuous (This problem is addressed in EDF+ format)

Challenges for Data Dissemination to the Research/Academic Community Enabling the new Systems Biology research agenda in sleep Translating this agenda into the data needs of the Research/Academic Community Translating these data needs into a data format/structure that can support the questions being addressed

PhenotypeGenesEnvironment Bioinformatics and Computational Genomics External Stimuli: Positive and Negative InfluencesMeasurements:Understanding: Sleep/Wake Patterns Heart Rate Blood Pressure Temperature … Biology Biology Health Health Disease Disease Systems Biology: An integrative approach to link genotypes/phenotypes with environment and behaviors through physiological measurements to improve understanding

Example Research Problem Investigate the relationship between time series features of EEG sleep and RDI across a population Needs: (1) Search data according to demographic and covariate data (2) Identify epochs across studies where RDI criteria is met, label these epochs according to sleep state (3) Collect all labeled epochs into a file(s) for download

Select and Query the PSG studies based on Events

Current Status Users can search and query the database based on Demographic and Covariate variables. Demographic variables such as: Participant ID, Age, Education Level, Gender, Marital Status, Race Covariate variables such as: Co-morbidities, any Additional Measurements, PSG Study Outcome Data, PSG Study Quality

Needs Search according to Events such as: –different sleep stages –mark the epochs in the studies for specific events such as Apnea, Hypopnea, etc. Provide a tool to select and query the database based on the events in conjunction with demographic and covariate variables.

Methodology First Step: Develop an application for event extraction Second Step: Design a database structure for events Third Step: Develop an application to query the events, find the corresponding studies, extract and compile data in open exchange format for users

Figure taken from: Current Situation: Annotations in XML file Tree structure of a typical XML document

The process of structuring XML documents to database tables Figure from: