Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rensselaer Polytechnic Institute Data Science, Fall 2010 Professor Peter Fox 6961-­2010_A4_GROUP_A Tim Lebo Chitti Shravya Ravi Chad Ruhle Brian Wang Jia.

Similar presentations


Presentation on theme: "Rensselaer Polytechnic Institute Data Science, Fall 2010 Professor Peter Fox 6961-­2010_A4_GROUP_A Tim Lebo Chitti Shravya Ravi Chad Ruhle Brian Wang Jia."— Presentation transcript:

1 Rensselaer Polytechnic Institute Data Science, Fall 2010 Professor Peter Fox 6961-­2010_A4_GROUP_A Tim Lebo Chitti Shravya Ravi Chad Ruhle Brian Wang Jia Zhang Using Someone Else's Data: Living with the Dead http://logd.tw.rpi.edu/demo/living_dead_-_november_2010

2 Outline 1: Data and Metadata: Discovery, Formats, Use Goals 2: Two Questions, Data Analysis, Tools and Methods Used 3: Visual Data, Significance of Findings 4: Data Management Plan http://logd.tw.rpi.edu/demo/living_dead_-_november_2010

3 1: Data Discovery "Living with the Dead" Data Discovery: http://ads.ahds.ac.uk/catalogue/specColl/lwtd Verify and explore an archaeological data set in the study reported by Martin King in 2004. Why "Living with the Dead?" o Easy to find o Interesting and easy-to-understand o Easy to obtain o Provided “good enough” documentation Shravya

4 1: Data and Metadata Formats Data -Three CSV and Four JPG files. Metadata - (Mostly) Self explanatory headers. CSV files were converted to RDF o Better representative structure o Allows for more distributed and analytical functions o Metadata about conversion captured Only difficulty is understanding of the context of the data Shravya

5 2. Data Analysis Background Data: nominal means of measurement unordered entries Dataset: difficult to form relationships between data fields observational values data entries did not differ greatly hard to identify significance Brian

6 2. Data Analysis Questions "Does the displacement between the bodies and tombs display patterns?" empty tomb locations unburied body locations determine if there is a possibility that some vandalism occurred that removed the bodies from the tombs "How did the treatment and context of the bodies change over time?" two data fields, notice change over time Brian

7 2. Data Analysis Group discussed the original data files Worked to understand its meaning Identified discrepancies and outliers Considered missing values, null(s) and error values Uninterpretable values ("class" = 1,4) lacking of relevant metadata makes the meaning unclear Jia Data Validation

8 2. Data Analysis Tools and Methods SPARQL queries were used to extract the required data from the existing data sets. The queries make it possible to discover something "interesting" which are difficult or not possible to observe directly from the original data set. Histogram of chronology (time period), context of remains, and how the dead person was processed. Jia

9 2. Data Analysis Data analysis process: The original data from ADS A parameterized conversion of the data Query construction and execution Query results processing Visualization of results The above steps can be validated by a third party by reviewing three types of artifacts that the group created during this project Inspection of data Reviewing conversion parameters Reviewing processing code (javascript) Jia Analysis validation

10 3: Visual Data, Significance of Findings Google Map plotting Tombs and Bodies Timelines plotting occurrences of o Occurrences of bodies' Treatment types o Occurrences of bodies' Context types http://logd.tw.rpi.edu/demo/living_dead_-_november_2010 Tim

11 3: Map: Tomb and Person Browse region via map Select site from list to focus in map Color and Letter symbols distinguish "Site Data" link leads to Linked Data http://logd.tw.rpi.edu/demo/living_dead_-_november_2010 Tim

12 3: Timelines Four graphs were constructed: Probability density function for treatment data Cumulative density function for treatment data Probability density function for context data Cumulative density function for context data Allowed us to see trends over time -- cumulative density functions proved to be more useful http://logd.tw.rpi.edu/demo/living_dead_-_november_2010 Chad

13 3: Treatment Timelines http://logd.tw.rpi.edu/demo/living_dead_-_november_2010 Chad

14 3: Context Timelines Chad http://logd.tw.rpi.edu/demo/living_dead_-_november_2010

15 3: Visual Data Results Visual inspection of map indicates little correlation of Bodies displaced from Tombs o Distances too large for "vandalism" o Generally uniform distribution over UK o (with slightly higher density of both types in south) Visual inspection of context timelines indicates little correlation between them o The most interesting thing is the sudden rise of "Cists" towards the end of the timeline, overtaking "Caves" o "Occupation debris" also spikes around 4000BC Visual inspection of treatment timelines indicates that disarticulation is consistently more common o Articulation and cremation grow at roughly the same pace o Little correlation between them Chad http://logd.tw.rpi.edu/demo/living_dead_-_november_2010

16 3: Visual Data Management Final demonstration hosted on logd.tw.rpi.edu o Map component developed by Tim on his laptop o Timeline components developed by Chad on his laptop Javascript used for all visuals o Google Maps and Google Annotated Timeline APIs Data dynamically SPARQL-queried to LOGD's triple store o Provides connection to relevant processing and source http://logd.tw.rpi.edu/demo/living_dead_-_november_2010 Tim

17 4: Data Management Plan Logical collection Physical data handling Persistence Interoperability support Security support Data ownership Data dissemination and publication Metadata collection, management, and access Knowledge and information discovery

18 4: Data Management Plan Tim dct:isReferencedBy. Logical collections 3 orig CSVs "column partition" We logically organized by source, dataset, and version

19 4: Data Management Plan Physical data handling LOGD center of physical data handling o Downloaded CSV files o Authored interpretation parameters o Converted and published RDF versions o SPARQL endpoint offers data o Visualization javascript hosted by LOGD Google Docs center of Data Management Plan o Single document for meeting notes, assignment writeup, data exploration notes, and technical documentation Tim

20 4: Data Management Plan Persistence LOGD center of physical data handling o Managed by TWC's LOGD group o Will persist beyond course project o Group is planning backups o Popularity of project aids continued maintenance Google Docs center of Data Management Plan o Relying on Google's massive data centers o Can archive results at end of class to LOGD Submitting back to Archeological Data Service (ADS) Tim

21 4: Data Management Plan Interoperability support Convert CSV format to RDF o Application independent o Follows W3C's 1999 recommendation o Numbers of tools are developed and various operations are available to perform operations o Hosted on SPARQL, allowing application with any type of implementation to access via the common HTTP standard Javascript source code available via web browser for inspection, reuse, and repurposing Jia

22 4: Data Management Plan Security Support Original Data Security o Read-only access to the Archeological Data Service site o Account Access to submit or modify data Analyzed Data Security o Cached versions and converted forms stored and served by Tetherless World Constellation o Read access only, public cannot make changes Jia

23 4: Data Management Plan Data Ownership Common Access Agreement ADS owns all of the data o allow anyone to use it o allow analysis and interpretations on it o non-commercial research or teaching purposes. reproduced, re-hosted, and transformed the data all within the two listed purposes. Brian

24 4: Data Management Plan Data Dissemination and Publication ADS needs to be credited for the data o All forms of dissemination and publications Does not want to be linked or be held responsible for any further analysis or interpretation of their work If any article or document is published, ADS must be given a copy. Brian

25 4: Data Management Plan Metadata Collection, Management, and Access ADS provided one paragraph about how data was collected in HTML, little else Provenance data captured from CSV and JPG files RDF associated with CSVs Our metadata is contained within the data dump downloads Can be queried in the same way the data is (same SPARQL endpoint, same named graph) Chad

26 4: Data Management Plan Knowledge and information discovery Will be added to the dataset collection of LOGD soon. Information discovery through the presentation in this class. Email Martin King with the findings and data, in order to pass it to the ADS community A web site containing the visualizations, information about the data and other findings has been created. Shravya

27 Questions? Essential links Original dataset URL: http://ads.ahds.ac.uk/catalogue/specColl/lwtd RDF dataset URI: http://logd.tw.rpi.edu/source/ads-ahds-ac-uk/dataset/living-with-the-dead/version/2008-Mar-26 Demo URL: http://logd.tw.rpi.edu/demo/living_dead_-_november_2010 Data management plan: https://docs.google.com/document/d/1UqQ45Cz7BJBHGLIv11EmKrr9tobuSHj6grG2bciYgrs This presentation: https://docs.google.com/present/edit?id=0AbTeDpS4-nUDZGNkYzQydm5fMzdnYnduZGpnNw


Download ppt "Rensselaer Polytechnic Institute Data Science, Fall 2010 Professor Peter Fox 6961-­2010_A4_GROUP_A Tim Lebo Chitti Shravya Ravi Chad Ruhle Brian Wang Jia."

Similar presentations


Ads by Google