DATA INTEGRATION FOR LANGUAGE DOCUMENTATION

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Review for Vocabulary Section 3 Quiz. What is the amount of data that can be sent in a certain amount of time? What is the amount of data that can be.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
RDF Tutorial.
Semantic Web Introduction
Ontology Notes are from:
Xyleme A Dynamic Warehouse for XML Data of the Web.
Semantic Search Jiawei Rong Authors Semantic Search, in Proc. Of WWW Author R. Guhua (IBM) Rob McCool (Stanford University) Eric Miller.
Direct Congress Dan Skorupski Dan Vingo 15 October 2008.
1 Introduction The Database Environment. 2 Web Links Google General Database Search Database News Access Forums Google Database Books O’Reilly Books Oracle.
Copyright 2003 The McGraw-Hill Companies, Inc CHAPTER Application Software computing ESSENTIALS    
1 DCS861A-2007 Emerging IT II Rinaldo Di Giorgio Andres Nieto Chris Nwosisi Richard Washington March 17, 2007.
AgriDrupal - a “suite of solutions” for agricultural information management and dissemination, built on the Drupal CMS; - the community of practice around.
MUSCLE WP9 E-Team Integration of structural and semantic models for multimedia metadata management Aims: (Semi-)automatic MM metadata specification process.
Software and Multimedia
Digital Object: A Virtual Online Storage Solution 598C Course Project Huajing Li.
Semantic Sensor/Device Description System EEEM042-Mobile Applications and Web Services Assignment- Spring Semester 2015 Prof. Klaus Moessner, Dr Payam.
Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
1 Web Basics Section 1.1 Compare the Internet and the Web Compare Web sites and Web pages Identify Web browser components Describe types of Web sites Section.
Ron Chernich Principal Research Fellow University of Queensland, Australia Annotation and Security Services Podd Workshop, CSIRO Gungahlin Campus 2010.
An Overview of MPEG-21 Cory McKay. Introduction Built on top of MPEG-4 and MPEG-7 standards Much more than just an audiovisual standard Meant to be a.
Metadata Xiangming Mu. What is metadata? What is metadata? (cont’) Data about data –Any data aids in the identification, description and location of.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Introduction.
Ihr Logo Chapter 7 Web Content Mining DSCI 4520/5240 Dr. Nick Evangelopoulos Xxxxxxxx.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Informational Objects TypeExamples 1. Structured Items Vouchers, Travel Orders, Invoices, Purchase Orders 2. Semi-Structured Items Letters, Memoranda,
The Worlds of Database Systems From: Ch. 1 of A First Course in Database Systems, by J. D. Pullman and H. Widom.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Oracle Database 11g Semantics Overview Xavier Lopez, Ph.D., Dir. Of Product Mgt., Spatial & Semantic Technologies Souripriya Das, Ph.D., Consultant Member.
Advanced topics in touchdevelop touchdevelop vs. apps with Visual Studio comparison Disclaimer: This document is provided “as-is”. Information and views.
Introduction to touchdevelop ✿ art read only resources Disclaimer: This document is provided “as-is”. Information and views expressed in this document,
Google Refine for Data Quality / Integrity. Context BioVeL Data Refinement Workflow Synonym Expansion / Occurrence Retrieval Data Selection Data Quality.
Towards the Semantic Web 6 Generating Ontologies for the Semantic Web: OntoBuilder R.H.P. Engles and T.Ch.Lech 이 은 정
Chapter 11 Using SAS ® Web Report Studio. Section 11.1 Overview of SAS Web Report Studio.
PHS / Department of General Practice Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn Knowledge representation in TRANSFoRm AMIA.
Introduction to the Semantic Web and Linked Data
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
The Synchronized Multimedia Integration Language (SMIL) Kuo-Hao Li.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
By Neil Ridgway FOHM+RTSP: Applying Open Hypermedia and Temporal Linking to Audio Streams.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Prizms for Data Publication and Management Katie Chastain May 9, 2014.
Chapter 04 Semantic Web Application Architecture 23 November 2015 A Team 오혜성, 조형헌, 권윤, 신동준, 이인용.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
introductionwhyexamples What is a Web site? A web site is: a presentation tool; a way to communicate; a learning tool; a teaching tool; a marketing important.
© STZ Language Learning Media Telos Language Partner (TLP Pro) TLP Pro combines communication-oriented interactive self-study activities with intuitive.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
TextOre Energy Analytics Applying Text Mining Solutions Toward Extraction of Energy Related Data from Local Records.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
B USINESS W EB P AGE D ESIGN Review for State Competency Exam
7th Annual Hong Kong Innovative Users Group Meeting
CS 405G: Introduction to Database Systems
Digital Video Library - Jacky Ma.
Introduction to Persistent Identifiers
Middleware independent Information Service
Presented at Archives Records 2016, session 510
A Lightweight Structured Data Implementation Using JSON-LD and Schema
Software and Multimedia
An Overview of MPEG-21 Cory McKay.
Software and Multimedia
The Re3gistry software and the INSPIRE Registry
Thanks to Bill Arms, Marti Hearst
Zachary Cleaver Semantic Web.
The Database Environment
Background Prepared by: Mr. Mahmoud Rafeek Alfarra.
Linked Data Reuse in the Language Services Industry
Presentation transcript:

DATA INTEGRATION FOR LANGUAGE DOCUMENTATION Under the guidance of :- Dr. Jan Chomicki & Dr. Jeff Good Presented By:- Sumit Agrawal

INTRODUCTION This project aims at integrating large amount of data spread across various files & folders and in different formats. The data is about 7-9 languages related to linguistics project undergoing in Cameroon. Data also contains metadata about the files.

DATA FORMATS Questionnaire data Data Available in different format AudioVisual Audio recordings Video recordings Photographs Scanned images Textual Transcriptions (some time-aligned, XML) Unstructured text (various formats) Questionnaire data Lexical data (e.g., vocabulary items in a database) Metadata

CHALLENGES Each file should have a metadata, but it is not the case for every file. Some files don’t have the associated metadata. Each researcher has different format of writing the file. Different researchers sometimes interacted with the same people. More than 200 different file types.

AIM System which can query the data by:- - Author name - Speaker name - Date and language name etc. E.g.-Records pertaining to language ‘Naki’. All the records of the date ‘2011-08-09’ Clean the data. Remove duplicates and build a database.

AIM Each file to be linked to its metadata. Query the RDF data using SPARQL . Integration of database and file system. User interface development for queries. Know the density of data. Database Management

ORIGINAL DATA- FOLDERS

ORIGINAL DATA- FILES .

Parsing The files were parsed using python scripts.

INITIAL RESULT

CLEANING & LINKING The different data formats were identified . The identified files were grouped based on file extensions . The related metadata for each file. e.g. language , date and extension were extracted. Duplicate files were identified. The unidentified files were grouped in a separate file. The identified files were linked to the existing metadata. Two types of metadata one which we extracted and the other which was provided.

AFTER CLEANING -RESULT A sample of data constructed after cleaning and linking the data with metadata:- Naki 12-11-05 .wav F:\DataIntegration\GoodBackup1-Obang\Naki\Jeff_Good\Cameroon2005 Naki-12-11-05-1-JCG.wav George Ngong NAKI-NOTEBOOK-2005-1 Jeff Good Naki 12-11-05 .wav F:\DataIntegration\GoodBackup1-Obang\Naki\Jeff_Good\Cameroon2005 Naki-12-11-05-2-JCG.wav George Ngong NAKI-NOTEBOOK-2005-1:26 Jeff Good Naki 14-11-05 .wav F:\DataIntegration\GoodBackup1-Obang\Naki\Jeff_Good\Cameroon2005 Naki-14-11-05-1-JCG.wav George Ngong NAKI-NOTEBOOK-2005-1:78 Jeff Good Naki 15-11-05 .wav F:\DataIntegration\GoodBackup1-Obang\Naki\Jeff_Good\Cameroon2005 Naki-15-11-05-1-JCG.wav George Ngong NAKI-NOTEBOOK-2005-1:914 Jeff Good

XML SCHEMA

RDF DatA

BUILD A RDF DATABASE USING SESAME TRIPLES OF THE RDF MODEL

RDF GRAPH

Current GOALS Providing SPARQL querying ability for the RDF data. Linking of the remaining metadata to the parsed metadata. Building database for unidentified file.

LONG TERm GOALS Create a multimedia server to store the whole data along with metadata as well as RDF data. Automated dumping of data in the repository. Building a user interface. Provide Linked Data for Sematic Web

THANK YOU!

REFERENCES http://www.w3.org/TR/rdf-schema/ http://www.delaman.org/docs/meeting06/good-metadata.pdf http://www.acsu.buffalo.edu/~jcgood/jcgood-CUPHEL.pdf http://www.w3.org/RDF/Validator/ Legal Disclaimer: All other products, company names, brand names, trademarks and logos are the property of their respective owners.