05.11.2006 | XClean in Action Melanie Weis, HPI Potsdam, Germany Ioana Manolescu, INRIA Futurs, France CIDR 2007.

Slides:



Advertisements
Similar presentations
Multiprocessor Scheduling
Advertisements

CJEdit – A ContextJ-based Editor Malte Appeltauer Software Architecture Group Hasso-Plattner-Institut Universität Potsdam - Germany
Introducing JavaScript
Clean code. Motivation Total cost = the cost of developing + maintenance cost Maintenance cost = cost of understanding + cost of changes + cost of testing.
Programming Languages Language Design Issues Why study programming languages Language development Software architectures Design goals Attributes of a good.
Working with JavaScript. 2 Objectives Introducing JavaScript Inserting JavaScript into a Web Page File Writing Output to the Web Page Working with Variables.
XP 1 Working with JavaScript Creating a Programmable Web Page for North Pole Novelties Tutorial 10.
28-Jun-15 Recognizers. 2 Parsers and recognizers Given a grammar (say, in BNF) and a string, A recognizer will tell whether the string belongs to the.
Chapter 1 Program Design
General Issues in Using Variables
Python quick start guide
Data Quality Issues-Chapter 10
Computer Programming I Hour 1-Getting Started. Word of Day —Chinese proverb A journey of a thousand miles is started by taking the first step. —Aristophanes.
1 EECS 231 ADVANCED PROGRAMMING. 2 Staff Instructor Vana Doufexi Ford Building, 2133 Sheridan, #2-229 Teaching Assistant.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
Basics Programming Concepts. Basics A computer program is a set of instructions to tell a computer what to do Machine language = circuit level language.
Standard Grade Computing SYSTEM SOFTWARE CHAPTER 19.
Programming 1 1. Introduction to object oriented programming and problem-solving.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
© 2006 by BEA Systems Inc; made available under the EPL v1.0 | March 2006 | Java Annotation Processing (APT) in the Eclipse JDT Gary Horen BEA Systems.
CSCI 130 Chapter 1. History of C Bell Telephone Laboratories (1972) Dennis Ritchie (also created UNIX) A - B - C.
Introduction to Java August 14, 2008 Mrs. C. Furman.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Java Chapter 1 Problem solving: 1. Understanding the problem. 2. Breaking the problem into manageable pieces. 3. Designing a solution. 4. Considering alternatives.
XP Tutorial 10New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with JavaScript Creating a Programmable Web Page for North Pole.
Just as there are many human languages, there are many computer programming languages that can be used to develop software. Some are named after people,
© 2002 IBM Corporation Confidential | Date | Other Information, if necessary Make your processes executable! A quick demonstration of the JWT project Florian.
Guide to Programming with Python Chapter One Getting Started: The Game Over Program.
IXA 1234 : C++ PROGRAMMING CHAPTER 1. PROGRAMMING LANGUAGE Programming language is a computer program that can solve certain problem / task Keyword: Computer.
EG280 Computer Science for Engineers Fundamental Concepts Chapter 1.
MD – Object Model Domain eSales Checker Presentation Régis Elling 26 th October 2005.
XP Tutorial 10New Perspectives on HTML and XHTML, Comprehensive 1 Working with JavaScript Creating a Programmable Web Page for North Pole Novelties Tutorial.
NOTE TAKING GUIDELINE. GUIDELINE FOR RECORDING OR WRITING.
Demo of Scalable Pluggable Types Michael Ernst MIT Dagstuhl Seminar “Scalable Program Analysis” April 17, 2008.
 Computer Languages Computer Languages  Machine Language Machine Language  Assembly Language Assembly Language  High Level Language High Level Language.
MNP1163/MANP1163 (Software Construction).  Minimizing complexity  Anticipating change  Constructing for verification  Reuse  Standards in software.
INTRODUCTION CHAPTER #1 Visual Basic.NET. VB.Net General features It is an object oriented language  In the past VB had objects but focus was not placed.
The Development Process Compilation. Compilation - Dr. Craig A. Struble 2 Programming Process Problem Solving Phase We will spend significant time on.
© 2008 UniTESK Lab, ISP RAS; made available under the EPL v1.0 Towards Common Language Toolkit Institute for System Programming of RAS,
Compilers and Interpreters
CSCI 161 Lecture 3 Martin van Bommel. Operating System Program that acts as interface to other software and the underlying hardware Operating System Utilities.
XP Tutorial 10New Perspectives on HTML, XHTML, and DHTML, Comprehensive 1 Working with JavaScript Creating a Programmable Web Page for North Pole Novelties.
Operating Systems and Middleware Group Hasso – Plattner – Institute, University of Potsdam, Germany DRMAA 1.0.NET Language Mapping Dipl. Inf. Peter Tröger.
Software Development Languages and Environments. Computer Languages Just as there are many human languages, there are many computer programming languages.
Computer Systems Nat 5 Computing Science
Why don’t programmers have to program in machine code?
Dependency Analysis Use Cases
Chapter 5- Assembling , Linking, and Executing Programs
Computer Systems Nat 5 Computing Science
Variables A variable is a placeholder for a value. It is a named memory location where that value is stored. Use the name of a variable to access or update.
Programming Problem steps must be able to be fully & unambiguously described Problem types; Can be clearly described Cannot be clearly described (e.g.
CS1010 Programming Methodology
Dependency Analysis Use Cases
CS1010 Programming Methodology
and Executing Programs
Dynamics 365 Customer Engagement Deep Dive: Creating a Basic Plug-in
Introduction to Computer Programming
Structuring Adaptive Applications using AspectJ and AOM
Part 1 Q1 to Q5 of National 5 Prelim
Software Development Process
Introduction to grand task
Understand the interaction between computer hardware and software
Querying XML XSLT.
½ of 6 = 3.
Running a Java Program using Blue Jay.
Perl Programming Dr Claire Lambert
Thing / Person:____________________ Dates:_________________
LANGUAGE EDUCATION.
Types of Errors And Error Analysis.
Visual Studio productivity
Presentation transcript:

| XClean in Action Melanie Weis, HPI Potsdam, Germany Ioana Manolescu, INRIA Futurs, France CIDR 2007

Melanie Weis, Hasso Plattner Institut Potsdam, What is XClean? ■ XClean is an XML data cleaning system. ■ Types of errors that require data cleaning: □ Typos □ Different data formats (e.g., date, abbreviations, language) □ Missing data □ Contradictory data □ Duplicates

Melanie Weis, Hasso Plattner Institut Potsdam, Where do we find Duplicates? False Duplicate

Melanie Weis, Hasso Plattner Institut Potsdam, How do we get rid of dirty data? ■ Quick fix (get glasses) ■ Start over again next year (get new, expensive glasses) ■ Clear methodology (Clearly defined processing stages that combine) ■ Possibility to reuse (parts of) a solution

Melanie Weis, Hasso Plattner Institut Potsdam, Data Cleaning with XClean Set of clearly defined cleaning operators. XClean/PL Declarative Modular Readable XQuery XQuery Processor Clean XML data Dirty XML data

Melanie Weis, Hasso Plattner Institut Potsdam, Come see the demo! ■ XClean Java plugin ■ Supports □ Writing XClean/PL □ Compiling XClean/PL to XQuery □ Executing XQuery to obtain clean data