Download presentation
Presentation is loading. Please wait.
Published byJayson Booker Modified over 9 years ago
1
1 Peter Fox Data Science - CSCI-6961-01 Week 13, November 30, 2010 Webs of Data and Data on the Web, the Deep Web, Data Discovery, Data Integration
2
Contents Review of reading assignment Webs of data and semantic web Data on the web, linked data Deep web Data discovery Data integration Summary Next week 2
3
Reading Introduction to Data Management Changing software, hardware a nightmare for tracking scientific data Overview of Scientific Workflow Systems, Gil (AAAI08 Tutorial) Comparison of workflow software products, Krasimira Stoilova,Todor Stoilov Scientific Workflow Systems for 21st Century, New Bottle or New Wine? Yong Zhao, Ioan Raicu, Ian Foster 3
4
Webs of data Early Web - Web of pages http://www.ted.com/index.php/talks/tim_berne rs_lee_on_the_next_web.htmlhttp://www.ted.com/index.php/talks/tim_berne rs_lee_on_the_next_web.html Semantic web started as a way to facilitate “machine accessible content” –Initially was available only to those with familiarity with the languages and tools, e.g. your parents could not use it Webs of data grew out of this –One specific example is W3C’s Linked Open Data 4
5
Semantic Web http://www.w3.org/2001/sw/ “The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF). See also the separate FAQ for further information.” 5
6
6 Terminology Semantic Web –An extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation, www.semanticweb.org www.semanticweb.org –Primer: http://www.ics.forth.gr/isl/swprimer/http://www.ics.forth.gr/isl/swprimer/ Semantic Grid –Semantic services to use the resources of many computers connected by a network to solve large scale computational/ data problems Provenance –origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility. Ontology (n.d.). The Free On-line Dictionary of Computing. http://dictionary.reference.com/browse/ontology http://dictionary.reference.com/browse/ontology –An explicit formal specification of how to represent the objects, concepts and other entities that are assumed to exist in some area of interest and the relationships that hold among them.
7
7 Semantic Web Layers http://www.w3.org/2003/Talks/1023-iswc-tbl/slide26-0.html, http://flickr.com/photos/pshab/291147522/
8
8 Application Areas for SW Smart search Annotation (even simple forms), smart tagging Geospatial Implementing logic (rules), e.g. in workflows Data integration Verification …. and the list goes on Web services Web content mining with natural language parsing User interface development (portals) Semantic desktop Wikis - OntoWiki, SemanticMediaWiki Sensor Web Software engineering Explanation
9
9 Semantic Web Basics The triple: {subject-predicate-object} Interferometer is-a optical instrument Optical instrument has focal length W3C is the primary (but not sole) governing org. –RDF –OWL 1.0 and 2.0 - Ontology Web Language RDF –programming environment for 14+ languages, including C, C++, Python, Java, Javascript, Ruby, PHP,...(no Cobol or Ada yet ;-( ) OWL programming for Java Closed World - where complete knowledge is known (encoded), AI relied on this Open World - where knowledge is incomplete/ evolving, SW promotes this
10
10 Ontology Spectrum Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal is-a Formal instance Value Restrs. General Logical constraints Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness. Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
11
11 SW != ontologies on the web (!) Ontologies are important, but use them only when necessary as identified by use cases The Semantic Web is about integrating data on the Web; ontologies (and/or rules) are tools to achieve that when necessary SW ontologies != some big (central) ontology –The ethos of the Semantic Web is on sharing, ie, sharing possibly many small ontologies –A huge, central ontology could be difficult to manage in terms of maintenance. –Semantic web languages such as OWL contain primitives for equivalence and disjointness of terms and meta primitives for versioning info The practice: –SW applications using ontologies mix large number of ontologies and vocabularies (FOAF, DC, and others) –the real advantage comes from this mix: that is also how new relationships may be discovered One readable background article from the metadata world is available at: http://www.metamodel.com/article.php?story=20030115211223271 http://www.metamodel.com/article.php?story=20030115211223271
12
12 Semantic Web Myths ‘the Semantic Web is a reincarnation of Artificial Intelligence on the Web’ (closed world versus open world) ‘it relies on giant, centrally controlled ontologies for "meaning" (as opposed to a democratic, bottom-up control of terms)’ ‘one has to add metadata to all Web pages, convert all relational databases, and XML data to use the Semantic Web’ ‘one has to learn formal logic, knowledge representation techniques, description logic, etc, to use it’ ‘it is, essentially, an academic project, of no interest for industry’
13
13 Integrating Multiple Data Sources The Semantic Web lets us merge statements from different sources The RDF Graph Model allows programs to use data uniformly regardless of the source Figuring out where to find such data is a motivator for Semantic Web Services #Ionosphere#magnetic “100” “Terrestrial Ionosphere” name hasCoordinates hasLowerBoundaryValue Different line & text colors represent different data sources hasLowerBoundaryUnit “km”
14
14 Drill Down /Focused Perusal The Semantic Web uses Uniform Resource Identifiers (URIs) to name things These can typically be resolved to get more information about the resource This essentially creates a web of data analogous to the web of text created by the World Wide Web Ontologies are represented using the same structure as content –We can resolve class and property URIs to learn about the ontology Internet …#NeutralTemperature...#ISR …#Norway …#EISCAT measuredby type locatedIn...#FPI...#MilllstoneHill operatedby
15
15 Statements about Statements The Semantic Web allows us to make statements about statements –Timestamps –Provenance / Lineage –Authoritativeness / Probability / Uncertainty –Security classification –… This is an unsung virtue of the Semantic Web #Aurora Red #Danny’s 20031031 hascolor hasSource hasDateTime Ontologies Workshop, APL May 26, 2006
16
16 ‘Collecting’ the ‘data’ Part of the (meta)data information is present in tools... but thrown away at output e.g., a business chart can be generated by a tool: it ‘knows’ the structure, the classification, etc. of the chart, but, usually, this information is lost storing it in web data would be easy! SW-aware tools are around (even if you do not know it...), though more would be good: –Photoshop CS stores metadata in RDF in, say, jpg files (using XMP) –RSS 1.0 feeds are generated by (almost) all blogging systems (a huge amount of RDF data!)
17
17 ‘Collecting’ the ‘data’ Scraping - different tools, services, etc, come around every day: –get RDF data associated with images, for example: service to get RDF from flickr images –service to get RDF from XMP –XSLT scripts to retrieve microformat data from XHTML files –RSS scraping in use in VO projects in Japan –scripts to convert spreadsheets to RDF
18
18 ‘Collecting’ the ‘data’ SQL - A huge amount of data in Relational Databases –Although tools exist, it is not feasible to convert that data into RDF –Instead: SQL ⇋ RDF ‘bridges’ are being developed: a query to RDF data is transformed into SQL on-the-fly –Reading for this week, article by Berners Lee and Sahoo et al. –RDB2RDF W3 working group - http://www.w3.org/2001/sw/rdb2rdf/ http://www.w3.org/2001/sw/rdb2rdf/ –D2RQ/ D2RServer
19
19 More Collecting RDFa (formerly known as RDF/A) extends XHTML by: –extending the link and meta to include child elements –add metadata to any elements (a bit like the class in microformats, but via dedicated properties) It is very similar to microformats, but with more rigor: –it is a general framework (instead of an ‘agreement’ on the meaning of, say, a class attribute value) –terminologies can be mixed more easily GRDDL - Gleaning Resource Descriptions from Dialects of Languages ATOM (follow on to RSS)
20
Linked open data http://linkeddata.org/guides-and-tutorials http://tomheath.com/slides/2009-02-austin- linkeddata-tutorial.pdf (we will look at some of these slides now, #1-25 and 30-37)http://tomheath.com/slides/2009-02-austin- linkeddata-tutorial.pdf And of course: –http://logd.tw.rpi.edu/http://logd.tw.rpi.edu/ –http://data-gov.tw.rpi.edu/wikihttp://data-gov.tw.rpi.edu/wiki 20
21
2009-03-05 (Chris Bizer) 21
22
(Class 2) Management Creation of logical collections Physical data handling Interoperability support Security support Data ownership Metadata collection, management and access. Persistence Knowledge and information discovery Data dissemination and publication 22
23
Data Management and WOD Is this the grand solution? How is the data managed? Found? Curated? What about the metadata? What problems are introduced? 23
24
Data on the Web, Internet Data behind web services Data files on web sites We have covered data as service approaches Thinking you have found data when you have really only found information and metadata The real difference between this topic and the next one is: –Access and dissemination –Level of curation (and often description) 24
25
Data on the internet http://www.dataspaceweb.org/ Data files on other protocols –FTP –RFTP –GridFTP –SABUL –XMPP/AMQP –Others… 25
26
Deep web Data behind web services Data behind query interfaces (databases or files) Introduces a different curation problem 26
27
The loose definition Something that a crawler cannot find and/or index –Creates the other definition of shallow web Has many implications for discovery, access and use Curation is more complex to satisfy this definition, i.e. not a matter of just putting files ‘on the web’ 50, 100, 1000 times the ‘shallow web’? 27
28
Managing (in) the deep web Sometimes, the deep web aspect of a data source can be due to extreme obscurity, language peculiarities, NO metadata, NO documentation There are no known studies of how effective data management (what you are learning) could change the percentage of deep/ shallow Semantics are often put forward as a solution http://www.mkbergman.com/458/new-currents-in- the-deep-web/ http://www.mkbergman.com/458/new-currents-in- the-deep-web/ 28
29
Internet impacts on management Management of data that is.. Web – ‘stateless’ Curation, Preservation – highly stateful (by definition) You will hear terms such as digital curation and digital preservation (search on these) but what about internet curation and internet preservation (Internet Archive?) What others?? 29
30
(Class 2) Management Creation of logical collections Physical data handling Interoperability support Security support Data ownership Metadata collection, management and access. Persistence Knowledge and information discovery Data dissemination and publication 30
31
Thus data frameworks are appearing Many – meaning they go beyond web sites, they incorporate many of the data management functions Initially syntactic – e.g. OPeNDAP, ADDE, Application oriented – e.g. virtual observatories Semantic – e.g. Virtual Solar-Terrestrial Observatory ALL of these are changing the nature of data management and role of data ‘providers’ cf. ? 31
32
32
33
BOM, Melbourne, VIC 20071015 (Fox) 33 Some Definitions DAP = Data Access Protocol Model used to describe the data; Request syntax and semantics; and Response syntax and semantics. OPeNDAP The software; Numerous reference implementations; Core/libraries and services (servers and clients). OPeNDAP Inc. OPeNDAP is a 501.c(3) non-profit corporation; Formed to maintain, evolve and promote the discipline neutral DAP that was the DODS core infrastructure.
34
BOM, Melbourne, VIC 20071015 (Fox) 34 Considerations with regard to the development of DAP and OPeNDAP Many data providers Many data formats Many different semantic representations of the data Many different security requirements Many different client types
35
BOM, Melbourne, VIC 20071015 (Fox) 35 Broad Vision A world in which a single data access protocol is used for the exchange of data between network based applications regardless of discipline. A layer above TCP/IP providing for syntactic and semantic consistency not available in existing protocols such as FTP.
36
BOM, Melbourne, VIC 20071015 (Fox) 36 Practical Practical Considerations The broad vision: Is syntactically achievable, but Is not semantically achievable, at least not in the near term.
37
BOM, Melbourne, VIC 20071015 (Fox) 37 OPeNDAP Inc. Mission Statement To maintain, evolve and promote a data access protocol (DAP) and reference implementation software (OPeNDAP) for the syntactically consistent exchange of data over the network. The DAP should provide syntactic interoperability across disciplines and allow for semantic interoperability within disciplines.
38
BOM, Melbourne, VIC 20071015 (Fox) 38 The DAP has been designed to be as general as possible without being constrained to a particular discipline or world view. The Data Access Protocol (DAP) The DAP is a discipline neutral data access protocol; it is being used in astronomy, medicine, earth science,… Provides data format and location, and data organization transparency Is metadata neutral
39
BOM, Melbourne, VIC 20071015 (Fox) 39 DAP comparisons File-based –GridFTP/FTP –HTTP –SRB Service-based –Open-Geospatial Consortium, WCS, WMS, WFS, … –Virtual Observatory (Astronomy), SIAP, SSAP, STAP,…
40
BOM, Melbourne, VIC 20071015 (Fox) 40 Who is using DAP/ OPeNDAP? Science examples –PMEL with their Tsunami inundation modeling –Ocean regional modelers to extract open boundary conditions –Visualization of data sets using MATLAB/IDL/… Service examples –Live Access Server –Mapserver – OGC services and OPeNDAP data access (future) –Digital Library Service - metadata and catalogue info
41
BOM, Melbourne, VIC 20071015 (Fox) 41 Data Access Protocol (DAP2) - Current DAP2 currently a NASA/ESE ‘Standard’ Current servers implement DAP2 DAP 2 + XML responses (implemented) DAP3
42
BOM, Melbourne, VIC 20071015 (Fox) 42 DAP4 DAP4 improvements over DAP3: Additional datatypes Swath Blob - GIF, MPEG,… Additional functionality Check sum Modulo The additional datatypes will enable the DAP to be used in a wider variety of circumstances and are a direct response to users’ requests.
43
BOM, Melbourne, VIC 20071015 (Fox) 43 What DAP means to me Data access and transport Response types: DAP objects versus file type –A DAP URL is essentially an HTTP URL with additional restrictions placed on the abs-path component. –DAP2-URL = "http://" host [ ":" port ] [ abs-path] abs-path = server-path data-source-id [ "." ext[ "?" query ] ] server-path = [ "/" token ] data-source-id = [ "/" token ] ext = "das" | "dds" | "dods" –The server-path is the pathname to the server, whereas data-source-id is the pathname to the data.
44
BOM, Melbourne, VIC 20071015 (Fox) 44 OPeNDAP V3 Architecture Cgi style access CGI-style access Uses web server HTTP protocol Several request and response types Reads data files, Databases, et c., returns info May return DAP2 objects or other data Client can be application, web browser or specialized server/service Data Client
45
BOM, Melbourne, VIC 20071015 (Fox) 45 OPeNDAP V4 (Hyrax) Architecture OLFSBES OPeNDAP Lightweight Front end Server (OLFS) Receives requests and asks the BES to fill them Uses Java Servlets Does not directly ‘touch’ data Multi-protocol Data Back End Server (BES) Reads data files, Databases, et c., returns info May return DAP2 objects or other data Does not require web server Client
46
BOM, Melbourne, VIC 20071015 (Fox) 46 Binaries Generated There are approximately 80 binaries built on a nightly basis. They are built for the following platforms/operating systems: Linux FC4 FC5 MacOS-X (universal binaries when possible) Windows XP, win32 Java 1.5 (Tomcat 5.5) IRIX (in four variants), Solaris, AIX, OSF
47
BOM, Melbourne, VIC 20071015 (Fox) 47 Clients Browser Interfaces Data System Integrators (ODC) Servers Processing Servers Aggregating Servers - OPeNDAP chains Ancillary Information Services The OPeNDAP data access protocol is used by a variety of system elements. OPeNDAP System Elements
48
BOM, Melbourne, VIC 20071015 (Fox) 48 Clients Clients make requests and receive responses via the DAP. Clients convert data from the OPeNDAP data model to the form required in the client application.
49
BOM, Melbourne, VIC 20071015 (Fox) 49 netCDF C FerretGrADS netCDF Java IDVVisADncBrowse Matlab Client Access Excel IDL Client ArcGIS pyDAP OPeNDAP Clients ArcGIS pyDAP NCL Client Internet Web Browser OPeNDAP Data Connector
50
BOM, Melbourne, VIC 20071015 (Fox) 50 OCAPI -> OC in 2009 A pure OPeNDAP C API (OCAPI) for the client-side Applications: DAP-aware ‘commands’ for commercial analysis programs (e.g., IDL) Scripting tools (e.g., Perl)
51
BOM, Melbourne, VIC 20071015 (Fox) 51 Clients Browser Interfaces Data System Integrators (ODC) Servers Processing Servers Aggregating Servers - OPeNDAP chains Ancillary Information Services The OPeNDAP data access protocol is used by a variety of system elements. OPeNDAP System Elements
52
BOM, Melbourne, VIC 20071015 (Fox) 52 Browser interfaces
53
BOM, Melbourne, VIC 20071015 (Fox) 53 Clients Browser Interfaces Data System Integrators (ODC) Servers Processing Servers Aggregating Servers - OPeNDAP chains Ancillary Information Services The OPeNDAP data access protocol is used by a variety of system elements. OPeNDAP System Elements
54
BOM, Melbourne, VIC 20071015 (Fox) 54 Servers Servers receive requests and provide responses via the DAP. Servers convert the data from the form in which they are stored to the DAP. Servers provide for subsetting of the data and more.
55
BOM, Melbourne, VIC 20071015 (Fox) 55 Data HDF5 HDF4JDBC FreeFormFITS CDFCEDAR Data netCDF HDF4HDF5 Data DSP Data JGOFS TablesSQLFITSCDF Flat Binary CEDAR Data General ESML OPeNDAP Servers CDM Internet
56
BOM, Melbourne, VIC 20071015 (Fox) 56 Data GRIB BUFR OPeNDAP GDS Data CODAR Data FDS netCDF OPeNDAP Data General pyDAP Data DAPPER netCDF OPeNDAP Data netCDF OPeNDAP TDS Data General pyDAP Data netCDF OPeNDAP TDS OPeNDAP Servers (specialized processing) Data ESG netCDF OPeNDAP Internet
57
BOM, Melbourne, VIC 20071015 (Fox) 57 Servers Servers may also provide other services Directory traversal. Browser-based form to build URL. Ascii or other representations of data. Metadata associated with the data. Server side functions.
58
BOM, Melbourne, VIC 20071015 (Fox) 58 Data General pyDAP OPeNDAP Aggregation Servers Data GRIB BUFR OPeNDAP GDS Data CODAR Data FDS netCDF OPeNDAP Data DAPPER netCDF OPeNDAP Data TDS netCDF OPeNDAP Data General JGOFS Data ESG netCDF OPeNDAP Internet
59
BOM, Melbourne, VIC 20071015 (Fox) 59 The Aggregation Server: An ExampleAggregationServer File DSP Data Set File netCDF Data Set File Matlab Local OPeNDAP HTML, GIF Matlab Client DSP
60
BOM, Melbourne, VIC 20071015 (Fox) 60 OPeNDAP’s Hyrax (‘Server4’) Uses a modular architecture to support different application-level protocols –Data access using DAP2 (DAP3) –Catalogs using THREDDS –Browsing using HTML and ASCII Modules for data access –Different file types –Potential for database and scripting Modules for commands –Commands provide varying operations for different protocols
61
BOM, Melbourne, VIC 20071015 (Fox) 61 OPeNDAP V4 (Hyrax) Architecture OLFSBES OPeNDAP Lightweight Front end Server (OLFS) Receives requests and asks the BES to fill them Uses Java Servlets Does not directly ‘touch’ data Multi-protocol Data Back End Server (BES) Reads data files, Databases, et c., returns info May return DAP2 objects or other data Does not require web server Client
62
BOM, Melbourne, VIC 20071015 (Fox) 62 GridFTP DAP2 GridFTP DAP2 HTTP DAP2 HTTP DAP2 ASCII output HTML form Info output OPeNDAP Lightweight Front end Server THREDDS Request Formulation** Request from client Response to client BES SOAP-DAP (HTTP) DAP2 (GridFTP, HTTP)
63
BOM, Melbourne, VIC 20071015 (Fox) 63 BES Network Protocol and Process start/stop activities Data Store Interfaces BES Framework PPT* Initialization/ Termination DAP2 Access NetCDF3HDF4FreeForm … Data Catalogs Commands** BES Commands/ XML Documents *PPT is built in (other protocols) **Some commands are built in Data
64
BOM, Melbourne, VIC 20071015 (Fox) 64 Ancillary Information Service Current capability: Attributes only Client-side only Local and remote resources Local resource databases The AIS enables users to augment the metadata for a data source in a controlled way without requiring write access to the original data. By using the DAP, users are also isolated from data format issues.
65
BOM, Melbourne, VIC 20071015 (Fox) 65 AIS Server Client linked w/DAP Software Data Source AIS Server AIS Resource 1 2 0 3 0. Client requests metadata from the AIS server (which appears no different from any other DAP server). 1. The AIS server gets metadata from data source 2. The AIS server gets matching the AIS resource using the AIS database and merges it into the metadata. 3. The AIS server returns resulting the metadata object.
66
9/8/2015Bureau of Meteorology, Melbourne Australia 66 Lessons (Re)Learned 1. Modularity provides for flexibility The more modular the underlying infrastructure the more flexible the system. This is particularly important for network based systems for which the technology, software and hardware, are changing rapidly.
67
9/8/2015Bureau of Meteorology, Melbourne Australia 67 Lessons (Re)Learned 2. Data of interest will be stored in a variety of formats. Regardless of how much one might want to define the format to be used by system participants, in the end the data will be stored in a variety of formats. 2a. The same is true of use metadata!
68
9/8/2015Bureau of Meteorology, Melbourne Australia 68 Lessons Learned 3. Structural representation of sequence data sets is a major obstacle to interoperability Care must be given to the organizational structure (as opposed to the format) of the data.
69
9/8/2015Bureau of Meteorology, Melbourne Australia 69 Lessons Learned 7. The lack of a consistent structure for data inventories is a major obstacle to the use of distributed systems.
70
9/8/2015Bureau of Meteorology, Melbourne Australia 70 Lesser Lessons Learned 9. Some surprises/observations encountered in the OPeNDAP effort Metadata focus in the past has been on data discovery not on data use, but metadata for use is where it’s at. Number of variables increases almost linearly with the number of data sets. Users will take advantage of all of the flexibility offered by a system sometimes to the disadvantage of all. Incredible variability in the structural organization of data.
71
9/8/2015Bureau of Meteorology, Melbourne Australia 71 Lessons Learned 10. Time to maturity is order 10 years not 3 Developing new infrastructure takes time, particularly to iron out all of the %^*% little details.
72
Data discovery Free text search on the internet/ web Data portals What makes discovery work? 72
73
73 Three level ‘metadata’ solution Level 1: Data Registration at the Discovery Level, e.g. Volcano location and activity Level 2: Data Registration at the Inventory Level, e.g. list of datasets, times, products Level 3: Data Registration at the Item Detail Level, e.g. access to individual quantities Ontology based Data Integration Using scientific workflows Earth Sciences Virtual Database A Data Warehouse where Schema heterogeneity problem is Solved; schema based integration Data DiscoveryData Integration A.K.Sinha, Virginia Tech, 2006
74
Data discovery What makes discovery work? –Metadata –Logical organization –Attention to the fact that someone would want to discover it –It turns out that file types are a key enabler or inhibitor to discovery What does not work? –Result ranking using *any* conventional algorithms 74
75
Smart search Semantically aware search, e.g. http://noesis.itsc.uah.edu http://noesis.itsc.uah.edu Faceted search, e.g. mspace (http://mspace.fm ), Earth System Grid (http://esg.prototype.ucar.edu )http://mspace.fmhttp://esg.prototype.ucar.edu 75
76
NOESIS 76
77
Faceted search Semantically aware search, e.g. http://noesis.itsc.uah.edu http://noesis.itsc.uah.edu Faceted search, e.g. mspace (http://mspace.fm ), Earth System Grid (http://esg.prototype.ucar.edu )http://mspace.fmhttp://esg.prototype.ucar.edu 77
78
Federated search “is the simultaneous search of multiple online databases or web resources and is an emerging feature of automated, web-based library and information retrieval systems. It is also often referred to as a portal or a federated search engine.” wikipedia Libraries have been doing this for a long time (Z39.50, ISO23950) Key is consistent search metadata fields (keywords) E.g. Geospatial One Stop http://www.geodata.govhttp://www.geodata.gov 78
79
Data integration “involves combining data residing in different sources and providing users with a unified view of these data. This process becomes significant in a variety of situations both commercial (when two similar companies need to merge their databases) and scientific (combining research results from different bioinformatics repositories, for example). “ “Data integration appears with increasing frequency as the volume and the need to share existing data explodes. It has become the focus of extensive theoretical work, and numerous open problems remain unsolved. In management circles, people frequently refer to data integration as "Enterprise Information Integration" (EII)” wikipedia Is this a data science/ management challenge (rhetorical question) 79
80
Aiding data integration Standards – formats for sure but also Metadata Semantics As such any integration capability is HIGHLY curated or left entirely to the end user If left to the user, results in a new data product which is rarely managed or shared What would you do? 80
81
Summary Theme of data management in the chaotic and enabling environment of the web, internet Emergence of frameworks that encompass some aspects of data management Unlocking data in a preservable way is an immense challenge Anything/ everything you can do will help 81
82
What is next Dec. 7 – project presentations Final assignment to be handed in Reading for this week: –Semantic Deep Web, James Geller, Soon Ae Chun, and Yoo Jung An, –The Deep Web (Internet Tutorials) –Digital Image Resources on the Deep Web 82
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.