Presentation is loading. Please wait.

Presentation is loading. Please wait.

Final Report of Working Group 5 Interoperation G. Simons (chair), H. Aristar-Dry, D. Iannucci, E. Richter, H. Sicard, N. Thieberger, P. Wittenburg G. Simons.

Similar presentations


Presentation on theme: "Final Report of Working Group 5 Interoperation G. Simons (chair), H. Aristar-Dry, D. Iannucci, E. Richter, H. Sicard, N. Thieberger, P. Wittenburg G. Simons."— Presentation transcript:

1 Final Report of Working Group 5 Interoperation G. Simons (chair), H. Aristar-Dry, D. Iannucci, E. Richter, H. Sicard, N. Thieberger, P. Wittenburg G. Simons (chair), H. Aristar-Dry, D. Iannucci, E. Richter, H. Sicard, N. Thieberger, P. Wittenburg ELIIP Workshop, Salt Lake City, 12-14 Nov 2009

2 2 Interoperation What is it? What is it? –Interoperability is the ability for two or more systems to exchange information or services and to make satisfactory use of what is exchanged. What does it take for this to happen: What does it take for this to happen: –The systems agree on standardized definitions of the concepts about which they want to share –The systems use a standardized format and protocol for information interchange

3 3 Why interoperate? It prevents a centralized service from duplicating the efforts of others It prevents a centralized service from duplicating the efforts of others It maximizes data freshness since updates are propagated when made by the owner It maximizes data freshness since updates are propagated when made by the owner It makes a centralized service more sustainable since others bear the cost of providing data It makes a centralized service more sustainable since others bear the cost of providing data It allows multiple centralized service to add value to the same basic information It allows multiple centralized service to add value to the same basic information

4 4 Ways to build a web information service Centralized database curation Centralized database curation –The service is self-contained: the service defines the database, users edit the data directly, the service curates the information Centralized database aggregation Centralized database aggregation –The service has no data of its own: it uses an interoperation protocol to populate the database from other sources that curate the desired information

5 5 The hybrid approach The service uses an interoperation protocol to aggregate all information it can get from elsewhere. The service uses an interoperation protocol to aggregate all information it can get from elsewhere. The service develops a database to handle new information it will curate (whether missing data or alternative values). As a “good citizen” the service shares its unique data with others via the same protocol. The service develops a database to handle new information it will curate (whether missing data or alternative values). As a “good citizen” the service shares its unique data with others via the same protocol. End users see a combination of the aggregated and the curated data. End users see a combination of the aggregated and the curated data.

6 6 What does this mean for ELIIP? For each kind of information that the centralized ELIIP service wants to offer, it must decide whether to: For each kind of information that the centralized ELIIP service wants to offer, it must decide whether to: –Aggregate it, –Curate it, or –Do both The answer can be different for different kinds of information The answer can be different for different kinds of information

7 7 What kinds of information? 1. Web pages about a language 2. Existing language documentation 3. Summary index of documentation level 4. Projects and people 5. Training and revitalization programs 6. The language situation 7. The genetic classification OUT OF SCOPE: Interoperation over language data (like dictionaries and interlinear texts)

8 8 1. Web pages on languages Two low-bar approaches to interoperation: Two low-bar approaches to interoperation: –Microformats: Harvestable metadata is embedded in the HTML coding of a page. –Predictable URL: A web site that offers information about many languages has a main page for each language with a base URL parameterized by the ISO 639-3 code

9 9 ELIIP could … Define microformats and provide a service for crawling pages on sites that use them Define microformats and provide a service for crawling pages on sites that use them Identify web sites that should implement predictable URLs and provide funding to incentivize needed changes on those sites Identify web sites that should implement predictable URLs and provide funding to incentivize needed changes on those sites Provide a service for registering base URLs and boilerplate metadata so that OLAC records are generated for all language codes that yield a page Provide a service for registering base URLs and boilerplate metadata so that OLAC records are generated for all language codes that yield a page

10 10 2. Existing documentation A working interoperation infrastructure already exists in OLAC A working interoperation infrastructure already exists in OLAC ELIIP should aggregate from OLAC to avoid duplicatin work ELIIP should aggregate from OLAC to avoid duplicatin work But there are huge gaps in the OLAC coverage But there are huge gaps in the OLAC coverage Thus ELIIP needs a hybrid approach as OLAC data provider to fill the gaps and as OLAC service provider to aggregate Thus ELIIP needs a hybrid approach as OLAC data provider to fill the gaps and as OLAC service provider to aggregate

11 11 Filling the gaps Since … ELIIP could … Filling the gaps Since … ELIIP could … Many language archives don’t participate in OLAC Help those archives become OLAC data providers Many resources are being put in generic OAI-based institutional repositories Run a service that harvests those resources and assigns linguistic metadata to them Many resources are conven- tionally published or posted directly to the web Curate a database in which linguists can enter metadata for those resources Many linguists don’t have a place to deposit their work Curate a digital repository of language documentation

12 12 3. Documentation index A numerical index that summarizes level of language documentation (as at AUSTLANG) is desirable A numerical index that summarizes level of language documentation (as at AUSTLANG) is desirable The OLAC aggregator (especially after ELIIP fills the gaps) provides a list of all the resources by linguistic data types The OLAC aggregator (especially after ELIIP fills the gaps) provides a list of all the resources by linguistic data types What’s needed is a way to convert those to a measure of extent What’s needed is a way to convert those to a measure of extent

13 13 ELIIP could … Participate in the OLAC process to refine the linguistic data type vocabulary as needed Participate in the OLAC process to refine the linguistic data type vocabulary as needed –E.g. add “language instruction” Participate in the OLAC process to add a new recommendation for Participate in the OLAC process to add a new recommendation for –E.g. lexicon/0, lexicon/1, lexicon/2, lexicon/3 Promote its adoption by all OLAC participants and add curated judgments where that fails Promote its adoption by all OLAC participants and add curated judgments where that fails Develop an overall numerical index that combines results over all the data types Develop an overall numerical index that combines results over all the data types

14 14 4. People and projects The OLAC infrastructure can support this The OLAC infrastructure can support this DCMI Type vocabulary: DCMI Type vocabulary: –Event: A time-bounded occurrence A project can be described in an OLAC record using elements like Contributor, Language, Linguistic data type, Description A project can be described in an OLAC record using elements like Contributor, Language, Linguistic data type, Description An advantage of this approach is that projects appear with all other resources in any OLAC-based service An advantage of this approach is that projects appear with all other resources in any OLAC-based service

15 15 ELIIP could … Propose a metadata refinement to distinguish a project from other kinds of “events” Propose a metadata refinement to distinguish a project from other kinds of “events” Curate records that allow linguists to describe their own projects Curate records that allow linguists to describe their own projects Help players like funding agencies with databases of relevant projects to become OLAC data providers Help players like funding agencies with databases of relevant projects to become OLAC data providers

16 16 5. Training and revitalization The OLAC infrastructure can support this The OLAC infrastructure can support this A training course or revitalization program can be described in an OLAC record with DCMI Type = “Event” + OLAC resource type = “language instruction” + Language, Description, Identifier for a URL A training course or revitalization program can be described in an OLAC record with DCMI Type = “Event” + OLAC resource type = “language instruction” + Language, Description, Identifier for a URL This approach allows these programs to appear with all other resources for the language in any OLAC-based service This approach allows these programs to appear with all other resources for the language in any OLAC-based service

17 17 ELIIP could … Curate records that allow these programs to describe themselves Curate records that allow these programs to describe themselves Help players who are curating databases of training events to become OLAC data providers Help players who are curating databases of training events to become OLAC data providers

18 18 6. Language situation No suitable interoperation standard yet exists for population data, etc. No suitable interoperation standard yet exists for population data, etc. Are there other projects already curating this kind of information such that interoperation is desirable? Are there other projects already curating this kind of information such that interoperation is desirable? –E.g. UNESCO Atlas, Ethnologue, AUSTLANG But interoperation will only work if all the players agree to do it But interoperation will only work if all the players agree to do it

19 19 ELIIP could … During proposal phase, identify the projects that should interoperate and secure agreement in principle to participate During proposal phase, identify the projects that should interoperate and secure agreement in principle to participate During the project phase, foster the process among those players to agree on standard definitions, format, and protocol During the project phase, foster the process among those players to agree on standard definitions, format, and protocol Could use the OAI protocol Could use the OAI protocol –“olac” payload for the metadata –“eliip” payload for the language information

20 20 ELIIP could also … Provide a feedback mechanism that allows a user to report an error back to the provider of the aggregated data Provide a feedback mechanism that allows a user to report an error back to the provider of the aggregated data Provide a publicly viewable tracking mechanism to ensure accountability of the data providers, e.g. Provide a publicly viewable tracking mechanism to ensure accountability of the data providers, e.g. –Is a population in Ethnologue or UNESCO wrong because they won’t fix it when someone reports the right data, or because the person who knows won’t tell them?

21 21 Nota Bene None of the “ELIIP could” proposals up to this point would require the overhead of a governing body or regional captains to vet individual data points (though they would still have a role in recommending and vetting aggregation sources). None of the “ELIIP could” proposals up to this point would require the overhead of a governing body or regional captains to vet individual data points (though they would still have a role in recommending and vetting aggregation sources). That threshold is crossed if ELIIP chooses to: That threshold is crossed if ELIIP chooses to: –Curate its own version of language situation data that it judges to be the most correct

22 22 7. Genetic classification Same story as for “language situation” information Same story as for “language situation” information If the set of data providers is the same as for the situation information, then this could be included in the interoperation standard as a kind of situation information If the set of data providers is the same as for the situation information, then this could be included in the interoperation standard as a kind of situation information If there is a different set of players, ELIIP could foster the same process to develop an interoperation standard for classification If there is a different set of players, ELIIP could foster the same process to develop an interoperation standard for classification

23 23 Thought for the day The aggregator lies at the sweet spot in the value chain of today’s web economy. The aggregator lies at the sweet spot in the value chain of today’s web economy. –E.g. Google, Amazon, iTunes, Netflix –Cf. Chris Anderson, The Long Tail (2006)

24 24 Conclusion There are many things that ELIIP could do: There are many things that ELIIP could do: –To exploit the power of interoperation –For mobilizing our community to share information about endangered languages –While minimizing what it must centrally curate The task for the ELIIP planners is to decide which of these things they want to do The task for the ELIIP planners is to decide which of these things they want to do


Download ppt "Final Report of Working Group 5 Interoperation G. Simons (chair), H. Aristar-Dry, D. Iannucci, E. Richter, H. Sicard, N. Thieberger, P. Wittenburg G. Simons."

Similar presentations


Ads by Google