No application is an island: Using topes to transform strings during data transfer Atipol Asavametha, Prashanth Ayyavu, Christopher Scaffidi School of Electrical Engineering and Computer Science Oregon State University
2 Problem: Data heterogeneity among software components Software components –Created by autonomous stakeholders –Differing data formats –May switch to new formats without prior notice Programmers –Need to move data between elements automatically End users –Need to move data between elements manually problem approach evaluation
3 Example: Exchanging person names John Smith today Smith, John tomorrow – unexpected format! unanticipated need for “glue code” to reformat Lincolnshire MCC tomorrow – questionable! need to validate data, maybe trigger fail-over Similar issues for data from users, external datasets, or the web. problem approach evaluation
4 Other examples of data format heterogeneity Room Numbers –NSH 3103 vs Newell Simon Hall 3103 Stocks –GOOG vs Google vs Google Corporation Address Lines –101 Main St. vs 101 MAIN STREET vs 101 Main Str. Phone Numbers – vs vs (888) State Names –California vs CA vs Calif. problem approach evaluation
5 Insight: Exchange kinds of data (rather than particular formats) John Smith Main St. Pittsburgh, PA Doe, Jane Brooke Lane PITTSBURGH Pennsylvania RAY TILL (404) PITT ST PGH, Penna. MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA MR. ART COR RED RUN RD. pittsburgh PA JOHN SMITH (303) MAIN ST Pittsburgh, PA problem approach evaluation
6 Insight: Exchange kinds of data (rather than particular formats) Three loci for reformatting… –Before transmitting (from source component) –After receiving (at receiving component) –Or along the way (in the connector itself) problem approach evaluation Could be a database, web site, XML web service, desktop application, …
7 Use topes to reformat! A tope = a platform-independent abstraction describing how to recognize and transform strings in one category of data Greek word for “place,” because each corresponds to a data category with a natural place in the problem domain Examples: –Tope for person name –Tope for university names (and abbreviations) –Tope for North American phone numbers –Tope for Oregon State University phone numbers problem approach evaluation
8 A tope is a graph. Node = format, edge = transformation Notional representation for an OSU room number tope… Formal building name & room number Kelley Engineering Center 1148 Colloquial building name & room number Kelley 1148 Building abbreviation & room number KEC 1148 problem approach evaluation
9 A tope is a conceptual abstraction. A tope implementation is code. Each tope implementation has executable functions: –1 isa:string [0,1] function per format, for recognizing instances of the format (a fuzzy set) –0 or more trf:string string functions linking formats, for transforming values from one format to another Validation function: (str) = max(isa f (str)) where f ranges over tope’s formats –Valid when (str) = 1 –Invalid when (str) = 0 –Questionable when 0 < (str) < 1 problem approach evaluation
10 But will it really work? For a range of different kinds of components, e.g…. Web service application Application web service Web site web site Desktop application web site … and other combinations? How to specify which tope functions to invoke? How much work will it be, in practice? problem approach evaluation
11 Case study propositions Most of the difficulties encountered will result from technologies other than topes. Topes will be able to perform the string transformations needed in a variety of situations. Topes will be useful at all three loci (before/during/after data transfer), though not necessarily in every combination of locus and architectural style. Using topes will simplify the code required to perform string transformations. problem approach evaluation
12 Case #1: Enhanced Windows clipboard problem approach evaluation
13 Case #2: Enhanced web macro tool go to “ enter “Prashanth Ayyavu” into the “Full name” textbox copy the “Full name” textbox go to “ paste in “DAVID JAMES” format from “person name” into the “your name” textbox (The CoScripter web macro tool already had copy/paste functionality; we just added the clauses for reformatting.) problem approach evaluation
14 Case #3: Web service library XML Jan-96 (203) /30/2007 TopeSheet xpath:/mydoc/whatever/date{tope:url( xpath:/mydoc/whatever/tel{tope:url( Client Code ItemLoader loader = ItemLoader.FromXml(xml); ItemSet items = loader.Load("xpath:/*/tel"); List values = items.FormatAs(" "); // overloaded methods let you override the topes and/or validate the data problem approach evaluation
15 Summary of findings 1. Clipboard2. Web macros3. Web services Main sources of difficulty Windows APIReading the CoScripter code; interfacing to our topes library Web services becoming unavailable Topes can handle the kinds of strings Yes Topes useful at all three loci ConnectorCoScripter component (acts as connector between websites) Sender or receiver of data Topes simplify reformatting code YesNo… needed interface code Yes problem approach evaluation
16Conclusion Software elements can use varying formats –No explicit references to format identifiers –No need for ontology consensus Topes are reusable for data in… XML nodes Database tuples HTML tags Webform fields Spreadsheet cells …and more Main challenge is interfacing to library across languages problem approach evaluation
17 Thank You… To ICISA for this opportunity to participate