TANGO (RPI, June 2009) George Nagy, Mukkai Krishnamoorthy, Sharad Seth Raghav Padmanabhan, Ramana C. Jandhyala, Sean Kelley Max Muthalathu, William Silversmith
June 15,3009TANGO PROGRESS REPORT2 Completed Stuff WNT (Piyushee, MS May 2008) TAT (Raghav, MS May 2009) Pubs: ICPR08, WNT PJ & GN, Dec ICPR08, QBT, RP & GN Dec MKM09, Tessellations, RJ, RP, MK, GN, SS, WS, July 2009 GREC09, TAT results, RP, RP, MK, GN, SS, WS, July 2009
June 15,3009TANGO PROGRESS REPORT3 Software TAT (demo) EX2XY, XY2EX (Ramana) OO2XY, XY2OO (Sean, in progress) XY2LN (SS, MK) XY2WN (Bill) TAT stat analysis (RB & GN, in progress)
June 15,3009TANGO PROGRESS REPORT4 Partial grammar for X-Y trees (MK & SS) Employment Status UnemployedEmployed Education High School or Less College High School or Less College BS/BA Graduat e Degree BS/BA Graduat e Degree SXY = { c [ c c ] c [ c { c [ c c ] } c { c [ c c ] } ] Grammar G1 for parsing all layout-equivalent tessellations of this kind is: S : = A A : = { B } B : = c [ X ] B | c [ X ] X : = c X | A X | A | c
June 15,3009TANGO PROGRESS REPORT5 A’ and A’’ table formats A’ A’’ Hybrid
June 15,3009TANGO PROGRESS REPORT6 Appearance-based distance (WS?) Each table cell is described by a vector: width, type size, typeface, indent, justification, alpha/num, color, #_of_chars,… Compute differences between horizontally and vertically adjacent cells From resulting “gradient map” determine row header, column header, and delta cell regions. (Show GN’s Excel example)
June 15,3009TANGO PROGRESS REPORT7 Prediction of TAT-time Multiple regression of interaction time from: Size of table (#cols, #rows, or # cells) Number of aggregates Number of footnotes Number units Other? (GN has tried it with 20 tables – have Excel ‘GN_Data_Analysis’)
June 15,3009TANGO PROGRESS REPORT8 Table similarity May be useful to determine similar edit sequences. Tree distance between X-Y representations symmetry? Edit distance between linear P-notation for X-Y trees Metric for parse sequences?? Tree distance between Wang category forests? (new)
June 15,3009TANGO PROGRESS REPORT9 Learning ??? Retain edit sequences from TAT Make X-Y tree from each imported but not edited table Find distance of X-Y tree from new table to all previous Execute edit sequences of nearest neighbor(s) Check algorithmically if resulting X-Y tree corresponds to correct WN Check visually if table corresponding to resulting X-Y tree is equivalent to original table. If not, edit Concatenate further edit and associate with X-Y tree of new table, then add to reference set
June 15,3009TANGO PROGRESS REPORT10 Discussion Items Lists & Ordering XML format and verification Augmentations (spotting and processing) Open Office Table ontology XY tree to WN via lexical parse (checks?) Use of parse trees for XY2WN Learning? Overall TANGO evaluation for final report Critique draft slides for GREC and MKM Tools: RPI: OO, VBA, Matlab, Python, BYU: ?? Other RPI projects: PERFECT, CERVITOR, CAVIAR
June 15,3009TANGO PROGRESS REPORT11 Survival Plans NSF TANGO Final Report ! New NSF proposal (Maria) Other possible sponsors? Confs Archival Journals Collaborators Demos and dissemination Next visit