Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automation and Quality in Image Digital Libraries with Annotations Edward Fox, Uma Murthy and Ricardo Torres Florence, Italy 17 February 2007.

Similar presentations


Presentation on theme: "Automation and Quality in Image Digital Libraries with Annotations Edward Fox, Uma Murthy and Ricardo Torres Florence, Italy 17 February 2007."— Presentation transcript:

1 Automation and Quality in Image Digital Libraries with Annotations Edward Fox, Uma Murthy and Ricardo Torres Florence, Italy 17 February 2007

2 2 Outline Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Retrieval CBISC, SIERRA Theory, Quality References Summary

3 3 Acknowledgements: Students Pavel Calado, Yuxin Chen, Fernando Das Neves, Shahrooz Feizabadi, Robert France, Marcos Gonçalves, Doug Gorton, Nithiwat Kampanya, Rohit Kelapure, S.H. Kim, Neill Kipp, Aaron Krowne, Bing Liu, Ming Luo, Roberto Marchesini, Paul Mather, Sudarshan Murthy, Uma Murthy, Sanghee Oh, Ananth Raghavan, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo da Silva Torres, Srinivas Vemuri, Wensi Xi, Seungwon Yang, Baoping Zhang, Qinwei Zhu, …

4 4 Acknowledgements: Faculty, Staff Lillian Cassel, Lois Delcambre, Debra Dudley, Roger Ehrich, Joanne Eustis, Weiguo Fan, James Flanagan, C. Lee Giles, Sandy Grant, Eric Hallerman, Eberhard Hilf, John Impagliazzo, Filip Jagodzinski, Douglas Knight, Deborah Knox, Alberto Laender, David Maier, Gail McMillan, Claudia Medeiros, Manuel Perez-Quinones, Jeff Pomerantz, Naren Ramakrishnan, Layne Watson, Barbara Wildemuth, …

5 5 Other Collaborators (Selected) Brazil: FUA, UFMG, UNICAMP Case Western Reserve University Emory, Notre Dame, Oregon State Germany: Univ. Oldenburg Mexico: UDLA (Puebla), Monterrey College of NJ, Hofstra, Penn State, Villanova Portland State University University of Arizona, University of Florida, Univ. of Illinois, University of Virginia VTLS (slides on digital repositories, NDLTD)

6 Acknowledgements: Support ACM, Adobe, AOL, CAPES, CNI, CNPq, CONACyT, DFG, FAEPEX, FAPESP, IBM, IMLS, Microsoft, NASA, NDLTD, NLM, NSF (IIS-9986089, 0080748, 0086227, 0307867, 0325579, 0532825, 0535057, 0535060; ITR-0325579; DUE-0121679, 0121741, 0136690, 0333531, 0333601, 0435059), OCLC, SOLINET, SUN, SURA, UNESCO, US Dept. Ed. (FIPSE), VTLS, …

7 7 Outline Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Retrieval CBISC, SIERRA Theory, Quality References Summary

8 Digital Libraries --- Objectives World Lit.: 24hr / 7day / from desktop Integrated “super” information systems: 5S: Table of related areas and their coverage Ubiquitous, Higher Quality, Lower Cost Education, Knowledge Sharing, Discovery Disintermediation -> Collaboration Universities Reclaim Property Interactive Courseware, Student Works Scalable, Sustainable, Usable, Useful

9 9

10 10 Alliteration 5S –Societies Users Collaboration, Web 2.0 –Scenarios Workflow, Stories Services, Components –Spaces: GIS –Structures: DBMS –Streams: DSMS 3C –Content Content Management Systems –Context Link Structure NLP Mental models –Criticism, commentary Annotation, Talmud Cataloging, indexing Abstracting Summarizing Secondary literature

11 11

12 12 Outline Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Retrieval CBISC, SIERRA Theory, Quality References Summary

13 13 Consider this scenario 1. Ingrid is a graduate student in the Fisheries department doing research on freshwater fish 2. In a field visit, she finds a unique-looking fish, and wants to know more. 3. She wants to search for related information based on others’ observa- tions, in the dept. DB. Also, she wants to enter new infor- mation about the fish into the DB. Source: http://umd.edu/http://umd.edu/ Source: http://umd.edu/http://umd.edu/

14 14 EKEY: The electronic key for identifying freshwater fishes

15 15 Next, Ingrid works on an assignment to gain familiarity with the capabilities of a new Biodiversity Information System. She is required to make the system help her with her complex integrated information need: “Retrieve fish descriptions of all fish whose shape is similar to that shown in the figure below, which belong to genus “Notropis”, which have “large eyes” and “dorsal stripe”, and have been observed within the catchments of the “Tennessee” river.”

16 16 Here is another scenario … An archeologist wants to write commentaries on artifacts discovered in the field –Manually annotate images (and parts) –Search for images (and parts), and annotations –Automatically annotate/tag similar images (and parts) –Share annotations and images Using an Archeology digital library in his study, he wants to be able to : Sources: http://www.dorsetforyou.com, http://www.archaeology.orghttp://www.dorsetforyou.comhttp://www.archaeology.org Source: http://www.bewegende-plaatjes.nethttp://www.bewegende-plaatjes.net

17 17 Functionality required Digital Library (DL) users need, but get little assistance, regarding tasks: –Selecting and Annotating images and parts of images Preserve original context of information Manual and automated annotation –Content-based image retrieval of images and parts of images (+ GIS + metadata + text …), machine learning of proper set of descriptors –Sharing selections and annotations

18 18 New Microsft Research grant Virginia Tech and UNICAMP (Brazil) Fisheries & Wildlife, Computer Science Tablet PCs: Content-Based Image Retrieval Superimposed Information +

19 19 Outline Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Retrieval CBISC, SIERRA Theory, Quality References Summary

20 20 Superimposed information (SI) New interpretation of existing information –New content, new structures Focuses on –Information at sub-document granularity –Information from heterogeneous sources (multimedia content) –Working with information in situ

21 21 Origin of SI This basic need had been addressed in diverse ways, with varying degrees of success, for many years: –concordances, annotations, comments –bookmarks, concept maps, digital annotations, … The term “SI” was coined in 1999 by researchers, currently collaborating with us, now at Portland State University –Lois Delcambre –David Maier

22 22 Layers in an SI system * Source: ICDE04 presentation by Murthy, et. al

23 23 Benefits Specificity of reference Flexibility –Identifying interesting (parts of) objects –Making connections between selections –Managing collections of selections References sub-document information –Preservation of context –Facilitates easy sharing of information

24 24 Superimposed Applications SIMPEL: A SuperImposed Multimedia Presentation Editor and pLayer 0 5 10 15 20 A C B Enhanced CMapTools

25 25 Combining CBIR and SI Associate images and parts of images, with related information such as annotations, hyperlinks, metadata records, etc. Perform CBIR on images and parts of images that have been annotated Combine text- (on annotations and other associated text information) and content-based (image content) search for more effective retrieval of images and parts of images

26 26 Outline Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Retrieval CBISC, SIERRA Theory, Quality References Summary

27 27 Content-Based Image Retrieval (CBIR) Retrieve images similar to a user-defined specification or pattern (e.g., shape sketch, image example) Goal: To support image retrieval based on content properties (e.g., shape, color or texture), usually encoded into feature vectors

28 28 Textual information retrieval Query on Google using Sunset and Rio de Janeiro Query result

29 29 Content Based Information Retrieval

30 30 Effective Image Description + Feature Extraction Feature Vector [0.98, 0.91, 0.73, ……] R B G B

31 Image descriptors Image Descriptor

32 32 Example: Histogram Image Corresponding histogram Frequency count of each individual color Most commonly used color feature representation Source: Andrade, D.

33 33 Texture Descriptors

34 34 Contour Saliences

35 35 Contour Segment Saliences

36 36 Multiscale Fractal Dimension Complex geometric shapes Defined by simple algorithms Non integer dimension Invariant under scaling

37 37 Multiscale Fractal Dimension (Experiments)

38 38 Introduced by Punam et al. in 2003. For a pixel p, it is the largest ellipse centered at p within the same homogeneous region. It extracts local structure information (thickness, orientation, and anisotropy). Tensor Scale Descriptor

39 39 0°0°180°90° Tensor Scale Image

40 40 Tensor Scale Image

41 41 Tensor Scale Descriptor

42 42 Tensor Scale Descriptor

43 43

44 44 A typical CBIR system Interface Query SpecificationVisualization Image Database Ranking Similarity Computation Query-processing Module Query Pattern Similar Images Feature Vector Extraction Feature Vectors Images Data Insertion

45 45 Outline Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Retrieval CBISC, SIERRA Theory, Quality References Summary

46 46 CBISC An OAI-compliant component that supports queries on image collections using content-based image retrieval May be customized to support different image collections

47 47 CBISC in ETANA

48 48 CBISC Descriptor Training

49 49 System’s Architecture Mediator Interface Data Insertion Module Query Processing Module GIS DBMS Geo. DB Metadata Image DB Databases

50 50 Content-Based Image Search Component (CBISC) OAI Eco Collection Metadata Taxonomic Trees Metadata-Based Search Component (ESSEX) Geographic Data Search Component (GDSC) Web Feature Server (WFS) Geo Collection MetadataMaps Image Collection Image Metadata Image Descriptors Images Image Collection Interface Query Specification Visualization Query Mediator Analysis Merging Execution BIS Manager HTTP Request (ListDescriptors) HTTP Request (GetImages) HTTP Request (keywords) HTTP Request (GetCapabilities) HTTP Request (GetFeatureType) HTTP Request (GetFeature)

51 51 CBISC Configuration Tool

52 52

53 53 Integrated support for SI applications in Biomedical Information Systems

54 54 SIERRA A tool that allows users to select parts of images and associate them with text annotations. Performs information retrieval as annotations and associated marks in two ways, either for: –images or marks similar (in content) to a specified image or mark –annotations containing specified query terms

55 55 Annotating an image

56 56 Searching over annotations

57 57 Searching over images/sub-images

58 58 DL services and tools drive quality Formal frameworks

59 59 Outline Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Retrieval CBISC, SIERRA Theory, Quality References Summary

60 60 The 5S framework A DL framework that defines constructs that lead to the definition of a minimal digital library Then, an archaeological DL Then, a practical DL Then, DL handling superimposed information... Plus, theory based Quality Models and Digital Librarian’s Quality Toolkit

61 61 The 5 S’s SsExamplesObjectives Streams Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data Structures Collection; catalog; hypertext; document; metadata Specifies organizational aspects of the DL content Spaces Measure; measurable, topological, vector, probabilistic Defines logical and presentational views of several DL components Scenarios Searching, browsing, recommending Details the behavior of DL services Societies Service managers, learners, teachers, etc. Defines managers, responsible for running DL services; actors, that use those services; and relationships among them

62 62

63 63 5S and DL formal definitions and compositions (April 2004 TOIS)

64 64 Digital Object Repository Collection Minimal DL Metadata Catalog Descriptive Metadata Specification A Minimal DL in the 5S Framework Structural Metadata Specification StreamsStructuresSpacesScenariosSocieties indexing browsing searching services hypertext Structured Stream

65 65 StreamsStructuresSpacesScenariosSocieties indexing browsing searching services hypertext Structured Stream Descriptive Metadata specification SpaTemOrg StraDia Arch Descriptive Metadata specification ArchDO ArchObj ArchColl Arch Metadata catalog ArchDColl ArchDR Minimal ArchDL A Minimal ArchDL in the 5S Framework

66 66 Formalizing CBIR services in DLs

67 67 Information model

68 68 Tools/Applications

69 5SQual: A Quality Assessment Tool for Digital Libraries

70 70 Digital Objects Metadata Services Completeness Conformance Accessibility Similarity Significanc e Timeliness Efficiency Reliability Numeric Indicators 5SQual - Dimensions

71 71 5SQual Archi- texture

72 Evaluations – XML Report

73 Evaluations – Charts

74

75 75 Outline Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Retrieval CBISC, SIERRA Theory, Quality References Summary

76 76 References (selected) Uma Murthy, Ricardo da Silva Torres, Edward A. Fox: SIERRA - A Superimposed Application for Enhanced Image Description and Retrieval. ECDL 2006: 540-543 Uma Murthy, Ricardo da Silva Torres, Edward A. Fox: Integrated Support for Superimposed Applications in Biomedical Information Systems, Virginia Tech, 2006 (for the National Library of Medicine), http://si.dlib.vt.edu/publications/NLMWhitePaperSI2.pdf. http://si.dlib.vt.edu/publications/NLMWhitePaperSI2.pdf M. A. Gonçalves. Streams, Structures, Spaces, Scenarios, and Societies: A Formal Framework for Digital Libraries and Its Applications: Defining a Quality Model for Digital Libraries (Chapter 8) – PHD thesis, Virginia Tech CS Dept., Blacksburg, VA, 2004. http://scholar.lib.vt.edu/theses/available/etd_12052004_135923/ M. A. Gonçalves, B. L. Moreira, E. A. Fox, L. T. Watson. What is a good digital library? - defining a quality model for digital libraries. To appear in Information Processing and Management, 2007. http://fox.cs.vt.edu/cv.htm

77 77 Summary Acknowledgements Digital Libraries Scenarios, Requirements Superimposed Information Content Based Information Retrieval CBISC, SIERRA Theory, Quality References Summary


Download ppt "Automation and Quality in Image Digital Libraries with Annotations Edward Fox, Uma Murthy and Ricardo Torres Florence, Italy 17 February 2007."

Similar presentations


Ads by Google