Unstructured Information, Information Audit / Workflow and Discovery

Unstructured Information, Information Audit / Workflow and Discovery
Peter Fox Xinformatics 4400/6400 Module 9, March 27, 2017 1 1

Contents Information Audit Unstructured Information 2
2 2

Businessdictionary.com Analysis and evaluation of a firm's information system (whether manual or computerized) to detect and rectify blockages, duplication, and leakage of information. 3 3

Objective? The objectives of this audit are to improve accuracy, relevance, security, and timeliness of the recorded information. 4 4

What is an information audit?
An information audit is a process that effectively determines the current information environment within an organization by identifying and mapping: What information is currently available? Where the information lives? Npoki.org ( adapted directly from. 5 5

Results/ format (e.g.) The results of an information audit are twofold: there is a detailed report which includes: What information do staff acquire? Where from? At what cost? How is it used? What information do staff create? What happens to it? Where does it go? Npoki.org 6 6

Results/ format (e.g.) What information is stored and why? What purpose will it serve? What information is passed on or delivered? To whom? For what purpose? In what form? Npoki.org 7 7

Results/ format (e.g.) Is there a gap, or a match, between that which is available and that which is needed? What are the skills and responsibilities of the people who carry out these tasks? What equipment and tools do they have available (hardware, software, filing cabinets, web sites, etc)? Npoki.org 8 8

Results/ format (e.g.) Are there any control documents, such as policy statements, guidelines, service level agreements, procedures, manuals? Is any of the information (produced, acquired, processed, re-delivered, or stored) superfluous to needs? Are any of the information-handling activities non-productive? Npoki.org 9 9

Results/ format (e.g.) There is also a detailed flow chart:
A visual map that show the areas, processes, functions and activities through which information passes, clarifying gaps or fault-lines that need to be plugged or bottlenecks and overflows that need to be unblocked Sound familiar? Npoki.org 10 10

How to use? An information audit can be used as a baseline for making major improvements to the business process of an organization. It is extremely helpful in the identifying, buying, and implementation of enterprise systems finance systems, portfolio management systems, document management systems, learning and knowledge management systems, etc. Npoki.org 11 11

Remember the use case doc?
Present the tabular form of the Burnett format since it shows the integrative information and is the main difference between the formats Developed for NASA TIWG 12

Developed for NASA TIWG
Event/application Developed for NASA TIWG

Remember It never hurts to know what you have
Build it into the routine and do not leave it as an after-thought (yep, just like documenting your code!) To help – MIF – Management Information Format (supported by MSDN applications, like SMS) 14

Sources and uses of unstructured information
- audio, video, graphics, social media messages, etc. – that which fall outside the purview of traditional databases 16 16

Data<->Information<->Knowledge
Where is the structure? Experience Data Information Knowledge Creation Gathering Presentation Organization Integration Conversation Context 17

Informatics Oh, wait – people structure information!
Cognitive processes Semiotics Mental representation Intuition Expertise But not in the same way computers can! 18

http://www. dashboardinsight
19 19

So what happens? If a structured representation of fundamentally unstructured information is useless? Why would it be? What role does visual representation play in structuring information? Hint: 20

More than 10 years ago… Unstructured Information Management Architecture (UIMA) from IBM “Unstructured information management (UIM) applications are software systems that analyze unstructured information (text, audio, video, images, and so on) to discover, organize, and deliver relevant knowledge to the user. In analyzing unstructured information, UIM applications make use of a variety of analysis technologies, including statistical and rule-based Natural Language Processing (NLP), Information Retrieval (IR), machine learning, and ontologies. IBM's Unstructured Information Management Architecture (UIMA) is an architectural and software framework that supports creation, discovery, composition, and deployment of a broad range of analysis capabilities and the linking of them to structured information services, such as databases or search engines. The UIMA framework provides a run-time environment in which developers can plug in and run their UIMA component implementations, along with other independently-developed components, and with which they can build and deploy UIM applications.” 21

From way back… 22

Data<->Information<->Knowledge
Future? Experience Data Information Knowledge Creation Gathering Presentation Organization Integration Conversation Context 24

Reading for this week http://en.wikipedia.org/wiki/Information_audit
UIMA - SPAR - 25

Logical Collections The primary goal of a Management system is to abstract the physical collection into logical collections. The resulting view is a uniform homogeneous collection. Note the analogy with logical models and information integration: so EARLY ON Identifying naming conventions and organization Aligning cataloguing and naming to facilitate search, access, use (who uses?) Provision of **contextual** information 26

Physical Handling Note analogy to physical models
Map between physical and logical. Where and who does it come from? Is there a transfer into a physical form? Is it backed-up, archived, cached? … What formats? Naming conventions – do they change? Note analogy to physical models 27

Interoperability Support
28

Security Access authorization and change verification. This is the basis of trusting your information. What mechanisms exist for securing? Who performs this task? Change and versioning (yes, the information may change), who does this, how? Who has access? How are access methods controlled, audited? Who and what – authentication and authorization? Encryption and integrity 29 29

Ownership Who is responsible for quality and meaning 30
Science commons Rights and policies – definition and enforcement Limitations on access and use Requirements for acknowledgement and use Who and how is quality defined and ensured? Who may ownership migrate too? How to address replication? How to address revised/ derivative products? 30 30

Metadata Recall metadata are data about data. Metainformation? 31
Metainformation are information about information How to know what conventions, standards, best practices exist? How to use them, what tools? Understanding costs of incomplete and inconsistent metadata Understanding the line between metadata and data and when it is blurred Knowing where and how to manage metadata and where to store it (and where not to) 31 31

Persistence Deployment of mechanisms to counteract technology obsolescence. Where will you put your information so that someone else (e.g. one of your class members) can access it? What happens after the class, the semester, after you graduate? What other factors are there to consider? 32 32

Discovery Ability to identify useful relations and information inside the collection More on this later in this class If you choose (see ownership and security), how does someone find your information? How would you provide discovery of collections, versus files, versus ‘bits’? How to enable the narrowest/ broadest discovery? 33 33

Dissemination Mechanisms to make aware the interested parties of changes and additions to the collections. Do you rely on information retrieval? The Web? Who should do this? How and what needs to be put in place? How to advertise? How to inform about updates? How to track use, significance? 34 34

Summary of Information Management
Creation of logical collections Physical handling Interoperability support Security support Ownership Metadata collection, management and access. Persistence Knowledge and information discovery Dissemination and publication 35 35

Note for your project writeup!
Information management! Cover the 9 areas. 36

Information Workflow What is a workflow? Why would you use it?
Key considerations for information, cf. data Some pointers to workflow systems 37

What is a workflow? General definition: “series of tasks performed to produce a final outcome” (taxes?) Information workflow – involves people but potentially want to Automate jobs that a person traditionally performed manually Process large volumes of information faster than one could do by hand NB difference from data workflows – it reaches out to encompass the user (e.g. ‘unrecorded actions’) Adapted from Slides by Bill Howe UW 2006 38 38

Background: Business Workflows
Example: planning a trip Need to perform a series of tasks: book a flight, reserve a hotel room, arrange for a rental car, etc. Each task may depend on outcome of previous task Days you reserve the hotel depend on days of the flight If hotel has shuttle service, may not need to rent a car Prior information, experience, preferences… Adapted from Slides by Bill Howe UW 2006 39 39

Tripit.com? 40

What about information workflows?
Perform a set of transformations/ operations on information source(s) Examples Generating images from raw data Identifying areas of interest from a large information source (e.g. word cloud) Classifying a set of objects Querying a web service for more information on a set of objects Many others… Adapted from Slides by Bill Howe UW 2006 41 41

More on Workflows Can process many information types:
Archives Web pages Streaming/ real time Images Semiotic systems Robust workflows depending on formal (concept and logical) models of the flow of information among components May be simple and linear or very complex Adapted from Slides by Bill Howe UW 2006 42 42

Challenges Questions: Mastering a programming language
What are some challenges for users in implementing workflows? What are some challenges to executing these workflows? What are limitations of writing a program? Mastering a programming language Visualizing workflow Sharing/exchanging workflow Formatting issues Locating datasets, services, or functions Adapted from Slides by Bill Howe UW 2006 43 43

Workflow Management Systems
44 44

Benefits of Workflows Documentation of aspects of analysis
Visual communication of analytical steps Ease of testing/debugging Reproducibility Reuse of part or all of workflow in a different project Adapted from Slides by Bill Howe UW 2006 45 45

Additional Benefits Integration of and between multiple computing environments ‘Automated’ access to distributed resources via other architectural components, e.g. web services and Grid technologies System functionality to assist with information integration of heterogeneous components and source Adapted from Slides by Bill Howe UW 2006 46 46

Why not just use a script?
Script does not specify low-level task scheduling and communication May be platform-dependent Can’t be easily reused May not have sufficient documentation to be adapted for another purpose Based on Howe (2006) 47 47

Why can a GUI be useful? No need to learn a programming language
Visual representation of what workflow does Allows you to monitor workflow execution Enables user interaction (though not necessarily collaboration) Facilitates sharing of workflows Adapted from Howe (2006) 48 48

Some workflow systems Kepler SCIRun Sciflo Triana Taverna Pegasus
Some commercial tools: Windows Workflow Foundation Mac OS X Automator See reading for this week 49

Discovery How does someone find your information?
How would you provide discovery of collections files ‘bits’ How would you find -> 50

Discovery Search (Federated Search) Helped by Find photos of Kim
Folksonomies (user contributed) Intelligent Agents Search Engines Taxonomies Find photos of Kim Boy or girl? 51 51

Use cases Find a sound recording of a swallow. Excuse me? 52

Use cases Find a sound recording of an African Swallow
Find a sound recording of a bird that sounds like an African Swallow Media types – how can you discover them? 53

Use cases Find the movie that Jean Tripplehorn first starred in/ that was her most successful/ was lead actress? Has anyone gene sequenced a mouse? Find images of primary productivity in the North Atlantic Discovery can often involve information integration (or is it *almost always*?) 54

Three level ‘metadata’ solution for DATA
Data Discovery Data Integration Level 1: Data Registration at the Discovery Level, e.g. Volcano location and activity Level 2: Data Registration at the Inventory Level, e.g. list of datasets, times, products Level 3: Data Registration at the Item Detail Level, e.g. access to individual quantities From GEON, ack Krishna Sinha Earth Sciences Virtual Database A Data Warehouse where Schema heterogeneity problem is Solved; schema based integration Ontology based Data Integration Using scientific workflows 55 A.K.Sinha, Virginia Tech, 2006 55

Three level ‘metadata’ solution?
Information Integration Information Discovery Level 1: Registration at the Discovery Level, e.g. Find the upper level entry point to a source Level 2: Registration at the Inventory Level, e.g. list of datasets, using the logical organization Level 3: Registration at the Item Detail Level, i.e. annotation e.g. tagging From GEON, ack Krishna Sinha Catalog/ Index Schema based integration Integration using mapping management 56 A.K.Sinha, Virginia Tech, 2006 56

Information discovery
What makes discovery work? Metadata Logical organization Attention to the fact that someone would want to discover it It turns out that file types are a key enabler or inhibitor to discovery Result ranking using *tuned* algorithm What does not work? Result ranking algorithms that depend on unconventional information types (icon, index, symbol) 57

Federated search “is the simultaneous search of multiple online databases or web resources and is an emerging feature of automated, web-based library and information retrieval systems. It is also often referred to as a portal or a federated search engine.” wikipedia Libraries have been doing this for a long time (Z39.50, ISO23950) Key is consistent search metadata fields (keywords) E.g. Geospatial One Stop Implement metatags on your and your partners web sites Update content frequently Utilize annotation, e.g. RDFa, or support schema.org Register your site with the major search engines (tools exist to aid in this process) Perform a basic study of where your site results within the major search engine providers Do not spam the search engine providers Re-evaluate your web site directory structure to ensure information is appropriately categorized/ described within your URL strings Look through your server log files to determine what users are trying to find on your site and/or the path they are using to find information Perform basic usability testing of your site to determine what users expect and can easily gather from your site. This also may determine why users go to an Internet search engine provider versus accessing your site directly. Realize that Internet search engines dont all act the same, index at the same time period, and often value a particular metatag, document date, etc. more than another vendor product. 58 58

Smart search Semantically aware search, e.g. , Faceted search, e.g. exhibit (MIT), S2S (RPI ) Open search is hope you can choose search engine providers in your browser but it can be a lot smarter Elastic search - – json based (flattened and indexed), RESTful 59

NOESIS 60

Faceted search 61 logd.tw.rpi.edu

Summary - discovery Useful to write a few discovery use cases to drive how your design is developed Evolution of your role in facilitating discovery and what/ how others implement access to your information 62

Reading for this week Is retrospective 63

Check in for Project Assignment
Prototype deployment of an information system content and architecture, design, etc. A new use case 64

What is next Today – project group meetings/ check in
April 03 – Information Discovery and Retrieval, Knowledge Representation April 10 – Information Quality and Bias April 17 – Global Change Information System April 24 – final project presentations (BE ON TIME, i.e. 5-10mins BEFORE 3PM) Final project due Be prepared to be asked (and answer) questions 65

Unstructured Information, Information Audit / Workflow and Discovery

Similar presentations

Presentation on theme: "Unstructured Information, Information Audit / Workflow and Discovery"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Unstructured Information, Information Audit / Workflow and Discovery

Similar presentations

Presentation on theme: "Unstructured Information, Information Audit / Workflow and Discovery"— Presentation transcript:

Similar presentations

About project

Feedback