Download presentation
Presentation is loading. Please wait.
1
A presentation by W H Inmon BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA
2
- unstructured data -.doc files -.txt files -.xls files - email - transcripted telephone The informal systems of the corporation: Email.Txt.Doc - structured systems - structured data - corporate transactions - corporate reports - corporate databases -customer files - audit reports The formal systems of a corporation: Program
3
It is estimated that less than 20% of corporate systems are structured. 80 % Email.Txt.Doc 20% Program
4
Email.Txt.Doc search engines legal discovery email archive taxonomy ontology document mgmt web content Program dbms business intelligence applications transactions OLTP ERP compliance imagine what would happen if the two worlds could be integrated……. the world of dbms, analytics, and other processing opens up.
5
Email.Txt.Doc search engines legal discovery email archive taxonomy ontology document mgmt web content Program dbms business intelligence applications transactions OLTP ERP compliance Email.Txt.Doc tight integration between the two types of data.
6
There is a gulf between the two worlds: - technology - business practice - organizational - historical Email.Txt.Doc Program
7
Think of the possibilities! Email.Txt.Doc Program
8
Imagine this - Reports and visualization show a lot. have you ever wondered why you can’t hook up your Business Objects to email? or telephone conversations?
9
Email.Txt.Doc text numbers There is a fundamental disconnect between unstructured data and business intelligence. So what would happen if we had powerful visualization for text? Business Intelligence
11
liver cancer skin cancer thirst diabetes blood pressure correlative information becomes very easy to spot
12
for the general population for women for women who smoke over the age to 50 doing analysis on sub populations of women
13
for the general population for women who smoke over the age to 50 the contrast between the different correlations of different populations leads to great insight
14
service delivery late broken installation salesman attitude wait too long did not fit what about looking at customer feedback – complaints? now you can see the broader picture of what is happening
15
but there are plenty of other places where the technology applies – - manufacturing warranties – (what patterns of defects are there?) - Weblogs (marketing – who is saying what?) - customer complaints – (what are the problem products?) - general email – (What’s the buzz? what is on people’s minds?) - insurance claims (what are the circumstances of accidents?)
16
Email.Txt.Doc another possibility is the monitoring of email and the transport of email to the structured environment
17
Monitoring emails and other corporate conversations - Email.Txt.Doc Sarbanes Oxley HIPAA BASEL II compliance – making sure that email is being used properly - compliance - corporate standard for language
18
Jan 3 - vp to vp “This is going to be a real barn burner of a quarter….” Jan 5 – finance to vp “It looks like we are going to do $9,000,000 this quarter…” Jan 5 – president to analyst “This quarter looks like we are going to break new records…” Feb 1 – employee to employee “Did you see the stock market? Everything is going down…” Feb 3 – president to vp “What is happening to sales in the midwest? We didn’t expect this…” Feb 4 – sales manager to vp Feb 3 – vp to vp “The sales cycle looks like it is extending. The economy is tanking…” “It looks like we are going to be a little short this quarter…” Feb 6 – president to vp “What are we going to do to get sales up? Do we need to do some discounting?” Mar 2 – sales person to vp “Demand has dried up. We aren’t going to close as many sales this quarter as we thought…” A bunch of emails and conversations: What do you do with them?
19
Jan 3 - vp to vp “This is going to be a real barn burner of a quarter….” Jan 5 – finance to vp “It looks like we are going to do $9,000,000 this quarter…” Jan 5 – president to analyst “This quarter looks like we are going to break new records…” Feb 1 – employee to employee “Did you see the stock market? Everything is going down…” Feb 3 – president to vp “What is happening to sales in the midwest? We didn’t expect this…” Feb 4 – sales manager to vp Feb 3 – vp to vp “The sales cycle looks like it is extending. The economy is tanking…” “It looks like we are going to be a little short this quarter…” Feb 6 – president to vp “What are we going to do to get sales up? Do we need to do some discounting?” Mar 2 – sales person to vp “Demand has dried up. We aren’t going to close as many sales this quarter as we thought…” Examining emails (“combing” them) for important corporate information: Sarbanes Oxley quarter stock sales discount demand sales cycle external categories
20
sales email – Feb 2 email – Mar 5 phone – Mar 8 ……………… quarter email – Jan 2 email – Jan 4 email – Feb 5 ……………… discount phone conversation – Jan 6 email – Jan 12 email – Jan 14 ………………………….. sales cycle email – Feb 24 phone conversation – Mar 14 meeting notes – Mar 18 ……………………………. Structured Environment The “combed” information is brought over to the structured environment. Now you can use standard tools, such as Cognos, Business Objects, Crystal Reports, MicroStrategy to do analysis.
21
customer data probabilistic match Emails and telephone conversations can be linked to CDI/CRM data. But there are other ways that communications can be used
22
A true 360 degree view of the customer can be formed. “I placed an order last week and when it arrived it was the wrong size. And then your company would not take it back. I’m mad.” how easy is it going to be to engage Mrs Jones until she has satisfaction about her order
23
A true 360 degree view of the customer can be formed. communications demographics delivering on the promise of CDI
24
Email.Txt.Doc Program can’t I just use a search engine to link the two worlds? integration search engines do not integrate textual information
25
Email.Txt.Doc Program integration text doesn’t need to be searched, it needs to be integrated
26
Email.Txt.Doc Program integration “ha” “head ache” “heart attack” “Hepatitis A”
27
Email.Txt.Doc Program integration “oblique fractured ulna” “oblique fractured tibia” “obliq fractured tarsi” “broken bone”
28
Email.Txt.Doc Program 1 – stop word editing 2 – stemming 3 – synonym replacement 4 – synonym concatenation 5 – homograph resolution 6 – alternate spelling resolution 7 – external category classification 8 – theming 9 – probabilistic matching 10 – negation exclusion 11 – concept clustering 12 – mid process editing 13 – change sensitivity What is meant by editing, integrating text? integration
29
Email.Txt.Doc Program For a detailed description of how the unstructured environment should be linked to the structured environment, go to - www.inmoncif.com and look for DW 2.0 TM or go to - www.inmondatasystems.com
30
Unstructured Data Structured Environment Query Business Objects, Cognos, MicroStrategy, Crystal Reports DB2 probabilistic match visualization
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.