Creating Probabilistic Databases from IE Models Olga Mykytiuk, 21 July 2011 M.Theobald.

Slides:



Advertisements
Similar presentations
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Advertisements

Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Decision Tree Approach in Data Mining
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
Dynamic Bayesian Networks (DBNs)
A COURSE ON PROBABILISTIC DATABASES June, 2014Probabilistic Databases - Dan Suciu 1.
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
CUSTOMER NEEDS ELICITATION FOR PRODUCT CUSTOMIZATION Yue Wang Advisor: Prof. Tseng Advanced Manufacturing Institute Hong Kong University of Science and.
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
… Hidden Markov Models Markov assumption: Transition model:
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Lecture 5: Learning models using EM
About ISoft … What is Decision Tree? Alice Process … Conclusions Outline.
Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han SIGMOD 2002.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
Computer vision: models, learning and inference
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
A Brief Introduction to Graphical Models
GTECH 361 Lecture 13a Address Matching. Address Event Tables Any supported tabular format One field must specify an address The name of that field is.
Notes for Chapter 12 Logic Programming The AI War Basic Concepts of Logic Programming Prolog Review questions.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically.
IJCAI 2003 Workshop on Learning Statistical Models from Relational Data First-Order Probabilistic Models for Information Extraction Advisor: Hsin-His Chen.
Inferring High-Level Behavior from Low-Level Sensors Don Peterson, Lin Liao, Dieter Fox, Henry Kautz Published in UBICOMP 2003 ICS 280.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
A Survey Based Seminar: Data Cleaning & Uncertain Data Management Speaker: Shawn Yang Supervisor: Dr. Reynold Cheng Prof. David Cheung
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Decision Trees. MS Algorithms Decision Trees The basic idea –creating a series of splits, also called nodes, in the tree. The algorithm adds a node to.
Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule Jim Little Uncertainty 2 Nov 3, 2014 Textbook §6.1.3.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
Randomized Algorithms for Bayesian Hierarchical Clustering
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
ID3 Algorithm Michael Crawford.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Theoretic Frameworks for Data Mining Reporter: Qi Liu.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Creating Imprecise Databases from Information Extraction Models Rahul Gupta Sunita Sarawagi (IBM India Research Lab) (IIT Bombay)
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
John Lafferty Andrew McCallum Fernando Pereira
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Stochastic Grammars: Overview Representation: Stochastic grammar Representation: Stochastic grammar Terminals: object interactions Terminals: object interactions.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.
Supervised Time Series Pattern Discovery through Local Importance
Probabilistic Data Management
Lecture 16: Probabilistic Databases
Probabilistic Ranking of Database Query Results
Presentation transcript:

Creating Probabilistic Databases from IE Models Olga Mykytiuk, 21 July 2011 M.Theobald

Outline  Motivation for probabilistic databases  Model for automatic extraction  Different representation  One-row model  Multi-row model  Approximation methods  One-row model approximation  Enumeration-based approach  Structural approach  Merging  Evaluation 2

Motivation Ambiguity:  Is Smith single or married?  What is the marital status of Brown?  What is Smith's social security number: 185 or 785?  What is Brown's social security number: 185 or 186? 3

Motivation Probabilistic database:  Here: 2 × 4 × 2 × 2 = 32 possible readings → can easily store all of them  200M people, 50 questions, 1 in ambiguous (2 options) → possible readings 4

Sources of uncertinity 5 Certain DataUncertain Data The temperature is C. Sensor reported 25 +/- 1 C. Bob works for Yahoo. Bob works for Yahoo or Microsoft. UDS is located in Saarbrücken. UDS is located in Saarland. Mary sighted a crow. Mary sighted either a crow (80%) or a raven(20%). It will rain in Saarbrücken tomorrow. There is a 60% chance of rain in Saarbrücken tomorrow. Olga's age is 18.Olga's age is in [10,30]. Paul is married to Amy. Amy is married to Frank. Precision Ambiguity Uncertainty about future Anonymization Inconsistent data Coarse-grained information Lack of information

Sources of uncertainty  Information extraction → from probabilistic models  Data integration → from background knowledge & expert feedback  Moving objects → from particle lters  Predictive analytics → from statistical models  Scientific data → from measurement uncertainty  Fill in missing data → from data mining  Online applications → from user feedback 6

Or-set tables 7 NameBirdSpecies BesnikBird-1Finch: 0.8 || Toucan: 0.2 NiketBird-2Nightingale: 0.65 || Toucan: 0.35 StephanBird-3Humming bird: 0.55 || Toucan: 0.45 t1 t2 t3 Observed Species Species Finch (t1,1) Toucan (t1,2) ˅ (t2,2) ˅ (t3,2) Nightingale (t2,1) Humming bird (t3,1)

Pc-table 8 FIDSSNName 1185SmithX=1 1785SmithX≠1 2185Brown Y=1 ˄ X≠1 2186Brown Y ≠1 ˅ X = 1 VDP X10.2 X20.8 Y10.3 Y20.7 FIDSSNName 1185Smith 2186Brown FIDSSNName 1185Smith 2186Brown FIDSSNName 1185Smith 2186Brown {X → 1, Y → 1 } {X → 1, Y → 2 } 0.2× ×0.7=0.2 {X → 2, Y → 1 } 0.8×0.3=0.24 {X → 2, Y → 2 } 0.8×0.7=0.56

Tuple-independent databases 9 SpeciesP Finch0.80X1 Toucan0.71X2 Nightingale0.65X3 Humming bird0.55X4 Birds  P (Finch) = P(X1) = 0.8  Is there a finch?  Q ← Birds(Finch)  P (Q ) = 0.8  Is there some bird?  Q ← Birds(s)?  Q = X1 ˅ X2 ˅ X3 ˅ X4  P (Q ) = 99,1%

Outline  Motivation for probabilistic databases  Model for automatic extraction  Different representation  One-row model  Multi-row model  Approximation methods  One-row model approximation  Enumeration-based approach  Structural approach  Merging  Evaluation 10

Semi-CRF  Input: sequence of tokens  Output: segmentation s With a label  Y consists of K attribute labels And a special “Other” A probability distribution over s: 11

Semi-CRF “ 52-A Goregaon West Mumbai PIN ” Goregaon Mumbai PIN Y1 Y4 Y5 Y6Y7 West A Y2 Y3 CityAreaHouse_no Zip Other

Semi-CRF Goregaon Mumbai PIN Y1 Y4 Y5 Y6Y7 West A Y2 Y3 City Area House_no Zip Other City Area House_n o Zip Other other

Number of segmentation required 14

Outline  Motivation for probabilistic databases  Model for automatic extraction  Different representation  One-row model  Multi-row mode l  Approximation methods  One-row model approximation  Enumeration-based approach  Structural approach  Merging  Evaluation 15

Segmentation per row Gorega on Mumb ai PIN Y1 Y4 Y5 Y6Y7 We st A Y2 Y3 City Area House_no Zip Other City Area House_ no Zip Other other

One Row Model Let be probability for segment Probability of the query Pr((Area=‘Goregaon West’),City=‘Mumbai’) = 0.6×0.6 =

One Row Model Pr((Area=‘Goregaon West’),City=‘Mumbai’) = =

Multi-row Model  Let denote the row probability of row  - multinomial parameter for the segment for column y of the row Pr((Area=‘Goregaon West’),City=‘Mumbai’) = 1*1*0.6+0*0*0.4 =

Outline  Motivation for probabilistic databases  Model for automatic extraction  Different representation  One-row model  Multi-row model  Approximation methods  One-row model approximation  Enumeration-based approach  Structural approach  Merging  Evaluation 20

Approximation Quality  Kullback–Leibler divergence  The parameters for One-Row model: 21

Parameters for One Row Model  A Probability of segmentation s in model:  The marginal probabilityof segment s: 22

Computing Marginals  Forward pass: let be  Backward pass  Computing marginals: 23

Computing Marginals 24 SE H_no city Zip other area H_no city Zip other area H_no city Zip other area H_no city Zip other area … ∑(Pr) = α ∑(Pr) = β

Parameters for Multi-Row model  m – number of rows  Compute:  Row probabilities  Distribution parameters Where objective 25

Enumeration-based Approach  Let be an enumeration of all segments  Objective Expectation-Minimization algorithm  E step  M step 26

Structural Approach  Components cover disjoint sets of segmentation  Binary decision tree  Each segmentation – one of the path 27

Structural Approach  Three kinds of variables:  For a given condition c entropy measure:  Information gain for 28

Computing parameters 29 S E H_no city Zip other area H_no city Zip other area H_no city Zip other area H_no city Zip other area … ∑(Pr) = α ∑(Pr) = β Under condition c

Structural Approach 30 A B s1s1 s2s2 s3s3 ’52-A’, House_no ‘West’,_ yes no C s4s4 yes no

Merging structures Use E-M algorithm for all paths until converges:  M-step  E-step  Column of row are independent  Each label defines a multinomial distribution over it’s possible segments → generate one MD from another 31

Merging structures example For disjoint segmentation: s1= {‘52-A’, ‘Goregaon West’, ‘Mumbai’, } s2= {’52’, ‘Goregaon’, ‘West Mumbai’, }... For m=2 rows: R[1,s1] =0.2 R[1,s2] =0.1 R[2,s2] =0.9 R[2,s1] =0.8 s1, s2 → row 2 32

Outline  Motivation for probabilistic databases  Model for automatic extraction  Different representation  One-row model  Multi-row model  Approximation methods  One-row model approximation  Enumeration-based approach  Structural approach  Merging  Evaluation 33

Evaluation  Two datasets  Cora  Address dataset  Strong(30%, 50%), Weak CRF (10%) 34

Comparing Models Comparing divergence of 2 models with the same number of parameters 35

Comparing Models 36 Variation of k with m_0, ξ = 0.005

Impact on Query Result 37

Impact on Query Result Correlation between KL and inversion score. For StructMerge approach, m=2, ξ =

Questions? 39

References 1.Rahul Gupta, Sunita Sarawagi “Creating Probabilistic Databases from IE Models” 2.Reiner Gemulla, Lecture Notes of Scalable Uncertainty Management. 3.Wikipedia divergence 40