Download presentation
Presentation is loading. Please wait.
Published byRosalind Allison Modified over 9 years ago
1
Shaoxu Song 1, Aoqian Zhang 1, Lei Chen 2, Jianmin Wang 1 1 Tsinghua University, China 2Hong Kong University of Science & Technology, China 1/19 VLDB 2015
2
Motivation Imputation Methods Preliminary Exact Solutions Approximation Method Experiments Conclusion 2/19 VLDB 2015
3
Fill the missing data With values of neighbors Editing rules Statistically by relational dependency networks Sparsity Values with variances Extensive similarity neighbors Using similarity rules Tolerance to small variations 3/19 VLDB 2015
4
4/19 Equality rules Functional dependency (FD) Rather rare Similarity rules Differential dependency (DD) Plenty of them VLDB 2015
5
Motivation Imputation Methods Preliminary Exact Solutions Approximation Method Experiments Conclusion 5/19 VLDB 2015
6
6/19 VLDB 2015
7
7/19 VLDB 2015
8
8/19 VLDB 2015
9
9/19 VLDB 2015
10
10/19 Examples DD1: (Name, Street → Address, [0,1],[0,9],[0,2]) DD2: (Street → Address, ) VLDB 2015
11
11/19 VLDB 2015
12
12/19 VLDB 2015
13
13/19 VLDB 2015
14
Motivation Cleaning Methods Preliminary Global Optimum Local Optimum Experiments Conclusion 14/19 VLDB 2015
15
15/19 Real datasets RESTAURANT: with name, address, type, and city information of 864 restaurants. UIS 1 :data base generator, using at most 100k of them Errors Injected errors in the RESTAURANT and UIS by removing values Criteria Recall, precision and f-measure 1. http://www.cs.utexas.edu/users/ml/riddle/data.html VLDB 2015
16
16/19 Equality based CERTAIN 1 ERACER 2 Similarity based MIBOS 3 CMI 4 1.W. Fan, J. Li, S. Ma, N. Tang, and W. Yu. Towards certain fixes with editing rules and master data. PVLDB, 3(1):173–184, 2010. 2.C. Mayfield, J. Neville, and S. Prabhakar. ERACER: a database approach for statistical inference and data cleaning. In SIGMOD Conference, pages 75–86, 2010 3.S. Wu, X. Feng, Y. Han, and Q. Wang. Missing categorical data imputation approach based on similarity. In SMC, pages 2827–2832, 2012. 4.S. Zhang, J. Zhang, X. Zhu, Y. Qin, and C. Zhang. Missing value imputation based on data clustering. Trans. on Computational Science, 1:128–138, 2008. VLDB 2015
17
17/19 VLDB 2015
18
Motivation Cleaning Methods Preliminary Global Optimum Local Optimum Experiments Conclusion 18/19 VLDB 2015
19
Imputing missing values Explore the extensive similarity neighbors Exact approaches Approximation approaches Efficient approximation algorithms are devised with certain performance guarantees More room for DDs 19/19 VLDB 2015
20
Thanks ! 20/19 VLDB 2015
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.