Download presentation
Presentation is loading. Please wait.
1
Statistical Analysis of Transaction Dataset 91.541 Data Visualization Homework 2 Hongli Li
2
Dataset Introduction Generated by IBM Quest Synthetic Data Generation Code It’s Transaction Dataset It’s for Mining Association Rules Generation Parameter Number of transaction = 1000 Average transaction length = 10 (default) Number of items = 30
3
Transaction Dataset
4
Metadata No Missing Values Actual Transaction Number = 980 Actual Average Transaction Length = 9.24 Actual Number of Items = 30 The Most Frequent Item Is Item 12 (64%) The second Most Freq. Item is Item 9 (62%) Other Information
5
Pearson Correlation – Item × Item A measured of the degree of linear relation between two variables Person correlation matrix of Item x ItemItem x Item The most correlated two items are item 24 and item 1(0.138)
6
Pearson Correlation – TID × TID Pivot the dataset to get Item x TID matrixItem x TID Person correlation matrix of TID x TIDTID x TID The most correlated transaction are TID 9 and TID 857, the correlation coefficient between these two is 1
7
Conclusion Only using statistical tools is hard! Needs mining algorithms Visualization could help
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.