Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Analysis of Transaction Dataset 91.541 Data Visualization Homework 2 Hongli Li.

Similar presentations


Presentation on theme: "Statistical Analysis of Transaction Dataset 91.541 Data Visualization Homework 2 Hongli Li."— Presentation transcript:

1 Statistical Analysis of Transaction Dataset 91.541 Data Visualization Homework 2 Hongli Li

2 Dataset Introduction Generated by IBM Quest Synthetic Data Generation Code It’s Transaction Dataset It’s for Mining Association Rules Generation Parameter  Number of transaction = 1000  Average transaction length = 10 (default)  Number of items = 30

3 Transaction Dataset

4 Metadata No Missing Values Actual Transaction Number = 980 Actual Average Transaction Length = 9.24 Actual Number of Items = 30 The Most Frequent Item Is Item 12 (64%) The second Most Freq. Item is Item 9 (62%) Other Information

5 Pearson Correlation – Item × Item A measured of the degree of linear relation between two variables Person correlation matrix of Item x ItemItem x Item The most correlated two items are item 24 and item 1(0.138)

6 Pearson Correlation – TID × TID Pivot the dataset to get Item x TID matrixItem x TID Person correlation matrix of TID x TIDTID x TID The most correlated transaction are TID 9 and TID 857, the correlation coefficient between these two is 1

7 Conclusion Only using statistical tools is hard! Needs mining algorithms Visualization could help


Download ppt "Statistical Analysis of Transaction Dataset 91.541 Data Visualization Homework 2 Hongli Li."

Similar presentations


Ads by Google