Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Classification Approach for Movie Recommender System 指導教授:黃三益 老師 學生: M964020007 黃于珊 M964020011 李界寬 M964020022 程尚文.

Similar presentations


Presentation on theme: "A Classification Approach for Movie Recommender System 指導教授:黃三益 老師 學生: M964020007 黃于珊 M964020011 李界寬 M964020022 程尚文."— Presentation transcript:

1 A Classification Approach for Movie Recommender System 指導教授:黃三益 老師 學生: M964020007 黃于珊 M964020011 李界寬 M964020022 程尚文

2 Agenda Introduction Motivation and background Determination of data set The Data Mining Procedure Conclusion and Limitation

3 1.MOTIVATION AND BACKGROUND 2.DETERMINATION OF DATA SET INTRODUCTION

4 Motivation and background Dataset 來源自 GroupLens ◦ (Research lab in the Department of Computer Science and Engineering at the University of Minnesota ; http://www.grouplens.org/)http://www.grouplens.org/ 線上電影推薦系統 -MovieLensMovieLens ( http://www.movielens.org/ )http://www.movielens.org/ ◦ 加入會員,評價隨機選出的數部電影,即 可享受到網站給予的五部電影之推薦,並 附上預測使用者喜好該電影的程度。 We all loves movies Find the rule

5

6 Determination of data set 使用 MovieLens 目前提供兩種 Datasets 的其中一種。 ◦ 內容包含 1682 部電影, 943 使用者,共 100,000 ratings 。 ◦ 提供足夠的樣本規模,讓我們可以適當的 建立和測試模型。

7 1.DATA MINING PROCEDURE:10 STEP 2. CONCLUSION AND LIMITATION The Data Mining Procedure

8 Step 1. Translate the business problem into a data mining problem 電影種類與數目相當繁多,如何在眾多 的電影中可以快速的找到符合自己偏好 的電影 ? ◦ 電影推薦系統 ◦ 縮短搜尋時間 ◦ Find the Rule  年齡、職業、性別等之偏好那些種類的電影 ◦ Potential customers

9 Step 2. Select appropriate data 線上電影推薦系統 -MovieLens Research lab in the Department of Computer Science and Engineering at the University of Minnesota ; http://www.grouplens.org/)http://www.grouplens.org/ 資料來源自加入其網站的會員對電影所作的評價與 會員的相關個人資料 其所提供的 Dataset 內容包含 1682 部電影, 943 使 用者,共 100,000 ratings 。

10 Step 3. Get to know the data(1/2) This data has been cleaned up ◦ users who had less than 20 ratings ◦ did not have complete demographic information

11 Step 3. Get to know the data(2/2) Attribute nameDescriptionDomain Age User 年齡 1: “Under 18” , 18: "18-24“ 25: “25-34” , 35: "35-44" 45: “45-49” , 50: "50-55“ 56: "56+” Gender User 性別 "M" 代表男性, "F" 代表女性 Occupation User 職業 0: "other" or not specified 1: “academic/educator” 2: "artist" 3: “clerical/admin” 4: "college/grad student“ And so on…… Movie Kind 電影類型 * Action * Adventure * Animation * Children‘s * Comedy * Crime * Documentary * Drama * Fantasy * Film-Noir * Horror * Musical * Mystery * Romance * Sci-Fi * Thriller * War * Western

12 Step 4. Create a model set Data Source – MovieLens (The GroupLens Research Project at the University of Minnesota) Data Characteristics: – 100,000 ratings (1-5) from 943 users on 1682 movies – Each user has rated at least 20 movies – seven-month period from September 19th, 1997 through April 22nd, 1998 – With complete demographic information

13 Step 5. Fix problems with the data Variable with too many values ◦ Movie kind ◦ Occupation ◦ We do not consider variables such as ZipCode and rate

14 Step 6.Transform data to bring information to the surface We skip this step due to the uselessness of transforming data into different formats

15 Step 7. Build models Data mining tool: ◦ Weka Explorer 3.4.12 Classifier ◦ Decision tree methods ◦ using C4.5 algorithm  Performs well on both accuracy and speed

16 Weka: the software

17 Step8. Assess Model Confusion Matrix Table 1. Confusion Matrix of Classifier C4.5 from Training Set The Kind of MovieRomanceThrillerWar Romance2,5767,46538 Thriller1,74215,64353 War1,0956,42890

18 Step8. Assess Model Detailed Accuracy Table 2. Detailed Accuracy of Classifier C4.5 from Training Set ClassTP RateFP RatePrecisionRecallF-Measure Romance0.2560.1130.4760.2560.333 Thriller0.8970.7850.530.8970.666 War0.0120.0030.4970.0120.023

19 Step8. Assess Model Other Information Table 3. The Results of Classifier C4.5 from Training Set Correctly Classified Instances 18,309Rate : 52.1178% Incorrectly Classified Instances 16,821Rate : 47.8822% Kappa statistic 0.1089 Mean absolute error 0.4023 Root mean squared error 0.4485 Relative absolute error 96.6655% Root relative squared error 98.3189% Total Number of Instances 35,130

20 Step 8. Assess Model Decision Tree ◦ Number of Leaves : 118 ◦ Size of the tree : 216

21

22 Step 9. Deploy Model It’s difficult to deploy, because ◦ Computer’s resources are not enough ◦ Difficult to implementation

23 Conclusion and Limitation Classification Approach : C4.5 → Decision Tree Data Set : 35,130 data Limitation ◦ Hardware and software don’t support enough to mining more data to find more interest and complete rules.

24 Thanks For Your Attention.


Download ppt "A Classification Approach for Movie Recommender System 指導教授:黃三益 老師 學生: M964020007 黃于珊 M964020011 李界寬 M964020022 程尚文."

Similar presentations


Ads by Google