Can we build recommender system for artwork evaluation?

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Florida International University COP 4770 Introduction of Weka.
Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Decision Tree Approach in Data Mining
Machine Learning in Practice Lecture 3 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Finding Similar Music Artists for Recommendation Presented by :Abhay Goel, Prerak Trivedi.
User and Task Analysis Howell Istance Department of Computer Science De Montfort University.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
ACM SAC’06, DM Track Dijon, France “The Impact of Sample Reduction on PCA-based Feature Extraction for Supervised Learning” by M. Pechenizkiy,
Learning to Extract Form Labels Nguyen et al.. The Challenge We want to retrieve and integrate online databases We want to retrieve and integrate online.
Lecture 5 (Classification with Decision Trees)
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Chapter 5 Data mining : A Closer Look.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Big data analytics with R and Hadoop Chapter 5 Learning Data Analytics with R and Hadoop 데이터마이닝연구실 김지연.
Evaluating Performance for Data Mining Techniques
Objects What are Objects Observations
Rapid Miner Session CIS 600 Analytical Data Mining,EECS, SU Three steps for use  Assign the dataset file first  Select functionality  Execute.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata Anon Plangprasopchok 1, Kristina Lerman 1, Lise Getoor 2 1 USC.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Beyond Co-occurrence: Discovering and Visualizing Tag Relationships from Geo-spatial and Temporal Similarities Date : 2012/8/6 Resource : WSDM’12 Advisor.
User Profiling based on Folksonomy Information in Web 2.0 for Personalized Recommender Systems Huizhi (Elly) Liang Supervisors: Yue Xu, Yuefeng Li, Richi.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Experimental Evaluation of Learning Algorithms Part 1.
Microarrays.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Weka Just do it Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato New Zealand.
Semi-automatic Product Attribute Extraction from Store Website
Apache Mahout Qiaodi Zhuang Xijing Zhang.
CS378 Final Project The Netflix Data Set Class Project Ideas and Guidelines.
Lecture Notes for Chapter 4 Introduction to Data Mining
Multimedia Analytics Jianping Fan Department of Computer Science University of North Carolina at Charlotte.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
The Wisdom of the Few Xavier Amatrian, Neal Lathis, Josep M. Pujol SIGIR’09 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Recommendation Systems ARGEDOR. Introduction Sample Data Tools Cases.
FLAX Shaoqun Wu and Ian H. Witten Computer Science Department Waikato University New Zealand Utilizing lexical data from a web-derived.
Data mining in web applications
Oracle Advanced Analytics
Experience Report: System Log Analysis for Anomaly Detection
Sharing of previous experiences on scraping Istat’s experience
Hierarchical Recommender System for Improving NPS.
NPS-Based Recommender System for Increasing Business Revenue.
Project Participants Mitch Campion, M.S. Graduate Student
Saisai Gong, Wei Hu, Yuzhong Qu
Personalized Social Image Recommendation
Waikato Environment for Knowledge Analysis
From Knowledge Discovery to Customer Attrition
Project Implementation for ITCS4122
E-learning system "Carat"
Prepared by: Mahmoud Rafeek Al-Farra
KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner
Seattle Event Finder Justin Meyer Jessica Leung Jennifer Hanson
Classification and Prediction
Topic 5: Cluster Analysis
Discovering Companies we Know
Recommender System for Pricing Contemporary Fine Art
Data Mining CSCI 307, Spring 2019 Lecture 8
CLIRS: Customer Loyalty Improvement Recommender System.
Presentation transcript:

Can we build recommender system for artwork evaluation? Zbigniew W. Ras & Anna Gelich KDD Laboratory Computer Science Department University of North Carolina, Charlotte Research Sponsored by

Assigning Price Tags for Art - BIG DATA Problem Semantic WEB – Very rich source of data/knowledge about artists & Art Websites with information about artists: ArtPrice (649,129 artists) National Gallery, Artists Signatures Websites with information about art: ArtPrice (detailed auction results, marked trends) Saatchiart, Artlog, DevianArt (26 milion members, 1.5 milion comments daily), ….. Recommender System Partitioning Artists into Semantically Similar Buckets & treating buckets as a WEB Crowd (1) Data Collection: Extracting Information from WEB using crawlers, spiders [Apache Nutch based on Apache Hadoop] Data Processing: Extracting Features including Price (discretization cuts: 480, 1020, 2335, 5100, 10650, 19500, …) (3) Data Mining /Map Reduce/

x2 Sample Attributes: Has to be discretized [0,480), [480,1020), [1020,2335), [2335,5100), [5100,10650), [10650,19500), [19500, -) Price discretized Dense area Sparse area 480 1020 2335 5100 10650 19500 x2 x2 x2 x2 x2 Price Distribution 20,543 paintings Price range: 0 – 40,000 Has to be discretized

Confusion Matrix f a b d e c g  classified as Number 1863 731 264 71 f = [0,480) 2933 646 2401 1054 384 53 3 a = [480, 1020) 4543 250 1060 2568 1236 134 12 8 b = [1020, 2335) 5268 109 374 1135 2915 430 41 15 d = [2335, 5100) 5019 26 88 229 822 676 73 25 e = [5100, 10650) 1939 6 14 57 166 123 78 9 c = [10650, 19500) 453 4 18 85 44 11 214 g = [19500, -) 388 It looks like paintings within the range $5,100 – $19,500 are overpriced Decision Tree Classifier (Precision) Price Range Correctness Left Ext. Correctness Right Ext. Correctness Overpriced (?) [0,480) 64% - 88% [480, 1020) 53% 67% 76% NO [1020, 2335) 49% 69% 72% [2335, 5100) 58% 81% YES [5110, 10650) 35% 77% 39% [10650, 19500) 17% 44% 19% [19500, -) 55% Average Precision - 52%; Left Ext Average Precision – 71% ; Right Ext Average Precision – 69%

Confusion Matrix f a b d e c g  classified as Number 1863 731 264 71 f = [0,480) 2933 646 2401 1054 384 53 3 a = [480, 1020) 4543 250 1060 2568 1236 134 12 8 b = [1020, 2335) 5268 109 374 1135 2915 430 41 15 d = [2335, 5100) 5019 26 88 229 822 676 73 25 e = [5100, 10650) 1939 6 14 57 166 123 78 9 c = [10650, 19500) 453 4 18 85 44 11 214 g = [19500, -) 388 It looks like paintings within the range $5,100 – $19,500 are overpriced Decision Tree Classifier (Precision) Price Range Correctness Left Ext. Correctness Right Ext. Correctness Overpriced (?) [0,480) 64% - 88% [480, 1020) 53% 67% 76% NO [1020, 2335) 49% 69% 72% [2335, 5100) 58% 81% YES [5110, 10650) 35% 77% 39% [10650, 19500) 17% 44% 19% [19500, -) 55% Average Precision - 52%; Left Ext Average Precision – 71% ; Right Ext Average Precision – 69%

Confusion Matrix f a b d e c g  classified as Number 1857 676 286 99 11 3 1 f = [0,480) 2933 576 2467 1026 400 61 6 7 a = [480, 1020) 4543 229 982 2668 1175 184 21 9 b = [1020, 2335) 5268 77 336 986 3063 493 43 d = [2335, 5100) 5019 19 76 227 709 812 70 26 e = [5100, 10650) 1939 12 49 151 118 106 c = [10650, 19500) 453 5 18 36 13 239 g = [19500, -) 388 Random Forest (Precision) Price Range Correctness Left Ext. Correctness Right Ext. Correctness Overpriced (?) [0,480) 63% - 86% [480, 1020) 54% 67% 77% NO [1020, 2335) 51% 69% 73% [2335, 5100) 61% 81% 71% YES [5110, 10650) 42% 78% 45% [10650, 19500) 23% 49% 26% [19500, -) 62% 65% Average Precision - 55%, Left Ext Average Precision – 71% ; Right Ext Average Precision – 71%

Confusion Matrix f a b d e c g  classified as Number 1857 676 286 99 11 3 1 f = [0,480) 2933 576 2467 1026 400 61 6 7 a = [480, 1020) 4543 229 982 2668 1175 184 21 9 b = [1020, 2335) 5268 77 336 986 3063 493 43 d = [2335, 5100) 5019 19 76 227 709 812 70 26 e = [5100, 10650) 1939 12 49 151 118 106 c = [10650, 19500) 453 5 18 36 13 239 g = [19500, -) 388 Random Forest (Precision) Price Range Correctness Left Ext. Correctness Right Ext. Correctness Overpriced (?) [0,480) 63% - 86% [480, 1020) 54% 67% 77% NO [1020, 2335) 51% 69% 73% [2335, 5100) 61% 81% 71% YES [5110, 10650) 42% 78% 45% [10650, 19500) 23% 49% 26% [19500, -) 62% 65% Average Precision - 55%, Left Ext Average Precision – 71% ; Right Ext Average Precision – 71%

WEB (Source of Data & Knowledge) Extracting artworks (paintings), relevant features, comments about artworks, artists biographies Step 1 Step 1 Initial Dataset of Artists Initial Dataset of Artworks Converting data to a format ready for mining. New features construction using text mining, sentiment mining, guided folksonomy. Step 2 Step 2 Dataset of Artists Dataset of Artworks Clustering based on semantic distance Step 3 Step 4 For each bucket, identifying artworks done by artists listed in that bucket Disjoint Buckets of Semantically Similar Artists Collection of Datasets of Artworks Data Mining and Crowd Construction Step 5 Recommender Systems for Artwork Evaluation and Pricing System Demo (One RS) System Interface

Q & A