Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla

Slides:

Advertisements

Similar presentations

Adaptive Support using Cognitive Models of Trust Robbert-Jan Beun (UU), Jurriaan van Diggelen (TNO), Mark Hoogendoorn (VU), Syed Waqar Jaffry (VU), Peter-Paul.

Advertisements

Analysis of Algorithms II

22C:19 Discrete Structures Advanced Counting Spring 2014 Sukumar Ghosh.

Chi-Square and Analysis of Variance (ANOVA)

NON - zero sum games.

Node Optimization. Simplification Represent each node in two level form Use espresso to minimize each node Several simplification procedures which vary.

System Integration Verification and Validation

Standards Alignment A study of alignment between state standards and the ACM K-12 Curriculum.

Clustering Categorical Data The Case of Quran Verses

K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.

Monte Carlo Methods and Statistical Physics

CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.

© 2005 Prentice Hall6-1 Stumpf and Teague Object-Oriented Systems Analysis and Design with UML.

Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.

TLI371 – Distributed Artificial Intelligence in Mobile Environment Course Introduction Vagan Terziyan Department of Mathematical Information Technology.

A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search.

A Review of Ontology Mapping, Merging, and Integration Presenter: Yihong Ding.

Temporal Knowledge Acquisition From Multiple Experts by Helen Kaikova, Vagan Terziyan.

Integrated Product Development in Internet Surroundings Igor Fuerstner Visoka tehnička škola Subotica Vojvodina, Serbia.

An Interval Approach to Discover Knowledge from Multiple Fuzzy Estimations Vagan Terziyan * & **, Seppo Puuronen **, Helen Kaikova * *Department of Artificial.

CHAPTER 3 The Church-Turing Thesis

A Technique for Advanced Dynamic Integration of Multiple Classifiers Alexey Tsymbal*, Seppo Puuronen**, Vagan Terziyan* *Department of Artificial Intelligence.

Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.

Mining Several Databases with an Ensemble of Classifiers Seppo Puuronen Vagan Terziyan Alexander Logvinovsky 10th International Conference and Workshop.

Case-based Reasoning System (CBR)

Dynamic Integration of Virtual Predictors Vagan Terziyan Information Technology Research Institute, University of Jyvaskyla, FINLAND

A Similarity Evaluation Technique for Cooperative Problem Solving with a Group of Agents Seppo Puuronen, Vagan Terziyan Third International Workshop CIA-99.

Intelligent Web Applications (Part 1) Course Introduction Vagan Terziyan AI Department, Kharkov National University of Radioelectronics / MIT Department,

11 Populations and Samples.

Direct Time Study Chapter 13 Sections: Direct Time Study Procedure

Building Knowledge-Driven DSS and Mining Data

Conceptual modelling. Overview - what is the aim of the article? ”We build conceptual models in our heads to solve problems in our everyday life”… ”By.

1 A Semantic Metanetwork Vagan Terziyan University of Jyvaskyla, Finland

“There are three types of lies: Lies, Damn Lies and Statistics” - Mark Twain.

Software Process and Product Metrics

What is Mathematical Literacy?. MATHEMATICAL LITERACY “The ability to read, listen, think creatively, and communicate about problem situations, mathematical.

The Multiplicative Weights Update Method Based on Arora, Hazan & Kale (2005) Mashor Housh Oded Cats Advanced simulation methods Prof. Rubinstein.

A Comparison of Approaches to Investment Analysis John Favaro Proc. Fourth International Conference on Software Reuse, 1996, IEEE Computer Press, p

Presented by Tienwei Tsai July, 2005

Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.

Random Sampling, Point Estimation and Maximum Likelihood.

Marketing Research Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides.

Design engineering Vilnius The goal of design engineering is to produce a model that exhibits: firmness – a program should not have bugs that inhibit.

INTEGRALS 5. INTEGRALS In Chapter 2, we used the tangent and velocity problems to introduce the derivative—the central idea in differential calculus.

Integrals  In Chapter 2, we used the tangent and velocity problems to introduce the derivative—the central idea in differential calculus.  In much the.

Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.

Database Concepts. Data :Collection of facts in raw form. Information : Organized and Processed data is information. Database : A Collection of data files.

ICNEE 2002 Applying RL to Take Pedagogical Decisions in Intelligent Tutoring Systems Ana Iglesias Maqueda Computer Science Department Carlos III of Madrid.

Understanding Sampling

Designing multiple biometric systems: Measure of ensemble effectiveness Allen Tang NTUIM.

Agents that Reduce Work and Information Overload and Beyond Intelligent Interfaces Presented by Maulik Oza Department of Information and Computer Science.

Chapter 4 Decision Support System & Artificial Intelligence.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content.

Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.

+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.

Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.

Chapter Integration of substitution and integration by parts of the definite integral.

In Chapters 6 and 8, we will see how to use the integral to solve problems concerning:  Volumes  Lengths of curves  Population predictions  Cardiac.

1 TEAM BUILDING & MANAGEMENT. 2 CONTENTS Generalities The individual The individual in the group To manage the group The group manager The group facing.

Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.

INTEGRALS 5. INTEGRALS In Chapter 3, we used the tangent and velocity problems to introduce the derivative—the central idea in differential calculus.

Queensland University of Technology

k-Nearest neighbors and decision tree

Erasmus University Rotterdam

Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.

Generalization ..

Association Rule Mining

iSRD Spam Review Detection with Imbalanced Data Distributions

Presentation transcript:

Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla

Evaluating Distance between Various Domain Objects and Concepts - one of the basic abilities of an intelligent agent Are these two the same? … No ! The difference is equal to 0.234

Contents 4 Goal 4 Basic Concepts 4 External Similarity Evaluation 4 An Example 4 Internal Similarity Evaluation 4 Conclusions

Reference A Similarity Evaluation Technique for Data Mining with an Ensemble of Classifiers Puuronen S., Terziyan V., A Similarity Evaluation Technique for Data Mining with an Ensemble of Classifiers, In: A.M. Tjoa, R.R. Wagner and A. Al- Zobaidie (Eds.), Proc. of the 11th Intern. Workshop on Database and Expert Systems Applications, IEEE CS Press, Los Alamitos, California, 2000, pp

Goal 4 The goal of this research is to develop simple similarity evaluation technique to be used for social filtering 4 Result of social filtering here here is prediction of a customers evaluation of certain product based on known opinions about this product from other customers

Basic Concepts: Virtual Training Environment (VTE) VTE is a quadruple: D is the set of goods D 1, D 2,..., D n in the VTE; C is the set of evaluation marks C 1, C 2,..., C m, that are used to rank the products; S is the set of customers S 1, S 2,..., S r, who select evaluation marks to rank the products; P is the set of semantic predicates that define relationships between D, C, S

Basic Concepts: Semantic Predicate P

Problem 1: Deriving External Similarity Values

External Similarity Values External Similarity Values (ESV): binary relations DC, SC, and SD between the elements of (sub)sets of D and C; S and C; and S and D. ESV are based on total support among all the customers for voting for the appropriate connection (or refusal to vote)

Problem 2: Deriving Internal Similarity Values

Internal Similarity Values Internal Similarity Values (ISV): binary relations between two subsets of D, two subsets of C and two subsets of S. ISV are based on total support among all the customers for voting for the appropriate connection (or refusal to vote)

Why we Need Similarity Values (or Distance Measure) ? 4 Distance between products is used to advertise the customers a new product based on evaluation of already known similar products 4 distance between evaluations is necessary to estimate evaluation error when necessary, e.g. in the case of adaptive filtering technologies used 4 distance between customers is useful to evaluate weights of all customers when necessary, e.g. to be able to integrate their opinions by weighted voting.

Deriving External Relation DC: How well evaluation fits the product Customers Products Evaluation marks

Deriving External Relation SC: Measures customers competence in the use of evaluation marks 4 The value of the relation (S k,C j ) in a way represents the total support that the customer S k obtains selecting (refusing to select) the mark C j to evaluate all the products.

Example of SC Relation Customers Products Evaluation marks

Deriving External Relation SD: Measures customers competence in the products 4 The value of the relation (S k,D i ) represents the total support that the agent S k receives selecting (or refusing to select) all the solutions to solve the problem D i.

Example of SD Relation Products Evaluation marks Customers

Normalizing External Relations to the Interval [0,1] n is the number of products m is the number of evaluation marks r is the number of customers

Competence of a customer DiDi Conceptual pattern of goods features Conceptual pattern of evaluation marks definitions Goods Evaluation marks CjCj Customer Competence in the goods Competence in the evaluation marks

Customers Evaluation: competence quality in Products

Customers Evaluation: competence quality in evaluation marks use

Quality Balance Theorem The evaluation of a customers competence (ranking, weighting, quality evaluation) does not depend on the competence area virtual world of products or conceptual world of evaluation marks because both competence values are always equal.

Proof...

An Example 4 Let us suppose that four customers have to evaluate three products from virtual shop using five different evaluation marks available. 4 The customers should define their selection of appropriate mark for every product. 4 The final goal is to obtain a cooperative evaluation result of all the customers concerning the quality of products.

C set (evaluation marks) in the Example Evaluation marks Notation Nicely designedC 1 ExpensiveC 2 Easy to useC 3 ReliableC 4 SafeC 5

S (customers) Set in the Example Customers IDs Notation FoxS 1 WolfS 2 CatS 3 HareS 4

D (products) Set in the Example D 2 - Nokia Communicator 9110 D 1 - Ultra Cast Spinning Reel D 3 - iGrafx Process Management Software

Evaluations Made for the Good Reel D 1 P(D,C,S) C 1 C 2 C 3 C 4 C 5 S S ** * -1 *** S S Customer Wolf prefers to select mark Reliable * to evaluate Reel and it refuses to select Expensive ** or Safe ***. Wolf does not use or refuse to use the Nicely designed + or Easy to use ++ marks for evaluation.

Evaluations Made for the Good Communicator D 2 PC 1 C 2 C 3 C 4 C 5 S S S S

Evaluations Made for the Good Software D 3 PC 1 C 2 C 3 C 4 C 5 S S S S

Example: Calculating Value DC 3,4 D 3 PC 1 C 2 C 3 C 4 C 5 S S S S

Resulting DC relation

Normalized and Thresholded DC relation

Result of Cooperative Goods Evaluation Based on DC Relation D 2 is reliable, safe, not expensive, but not easy to use D 1 is nicely designed, reliable, not expensive, but not easy to use D 3 is easy to use, safe, but not reliable

An Example: Calculating Value SD 1,1

An Example: Calculating Value SC 4,4

Resulting SD and SC relations

… or similar to Software. Foxs evaluations should be rejected if they concern goods similar to Communicator Evaluations obtained from the customer Fox should be accepted if he evaluates goods similar to Reels... Normalized and Thresholded SD relation Fox Wolf Cat Hare

Only evaluation from the customer Cat can be accepted if it concerns goods similar to Communicator Normalized and Thresholded SD relation Fox Wolf Cat Hare All four customers are expected to give an acceptable evaluations concerning Software related goods

… or reliability of a good. Evaluation obtained from the customer Fox should be accepted if it concern usability (easy to use) of a good... Foxs evaluations should be rejected if they concern design of goods Normalized and Thresholded SC relation Fox Wolf Cat Hare Nicely designed Expensive Easy to use Reliable Safe

Problem 2: Deriving Internal Similarity Values

Internal Similarity Values Internal Similarity Values (ISV): binary relations between two subsets of D, two subsets of C and two subsets of S. ISV are based on total support among all the customers for voting for the appropriate connection (or refusal to vote)

Deriving Internal Similarity Values Via one intermediate setVia two intermediate sets

Internal Similarity for Customers: Goods-based Similarity Goods Customers

Internal Similarity for Customers: Evaluation marks-Based Similarity Evaluation marks Customers

Internal Similarity for Customers: Evaluation marks-Goods-Based Similarity Customers Evaluation marks Goods

Internal Similarity for Evaluation Marks Customers-based similarity Goods-based similarity Goods-customers-based similarity

Internal Similarity for Goods Customers-based similarity Evaluation marks-based similarity Evaluation marks-customers-based similarity

Normalized and Thresholded DD C relation similar neutral different

Conclusion 4 Discussion was given to methods of deriving the total support of each binary similarity relation. This can be used, for example, to derive the most supported goods evaluation and to rank the customers according to their competence 4 We also discussed relations between elements taken from the same set: goods, evaluation marks, or customers. This can be used, for example, to divide customers into groups of similar competence relatively to the goods evaluation environment