Artificial Intelligence and Authorship: When Computers Learn to Read Kristin Betancourt COSC 480.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE 11 (Lab): Probability reminder.
COUNTING AND PROBABILITY
Smith/Davis (c) 2005 Prentice Hall Chapter Thirteen Inferential Tests of Significance II: Analyzing and Interpreting Experiments with More than Two Groups.
Assuming normally distributed data! Naïve Bayes Classifier.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Chapter 4 Probability.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.
Probability (cont.). Assigning Probabilities A probability is a value between 0 and 1 and is written either as a fraction or as a proportion. For the.
1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Solving Equations. Inverse Operations  When solving equations algebraically, use the inverse (opposite) operation that is displayed to determine what.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Wheeler Lower School Mathematics Program Grades 4-5 Goals: 1.For all students to become mathematically proficient 2.To prepare students for success in.
1  The goal is to estimate the error probability of the designed classification system  Error Counting Technique  Let classes  Let data points in class.
Bayesian Networks. Male brain wiring Female brain wiring.
Multiplication is the process of adding equal sets together = 6 We added 2 three times.
Probability Notes Math 309. Sample spaces, events, axioms Math 309 Chapter 1.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Chapter 4 Probability ©. Sample Space sample space.S The possible outcomes of a random experiment are called the basic outcomes, and the set of all basic.
Naive Bayes Classifier Christopher Gonzalez. Outline Bayes’ Theorem What is a Naive Bayes Classifier (NBC)? Why/when to use NBC? How does NBC work? Applications.
Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호
Technological Educational Institute Of Crete Department Of Applied Informatics and Multimedia Intelligent Systems Laboratory Statistical Decision Making.
Bayesian Classification Using P-tree  Classification –Classification is a process of predicting an – unknown attribute-value in a relation –Given a relation,
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
How To Do NPV’s ©2007 Dr. B. C. Paul Note – The principles covered in these slides were developed by people other than the author, but are generally recognized.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8.
Introduction Synthetic division, along with your knowledge of end behavior and turning points, can be used to identify the x-intercepts of a polynomial.
Machine Learning in Practice Lecture 5 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Learn to find terms in an arithmetic sequence.
Welcome to MM207 Unit 3 Seminar Dr. Bob Probability and Excel 1.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Advanced Algorithms Analysis and Design By Dr. Nazir Ahmad Zafar Dr Nazir A. Zafar Advanced Algorithms Analysis and Design.
AP Statistics From Randomness to Probability Chapter 14.
Pairwise comparisons: Confidence intervals Multiple comparisons Marina Bogomolov and Gili Baumer.
Applied statistics Usman Roshan.
ECE457 Applied Artificial Intelligence Fall 2007 Lecture #8
Chapter 3: Probability Topics
A Survey of Probability Concepts
Chapter 4 Created by Bethany Stubbe and Stephan Kogitz.
Conditional probability
Statistics for 8th Edition Chapter 3 Probability
2. Introduction to Probability
Bayesian Classification Using P-tree
Applicable Mathematics “Probability”
Basic Concepts An experiment is the process by which an observation (or measurement) is obtained. An event is an outcome of an experiment,
Data Structures Review Session
Mathematical Foundations of BME Reza Shadmehr
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Honors Statistics From Randomness to Probability
Random Variables Binomial Distributions
Title of Notes: Common Denominators
Bayesian Learning Chapter
The Naïve Bayes (NB) Classifier
St. Edward’s University
Probability Notes Math 309.
Parametric Methods Berlin Chen, 2005 References:
General Probability Rules
Objective: Learn to use a table to find equivalent ratios and rates.
ECE457 Applied Artificial Intelligence Spring 2008 Lecture #8
Probability Notes Math 309.
Probability Notes Math 309 August 20.
Presentation transcript:

Artificial Intelligence and Authorship: When Computers Learn to Read Kristin Betancourt COSC 480

What this presentation will cover: Bayes Theorem The Naive Bayes algorithm  An authorship program using the N-B algorithm Smoothing techniques  Add-1

Bayes Theorem In its simplest form, the Bayes Theorem can be stated as: P(A|B) = P(B|A) * P(A) / P(B) (The probability of A when given B is equal to the probability of B when given A, multiplied by the probability of A, and divided by the probability of B.) A B

Bayes Theorem Example You see someone in the classroom. This person has long hair (L). What is the likelihood that the person is female (F)? P(F|L) = P(L|F) * P(F) / P(L) Known facts:  Probability of seeing a female: 20%  Probability of a female having long hair: 60%  Probability of any person having long hair: 30% Conclusion?

Bayes Theorem Example P(F|L) = 0.6 * 0.2 / 0.3 = 0.4 The probability of the person you saw being female is 40%.

Naive Bayes Algorithm The Naive Bayes Algorithm is a classifier algorithm that borrows heavily from the Bayes Theorem. Instead of comparing the association between two distinct features, we compare the association between a set of features and a classifier: P(C|F 1, F 2, F 3 ) = P(F 1, F 2, F 3 |C) * P(C) / P(F 1, F 2, F 3 ) A classifier in this case can be almost anything, so long as there are distinct features to set it apart from other classifications.

Naive Bayes Algorithm In practice, we use this algorithm to distinguish between classifiers, and the classifiers are dependent on the features. When making comparisons between the same set of features, the denominator becomes a constant. So, this: P(C|F 1, F 2, F 3 ) = P(F 1, F 2, F 3 |C) * P(C) / P(F 1, F 2, F 3 ) Effectively becomes this: P(C|F 1, F 2, F 3 ) = P(F 1, F 2, F 3 |C) * P(C)

Naive Bayes Algorithm Because of the nature of this algorithm and probabilities in general, the more features that are added, the more cumbersome the equation becomes. Fortunately! We are working with a naive classifier system, which means that we assume strong independence between variables. What does this mean?

Naive Bayes Algorithm Instead of this: P(C)*P(F 1 |C)*P(F 2 |C, F 1 )*P(F 3 |C, F 1, F 2 ) and so on for larger sets of features... We get to use this: P(C)*P(F 1 |C)*P(F 2 |C)*P(F 3 |C) and so on... (I can not emphasize how much of a relief this is.)

Artificial Intelligence: Authorship So, we have our algorithm, what now? We have two people: Bob and Alice. Each has sent us a collection of letters that they wrote themselves. Among the letters, we have an anonymous note that one of them wrote. Who wrote it?

AI Authorship: The Breakdown For this problem, Bob and Alice are our classifiers. The words that they used for their letters are our set of features. Let me remind you of our equation: P(C|F 1, F 2, F 3 ) = P(C)*P(F 1 |C)*P(F 2 |C)*P(F 3 |C) In this example, we will compare the probability that results from Bob to the probability that results from Alice.

AI Authorship: The Breakdown First, we gather the learning data:  The data that we are going to “teach” the program with.  These are the letters that we know belong to Bob and Alice respectively. We make a table for each person containing every word that they've used and how many times they used it.  This is typically the size of a small dictionary.

AI Authorship: The Breakdown Once all of the data is fed into the tables, we calculate the probability of each word being used. This is the total number of uses of that word divided by the total words used overall. (Otherwise known as the P(F|C) for each word.) The P(C) of our equation is the probability that the author would have written a letter in the first place. This is just the number of authored letters divided by the total letters.

AI Authorship: The Breakdown To figure out who wrote the letter:  Start with a P(C) for each person.  For each word contained in the letter, including repeated words, multiple the current value with P(F|C) for that word.  When the entire letter has been processed, compare the resulting values.  The higher value is the most likely author.

Smoothing There is a glaring error with this algorithm: What happens when one person uses a word and the other person doesn't? We get a P(F|C) of zero. Whoops.  We can't multiply by zero, so we resort to “smoothing”.

Add-1 One solution is to add a count of one to everything. This skews the results a little bit, but the overall ratio stays the same. Simplest method, but also the least accurate.

Conclusion The Naive Bayes algorithm is a complicated but efficient and accurate means to generate a very human-like process: Making estimated guesses. This presentation has been a general overview of the fundamental application of the algorithm in both theoretical and practical use. I hope you've found this as interesting as I did. Thank you.

References Dr. Craig Martel  Naval Postgraduate School: Monterey, CA