Learning sets of rules 学习规则集合 2004. 12 Edited by Wang Yinglin Shanghai Jiaotong University.

Slides:



Advertisements
Similar presentations
纺纱学. 2 绪 论 基本要求:了解纺纱系统的类别 重点掌握:棉纺系统的工艺流程 3 一、纺纱原理与设备 纺纱:用物理或机械的方法将纺织纤维纺成纱 线的过程。 纺纱原理:初加工、原料的选配、开松除杂、 混和、梳理、精梳、并合、牵伸、加捻、卷绕等。 纺纱方法:传统纺纱方法、新型纺纱方法。 纺纱设备:开清棉联合机、梳棉机、精梳机、
Advertisements

第十二章 常微分方程 返回. 一、主要内容 基本概念 一阶方程 类 型 1. 直接积分法 2. 可分离变量 3. 齐次方程 4. 可化为齐次 方程 5. 全微分方程 6. 线性方程 类 型 1. 直接积分法 2. 可分离变量 3. 齐次方程 4. 可化为齐次 方程 5. 全微分方程 6. 线性方程.
Chapter 10 Learning Sets Of Rules
Machine Learning Chapter 10. Learning Sets of Rules Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Tuesday, November 27, 2001 William.
Learning set of rules.
计算机 在分析化学的应用 ( 简介 ) 陈辉宏. 一. 概述 信息时代的来临, 各门学科的研究方法都 有了新的发展. 计算机的介入, 为分析化学的进展提供了 一种更方便的研究方法.
4 第四章 矩阵 学时:  18 学时。 教学手段:  讲授和讨论相结合,学生课堂练习,演练习题与辅导答疑相结合。 基本内容和教学目的:  基本内容: 矩阵的运算,可逆矩阵,初等矩阵及其性质和意义, 分块矩阵。  教学目的:  1 .使学生理解和掌握矩阵等价的相关理论  2 .能熟练地进行矩阵的各种运算.
第二十三讲 7.3 利用频率采样法设计 FIR 滤波器. 回顾窗函数设计法: 得到的启发:能否在频域逼近? 用什么方法逼近? 通过加窗实 现时域逼近.
两极异步电动机示意图 (图中气隙磁场形象地 用 N 、 S 来表示) 定子接三相电源上,绕组中流过三相对称电流,气 隙中建立基波旋转磁动势,产生基波旋转磁场,转速 为同步速 : 三相异步电动机的简单工作原理 电动机运行时的基本电磁过程: 这个同步速的气隙磁场切割 转子绕组,产生感应电动势并在 转子绕组中产生相应的电流;
主要内容  LR(0) 分析. 0 S→  E # E→  E+T E→  T T→  id T→  ( E ) 1 S→E  # E→E  +T 5 T→id  3 E→E+  T T→  id T→  (E) 4 E→E+T  9 E→T  6 T→(  E) E→
第 4 章 抽象解释 内容概述 以一种独立于编程语言的方式,介绍抽象解释的 一些本质概念 – 将 “ 程序分析对语言语义是正确的 ” 这个概念公式 化 – 用 “ 加宽和收缩技术 ” 来获得最小不动点的较好的 近似,并使所需计算步数得到限制 – 用 “ 伽罗瓦连接和伽罗瓦插入 ” 来把代价较大的属 性空间用代价较小的属性空间来代替.
吉林大学远程教育课件 主讲人 : 杨凤杰学 时: 64 ( 第六十二讲 ) 离散数学. 最后,我们构造能识别 A 的 Kleene 闭包 A* 的自动机 M A* =(S A* , I , f A* , s A* , F A* ) , 令 S A* 包括所有的 S A 的状态以及一个 附加的状态 s.
1 为了更好的揭示随机现象的规律性并 利用数学工具描述其规律, 有必要引入随 机变量来描述随机试验的不同结果 例 电话总机某段时间内接到的电话次数, 可用一个变量 X 来描述 例 检测一件产品可能出现的两个结果, 也可以用一个变量来描述 第五章 随机变量及其分布函数.
数 学 系 University of Science and Technology of China DEPARTMENT OF MATHEMATICS 第 3 章 曲线拟合的最小二乘法 给出一组离散点,确定一个函数逼近原函数,插值是这样 的一种手段。在实际中,数据不可避免的会有误差,插值函 数会将这些误差也包括在内。
11-8. 电解质溶液的 活度和活度系数 电解质是有能力形成可以 自由移动的离子的物质. 理想溶液体系 分子间相互作用 实际溶液体系 ( 非电解质 ) 部分电离学说 (1878 年 ) 弱电解质溶液体系 离子间相互作用 (1923 年 ) 强电解质溶液体系.
例9:例9: 第 n-1 行( -1 )倍加到第 n 行上,第( n-2 ) 行( -1 )倍加到第 n-1 行上,以此类推, 直到第 1 行( -1 )倍加到第 2 行上。
主讲教师:陈殿友 总课时: 124 第八讲 函数的极限. 第一章 机动 目录 上页 下页 返回 结束 § 3 函数的极限 在上一节我们学习数列的极限,数列 {x n } 可看作自变量 为 n 的函数: x n =f(n),n ∈ N +, 所以,数列 {x n } 的极限为 a, 就是 当自变量 n.
吉林大学远程教育课件 主讲人 : 杨凤杰学 时: 64 ( 第三十八讲 ) 离散数学. 第八章 格与布尔代数 §8.1 引 言 在第一章中我们介绍了关于集 合的理论。如果将 ρ ( S )看做 是集合 S 的所有子集组成的集合, 于是, ρ ( S )中两个集合的并 集 A ∪ B ,两个集合的交集.
吉林大学远程教育课件 主讲人 : 杨凤杰学 时: 64 ( 第四十八讲 ) 离散数学. 例 设 S 是一个非空集合, ρ ( s )是 S 的幂集合。 不难证明 :(ρ(S),∩, ∪,ˉ, ,S) 是一个布尔代数。 其中: A∩B 表示 A , B 的交集; A ∪ B 表示 A ,
第十一章 曲线回归 第一节 曲线的类型与特点 第二节 曲线方程的配置 第三节 多项式回归.
线性代数习题课 吉林大学 术洪亮 第一讲 行 列 式 前面我们已经学习了关 于行列式的概念和一些基本 理论,其主要内容可概括为:
吉林大学远程教育课件 主讲人 : 杨凤杰学 时: 64 ( 第二十五讲 ) 离散数学. 定理 群定义中的条件 ( 1 )和( 2 )可以减弱如下: ( 1 ) ’ G 中有一个元素左壹适合 1 · a=a; ( 2 ) ’ 对于任意 a ,有一个元素左逆 a -1 适 合 a -1 ·
第 4 章 过程与变量的作用范围. 4.1 Visual Basic 的代码模块 Visual Basic 的应用程序是由过程组成的, 过程代码存放在模块中。 Visual Basic 提供了 三类模块,它们是窗体模块、标准模块和类 模块。 窗体模块 窗体模块是大多数 Visual Basic.
数 学 系 University of Science and Technology of China DEPARTMENT OF MATHEMATICS 第 3 章 曲线拟合的最小二乘法 给出一组离散点,确定一个函数逼近原函数,插值是这样的一种手段。 在实际中,数据不可避免的会有误差,插值函数会将这些误差也包括在内。
自顶向下分析 —— 递归下降法 递归下降法 (Recursive-Descent Parsing) 对每个非终极符按其产生式结构产生相应语 法分析子程序. 终极符产生匹配命令 非终极符则产生调用命令 文法递归相应子程序也递归,所以称这种方 法为递归子程序方法或递归下降法。
吉林大学远程教育课件 主讲人 : 杨凤杰学 时: 64 ( 第三十九讲 ) 离散数学. 例 设 S 是一个集合, ρ ( S )是 S 的幂集合,集合 的交( ∩ ),并(∪)是 ρ ( S )上的两个代数运算, 于是,( ρ ( S ), ∩ ,∪) 是一个格。而由例 知.
实验三: 用双线性变换法设计 IIR 数字滤波器 一、实验目的 1 熟悉用双线性变换法设计 IIR 数字滤波器的原理与方法。 2 掌握数字滤波器的计算机仿真方法。 3 通过观察对实际心电图信号的滤波作用, 获得数字滤波的感性知 识。
1 第 7 章 专家控制系统 概述 专家系统的起源与发展 专家系统的一般结构 专家系统的知识表示和获取 专家系统的特点及分类.
非均相物系的分离 沉降速度 球形颗粒的 :一、自由沉降 二、沉降速度的计算 三、直径计算 1. 试差法 2. 摩擦数群法 四、非球形颗粒的自由沉降 1. 当量直径 de :与颗粒体积相等的圆球直径 V P — 颗粒的实际体积 2. 球形度  s : S—— 与颗粒实际体积相等的球形表面积.
量子化学 第四章 角动量与自旋 (Angular momentum and spin) 4.1 动量算符 4.2 角动量阶梯算符方法
数 学 系 University of Science and Technology of China DEPARTMENT OF MATHEMATICS 第 5 章 解线性方程组的直接法 实际中,存在大量的解线性方程组的问题。很多数值方 法到最后也会涉及到线性方程组的求解问题:如样条插值的 M 和.
主讲教师:陈殿友 总课时: 124 第十一讲 极限的运算法则. 第一章 二、 极限的四则运算法则 三、 复合函数的极限运算法则 一 、无穷小运算法则 机动 目录 上页 下页 返回 结束 §5 极限运算法则.
在发明中学习 线性代数 概念的引入 李尚志 中国科学技术大学. 随风潜入夜 : 知识的引入 之一、线性方程组的解法 加减消去法  方程的线性组合  原方程组的解是新方程的解 是否有 “ 增根 ” ?  互为线性组合 : 等价变形  初等变换  高斯消去法.
Photoshop CS4 标准培训教程 第三章第三章 在 Photoshop CS4 中所谓的不规则选区指的是随意性强,不被局限在几何形状内, 他们可以是鼠标任意创建的也可以是通过计算而得到的单个选区或多个选区。在 Photoshop 中可以用来创建不规则选区的工具被分组放置到套索工具组、魔棒工具组.
§2.2 一元线性回归模型的参数估计 一、一元线性回归模型的基本假设 二、参数的普通最小二乘估计( OLS ) 三、参数估计的最大或然法 (ML) 四、最小二乘估计量的性质 五、参数估计量的概率分布及随机干 扰项方差的估计.
第 3 章 控制流分析 内容概述 – 定义一个函数式编程语言,变量可以指称函数 – 以 dynamic dispatch problem 为例(作为参数的 函数被调用时,究竟执行的是哪个函数) – 规范该控制流分析问题,定义什么是可接受的控 制流分析 – 定义可接受分析在语义模型上的可靠性 – 讨论分析算法.
吉林大学远程教育课件 主讲人 : 杨凤杰学 时: 64 ( 第五十三讲 ) 离散数学. 定义 设 G= ( V , T , S , P ) 是一个语法结构,由 G 产生的语言 (或者说 G 的语言)是由初始状态 S 演绎出来的所有终止符的集合, 记为 L ( G ) ={w  T *
编译原理总结. 基本概念  编译器 、解释器  编译过程 、各过程的功能  编译器在程序执行过程中的作用  编译器的实现途径.
周期信号的傅里叶变换. 典型非周期信号 ( 如指数信号, 矩形信号等 ) 都是满足绝对可 积(或绝对可和)条件的能量信号,其傅里叶变换都存在, 但绝对可积(或绝对可和)条件仅是充分条件, 而不是必 要条件。引入了广义函数的概念,在允许傅里叶变换采用 冲激函数的前提下, 使许多并不满足绝对可积条件的功率.
§8-3 电 场 强 度 一、电场 近代物理证明:电场是一种物质。它具有能量、 动量、质量。 电荷 电场 电荷 电场对外的表现 : 1) 电场中的电荷要受到电场力的作用 ; 2) 电场力可移动电荷作功.
Department of Mathematics 第二章 解析函数 第一节 解析函数的概念 与 C-R 条件 第二节 初等解析函数 第三节 初等多值函数.
首 页 首 页 上一页 下一页 本讲内容本讲内容 视图,剖视图(Ⅰ) 复习: P107 ~ P115 作业: P48(6-2,6-4), P49( 去 6-6) P50, P51(6-13), P52 P50, P51(6-13), P52 P53 (6-18,6-20) P53 (6-18,6-20)
《 UML 分析与设计》 交互概述图 授课人:唐一韬. 知 识 图 谱知 识 图 谱知 识 图 谱知 识 图 谱.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Wednesday, 11 April 2007 William.
Introduction to Automatic Control The Laplace Transform Li Huifeng Tel:
November 10, Machine Learning: Lecture 9 Rule Learning / Inductive Logic Programming.
Chapter 8 Algorithms. Understand the concept of an algorithm. Define and use the three constructs for developing algorithms: sequence, decision, and repetition.
1 、如果 x + 5 > 4 ,那么两边都 可得 x >- 1 2 、在- 3y >- 4 的两边都乘以 7 可得 3 、在不等式 — x≤5 的两边都乘以- 1 可得 4 、将- 7x — 6 < 8 移项可得 。 5 、将 5 + a >- 2 a 移项可得 。 6 、将- 8x < 0.
项目七: PLC 功能指令应用 带进位循环左移指令 XXXXX. 项目七: PLC 功能指令应用 FX2 系列可编程控制器移位控制指令有移位、循环移位、字移位 及先进先出 FIFO 指令等 10 条指令。 带进位循环右移指令 RCR 带进位循环左移指令 RCL 字右移位指令 WSFR 先入先出读出指令.
1 物体转动惯量的测量 南昌大学理学院
§10.2 对偶空间 一、对偶空间与对偶基 二、对偶空间的有关结果 三、例题讲析.
表单自定义 “ 表单自定义 ” 功能是用于制作表单的 工具,用数飞 OA 提供的表单自定义 功能能够快速制作出内容丰富、格 式规范、美观的表单。
第三章 正弦交流电路.
CS 5751 Machine Learning Chapter 10 Learning Sets of Rules1 Learning Sets of Rules Sequential covering algorithms FOIL Induction as the inverse of deduction.
First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume.
力的合成 力的合成 一、力的合成 二、力的平行四边形 上一页下一页 目 录 退 出. 一、力的合成 O. O. 1. 合力与分力 我们常常用 一个力来代替几个力。如果这个 力单独作用在物体上的效果与原 来几个力共同作用在物体上的效 果完全一样,那么,这一个力就 叫做那几个力的合力,而那几个 力就是这个力的分力。
第四章 不定积分. 二、 第二类换元积分法 一、 第一类换元积分法 4.2 换元积分法 第二类换元法 第一类换元法 基本思路 设 可导, 则有.
逻辑设计基础 1 第 7 章 多级与(或)非门电路 逻辑设计基础 多级门电路.
“ 百链 ” 云图书馆. 什么是百链云图书馆?1 百链云图书馆的实际效果?2 百链云图书馆的实现原理?3 百链云图书馆的价值?44 图书馆要做什么?55 提 纲.
Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과.
§5.6 利用希尔伯特 (Hilbert) 变换 研究系统的约束特性 希尔伯特变换的引入 可实现系统的网络函数与希尔伯特变换.
欢 迎 使 用 《工程流体力学》 多媒体授课系统 燕 山 大 学 《工程流体力学》课程组. 第九章 缝隙流动 概述 9.1 两固定平板间的层流流动 9.2 具有相对运动的两平行平板 间的缝隙流动 9.3 环形缝隙中的层流流动.
1 第三章 数列 数列的概念 考点 搜索 ●数列的概念 ●数列通项公式的求解方法 ●用函数的观点理解数列 高考 猜想 以递推数列、新情境下的 数列为载体, 重点考查数列的通 项及性质, 是近年来高考的热点, 也是考题难点之所在.
Generation of Chinese Character Based on Human Vision and Prior Knowledge of Calligraphy 报告人: 史操 作者: 史操、肖建国、贾文华、许灿辉 单位: 北京大学计算机科学技术研究所 NLP & CC 2012: 基于人类视觉和书法先验知识的汉字自动生成.
韩文数据库使用说明 鲁锦松. 主要内容 一、为什么要用数据库 二、怎样利用中文数据库 三、怎样利用韩文数据库.
目录 上页 下页 返回 结束 二、无界函数反常积分的审敛法 * 第五节 反常积分 无穷限的反常积分 无界函数的反常积分 一、无穷限反常积分的审敛法 反常积分的审敛法  函数 第五章 第五章.
本章讨论有限自由度结构系统,在给定载荷和初始条件激励下的系统动力响应计算方法。 第 六 章
SCI 数据库检索练习参考 本练习完全依照 SCI 数据库实际检索过程而 实现。 本练习完全依照 SCI 数据库实际检索过程而 实现。 练习中,选择了可以举一反三的题目,读 者可以根据题目进行另外的检索练习,如: 可将 “ 与 ” 运算检索改为 “ 或 ” 、 “ 非 ” 运算检索 等等。 练习中,选择了可以举一反三的题目,读.
§7.2 估计量的评价标准 上一节我们看到,对于总体 X 的同一个 未知参数,由于采用的估计方法不同,可 能会产生多个不同的估计量.这就提出一 个问题,当总体的一个参数存在不同的估 计量时,究竟采用哪一个好呢?或者说怎 样评价一个估计量的统计性能呢?下面给 出几个常用的评价准则. 一.无偏性.
Presentation transcript:

Learning sets of rules 学习规则集合 Edited by Wang Yinglin Shanghai Jiaotong University

2 注意:图书馆资源 馆藏资源  电子数据库 Kluwer Elsevier IEEE …. 中国期刊网 万方数据 …

3 Introduction Set of “If-then” rules The hypothesis is easy to interpret.Goal Look at a new method to learn rules Rules Propositional rules (rules without variables) First-order predicate rules (with variables)

4 Introduction So far... Method 1: Learn decision tree  rules Method 2: Genetic algorithm, encode rule set as a bit string From now... New method! Learning first-order rule Using sequential covering First-order rule Difficult to represent using a decision tree or other propositional representation If Parent(x,y) then Ancestor(x,y) If Parent(x,z) and Ancestor(z,y) then Ancestor(x,y)

5 Introduction contents Introduction Sequential Covering Algorithms First Order Rules First-order inductive learning (FOIL) Induction as Inverted Deduction Summary

6 Sequential Covering Algorithms Algorithm 1. Learn one rule that covers certain number of positive examples 2. Remove those examples covered by the rule 3. Repeat until no positive examples are left

7 Sequential Covering Algorithms Require that each rule has high accuracy but any coverage High accuracy  the predicate : correct Accepting low coverage  the predicate NOT for every training example

8 Sequential Covering Algorithms Sequential-Covering (Target-Attribute, Attributes, Examples, Threshold) Learned-Rules  {} Rule  Learn-One-Rule (Target-Attribute, Attributes, Examples) WHILE Performance (Rule, Examples) > Threshold DO Learned_rules ← Learned_rules + Rule // add new rule to set Examples ← Examples - {examples correctly classified by Rule} Rule  Learn-One-Rule (Target-Attribute, Attributes, Examples) Sort-By-Performance (Learned-Rules, Target-Attribute, Examples) RETURN Learned-Rules

9 Sequential Covering Algorithms One of the most widespread approaches to learning disjunctive sets of rules. Learning disjunctive sets of rules  sequence of single problem ( a rule : conjunction of attr. value ) It performs a greedy search (no backtracking); as such it may not find an optimal rule set. It sequentially covers the set of positive examples until the performance of a rule is below a threshold.

10 Sequential Covering Algorithms : General to Specific Beam Search How do we learn each individual rule? Requirements for LEARN-ONE-RULE High accuracy, need not high coverage One approach is... To proceed as decision tree learning (ID3) BUT, by following a single branch with best performance at each search step

11 Sequential Covering Algorithms : General to Specific Beam Search IF {Humidity = Normal} THEN Play-Tennis = Yes IF {Wind = Strong} THEN Play-Tennis = No IF {Wind = Light} THEN Play-Tennis = Yes IF {Humidity = High} THEN Play-Tennis = No … IF {} THEN Play-Tennis = Yes … IF {Humidity = Normal, Outlook = Sunny} THEN Play-Tennis = Yes IF {Humidity = Normal, Wind = Strong} THEN Play-Tennis = Yes IF {Humidity = Normal, Wind = Light} THEN Play-Tennis = Yes IF {Humidity = Normal, Outlook = Rain} THEN Play-Tennis = Yes

12 Idea: organize the hypothesis space search in general to specific fashion. Start with most general rule precondition, then greedily add the attribute that most improves performance measured over the training examples. Sequential Covering Algorithms : General to Specific Beam Search

13 Sequential Covering Algorithms : General to Specific Beam Search Greedy search without backtracking  danger of suboptimal choice at any step The algorithm can be extended using beam-search Keep a list of the k best candidates at each step On each search step, descendants are generated for each of these k best candidates. And resulting set is again reduced to the k best candidates.

14 Sequential Covering Algorithms : General to Specific Beam Search Learn_One_Rule (target_attr,attributes,examples,k) Best-hypothesis = 0 Candidate-hypotheses = {Best-hypothesis} While Candidate-hypotheses is not empty do 1. Generate the next more specific candidate hypotheses   2. Update Best-hypothesis For all h in new-candidates if (Performance(h) > Performance(Best-hypothesis)) Best-hypothesis = h 3. Update Candidate-hypotheses Candidate-hypotheses = best k members of new-candidates Return rule : “IF Best-hypothesis THEN prediction” (predication: most frequent value of target_attr. among examples covered by Best- hypothesis) Performance(h, examples, target_attr.) - h_examples = the subsets of examples covered by h - Return Entropy(h_examples)

15 All_constraints  所有形式为 (a=v) 的约束集合,其中, a 为 Attributes 的成员, v 为出现在当前 Examples 集合中的 a 的值 New_candidate_hypotheses  对 Candidate_hypotheses 中每个 h ,循环 对 All_constraints 中每个 c ,循环 通过加入约束 c 创建一个 h 的特化式 New_candidate_hypotheses 中移去任意重复的、不一致的或非 极大特殊化的假设 Generate the next more specific candidate hypotheses: Sequential Covering Algorithms : General to Specific Beam Search

16 DayOutlookTempHumidWindPlayTennis D1SunnyHotHighWeakNo D2SunnyHotHighStrongNo D3OvercastHotHighWeakYes D4RainMildHighWeakYes D5RainCoolLowWeakYes D6RainCoolLowStrongNo D7OvercastCoolLowStrongYes D8SunnyMildHighWeakNo D9SunnyCoolLowWeakYes D10RainMildLowWeakYes D11SunnyMildLowStrongYes D12OvercastMildHighStrongYes D13OvercastHotLowWeakYes D14RainMildHighStrongNo 1.best-hypothesis = IF T THEN PlayTennis(x) = Yes Learn-One-Rule Example

17 DayOutlookTempHumidWindPlayTennis D1SunnyHotHighWeakNo D2SunnyHotHighStrongNo D3OvercastHotHighWeakYes D4RainMildHighWeakYes D5RainCoolLowWeakYes D6RainCoolLowStrongNo D7OvercastCoolLowStrongYes D8SunnyMildHighWeakNo D9SunnyCoolLowWeakYes D10RainMildLowWeakYes D11SunnyMildLowStrongYes D12OvercastMildHighStrongYes D13OvercastHotLowWeakYes D14RainMildHighStrongNo 1.best-hypothesis = IF T THEN PlayTennis(x) = Yes 2.candidate-hypotheses = {best-hypothesis} Learn-One-Rule Example

18 DayOutlookTempHumidWindPlayTennis D1SunnyHotHighWeakNo D2SunnyHotHighStrongNo D3OvercastHotHighWeakYes D4RainMildHighWeakYes D5RainCoolLowWeakYes D6RainCoolLowStrongNo D7OvercastCoolLowStrongYes D8SunnyMildHighWeakNo D9SunnyCoolLowWeakYes D10RainMildLowWeakYes D11SunnyMildLowStrongYes D12OvercastMildHighStrongYes D13OvercastHotLowWeakYes D14RainMildHighStrongNo 1.best-hypothesis = IF T THEN PlayTennis(x) = Yes 2.candidate-hypotheses = {best-hypothesis} 3.all-constraints = {Outlook(x)=Sunny, Outlook(x)=Overcast, Temp(x)=Hot,......} Learn-One-Rule Example

19 DayOutlookTempHumidWindPlayTennis D1SunnyHotHighWeakNo D2SunnyHotHighStrongNo D3OvercastHotHighWeakYes D4RainMildHighWeakYes D5RainCoolLowWeakYes D6RainCoolLowStrongNo D7OvercastCoolLowStrongYes D8SunnyMildHighWeakNo D9SunnyCoolLowWeakYes D10RainMildLowWeakYes D11SunnyMildLowStrongYes D12OvercastMildHighStrongYes D13OvercastHotLowWeakYes D14RainMildHighStrongNo 1.best-hypothesis = IF T THEN PlayTennis(x) = Yes 2.candidate-hypotheses = {best-hypothesis} 3.all-constraints = {Outlook(x)=Sunny, Outlook(x)=Overcast, Temp(x)=Hot,......} 4.new-candidate-hypotheses = {IF Outlook=Sunny THEN PlayTennis=YES, IF Outlook=Overcast THEN PlayTennis=YES,...} Learn-One-Rule Example

20 DayOutlookTempHumidWindPlayTennis D1SunnyHotHighWeakNo D2SunnyHotHighStrongNo D3OvercastHotHighWeakYes D4RainMildHighWeakYes D5RainCoolLowWeakYes D6RainCoolLowStrongNo D7OvercastCoolLowStrongYes D8SunnyMildHighWeakNo D9SunnyCoolLowWeakYes D10RainMildLowWeakYes D11SunnyMildLowStrongYes D12OvercastMildHighStrongYes D13OvercastHotLowWeakYes D14RainMildHighStrongNo 1.best-hypothesis = IF T THEN PlayTennis(x) = Yes 2.candidate-hypotheses = {best-hypothesis} 3.all-constraints = {Outlook(x)=Sunny, Outlook(x)=Overcast, Temp(x)=Hot,......} 4.new-candidate-hypotheses = {IF Outlook=Sunny THEN PlayTennis=YES, IF Outlook=Overcast THEN PlayTennis=YES,...} 5.best-hypothesis = IF Outlook=Sunny THEN PlayTennis=YES Learn-One-Rule Example

21 Sequential Covering Algorithms : Variations Learn only rules that cover positive examples The fraction of positive example is small in this case, we can modify the algorithm to learn only from those rare example Instead of entropy, use a measure that evaluates the fraction of positive examples covered by the hypothesis AQ-algorithm Different covering algorithm Searches rule sets for particular target value Different single-rule alg. Guided by uncovered positive examples Only attributes satisfied in examples are used

22 Summary : Points for Consideration Key design issue for learning sets of rules Sequential or simultaneous? Sequential : isolate components of hypothesis Simultaneous : whole hypothesis at once General-to-specific or Specific-to-general? G  S : learn-one-rule S  G : Find-S Generate-and-Test or Example-Driven? G&T : search thru syntactically legal hypotheses E-D : Find-S, Candidate-Elimination Post-pruning of Rules? Popular overfitting recovery method

23 Summary : Points for Consideration What statistical evaluation method? Relative frequency : n c /n (n : matched by rule, n c : classified by rule correctly) M-estimate of accuracy : (n c + mp) / (n + m) P : the prior probability that a randomly drawn example will have classification assigned by the rule m : weight ( or # of examples for weighting this prior) Entropy a

24 Learning first-order rules From now... We consider learning rule that contain variables (first-order rules) Inductive learning of first-order rules : inductive logic programming (ILP) Can be viewed as automatically inferring Prolog programs Two methods are considered FOIL Induction as inverted deduction

25 Learning first-order rules First-order rule Rules that contain variables Example Ancestor (x, y)  Parent (x, y). Ancestor (x, y)  Parent (x, z) ^ Ancestor (z, y) : recursive More expressive than propositional rules IF (Father1 = Bob)&(Name2 = Bob)&(Female1 = True),THEN Daughter1,2 = True IF Father(y,x) & Female(y), THEN Daughter(x,y)

26 Learning first-order rules : Terminology Terminology 常量 ---Constants: e.g., John, Kansas, 42 变量 ---Variables: e.g., Name, State, x 谓词 ---Predicates: e.g., Father-Of, Greater-Than 函数 ---Functions: e.g., age, cosine 项 ---Term: constant, variable, or function(term) 文字 ----Literals (atoms): Predicate(term) or negation 例如:,  Greater-Than(age(John), 42) , Female(Mary), ¬Female(x) 子句 ---Clause: disjunction of literals with implicit universal quantification 例如:∀ x : Female(x) ∨ Male(x) Horn 子句 ----Horn clause: at most one positive literal (H   L 1   L 2  …   L n )

27 Learning first-order rules : Terminology First Order Horn Clauses Rules that have one or more preconditions and one single consequent. Predicates may have variables The following Horn clause is equivalent H   L1  …   Ln H  (L1  …  Ln ) If L1  …  Ln) then H

28 Learning first-order rules : First-Order Inductive Learning (FOIL) Natural extension of “Sequential covering + Learn-one-rule” FOIL rule : similar to Horn clause with two exceptions Syntactic restriction : no function More expressive than Horn clauses : Negation allowed in rule bodies

29 Learning first-order rules : First-Order Inductive Learning (FOIL) FOIL (Target_predicate, Predicates, Examples) Pos (Neg) ← those Examples for which the Target_predicate is True (False) Learned_rules ← { } while Pos is not NULL, do New Rule ← the rule that predicts Target_predicate with no preconditions New RuleNeg ← Neg while NewRuleNeg, do Candidate_literals ← candidate new literals for NewRule Best_literal ← argmax Foil_Gain (L, NewRule) Add Best_literal to preconditions of NewRule NewRuleNeg ← subset of NewRuleNeg (satisfies NewRule precondi.) Learned-rules ← Learned_rules + NewRule Pos ← Pos – {members of Pos covered by NewRule} Return Learned_rules

30 Learning first-order rules : First-Order Inductive Learning (FOIL) FOIL learns rules when the target literal is true. Cf. sequential covering learns both rules that are true and false Outer loop Add a new rule to its disjunctive hypothesis Specific-to-General search Inner loop Find a conjunction General-to-Specific search on each rule by starting with a NULL precondition and adding more literal (hill-climbing) Cf. sequential covering performs a beam search.

31 FOIL: Generating Candidate Specializations in FOIL Generate new literals, each of which may be added to the rule preconditions. Current Rule : P(x 1, x 2, …, x k ) <- L 1 … L n Add new literal L n+1 to get more specific Horn clause Form of literal Q(v 1, v 2, …, v k ) : Q in predicates Equal( X j, X k ) : X j and X k are variables already present to the rule The negation of above

32 FOIL : Guiding the Search in FOIL Consider all possible bindings (substitution) : prefer rules that possess more positive bindings Foil_Gain(L, R) L  candidate predicate to add to rule R p 0  number of positive bindings of R n 0  number of negative bindings of R p 1  number of positive bindings of R + L n 1  number of negative bindings of R + L t  number of positive bindings of R also covered by R + L Based on the numbers of positive and negative bindings covered before and after adding the new literal

33 FOIL : Examples Examples Target literal : GrandDaughter(x, y) Training Examples : GrandDaughter(Victor, Sharon) Father(Sharon,Bob)Father(Tom, Bob) Female(Sharon) Father(Bob, Victor) Initial step : GrandDaughter(x, y) ← positive binding : {x/Victor, y/Sharon} negative binding : others

34 FOIL : Examples Candidate additions to the rule preconditions : Equal(x,y), Female(x), Female(y), Father(x,y), Father(y,x), Father(x,z), Father(z,x), Father(y,z), Father(z,y) and the negations For each candidate, calculate FOIL_Gain If Father(y, z) has the maximum value of FOIL_Gain, select Father(y, z) to add precondition of rule GrandDaughter(x, y) ← Father(y,z) Iteration… We add the best candidate literal and continue adding literals until we generate a rule like the following: GrandDaughter(x,y)  Father(y,z) ^ Father(z,x) ^ Female(y) At this point we remove all positive examples covered by the rule and begin the search for a new rule.

35 FOIL : Learning recursive rules sets Predicate occurs in rule head. Example Ancestor (x, y)  Parent (x, z)  Ancestor (z, y). Rule: IF Parent (x, z)  Ancestor (z, y) THEN Ancestor (x, y) Learning recursive rule from relation Given: appropriate set of training examples Can learn using FOIL-based search Requirement: Ancestor  Predicates Recursive rules still have to outscore competing candidates at FOIL-Gain How to ensure termination? (i.e. no infinite recursion) [Quinlan, 1990; Cameron-Jones and Quinlan, 1993]

36 Induction as inverted deduction Induction : inference from specific to general Deduction : inference from general to specific Induction can be cast as a deduction problem ( ∀ ∈ D) (B ∧ h ∧ x i )├ f(x i ) D : a set of training data B : background knowledge x i : ith training instance f(x i ) : target value X├ Y : “Y follows deductively from X”, or “X entails Y”  For every training instance x i, the target value f(x i ) must follow deductively from B, h, and x i

37 Induction as inverted deduction Learn target : Child(u,v) : child of u is v Positive example : Child(Bob, Sharon) Given instance: Male(Bob), Female(Sharon), Father(Sharon,Bob) Background knowledge : Parent(u,v)  Father(u,v) Hypothesis satisfying the (B ∧ h ∧ x i )├ f(x i ) h1 : Child(u, v) ←Father(v, u) : no need of B h2 : Child(u, v) ←Parent(v, u) : need B The role of Background Knowledge Expanding the set of hypotheses New predicates (Parent) can be introduced into hypotheses(h2)

38 Induction as inverted deduction In view of induction as the inverse of deduction Inverse entailment operators is required O(B, D) = h such that ( ∀ ∈ D) (B ∧ h ∧ x i )├ f(x i ) Input : training data D = { } background knowledge B Output : a hypothesis h

39 Induction as inverted deduction Attractive features to formulating the learning task 1. This formulation subsumes the common definition of learning 2. By incorporating the notion of B, this formulation allows a more rich definition 3. By incorporating B, this formulation invites learning methods that use this B to guide search for h

40 Induction as inverted deduction Practical difficulties to formulating 1. The requirement of the formulation does not naturally accommodate noisy training data. 2. The language of first-order logic is so expressive, and the number of hypotheses that satisfy the formulation is so large. 3. In most ILP system, the complexity of the hypothesis space search increases as B is increased.

41 Inverting Resolution Resolution rule P ∨ L ¬ L ∨ R P ∨ R (L: literal P,R : clause) Resolution Operator (propositional form) Given initial clauses C 1 and C 2, find a literal L from clause C 1 such that ¬ L occurs in clause C 2. Form the resolvent C by including all literal from C 1 and C 2, except for L and ¬ L. More precisely, the set of literals occurring in the conclusion C is C = (C 1 - {L}) ∪ (C 2 - { ¬ L})

42 Inverting Resolution Example 1 C 2 : KnowMaterial ∨ ¬ Study C 1 : PassExam ∨ ¬ KnowMaterial C: PassExam ∨ ¬ Study Example 2 C 1 : A ∨ B ∨ C ∨ ┐D C 2 : ┐B ∨ E ∨ F  C : A ∨ C ∨ ┐D ∨ E ∨ F

43 Inverting Resolution O(C, C 1 ) Perform inductive inference Inverse Resolution Operator (propositional form) Given initial clauses C 1 and C, find a literal L that occurs in clause C 1, but not in Clause C. Form the second clause C 2 by including the following literals C 2 = (C - (C 1 -{L})) ∪ { ¬ L}

44 Inverting Resolution Example 1 C 2 : KnowMaterial ∨ ¬ Study C 1 : PassExam ∨ ¬ KnowMaterial C: PassExam ∨ ¬ Study Example 2 C 1 : B ∨ D, C : A ∨ B  C 2 : A ∨ ┐D (if, C 2 : A ∨ ┐D ∨ B ??) Inverse resolution is nondeterministic

45 Inverting Resolution : First-Order Resolution First-Order Resolution Substitution ( 置换 ) Mapping of variables to terms Ex) θ = {x/Bob, z/y} Unifying Substitution (合一置换) For two literal L 1 and L 2, provided L 1 θ = L 2 θ Ex) θ = {x/Bill, z/y} L 1 =Father(x, y), L 2 =Father(Bill, z) L 1 θ = L 2 θ = Father(Bill, y)

46 Inverting Resolution : First-Order Resolution Resolution Operator (first-order form) Find a literal L 1 from clause C 1, literal L 2 from clause C 2, and substitution θ such that L 1 θ= ¬ L 2 θ. From the resolvent C by including all literals from C 1 θ and C 2 θ, except for L 1 θ and ¬ L 2 θ. More precisely, the set of literals occurring in the conclusion C is C = (C 1 - {L 1 })θ ∪ (C 2 - {L 2 })θ

47 Inverting Resolution : First-Order Resolution Example C 1 = White(x) ← Swan(x), C 2 = Swan(Fred) C 1 = White(x) ∨¬ Swan(x), L 1 = ¬ Swan(x), L 2 =Swan(Fred) unifying substitution θ = {x/Fred} then L 1 θ = ¬ L 2 θ = ¬ Swan(Fred) (C 1 -{L 1 })θ = White(Fred) (C 2 -{L 2 })θ = Ø ∴ C = White(Fred)

48 Inverting Resolution : First-order case Inverse Resolution : First-order case C=(C 1 -{L 1 })θ 1 ∪ (C 2 -{L 2 })θ 2 (where, θ = θ 1 θ 2 (factorization)) C - (C 1 -{L 1 })θ 1 = (C 2 -{L 2 })θ 2 (where, L 2 = ¬ L 1 θ 1 θ 2 -1 ) ∴ C 2 =(C-(C 1 -{L 1 })θ 1 )θ 2 -1 ∪ { ¬ L 1 θ 1 θ 2 -1 }

49 Inverting Resolution : First-order case Multistep Inverse Resolution Father(Tom,Bob) GrandChild(y,x) ∨¬ Father(x,z) ∨¬ Father(z,y) {Bob/y,Tom/z} Father(Shannon,Tom) GrandChild(Bob,x) ∨¬ Father(x,Tom) {Shannon/x} GrandChild(Bob,Shannon)

50 Inverting Resolution C=GrandChild(Bob,Shannon) C 1 =Father(Shannon,Tom) L 1 =Father(Shannon,Tom) Suppose we choose inverse substitution θ 1 -1 ={}, θ 2 -1 ={Shannon/x) (C-(C 1 -{L 1 })θ 1 )θ 2 -1 = (Cθ 1 )θ 2 -1 = GrandChild(Bob,x) { ¬ L 1 θ 1 θ 2 -1 } = ¬ Father(x,Tom) ∴ C 2 = GrandChild(Bob,x) ∨¬ Father(x,Tom) or equivalently GrandChild(Bob,x) ← ¬ Father(x,Tom)

51 逆消解小结 逆消解提供了一种一般的途径以自动产生满足式子 ( ∀ ∈ D) (B ∧ h ∧ x i )├ f(x i ) 的假设 与 FOIL 方法的差异 逆消解是样例驱动, FOIL 是生成再测试 逆消解在每一步只考虑可用数据中的一小部分, FOIL 考虑所有的可 用数据 看起来基于逆消解的搜索更有针对性且更有效,但实际未必如此

52 Summary Learning Rules from Data Sequential Covering Algorithms 序列覆盖算法学习析取的规则集,方法是先学习单个精确的规则,然后移去被此规则 覆盖的正例,再在剩余样例上重复这一过程 序列覆盖算法提供了一个学习规则集的有效的贪婪算法,可作为由顶向下的决策树学 习算法的替代算法,决策树算法可被看作并行覆盖,与序列覆盖相对应 Learning single rules by search Beam search Alternative covering methods : 这些方法的不同在于它们考察规则前件空间的策略不同 Learning rule sets First-Order Rules Learning single first-order rules Representation: first-order Horn clauses Extending Sequential-Covering and Learn-One-Rule: variables in rule preconditions :学习一阶规则集的方法是将 CN2 中的序列覆盖算法由命题形式扩 展到一阶表示。

53 Summary FOIL: learning first-order rule sets :可学习包括简单递归规则集在 内的一阶规则集 Idea: inducing logical rules from observed relations Guiding search in FOIL Learning recursive rule sets Induction as inverted deduction 学习一阶规则集的另一个方法是逆演绎,通过运用熟知的演绎推理的逆 算子来搜索假设 Idea : inducing logical rule as inverted deduction O(B, D) = h –such that ( ∀ ∈ D) (B ∧ h ∧ x i )├ f(x i ) 逆消解是消解算子的逆转,而消解是普遍用于机器定理证明的一种推理 规则,

54 书后练习