Towards CI Foundations Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland Google: W. Duch WCCI’08 Panel Discussion.

Slides:

Advertisements

Similar presentations

Artificial Intelligence 12. Two Layer ANNs

Advertisements

Visualization of the hidden node activities or hidden secrets of neural networks. Włodzisław Duch Department of Informatics Nicolaus Copernicus University,

Slides from: Doug Gray, David Poole

Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.

Introduction to Neural Networks Computing

Universal Learning Machines (ULM) Włodzisław Duch and Tomasz Maszczyk Department of Informatics, Nicolaus Copernicus University, Toruń, Poland ICONIP 2009,

NEURAL NETWORKS Perceptron

1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.

Neural Network I Week 7 1. Team Homework Assignment #9 Read pp. 327 – 334 and the Week 7 slide. Design a neural network for XOR (Exclusive OR) Explore.

Support Vector Machines

Heterogeneous Forests of Decision Trees Krzysztof Grąbczewski & Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Torun, Poland.

Simple Neural Nets For Pattern Classification

K-separability Włodzisław Duch Department of Informatics Nicolaus Copernicus University, Torun, Poland School of Computer Engineering, Nanyang Technological.

Radial Basis Functions

Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Almost Random Projection Machine with Margin Maximization and Kernel Features Tomasz Maszczyk and Włodzisław Duch Department of Informatics, Nicolaus Copernicus.

Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Prof. Carla P. Gomes Module: Neural Networks: Concepts (Reading:

Support Vector Neural Training Włodzisław Duch Department of Informatics Nicolaus Copernicus University, Toruń, Poland School of Computer Engineering,

Transfer functions: hidden possibilities for better neural networks. Włodzisław Duch and Norbert Jankowski Department of Computer Methods, Nicholas Copernicus.

An Illustrative Example

Global Visualization of Neural Dynamics

How to learn hard Boolean functions Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland School of Computer Engineering,

CHAPTER 11 Back-Propagation Ming-Feng Yeh.

Aula 4 Radial Basis Function Networks

Radial-Basis Function Networks

Artificial Intelligence (AI) Addition to the lecture 11.

Where We’re At Three learning rules  Hebbian learning regression  LMS (delta rule) regression  Perceptron classification.

Radial Basis Function Networks

A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.

Introduction to AI Michael J. Watts

Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.

DIGITAL IMAGE PROCESSING Dr J. Shanbehzadeh M. Hosseinajad ( J.Shanbehzadeh M. Hosseinajad)

IJCNN 2012 Competition: Classification of Psychiatric Problems Based on Saccades Włodzisław Duch 1,2, Tomasz Piotrowski 1 and Edward Gorzelańczyk 3 1 Department.

LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.

Computational Intelligence: Methods and Applications Lecture 30 Neurofuzzy system FSM and covering algorithms. Włodzisław Duch Dept. of Informatics, UMK.

CS 478 – Tools for Machine Learning and Data Mining Backpropagation.

George F Luger ARTIFICIAL INTELLIGENCE 6th edition Structures and Strategies for Complex Problem Solving Machine Learning: Connectionist Luger: Artificial.

Meta-Learning and learning in highly non-separable cases Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland Google:

CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.

Meta-Learning: towards universal learning paradigms Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland Google: W.

Non-Bayes classifiers. Linear discriminants, neural networks.

How to learn highly non-separable data Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland Google: W. Duch ICAISC’08.

CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.

CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.

CS621 : Artificial Intelligence

Neural network applications: The present and the future Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland Google:

CITS7212: Computational Intelligence An Overview of Core CI Technologies Lyndon While.

Chapter 2 Single Layer Feedforward Networks

Towards Science of DM Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland Google: W. Duch WCCI’08 Panel Discussion.

Each neuron has a threshold value Each neuron has weighted inputs from other neurons The input signals form a weighted sum If the activation level exceeds.

Meta-Learning and Learning in Highly Non-separable Cases Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland Google:

1 Neural networks 2. 2 Introduction: Neural networks The nervous system contains 10^12 interconnected neurons.

Neural Networks References: “Artificial Intelligence for Games” "Artificial Intelligence: A new Synthesis"

Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.

Support Feature Machine for DNA microarray data

Data Mining, Neural Network and Genetic Programming

Chapter 2 Single Layer Feedforward Networks

Department of Informatics, Nicolaus Copernicus University, Toruń

with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017

CSC 578 Neural Networks and Deep Learning

Tomasz Maszczyk and Włodzisław Duch Department of Informatics,

Chapter 3. Artificial Neural Networks - Introduction -

Neuro-Computing Lecture 4 Radial Basis Function Network

Visualization of the hidden node activities or hidden secrets of neural networks. Włodzisław Duch Department of Informatics Nicolaus Copernicus University,

McCulloch–Pitts Neuronal Model :

How to learn highly non-separable data

Artificial Intelligence 12. Two Layer ANNs

Visualization of the hidden node activities or hidden secrets of neural networks. Włodzisław Duch Department of Informatics Nicolaus Copernicus University,

Heterogeneous adaptive systems

CS621: Artificial Intelligence Lecture 18: Feedforward network contd

Presentation transcript:

Towards CI Foundations Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland Google: W. Duch WCCI’08 Panel Discussion

Questions Nature of CI Current state of CI Promoting CI CI and Smart Adaptive Systems CI and Nature-inspiration Future of CI

CI definition Computational Intelligence. An International Journal (1984) + 10 other journals with “Computational Intelligence”, D. Poole, A. Mackworth & R. Goebel, Computational Intelligence - A Logical Approach. (OUP 1998), GOFAI book, logic and reasoning. CI should: be problem-oriented, not method oriented; cover all that CI community is doing now, and is likely to do in future; include AI – they also think they are CI... CI: science of solving (effectively) non-algorithmizable problems. Problem-oriented definition, firmly anchored in computer sci/engineering. AI: focused problems requiring higher-level cognition, the rest of CI is more focused on problems related to perception/action/control.

Are we really so good? Surprise! Almost nothing can be learned using current CI tools! Ex: complex logic; natural language; natural perception.

How much can we learn? Linearly separable or almost separable problems are relatively simple – deform or add dimensions to make data separable. How to define “slightly non-separable”? There is only separable and the vast realm of the rest.

Boolean functions n=2, 16 functions, 12 separable, 4 not separable. n=3, 256 f, 104 separable (41%), 152 not separable. n=4, 64K=65536, only 1880 separable (3%) n=5, 4G, but << 1% separable... bad news! Existing methods may learn some non-separable functions, but most functions cannot be learned ! Example: n-bit parity problem; many papers in top journals. No off-the-shelf systems are able to solve such problems. For parity problems SVM may go below base rate! Such problems are solved only by special neural architectures or special classifiers – if the type of function is known. But parity is still trivial... solved by

kD case 3-bit functions: X=[b 1 b 2 b 3 ], from [0,0,0] to [1,1,1] f(b 1,b 2,b 3 ) and  f(b 1,b 2,b 3 ) are symmetric (color change) 8 cube vertices, 2 8 =256 Boolean functions. 0 to 8 red vertices: 1, 8, 28, 56, 70, 56, 28, 8, 1 functions. For arbitrary direction W index projection W. X gives: k=1 in 2 cases, all 8 vectors in 1 cluster (all black or all white) k=2 in 14 cases, 8 vectors in 2 clusters (linearly separable) k=3 in 42 cases, clusters B R B or W R W k=4 in 70 cases, clusters R W R W or W R W R Symmetrically, k=5-8 for 70, 42, 14, 2. Most logical functions have 4 or 5-separable projections. Learning = find best projection for each function. Number of k=1 to 4-separable functions is: 2, 102, 126 and of all functions may be learned using 3-separability.

RBF for XOR Is RBF solution with 2 hidden Gaussians nodes possible? Typical architecture: 2 input – 2 Gaussians – 1 linear output, ML training 50% errors, but there is perfect separation - not a linear separation! Network knows the answer, but cannot say it... Single Gaussian output node may solve the problem. Output weights provide reference hyperplanes (red and green lines), not the separating hyperplanes like in case of MLP.

3-bit parity in 2D and 3D Output is mixed, errors are at base level (50%), but in the hidden space... Conclusion: separability in the hidden space is perhaps too much to desire... inspection of clusters is sufficient for perfect classification; add second Gaussian layer to capture this activity; train second RBF on the data (stacking), reducing number of clusters.

Spying on networks After initial transformation, what still needs to be done? Conclusion: separability in the hidden space is perhaps too much to desire... rules, similarity or linear separation, depending on the case.

Parity n=9 Simple gradient learning; quality index shown below.

More meta-learning Meta-learning: learning how to learn, replace experts who search for best models making a lot of experiments. Search space of models is too large to explore it exhaustively, design system architecture to support knowledge-based search. Abstract view, uniform I/O, uniform results management. Directed acyclic graphs (DAG) of boxes representing scheme placeholders and particular models, interconnected through I/O. Configuration level for meta-schemes, expanded at runtime level. An exercise in software engineering for data mining!

Intemi, Intelligent Miner Meta-schemes: templates with placeholders May be nested; the role decided by the input/output types. Machine learning generators based on meta-schemes. Granulation level allows to create novel methods. Complexity control: Length + log(time) A unified meta-parameters description... InteMi, intelligent miner, coming “soon”.

Biological justification Cortical columns may learn to respond to stimuli with complex logic resonating in different way. The second column will learn without problems that such different reactions have the same meaning: inputs x i and training targets y j. are same => Hebbian learning  W ij ~ x i y j => identical weights. Effect: same line y=W. X projection, but inhibition turns off one perceptron when the other is active. Simplest solution: oscillators based on combination of two neurons  (W. X-b) –  (W. X-b’) give localized projections! We have used them in MLP2LN architecture for extraction of logical rules from data. Note: k-sep. learning is not a multistep output neuron, targets are not known, same class vectors may appear in different intervals! We need to learn how to find intervals and how to assign them to classes; new algorithms are needed to learn it!