Machine Learning, Sebastiano Galazzo

Machine Learning, Sebastiano Galazzo
best practices and vulnerabilities Sebastiano Galazzo Microsoft MVP A.I. Category

Sebastiano Galazzo Microsoft MVP
@galazzoseba

Best practices

The perceptron In machine learning, the perceptron is a binary classifier or a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. 𝑓 𝑥 =χ ( ⟨ w , x ⟩ + b ) w is a vector having weights of real values, while operator ⟨ ⋅ , ⋅ ⟩ is the scalar product, b is the 'bias’, a constant not related to any input value and χ ( y ) is the output function

Main evolutions Easy way, Logistic Regression, Support Vector Machine
Pro: Easy and fast use use Cons: Low accuracy (compared to neural networks) Hard way, Neural Networks Pro: If get convergence you gain a very high accuracy (State of the art) Cons: Very difficult to model, a lot of experience is required

Easy way Pseudo equation 𝑥∗∝ +𝑦∗𝛽+𝑐∗𝛿+..+𝑧∗𝜔=(0,1) #logisticregression #svm

Hard way #neuralnetwork

Advanced modelling of Neural Networks
Use case, provide a customer's willingness to vote a political party Age Gender Income City Political party 30 Male 38,000 New York Democrat 39 Female 42,000 Page Republican 24 Other 39,000 San Francisco 51 Prefer not to say 71,000 Seattle

Age Gender Income City Political party 30 Male 38,000 New York Democrat 39 Female 42,000 Page Republican 24 Other 39,000 San Francisco 51 Prefer not to say 71,000 Seattle 0,17 18,24 25,35 36,45 46,60 >60 𝑚𝑎𝑙𝑒 𝑓𝑒𝑚𝑎𝑙𝑒 𝑢𝑟𝑏𝑎𝑛 𝑟𝑢𝑟𝑎𝑙 [𝑠𝑢𝑏𝑢𝑟𝑏𝑎𝑛]..[democrat][Republican] > 20 parameters

Age Gender Income City Political party 30 Male 38,000 New York Democrat 39 Female 42,000 Page Republican 24 Other 39,000 San Francisco 51 Prefer not to say 71,000 Seattle Age Gender /= 100 /4 [0,100] 0 = Male, 0.25 = Female, 0.5=Other, 0.75 = Prefer not to Say, 1 = Unk [0.25][<20.000][ ][ ][ ]…

Age Gender Income City Political party 30 Male 38,000 New York Democrat 39 Female 42,000 Page Republican Method 1-of-(C-1) effects-coding: Standard deviation 𝜎= 1 𝑁 𝑖=1 𝑁 𝑥 𝑖 −𝜇 2 𝜇=𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠

Age Gender Income City Political party 30 Male 38,000 New York Democrat 39 Female 42,000 Page Republican age = ( ) / 4 = 40.0 𝜎= 1 𝑁 𝑖=1 𝑁 𝑥 𝑖 −𝜇 2 𝜎= − − − − =8,12

Age Gender Income City Political party 30 Male 38,000 New York Democrat 39 Female 42,000 Page Republican 𝑉 ′ = (𝑉−𝑚𝑒𝑎𝑛) 𝑠𝑡𝑑 𝑑𝑒𝑣 𝑉 ′ input will be used in place of the original input Having the age average is 40.0, standard deviation is 8.12, and our current value is 30.0: 30.0= (30−40) 8.12 = −1.23

One of parameters: Italian cities (About 8000) 𝑀𝑖𝑙𝑎𝑛𝑜 𝑇𝑜𝑟𝑖𝑛𝑜 𝑅𝑜𝑚𝑎 … 𝐶𝑎𝑡𝑎𝑛𝑖𝑎 Binary compression: 2 13 =8192 City Value Milano 0,0,0,0,0,0,0,0,0,0,0,0,0,0 Torino 0,0,0,0,1,1,0,0,0,0,0,1,0,0 Catania 0,1,0,0,1,0,0,0,0,1,0,1,1,0 With 13 nodes we can map 8192 values (Having the same meaning/context)

Age Gender Income City Political party 30 Male 38,000 New York Democrat 39 Female 42,000 Page Republican −1, ,4 [0.25][0,3] The model has a mapping ratio of 1:1 between concepts and the number of neurons. Only 5 parameters! Can be managed without neural networks by an IF,THEN sequence in the code

Data must be manipulated and made understandable by the machine, not for the humans!

Vulnerabilities

Vulnerabilities Let’s imagine that we run an auction website like Ebay. On our website, we want to prevent people from selling prohibited items . Enforcing these kinds of rules are hard if you have millions of users. We could hire hundreds of people to review every auction listing by hand, but that would be expensive.

Vulnerabilities Instead, we can use deep learning to automatically check auction photos for prohibited items and flag the ones that violate the rules. This is a typical image classification problem.

Vulnerabilities – Image Classification
We repeat this thousands of times with thousands of photos until the model reliably produces the correct results with an acceptable accuracy.

Vulnerabilities - Convolutional neural networks
Convolutional neural networks are powerful models that consider the entire image when classifying it. They can recognize complex shapes and patterns no matter where they appear in the image. In many image recognition tasks, they can equal or even beat human performance.

With a fancy model like that, changing a few pixels in the image to be darker or lighter shouldn’t have a big effect on the final prediction, right? Sure, it might change the final likelihood slightly, but it shouldn’t flip an image from “prohibited” to “allowed”. “expectations”

It was discovered that this isn’t always true

If you know exactly which pixels to change and exactly how much to change them, you can intentionally force the neural network to predict the wrong output for a given picture without changing the appearance of the picture very much. That means we can intentionally craft a picture that is clearly a prohibited item but which completely fools our neural network

Why is this?

A machine learning classifier works by finding a dividing line between the things it’s trying to tell apart. Here’s how that looks on a graph for a simple two-dimensional classifier that’s learned to separate green points (acceptable) from red points (prohibited) Right now, the classifier works with 100% accuracy. It’s found a line that perfectly separates all the green points from the red points.

But what if we want to trick it into mis-classifying one of the red points as a green point? What’s the minimum amount we could move a red point to push it into green territory? If we add a small amount to the Y value of a red point right beside the boundary, we can just barely push it over into green territory. Here’s how that looks on a graph for a simple two-dimensional classifier that’s learned to separate green points (acceptable) from red points (prohibited)

In image classification with deep neural networks, each “point” we are classifying is an entire image made up of thousands of pixels. That gives us thousands of possible values that we can tweak to push the point over the decision line. If we make sure that we tweak the pixels in the image in a way that isn’t too obvious to a human, we can fool the classifier without making the image look manipulated. Here’s how that looks on a graph for a simple two-dimensional classifier that’s learned to separate green points (acceptable) from red points (prohibited) Global AI Nights - London 2019

+ = People Squirel Here’s how that looks on a graph for a simple two-dimensional classifier that’s learned to separate green points (acceptable) from red points (prohibited)

Perturbation of math model

Vulnerabilities – The steps
Feed in the photo that we want to hack. Check the neural network’s prediction and see how far off the image is from the answer we want to get for this photo. Tweak our photo using back-propagation to make the final prediction slightly closer to the answer we want to get. Repeat steps 1–3 a few thousand times with the same photo until the network gives us the answer we want. Here’s how that looks on a graph for a simple two-dimensional classifier that’s learned to separate green points (acceptable) from red points (prohibited)

Snippet of a Python script using Keras
Vulnerabilities Snippet of a Python script using Keras

How can we protect ourselves against these attacks?
Simply create lots of hacked images and include them in your training data set going forward, that seems to make your neural network more resistant to these attacks. This is called Adversarial Training and is probably the most reasonable defense to consider adopting right now. Here’s how that looks on a graph for a simple two-dimensional classifier that’s learned to separate green points (acceptable) from red points (prohibited)

How can we protect ourselves against these attacks?
Pretty much every other idea researchers have tried so far has failed to be helpful in preventing these attacks. Here’s how that looks on a graph for a simple two-dimensional classifier that’s learned to separate green points (acceptable) from red points (prohibited)

Thanks! Sebastiano Galazzo Microsoft MVP @galazzoseba

Machine Learning, Sebastiano Galazzo

Similar presentations

Presentation on theme: "Machine Learning, Sebastiano Galazzo"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine Learning, Sebastiano Galazzo

Similar presentations

Presentation on theme: "Machine Learning, Sebastiano Galazzo"— Presentation transcript:

Similar presentations

About project

Feedback