Algorithms - A technical perspective Are they really a black box?

Algorithms - A technical perspective Are they really a black box?
Ansgar Koene Algorithms Workshop 15 February 2017

Algorithms in the news

Origins of the ‘black box’
Technical issues Fundamental Practical Business/management interests, e.g. trade secrets How a decision was reached is in principle possible to be revealed, if sufficient data about the state of the system at time of operation is available (and we have access to the code) Why a particular chain of operations was done is much more difficult (especially with ML)

Fundamental properties
Machine Learning (ML) Hand coded O1=f(w1,H1,w2,H2,w3,H3)

Fundamental transparency: how vs. why
Machine Learning If parameters & data are known, we can trace how the output is computed. If history of data is known, we can in principle trace how parameters were set. Explaining why certain parameters are optimal can be very difficult. => Explaining why output is produced is difficult. Hand Coded If parameters & data are known, we can trace how the output is computed. We known the parameters were set by engineers. We can ask the engineers why certain parameters were chosen. => Explaining why output is produced depends on the engineers.

High dimensionality, a.k.a. when an explanation is not transparent
High dimensionality of Big Data algorithms can make interpretation of the ‘explanation’ problematic e.g. Google page ranking algorithms is estimated to involve 200+ parameters Approximated transparency through dimensionality reduction, e.g. Principle Component Analysis (PCA) requires case-by-case analysis depending on input data ‘general’ solution only valid for the ‘majority case’ conditions

Practical issues: non-static algorithms
Machine Learning If Machine Learning algorithms use ‘in situ’ continuous or intermitted learning, the parameter setting change over time. To re-create a system behaviour requires knowledge of the past parameter states. Hand Coded Hand coded systems are also frequently updates, especially if there is an ‘arms race’ between the service provider and users trying to ‘game’ the system (e.g. Google search vs. Search Engine Optimization) In some cases, randomness might be built into an algorithm’s design meaning its outcomes can never be perfectly predicted.

The challenge of translating a task/problem into an algorithm
Defining precisely what a task/problem is (logic) Break that down into a precise set of instructions, factoring in any contingencies, such as how the algorithm should perform under different conditions (control). “Explain it to something as stonily stupid as a computer” (Fuller 2008). Many tasks and problems are extremely difficult or impossible to translate into algorithms and end up being hugely oversimplified. Mistranslating the problem and/or solution will lead to erroneous outcomes and random uncertainties.

System design in the real world

Algorithm creation Algorithm are created through: trial and error, play, collaboration, discussion, and negotiation. They are teased into being: edited, revised, deleted and restarted, shared with others, passing through multiple iterations stretched out over time and space. They are always somewhat uncertain, provisional and messy fragile accomplishments. Algorithmic systems are not standalone little boxes, but massive, networked ones with hundreds of hands reaching into them, tweaking and tuning, swapping out parts and experimenting with new arrangements.

Examining pseudo-code/source code
Deconstructing and tracing how an algorithm is constructed in code and mutates over time is not straightforward. Code often takes the form of a “Big Ball of Mud”: “[a] haphazardly structured, sprawling, sloppy, duct-tape and bailing wire, spaghetti code jungle”.

Reverse engineering Reverse engineering is the process of articulating the specifications of a system through a rigorous examination drawing on domain knowledge, observation, and deduction to unearth a model of how that system works. By examining what data is fed into an algorithm and what output is produced it is possible to start to reverse engineer how the recipe of the algorithm is composed (how it weights and preferences some criteria) and what it does. researchers might search Google using the same terms on multiple computers in multiple jurisdictions to get a sense of how its PageRank algorithm is constructed and works in practice (Mahnke and Uprichard 2014), or they might experiment with posting and interacting with posts on Facebook to try and determine how its EdgeRank algorithm positions and prioritises posts in user time lines (Bucher 2012), or they might use proxy servers and feed dummy user profiles into e- commerce systems to see how prices might vary across users and locales (Wall Street Journal, detailed in Diakopoulos 2013).

Conclusion HOW With access to the code, data and parameter settings, HOW the output was produced can be ‘explained’. High dimensionality can make the ‘explanation’ difficult to understand. Dimensionality reduction can help to generate an approximate explanation that is understandable. WHY Can be (very) difficult to determine, especially if Machine Learning methods are used. Approximate explanation based on the manually set optimization targets can help.

UnBias project

Algorithms - A technical perspective Are they really a black box?

Similar presentations

Presentation on theme: "Algorithms - A technical perspective Are they really a black box?"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Algorithms - A technical perspective Are they really a black box?

Similar presentations

Presentation on theme: "Algorithms - A technical perspective Are they really a black box?"— Presentation transcript:

Similar presentations

About project

Feedback