Download presentation
Presentation is loading. Please wait.
1
Fifth Workshop on Link Analysis, Counterterrorism, and Security. or Antonio Badia David Skillicorn
2
Open Problems An individualized list (with some feedback from workshop participants)
3
Process improvements: Better overall processes. Defence in depth is the key to lower error rates; good/normal should look like it from every direction; Handling multiple kinds of data at once (attributed with relational); We don’t know very many algorithms that exploit more than one type of data within the same algorithm; Using graph analysis techniques more widely; Although there are good reasons to expect that a graph approach will be more robust than a direct approach, this is hardly ever done – for good reasons because it’s harder and messier; Better ways to exploit the fact that normality implies internal consistency; This only makes sense in an adversarial setting so it has received little attention – but it is a good, basic technique;
4
Legal and social frameworks for preemptive data analysis; The arguments for widespread data collection, and ways to mitigate the downsides need to be developed further, and explained by the knowledge discovery community to those who have legitimate concerns about the cost/benefit tradeoff; Challenges of open virtual worlds; New virtual worlds, such as the Multiverse, make it much hard to gather data using any kind of surveillance – the consequences need to be understood; Focus on emergent properties rather than collected ones; Attributes that are derived from the collective properties of many individual records are much more resistant to manipulation than those collected directly in individual records; Collaboration with linguists, sociologists, anthropologists, etc.; Applying technology well depends on deeper understanding of context, and computing people do not necessarily do this well; Better use of visualization, especially multiple views;
5
“Easy” technical advances: Hardening standard techniques against manipulation (by insiders and outsiders); Most existing algorithms are seriously vulnerable to manipulation by, e.g., adding a few particular data records; Distinguishing the bad from the unusual; It’s straightforward to identify the normal in a dataset, but once these records have been removed, it still remains to separate the bad and the unusual; little has been done to attack this problem; Getting graph techniques to work as well as they should; Although graph algorithms have known theoretical advantages, it has been surprisingly difficult to turn these into practical advantages; Strong but transparent predictors; We know predictors that are strong, and predictors that are transparent (they explain their predictions) but we don’t know any that are both at once;
6
Detecting when models need to be updated because the setting has changed; In adversarial settings, there is a constant arms race, and so a greater need to update models regularly – automatic ways to know when to do this are not really known; Clustering to find ‘fringe’ records; In adversarial settings, the records of interest are likely to be close to the normal data, rather than outliers – techniques for detecting such fringe clusters are needed ; Better 1-class prediction techniques; In many settings, only normal data is available; existing 1-class prediction is unusably fragile; Temporal change detection (trend/concept drift in every analysis); One way to detect manipulation is to see change for which there seems to be no explanation – detecting this would be useful;
7
Keyless fusion algorithms, and an understanding of the limits of fusion; Most fusion uses key attributes that are thought of as describing identity – but, anecdotally, almost any set of attributes can play this role, and we need to understand the theory and limits; Better symbiotic knowledge discovery – humans and algorithms coupled together; Many analysis systems have a loop between analyst and knowledge-discovery tools, but there seem to be interesting ways to make this loop more productive;
8
Difficult technical advances: Finding larger structures in text; Very little structure above the level of named entities is done at present; but there are opportunities to extract larger structures both to check for normality, and to use them to understand content better; Authorship detection from small samples; The web has become a place where authors are plentiful, and it would be useful to detect that the same person has written in this blog and that forum; Unusual region detection in graphs; Most graph algorithms focus either on clustering or on exploring the region of a single node – it is also interesting to find regions that are somehow anomalous; Performance improvements to allow scaling to v. large datasets; Changes of three orders of magnitude in quantity require changes in the qualitative properties of algorithms – scalability issues need more attention;
9
Better use of second-order algorithms; Approaches in which an algorithm is run repeatedly under different conditions and it is a change from one run to the next that is significant have potential but are hardly ever used; Systemic functional linguistics for content/mental state extraction from text; SFL takes into account the personal and social dimensions of language, and brings together texts that look very different on the surface; this will have payoffs in several dimensions of text exploitation; Adversarial parsing (cf error correction in compilers); When text has been altered for concealment, compiler techniques may help to spot where these changes have occurred and what they might have been.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.