Presented by Document Clustering on Supercomputers Yu (Cathy) Jiao, Ph.D. Applied Software Engineering Research Group Computational Sciences and Engineering.

Slides:



Advertisements
Similar presentations
2/8/ :17 AM2/8/ :17 AM2/8/ :17 AM.
Advertisements

O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY A Distributed Agent Implementation of Multiple Species Flocking Model for Document Partitioning.
Information Retrieval in Practice
Scalable Algorithms for Global Snapshots in Distributed Systems
Information Retrieval Lecture 7 Introduction to Information Retrieval (Manning et al. 2007) Chapter 17 For the MSc Computer Science Programme Dell Zhang.
Extracting Intelligence from Patent Data Using Wisdomain’s Focust Solution An overview Presented by: Jean Archambeault, Information Specialist 2004 FPTT.
SciTech Strategies, Inc. William Pickering Dick Klavans Marjorie M.K. Hlava IEEE SciTech Strategies Access Innovations / Data Harmony March 23, 2010 Found.
G. Folino, A. Forestiero, G. Spezzano Swarming Agents for Discovering Clusters in Spatial Data Second International.
Chapter 1 Assuming the Role of the Systems Analyst
Constructing the Future with Intelligent Agents Raju Pathmeswaran Dr Vian Ahmed Prof Ghassan Aouad.
ASKAP Central Processor: Design and Implementation Calibration and Imaging Workshop 2014 ASTRONOMY AND SPACE SCIENCE Ben Humphreys | ASKAP Software and.
©2012, TeleCommunication Systems, Inc. (TCS). TCS Proprietary Level 3 May 2 nd 2012 CAP Policy Workshop.
Oklahoma Supercomputing Symposium 2008 Oct 7 th 2008 Mining for Science and Engineering Presented by: Kenji Yoshigoe.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Distributed Computations
A Pairwise Key Pre-Distribution Scheme for Wireless Sensor Networks Wenliang (Kevin) Du, Jing Deng, Yunghsiang S. Han and Pramod K. Varshney Department.
Academia and Industry Oil and Water or Bread and Cheese? Michael Kirby Department of Mathematics Colorado State University.
UNIVERSITY COLLEGE DUBLINDUBLIN CITY UNIVERSITY SMI || NCSR || CDVP A Methodology for the Deployment of Multi-Agent Systems on Wireless Sensor Networks.
Massive Data Analysis Lab (MassDAL) S. Muthukrishnan CS Dept.
Improving UML Class Diagrams using Design Patterns Semantics Shahar Maoz Work in Progress.
Distributed Computations MapReduce
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Computer Science 1 Research on Sensor Network Security Peng Ning Cyber Defense Laboratory Department of Computer Science NC State University 2005 TRES.
Robots at Work Dr Gerard McKee Active Robotics Laboratory School of Systems Engineering The University of Reading, UK
REV Total SaaS global revenues of $13.1 billion in 2009 Total SaaS estimated revenues of $40.5 billion by % of CRM software in 2011 will.
Distribution Statement A. Approved for public release; distribution is unlimited. Test and Evaluation/Science and Technology Program Rapid Data Analyzer.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
1 Down Place Hammersmith London UK 530 Lytton Ave. Palo Alto CA USA.
Intelligent Software Agent Technology Thomas E. Potok, Ph.D. Applied Software Engineering Research Group Leader Computational Sciences and Engineering.
Distributed Systems 15. Multiagent systems and swarms Simon Razniewski Faculty of Computer Science Free University of Bozen-Bolzano A.Y. 2014/2015.
Ambulation : a tool for monitoring mobility over time using mobile phones Computational Science and Engineering, CSE '09. International Conference.
CIPHER Counterintelligence Penetration Hazard Evaluation and Recognition Thomas E. Potok, Ph.D. Applied Software Engineering Research Group Leader Computational.
© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March
Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.
Contact Profile
Steps Toward an AGI Roadmap Włodek Duch ( Google: W. Duch) AGI, Memphis, 1-2 March 2007 Roadmaps: A Ten Year Roadmap to Machines with Common Sense (Push.
© Yilmaz “Agent-Directed Simulation – Course Outline” 1 Course Outline Dr. Levent Yilmaz M&SNet: Auburn M&S Laboratory Computer Science &
Educational Technology in Context The Big Picture.
High Performance I/O and Data Management System Group Seminar Xiaosong Ma Department of Computer Science North Carolina State University September 12,
Efficient Minimal Perfect Hash Language Models David Guthrie, Mark Hepple, Wei Liu University of Sheffield.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
High Availability for Information Security Managing The Seven R’s Rich Schiesser Sr. Technical Planner.
Retrieving Relevant Reports from a Customer Engagement Repository Dharmesh Thakkar Zhen Ming Jiang Ahmed E. Hassan School of Computing, Queen’s University,
An Application of Text Mining in Strategic Technical Planning Paul Frey – Search Technology, Inc. Nils Newman – Intelligent Information Services Corp.
Business Intelligence Appliance Powerful pay as you grow BI solutions with Engineered Systems.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
What’s New in WatchGuard XCS v9.1 Update 1. WatchGuard XCS v9.1 Update 1  Enhancements that improve ease of use New Dashboard items  Mail Summary >
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Contact: Hirofumi Amano at Kyushu Mission 40 Years of HPC Services Though the R. I. I.
1 Diversifying Sensors to Improve Network Resilience Wenliang (Kevin) Du Electrical Engineering & Computer Science Syracuse University.
Eugenia Hatziangeli Beams Department Controls Group CERN, Accelerators and Technology Sector E.Hatziangeli - CERN-Greece Industry day, Athens 31st March.
Cybersecurity: Expanding the Front Lines of Defense Dr. George K. Kostopoulos Professor Electrical and Computer Engineering Cybersecurity New York Institute.
Plumbing the Computing Platforms of Big Data Dilma Da Silva Professor & Department Head Computer Science & Engineering Texas A&M University.
LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.
A Hybrid Search Engine -- Combining Google and P2P Xuanhui Wang.
Presented by Data Analysis and High Performance Computing Yu (Cathy) Jiao, Ph.D. Robert M. Patton, Ph.D. Xiaohui Cui, Ph.D. Applied Software Engineering.
ICSA 341 Data communications & Computer Networks Switching In the WAN, mesh networks are not practical for geographically spread areas with many nodes.
Presented by NCCS Hardware Jim Rogers Director of Operations National Center for Computational Sciences.
Contextual Text Cube Model and Aggregation Operator for Text OLAP
An Architecture-Centric Approach for Software Engineering with Situated Multiagent Systems PhD Defense Danny Weyns Katholieke Universiteit Leuven October.
1 Christopher Moretti – University of Notre Dame 4/30/2008 High Level Abstractions for Data-Intensive Computing Christopher Moretti, Hoang Bui, Brandon.
Performing Fault-tolerant, Scalable Data Collection and Analysis James Jolly University of Wisconsin-Madison Visualization and Scientific Computing Dept.
Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.
By: Joel Dominic and Carroll Wongchote 4/18/2012.
FusionCube At-a-Glance. 1 Application Scenarios Enterprise Cloud Data Centers Desktop Cloud Database Application Acceleration Midrange Computer Substitution.
Big Data is a Big Deal!.
with Computational Scientists
Resource Allocation in a Middleware for Streaming Data
EAST MDSplus Log Data Management System
Presentation transcript:

Presented by Document Clustering on Supercomputers Yu (Cathy) Jiao, Ph.D. Applied Software Engineering Research Group Computational Sciences and Engineering Division

2 Yu_Potok_Clustering_0611 National challenge Data Binary Text Image Multimedia Sensors One small step for man Data are everywhere Sources are unreliable Data are difficult to merge Merging cannot be done manually

3 Yu_Potok_Clustering_0611 Key technologies Intelligent agents  Peer-to-peer communication  Encapsulated messages  Computation distribution  Adaptive and collaborative behavior  Fault tolerance High-performance computing  Red/White Oak clusters  135 Dell computers  Largest cluster computer at ORNL  1.7 Tflops  270 GB memory  11.3 TB disk Intelligent Agents Intelligent Agents Intelligent Agents Intelligent Agents Intelligent Agents Intelligent Agents Black Board Intelligent Agents Intelligent Agents Intelligent Agents Intelligent Agents Intelligent Agents Intelligent Agents Intelligent Agents Intelligent Agents Intelligent Agents Intelligent Agents Intelligent Agents Intelligent Agents

4 Yu_Potok_Clustering_0611  What is in there?  Are there any threats?  What am I missing? Paper records Old disks Legacy databases What to do with this?

5 Yu_Potok_Clustering_0611 Raw documents Organize the information What are the connections? Connect the information What are they planning? Take action How ORNL can help Iraq Nuclear Materials Chemical Weapons Chemical Weapons Threats Potential Targets Potential Targets Money Laundering Money Laundering Training Camps Training Camps What do we have? Find the threats How credible is the threat?

6 Yu_Potok_Clustering_0611 Doc 1Doc 2Doc 3 Army100 Sensor111 Technology110 Help100 Find100 Improvise100 Explosive101 Device101 ORNL010 Develop011 Homeland011 Defense011 Mitre001 Won001 Contract001 Vector Space Model 10,000 documents 100 terms 1 second Similarity Matrix 10,000 documents 1.6 Minutes Words to documents Documents to documents Cluster Analysis Most similar documents The technical problem O(n 2 log n) Doc 1Doc 2Doc 3 Doc 1100%17%21% Doc 2100%36% Doc 3100% D1 D2 D3 d 2 (x i,x j ) =  (x i,k  x j.k ) 2 d K=1 1/2 W ij = log 2 ( ƒ ij + 1 ) *log 2 NnNn Powerful but expensive

7 Yu_Potok_Clustering_0611 Reed et al., “Multi-Agent System for Distributed Cluster Analysis,” Third International Workshop on Software Engineering for Large-Scale Multi-Agent Systems (SELMAS'04), May 24 – 25, 2004, Edinburgh, Scotland. Breakthrough—inverse corpus frequency  We analyzed nearly 1 million documents from six major research corpora  We found 229,023 unique terms (a large dictionary contains around 70,000 terms)  We use this term frequency distribution as our “global” term frequency Look at the forest, not the trees 0 50, , , , ,000 Number of documents (K) Unique term count W ij = log 2 ( ƒ ij + 1 ) log 2 C + 1 c + 1

8 Yu_Potok_Clustering_0611 Distributed clustering Reed et. al., “An Agent-based Method for Distributed Clustering of Textual Information,” patent pending, licensed to industry Head Node Computer 4 Computer 5 Computer 6 Computer 7 Computer 8 Computer 9 Computer 10 Computer 11 Computer 1 Computer 2 Computer 3 Intelligent Agents Intelligent Agents Intelligent Agents Black Board Intelligent Agents Intelligent Agents Intelligent Agents Intelligent Agents Intelligent Agents Intelligent Agents Intelligent Agents Intelligent Agents Intelligent Agents

9 Yu_Potok_Clustering_0611 Ant colony optimization Bird flocking model Breakthrough – bio-inspired distributed solution AlignmentSeparationCohesion

10 Yu_Potok_Clustering_0611 Summary  Current technology cannot solve emerging national challenges.  Intelligent software agents are a significant breakthrough technology.  Results indicate high potential to help solve these national challenges.  We have a progression of significantly successfully deployed agent systems and research to our credit.

11 Yu_Potok_Clustering_0611 Contact Yu (Cathy) Jiao, Ph.D. Applied Software Engineering Research Group Computational Sciences and Engineering Division (865) Yu_Potok_Clustering_0611