1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part VI AI Workflows AAAI-08 Tutorial on Computational Workflows.

Slides:



Advertisements
Similar presentations
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
Advertisements

Chapter 1 An Overview of Computers and Programming Languages.
Overview of Programming and Problem Solving ROBERT REAVES.
ISE480 Sequencing and Scheduling Izmir University of Economics ISE Fall Semestre.
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
Foundations of Constraint Processing, Spring 2008 Evaluation to BT SearchApril 16, Foundations of Constraint Processing CSCE421/821, Spring 2008:
Two main requirements: 1. Implementation Inspection policies (scheduling algorithms) that will extand the current AutoSched software : Taking to account.
A Constraint Satisfaction Problem (CSP) is a combinatorial decision problem defined by a set of variables, a set of domain values for these variables,
Foundations of Constraint Processing Evaluation to BT Search 1 Foundations of Constraint Processing CSCE421/821, Spring
A Constraint Satisfaction Problem (CSP) is a combinatorial decision problem defined by a set of variables, a set of domain values for these variables,
Introduction to C++ Programming CS 117 Section 2 and KNET Sections Spring 2001 MWF 1:40-2:30.
C++ Programming: From Problem Analysis to Program Design, Third Edition Chapter 1: An Overview of Computers and Programming Languages C++ Programming:
MultiJob PanDA Pilot Oleynik Danila 28/05/2015. Overview Initial PanDA pilot concept & HPC Motivation PanDA Pilot workflow at nutshell MultiJob Pilot.
Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
1 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 Part II Designing Workflows AAAI-08 Tutorial on Computational.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
 1  Outline  stages and topics in simulation  generation of random variates.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
1 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 Part VII: Future Challenges in Computational Workflows and.
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft.
Guide to Linux Installation and Administration, 2e1 Chapter 10 Managing System Resources.
Wenjing Wu Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing BOINC workshop 2013.
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
+ Simulation Design. + Types event-advance and unit-time advance. Both these designs are event-based but utilize different ways of advancing the time.
Procedures for managing workflow components Workflow components: A workflow can usually be described using formal or informal flow diagramming techniques,
Linux & Shell Scripting Small Group Lecture 3 How to Learn to Code Workshop group/ Erin.
Problem Solving using the Science of Computing MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way.
Chapter 2 – Fundamental Simulation ConceptsSlide 1 of 46 Chapter 2 Fundamental Simulation Concepts.
IXA 1234 : C++ PROGRAMMING CHAPTER 1. PROGRAMMING LANGUAGE Programming language is a computer program that can solve certain problem / task Keyword: Computer.
C++ Programming: From Problem Analysis to Program Design, Third Edition Chapter 1: An Overview of Computers and Programming Languages.
Basic of Programming Language Skill Area Computer System Computer Program Programming Language Programmer Translators.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Using and modifying plan constraints in Constable Jim Blythe and Yolanda Gil Temple project USC Information Sciences Institute
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
Sep 13, 2006 Scientific Computing 1 Managing Scientific Computing Projects Erik Deumens QTP and HPC Center.
Programming in C++ Dale/Weems/Headington Chapter 1 Overview of Programming and Problem Solving.
Weekly Report By: Devin Trejo Week of June 21, 2015-> June 28, 2015.
Project18’s Communication Drawing Design By: Camilo A. Silva BIOinformatics Summer 2008.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Data Structure Introduction Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Monster (“Mega”) Sudoku
Detecting and Tracking Hostile Plans in the Hats World.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
CS 101 – Oct. 7 Solving simple problems: create algorithm Structure of solution –Sequence of steps (1,2,3….) –Sometimes we need to make a choice –Sometimes.
Shortcomings of Traditional Backtrack Search on Large, Tight CSPs: A Real-world Example Venkata Praveen Guddeti and Berthe Y. Choueiry The combination.
INTRODUCTION TO COMPUTER PROGRAMMING(IT-303) Basics.
1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.
C++ Programming: From Problem Analysis to Program Design, Fifth Edition Chapter 1: An Overview of Computers and Programming Languages.
VO Experiences with Open Science Grid Storage OSG Storage Forum | Wednesday September 22, 2010 (10:30am)
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
Accessing the VI-SEEM infrastructure
OPERATING SYSTEMS CS 3502 Fall 2017
Chapter 1: An Overview of Computers and Programming Languages
Looping and Random Numbers
Empirical Comparison of Preprocessing and Lookahead Techniques for Binary Constraint Satisfaction Problems Zheying Jane Yang & Berthe Y. Choueiry Constraint.
Evaluation of (Deterministic) BT Search Algorithms
Evaluation of (Deterministic) BT Search Algorithms
Overview of Workflows: Why Use Them?
Evaluation of (Deterministic) BT Search Algorithms
ICS103 Programming in C 1: Overview of Computers And Programming
Presentation transcript:

1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part VI AI Workflows AAAI-08 Tutorial on Computational Workflows for Large-Scale Artificial Intelligence Research

2 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Machine Learning Workflows

3 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Simple Examples of Workflows Available for Experimentation in Wings/Pegasus

4 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 A Workflow for a New Method Combining Ensemble Machine Learning with k-Fold Cross-Validation [Caruana 07]

5 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 More Machine Learning Workflows Beacon Warning: Finding suspicious groups in the Hats simulator Joshua Moody Center for Research on Unexpected Events USC Information Sciences Institute

6 Yolanda Gil AAAI-08 Tutorial July 13, 2008 Hats Simulator [Cohen et al 02] Hats is a society in a box. hats: {benign | terrorist | covert} organizations: groups of hats capabilities: information or item(s) beacons: ports, buildings, landmarks meetings: {malicious | benign} trades: exchange of information or item(s) Beacon Attack a malicious meeting on a beacon and the participants carry the required capabilities that match the vulnerabilities of the beacon.

7 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Workflow Inputs Hats Meeting data meeting participants ~ 500 M b - ~ 5 G Watch List known terrorists Capabilities Hats trade data giver, taker, capability ~500 Mb - ~5 G Beacon vulnerabilities

8 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Components Third party software GDA/k-groups [Kubika et al 04] BC [Neuman 00] KOJAK [Adibi et al 04] Glue components merge, filter, translate

9 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008

10 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008

11 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Running Search Algorithms on Randomly Generated CSPs Yaling Zheng and Berthe Chouery University of Nebraska at Lincoln

12 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Task A CSP is specified as: Given: –A set of variables –Their domains –Constraints that restrict allowed combinations of values for variables Find one (or more) solution, assignment of values to variables such that all constraints are satisfied Task: Run 12 backtrack-search algorithms on randomly generated CSP instances

13 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Task (cont’d) A random CSP instance is characterized by Number of variables (considered fixed, n=50) Domain size of each variable (considered fixed, a=12) Constraint density (p in [0.1, 0.2, …, 0.9], sometimes more) Constraint tightness (t in [0.1, 0.2, …, 0.9], sometimes more) We generate about 400 instances for each combination of p and t We measure various output parameters reflecting search effort. Each experiment is repeated twice: CPU time All other parameters (Nodes visited, constraint checks, etc.) (Workflow constraint) When measuring CPU time, experiments must run on equivalent processors

14 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Workflow Instance P1P1 P2P2 P A1A1 A2A2 A 12 … A1A1 A2A2 … … S 11 S 12 S 112 S 21 S 22 S 212 … SASExcel …… g1g1 g2g2 gxgx … r1r1 r2r2 rxrx … Problem instance Algorithm

15 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Computing Resources ClusterPrairieFire.unl.edu Composition128 Node Production-mode LINUX cluster Processors2 Opteron 248 (2.2GHz/64 bit) per node RAM4GB PC2700 per node ConnectionMyrinet (2Gb/s) Gigabit Ethernet Storage1 TB SCSI raid (XFS over NFS) 6 TB SATA RAID (ReiserFS over NFS) Submit jobs to PBS, currently available queues: 2 long jobs (maximum duration = 120 hours) 28 short jobs (maximum duration = 72 hours) Rest in the waiting queue

16 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 A Sample PBS File #PBS –W group_list=consystlab #PBS –N fcp0.2cpu #PBS –l nodes=1,walltime=72:00:00 #PBS –q csl_short #PBS –M #PBS –m e #PBS –j oe cd /home/consystlab/yzheng/Codes/gredcpu /util/opt/ACL/alisp -#! scripts/fc/script-fcp0.2.lisp > results- fcp0.2.txt

17 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 A Sample Script File ;; Repeated for every p value (load “make.lisp”) (setf *variable-ordering-heuristic* ‘select-singleton-or-var-degd) (setf *global-gc-behavior* :auto) (setf *timelimit* ) (dolist (tightness ‘( )) (solve-randomly-generated-csp tightness ) Algorithm Parameters of Problem Instance Instance Index

18 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 What SAS and Excel Do SAS Analyzes the results using Friedman’s test (on ranked data) to compare 12 algorithms at each tightness for each given density Excel Draws average/median values for tightness in [ ] for each density

19 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Experiences Ideal situation Each job reads one CSP instance and solves it and produces the result Each job finishes in 10 msec, up to 1 hour x 12 x 2 jobs, dream! Current situation Each job reads a group of CSP instances and solves them and produces the results Write script for each job Takes about 2 weeks to accomplish this

20 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Chores Periodically (manually) I check the status of each job (when error, rewrite the script & re-submit) After understanding the priority policy on the queues, I re-organized the jobs to avoid being assigned a low priority After improving the code, I had to run again all the experiments In the future, when we have new algorithms or new problem instances, I have to repeat the procedure We need to automate the whole process Be able to check on progress of experiments Be able to easily rerun some/parts of the experiments

21 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Tera-Scale Machine Translation Jens Voeckler and the Machine Translation group at USC Information Sciences Institute

22 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008

23 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008

24 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008

25 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008

26 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008

27 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Rule Pruning Workflow in Wings

28 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008