Download presentation
Presentation is loading. Please wait.
Published byἙρμογένης Αλεξίου Modified over 6 years ago
1
Follow-the-moon optimization with Condor-enabled genetic algorithms
Condor Week Presentation April, 2006 Follow-the-moon optimization with Condor-enabled genetic algorithms Brooklin J. Gore Senior Fellow, Advanced Computing 2004 Micron Technology, Inc. All rights reserved. Information is subject to change without notice. 1 1
2
Condor Week 2006 Agenda Introductions A GA Optimization Problem
A ‘perfect’ Grid app Follow-the-moon computing Q/A 12/3/2018
3
Condor Week 2006 Overview of Micron’s Grid
11k+ processors in 11 “pools” Linux, Solaris, Windows ~65th Top500 Rank 5.7 TeraFLOPS Built in-house Open Source Grid Centralized governance Distributed management 20+ applications Self developed Micron’s Global Grid 12/3/2018
4
Condor Week 2006 Example Grid Application at Micron
Probe Card Optimization Given: A wafer map A probe card Max DUT1s on card 1 DUT = Device Under Test Find: A probe card configuration Probe vector That minimizes: DUTs on card Touchdowns Reprobed die Maximum die contacts 12/3/2018
5
Condor Week 2006 Example Grid Application at Micron
Probe Card Optimization is a HARD Problem Given: A wafer W of size Wx,Wy A card C of size Cx, Cy and V probe vectors C C! Configurations = Σ = 2C k=0 k!(C-k)! W! Vectors = ~ WV V!(W-V)! Total Complexity = 2CWV Search Space(100,500,5) = 2100*5005 ~ 4x1043 So, maybe a Genetic Algorithm would be good. 12/3/2018
6
Condor Week 2006 Follow-the-Moon Computing
For a single execution of the GA 1-5 ‘tries’ to find optimal solution (minimize fitness function) Shoot for minute run time (thorough yet responsive) Only output ‘good’ solutions (to avoid clutter) with ‘fitness’ filename Since GAs are probabilistic with variable run-times: The more ‘copies’ we run the better and faster we explore the solution space So, let’s run a bunch ( ) overnight 3 tries/job * 600 jobs * 2/hour * 12 hours ~=> 40,000 tries So, submit 600 jobs Leave in queue (run over and over), remove after 12 hours Good solutions just ‘show up’ in submit directory -- view ‘Top 10’ 12/3/2018
7
Condor Week 2006 Follow-the-Moon Computing
Let’s take a look at an actual run… 12/3/2018
8
Condor Week 2006 Example Grid Application at Micron
Probe Card Optimization with CGA Genetic Algorithm m49a: 29x14 wafer map 23x14 probe card 126 DUT1s on card 2 Re-probes 3 Touchdowns 12/3/2018
9
Condor Week 2006 Follow-the-Moon Computing
‘Perfect’ Grid app: Low data in/out Compute bound So: Direct jobs to sites Where the workers aren’t 12/3/2018
10
Condor Week 2006 Follow-the-Moon Computing
STARTD_CRON_JOBS = $(STARTD_CRON_JOBS) flockmgr:MU_:$(MODULES)/MU_FlockMgr:30m The STARTD_CRON_JOBS above run on ALL systems, but existence of certain files trigger specific behavior, like dynamic flocking below: -rw-r--r condor condor Jul bgore2-lnx.ClassAds -rw-r--r condor condor Dec 9 14:59 bgore2-lnx.Flocking -rw-r--r condor condor Feb 21 07:42 bgore2-lnx.local 12/3/2018
11
Condor Week 2006 Follow-the-Moon Computing
# Flocking schedule for bgore2-lnx # 12/09/2005/BJGore # # First column is seconds after midnight. Seconds after midnight key: # Midnight: am: am: am: am: am: 18000 # am: am: am: am: am: am: 39600 # Noon: pm: pm: pm: pm: pm: 61200 # pm: pm: pm: pm: pm: pm: 82800 # Subsequent columns are comma-separated list of pools to flock to # This is a good flocking schedule for Boise-based submitters. # We flock to pools where it's between 6pm and 6am. # Midnight 0 condor-mava.mava, condor-mndc, condor-backend, condor-is, condor-rnd, condor-lehi.lehi condor-mava.mava, condor-mndc, condor-backend, condor-is, condor-rnd, condor-lehi.lehi, condor-nijp.nijp : (etc) 12/3/2018
12
Condor Week 2006 Follow-the-Moon Computing
if ( $current_flist ne $new_flist ) { # Need to update FLOCK_TO $cmd_output= `$ccv_cmd -rset 'FLOCK_TO=$new_flist'`; if ( $cmd_output =~ m/Successfully/ ) { $cmd_output= `$cr_cmd`; # and reconfig the schedd to take the change print "$ca_prefix = \"FLOCK_TO updated\"\n"; } else { print "$ca_prefix = \"FLOCK_TO update failed\"\n"; print "$ca_prefix = \"FLOCK_TO is current\"\n"; 12/3/2018
13
Condor Week 2006 Example Grid Application at Micron
Solving Probe Card Optimization What about the ‘GA Knobs’? Parameter Reasonable Values Population, P 200, 400, 800 Tournament Size, Ts 2, 4, 8 Probability of Crossover, Pc 0.00, 0.33, 0.66, 1.00 Probability of Mutation, Pm Parameter Sweep: 133 Unique combinations (Pc=Pm=0 degenerate case) Did 40, six-hour runs for each combination 31,920 hours – 3.6 years on one CPU Ran in 7 days on Grid! 12/3/2018
14
Condor Week 2006 Example Grid Application at Micron
Solving Probe Card Optimization Derived Defaults: P = 400 Ts = 4 Pc = 0.66 Pm = 1.00 * 12/3/2018
15
Thank you! Questions? Micron and the Micron logo are trademarks and/or service marks of Micron Technology, Inc. All other trademarks are the property of their respective owners.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.