Download presentation
Presentation is loading. Please wait.
Published byJustin Pope Modified over 9 years ago
1
Large, Fast, and Out of Control: Tuning Condor for Film Production Jason A. Stowe Software Engineer Lead - Condor CORE Feature Animation
2
Submitter Session Manager FAMDB Condor View CORE User Facing Back End CORE's Farm & Middleware 1000 2.8 GHz. Processors Linux 4GB RAM 70-100 Terabytes Several Filers 50 Million Renders so far (Vanilla Universe) Condor_startd starter Condor_render Condor_schedd 64 Mac Procs 4 Managing Machines
3
Goals and Software Goals ●High Throughput & Efficiency ●Easy Condor Submission and Integration Priority Management – Key to Throughput
4
Initial Configuration Software/Policies ●User Priority ●Behavior Flags - STARTD Issues ●NFS issues ●Out of Order Execution ●Priority Management 320 Procs 1 Main Filer RenderMan Schedd Server Workstation Schedds (Sched Everything Else) Middleware CentralMgr
5
How CG Productions Work Traditionally, Movie scripts = Group of Sequences Movie's Sequences ~ Play's Scenes Sequence = Group of Shots Assets = Sets/Characters/Props/... Prioritize work-units instead of users? Design Model Texture Surfacing Assets Design Layout Animation Lighting Composite Shots Two Pipelines
6
Accounting Groups: Take 1 Software/Policies ●Contracted Wisconsin: Accounting Groups(AG) ●Job =unique AG ●Added Filers, Fix drivers Issues ●Accountant Overload ●Slow Finishing... 360 Procs Many Filers General Schedd Server Workstation Schedds (Sched Certain Jobs) Middleware Central Mgr 16 Mac Procs
7
Accounting Groups: Take 1 Every job got some resources, but not enough to finish fast for Production. Moved quickly to Take 2...
8
Accounting Groups: Take 2 Software/Policies ●Shots Get Unique AG ●Unify Schedds to fix out of order cases Issues ●Wanted: Farm % Priority ●Classic Schedd Overload: “Claimed Idle”s 360 Procs Many Filers General Schedd Server Fewer Workstation Schedds (Sched Certain Jobs) Middleware Central Mgr 32 Mac Procs
9
Accounting Groups: Final? Software/Policies ●“Priority User” - p1 p2 p3 ●Multiple Server & Schedds ●ASAP & Department Flags Issues ●Department “Pools” ●Preemption = Bad 500 Procs Many Filers 3 Schedd Servers Middleware Central Mgr 32 Mac Procs
10
Accounting Groups: Final? Sharing Power is a difficult task for anyone, especially users with deadlines. Need a Quality of Service guarantee: resources will always be available without preemptive department pools...
11
Group Quotas save the day 1000 Procs Many Filers 3 Schedd Servers Middleware Central Mgr 64 Mac Procs Software/Policies ●Department Groups g_lfx, g_mdl, g_chr, etc. ●Quality Of Service ●Nighttime Priority Issues ●Long negotiation Cycles Total Cycle: 6 minutes Server loads >6
12
Middle ware Performance Optimization 2 Schedd Servers Central Mgr 64 Mac Procs Goal: Speed Negotiator ●Remove Many Groups ●Significant Attributes (SIGNIFICANT_ATTRIBUTES) ●Schedd Submit Algorithm ●Separate Middleware & Central Manager Servers ●Negotiator Cycle 20 sec delay => 3 sec (NEGOTIATOR_CYCLE_DELAY) 1000 Procs Many Filers
13
Optimization Results Performance Before => After: ● Removed Groups: 6 => 5.5 min ● Significant Attributes: 5.5 => 3 min ● Schedd Algorithm: 3 => 1.5min ● Separate Servers:1.5 => 0.6min ● Cycle delay:0.6 => 0.33 min ● Server Loads:<1 Middleware <2 Central Manager
14
Lessons Learned ● Remove pre-emption where possible ● Simplify Startd/Negotiator (Control) policies: ● Make Consistent/remove special cases ● Understandable farm behavior ● Keep Server Functions Simple ● Use Accounting Groups to guarantee relative percentage allocation of resources ● Use Group Quotas instead of machine-specific RANK policies for better throughput
15
Thank you Condor Team University of Wisconsin CORE Any Questions? stowe@corefa.com
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.