Scaled Agility & Distributed Collaboration Improvising a Grid for Particle Physics Dr Yingqin Zheng Dr Will Venters Dr Tony Cornford This research was undertaken as part of Pegasus EPSRC: Grant No: EP/D049954/1
The Scale Currently constructing the worlds most powerful particle accelerator… the Large Hadron Collider (LHC) Searching for Higgs Boson – “1 person in 1000 worlds, or a needle in 20 million haystacks” million gigabytes per year. 100,000 CPUs. 40PB disk, 40PB tape. “Worlds biggest Grid“ CD stack with 1 year LHC data (~ 20 km)
Grids: Technology Emerging platform for coordinated resource sharing and problem solving on a global scale for data- intensive and compute-intensive applications (Foster, 2001) As Internet protocols enable the sharing and integration of information on the Web, so Grid protocols aim to allow the integration of … sensors, applications, data-storage, computer processors and most other IT resources (Wladawsky-Berger, 2004) Centred around standard protocols and middleware. 1: No central control. 2: Standard open protocols. 3: Non-trivial level of service. Experiment layer Application Middleware Grid Middleware Facilities and Fabrics
What Defines a (computing) Grid ?
The LHC Computing Grid Building the LHC Computing Grid (LCG): Highly distributed, complex and poorly defined systems development task. Cutting edge hardware and software used. New software standards being negotiated. Middleware and support software being developed in a range of languages. Grid must be distributed and proceed at different paces because of funding. Particle physics has a long tradition of such large scale global collaborations (Traweek 1988).
GridPP A Distributed Collaboration Collaboration of 230 people in 19 UK universities, RAL and CERN. Decisions are made democratically and consensually, and implemented by influence and persuasion. Network rather than hierarchy Virtual, federated, overlapping and inter- connected. Virtual meetings, wikis, blogs, mailinglists
How Often Do You Travel For Work?
Frequency and effectiveness of communication methods Frequency:1 - Never2 - Rarely 3 - Occasionally 4 - Regularly 5 - Very often Effectiveness: 1 - Very ineffective 2 - Ineffective 3 - Average 4 - Effective 5 - Very effective
Distributed Management It is hard to work in GridPP because it is so distributed? The collaboration works because there is a high level of trust in the community In GridPP, valuable knowledge is sometimes lost due to personnel turnover I don't pay much attention to who does what in GridPP apart from those I directly work with Developing duplicate solutions are a waste of resources Competition among parellel technical solutions is necessary in order to find the best one
How do you know what needs to be done in your job? Frequency:1 - Never2 - Rarely 3 - Occasionally 4 - Regularly 5 - Very often Comment: using meetings to update others
Bricolage Careful planning is necessary to develop effective solutions I enjoy a high level of autonomy at work I wish I had more authority and control over people and resources I have a pretty good idea of what's going on in GridPP at a general level The limited control over issues like available resources, hardware, and technical solutions is a big challenge for GridPP Practical solutions are more important than rigorous methodologies
Essential skills for individual roles
Organizational Improvisation Metaphors Jazz (Weick 1992, 1999; Barrett 1998, Hatch 1999) Improvisational Theatre (Crossan, 1998) Cunha (1999): “ the conception of action as it unfolds, by an organisation and/or its members drawing on available material, cognitive, affective and social resources” Convergence in time of conception and execution Bricolage – finding solutions from available rather than optimal resources
Analytical Framework Improvisation- Paradox Related theoretical constructs Sources (organizational improvisation) Paradoxes of Learning (Lewis 2000) Pragmatic Creativityenvironmental turbulence task uncertainty unplanned-for occurrences task complexity drop your tools visions (Moorman and Miner, 1998, Ciborra, 1996); (Dahlbom and Mathiassen, 1993) (Miner et al., 2001) (Hutchins, 1995, Weick and Roberts, 1993) (Weick, 1993a) (Hatch, 1999, Mintzberg and McHugh, 1985, Hutchins, 1991, Weick, 1993b) Retrospective Orderretrospective sense-making ex post interpretation transient constructs & Persistent structure (Weick, 1993b) (Lanzara, 1999) Paradoxes of Organizing (Lewis 2000) Oriented Driftingconvergence of planning and execution mixing the pre-composed and the spontaneous magnetic fields minimal structure plan to improvise artful planning (Moorman and Miner, 1998) (Weick, 1998) (Weick, 1993a) (Cunha et al., 1999) (Miner et al., 2001) (Baskerville, 2006) Managed Serendipityorganized anarchy collateral structure experimental culture the aesthetic of imperfection a sense of urgency. (Cohen et al., 1972) (Cunha et al., 1999) (Weick, 1999) (Crossan, 1998, Hutchins, 1991, Mirvis, 1998) Paradoxes of Belonging (Lewis 2000) Collective Individuality (Mirvis, 1998) facilitative leadership trust and kinship emotional communication hanging out fluid communication. (Crossan, 1998) (Crossan, 1998, Weick, 1993a) (Hatch, 1999) (Barrett, 1998) (Orlikowski, 1996, Miner et al., 2001) Anxious Confidence (Mirvis, 1998) Individual skills and creativity formative context organizational memory moods (Hutchins, 1991, Moorman and Miner, 1998, Orlikowski, 1996) (Ciborra and Lanzara, 1994) (Moorman and Miner, 1998) (Ciborra, 2002)
Pragmatic Creativity GridPP faces many unplanned for occurrences and environmental turbulence in funding, human resources, external and internal technological changes, hardware and software configurations, user requirements from the experiments, computer market conditions, and other institutional and political factors. The project is “committed to something that it isn’t quite funded” (PMB member). “… we have somehow learned how to organize things, at project management level and how to get things, to take the pragmatic view and to, faced with a problem, how to get from here to the solution... not just in GridPP but in building hardware and building detectors... There’s this background in problem solving and project management and the sort of pragmatic approach”.
Retrospective Ordering A significant part of GridPP’s activity, achieved by various means both formal and less formal, lies in monitoring, accounting for, and making sense of the behavior and performance of the system so far. With a range of different service challenges undertaken regularly statements such as “we have to understand what is causing this phenomenon” or “find out what is behind the data” are commonly heard during meetings. There is then Knorr-Cetina’s (1999) “humming” of collaboration “with itself, about itself”, which maintains a constant collective reflexivity, exemplifying Giddens (1984) “monitored character of the ongoing flow of social life” and which makes retrospective sense-making an inherent and natural component in their process of system development.
Oriented Drifting “I think you… need to keep enough of an idea of the general direction which represents progress, and the very specific goals which advance you… You need your head in the clouds to see the big picture, but you very much need your feet on the ground because you have to put one foot in front of the other, and day to day we keep putting one foot in front of the other … and different people, depending on their role in the project are more oriented towards the ultimate goal or more oriented towards the little concrete footsteps that need to be taken...” “We wanted to establish the fact that we had the right to change our deliverables. So we set up this project map and we set up the formality of change forms. So this was to formalise our freedom to change the project and at the next Oversight Committee we managed to get this sort of structure through to them that yes, we had a set of milestones but you know, we had a mechanism to change them because we have to be responsive.” (PMB member).
Managed Serendipity “… physicists are happier with an ad hoc solution just to get the job done and push them through”. A physicist also highlighted this saying that while computer scientists “will put together the most elegant thing in the universe, but it will never work… Physicists will come up with the most hacked solution in the world… but it will work”. Management in GridPP does not rely on vertical lines of command, and while there is an extensive structure of management boards, committees, and technical groups, they serve more as communication channels than authority hierarchies. Managerial roles in the collaboration serve most of the time as representatives, spokesperson, or coordinating facilitators. Different solutions often compete with each other within the collaboration for a while until one of them wins by forming more alliances or others die in a natural course e.g. due to technical failures, low up-take, lack of funding or other circumstances. The technical systems then emerge from “contests of unfolding” (Knorr-Cetina 1999)
Collective Individuality “This environment is based on, if you want, charismatic leadership and people doing things relatively independent but also having the freedom to do them, and not having to report every two minutes on what they are doing”. “Everyone trusts each other to be doing the best they can… That fundamental trust drives our particle physics group.” “You have to trust that people will step up… and do the dirty work as well as doing the glamorous work”. “Going to the pub” together when they meet, for example, is one aspect of it. “It fosters a bond between people … many aspects of working in this project are frustrating because it's so large. And so if you can go out together… you can identify the problems and let out steam about them…”
Anxious Confidence The project is constantly fire-fighting, discovering problems, managing crises, and negotiating solutions. Yet almost everybody in the collaboration that we have asked have a firm belief that the Grid will work; it may not work perfectly, but it will work. There is a high level of confidence despite the sense of urgency and chaos on the surface. Source of confidence Individual competence HEP history of front-edged computing HEP tradition of distributed collaboration “Aesthetics of imperfection”
Scaled Agility Scaled agility as an organization capability: distributed, innovative, flexible, under constraint in time and resources, and more or less decentralized. GridPP as a case of Scaled Agility which on the surface seems chaotic, haphazard, unplanned, and full of tension. Underlined by leadership, planning, strong commitment, emergent order, and the collective dynamic capability The framework of paradoxes of improvisation allows us to examine system development practices in the project regarding aspects that are often pushed to the background in discussions of system development methodologies, such as environmental conditions, individual skills, organizational structure, communication pattern, interpersonal relationship… Most studies of improvisation have stated that it is easier performed in a small group, such as a jazz band. Our case shows that it is possible in a large group, when the “ambience” is right.