The GitLab Case Case #14 Angelos Karageorgos, Camila Moreira, Viktoria Chalkidou
Background Open source coding platform founded in 2011 •1800 contributors worldwide • 39 countries • customers: 100.000 organizations and millions of users Software development and operations lifecycle include: • planning • coding • testing • getting feedback from the community • releasing • configuring • monitoring GitLab Community Edition: free for the first year designed for users GitLab Enterprise Edition: premium paid version for B2B customers
CBIM Value proposition Efficient coding platform free for users and premium solution for B2B Relationship Support their community Transparent and open relationship Position Leadership in open source coding market Expression Informal Contemporary Transparent Brand Core Everyone can contribute CREDIT: Collaboration, Results, Efficiency, Diversity, Iteration, and Transparency Personality Friendly Open Honest Explain the elements of the matrix focusing on Core, Culture, Mission&Vision and Relationships which will be used to relate to the reputation elements later. Mission: change all creative work from read-only to read-write so that everyone can contribute Vision: allow everyone to collaborate on all digital content so people can cooperate effectively and achieve better results, faster Culture Collaborative Supportive community Competences Efficiency Improve coding projects speed ©CBIM Urde 2014
Incident 23:00 UTC It’s late and he is tired. He is about to sign off. He accidentally executes the command on a primary database! 23:35 UTC Backup fails. 17:20 UTC A software engineer is trying to fix a problem that block users from posting comments on the platform. 23:25 UTC He has an idea on how to solve the problem. To remove a directory from a secondary database. 23:30 UTC Bang! 300GB of client’s data on GitLab.com re deleted! 12:20 UTC GitLab.com has to be taken down to try to fix the problem. Storytelling: On January 31st, 2017 at approximately 23:00 a software engineer at GitLab was frustrated as he was trying to fix a database overload on GitLab.com since 17:20 without success. Due to this problem, many users were not able to post comments on issues and merge requests on the platform. It was late, and he was very tired, so he mentions to his colleagues on Slack that he was signing off. But he doesn’t and has an idea to fix the problem by deleting a database. He executes the command on a primary database and bang! He accidently deletes 300GB of client’s data on GitLab.com ! He and other engineers try to locate backups and they can not find it. They realized they would have to take the website down while trying to fix this huge mistake.
Crisis
would this affect their reputation among their stakeholders? Question How should GitLab communicate this incident to their community of users and would this affect their reputation among their stakeholders? This questions is to guide the presentation of the scenarios. Do not start discussion here.
Scenario 1 Manage the crisis silently and internally hide the outage and communicate that the website is down for maintenance try to fix the problem as fast as possible take the junior engineer out of the case and hand it over to senior developers
Scenario 2 Communicating product failure take the website down stating a database problem that you are trying to restore communicate the real problem in more details when the issue is solved without revealing that the cause of the incident not expose the employee and the vulnerability of the system
Scenario 3 Go full transparent share the situation in real time with the community asking them for suggestions on how to fix this problem keep the junior engineer involved in the case
Scenarios Taking the role of GitLab’s managerial team, how should GitLab communicate this incident to their community of users and would this affect their reputation among their stakeholders? Scenario 1 Scenario 2 Scenario 3 Manage the crisis silently and internally Communicating product failure Go full transparent On the board list positive impact and risks of each scenario to guide discussion. Assisting questions: Should GitLab fire the junior engineer who deleted the database? What would be the consequences of each alternative of action for their brand reputation? How would the incident be perceived by their stakeholders? Would we share the outage with the community; real time, later time or never? Should they have different approaches to communicate the incident with each stakeholder group?
Action
Management Decisions Step 1: Act fast and now Step 2: Explain all the details and keep the public updated Step 3: Monitor your brand mentions Step 4: Transparent all the way
Step 1 Act fast and now The company decided to bring GitLab.com down and inform their followers on social media such as Twitter about it. A hashtag was used for this purpose with the name #HugOps. They also informed them that they would be performing emergency database maintenance.
Step 2 Explain all the details and keep the public updated Created a Google Docs explaining in details what they were doing to fix the problem YouTube livestream with their engineering efforts Constantly updating their community on social media
Step 3 Monitor your brand mentions Online monitoring all mentions of GitLab brand Immediate response to every comment
Step 4 Transparent all the way 2 days later GitLab published a detailed explanation of the problem CEO of GitLab apologized personally about the lost data incident of 300 GB which affected 5000 projects
Community Reaction Explain the graph with the engagement when they made the first post on Twitter - Feb 1st and when they posted the Postmortem on their blog on Feb 10th.
Core Values GitLab core values guided their actions and had a positive impact on their reputation C-ollaboration R-esults E-fficiency D-iversity I-teration T-ransparency Reflecting on GitLab’s actions and how they were committed to their core values during the entire crisis. Collaboration: Ask for help from their community which is formed by software developers Results: monitoring live the process of problem fixing Efficiency: full effort on restoring the database as soon as possible Diversity: Keep the junior engineer responsible involved in the process Iteration: Learn from mistakes and implement new procedures to avoid these type of mistake / Control access to primary databases Transparency: Communicate the problem in details to their community/Apologize in public
Willingness to support Reputation Elements Trustworthiness Reflected through its community collaborative support and the company’s effective decisions Relevance Differentiation Value proposition Relationship Position Expression Brand Core Personality Recognizability Credibility Willingness to support: Community’s actions showed their willingness to support the company during the crisis, by committing and providing their valuable spare time to retrieve the lost database. Responsibility: Working as a unified team, with one common goal, co-creators and developers showed the real importance of GitLab’s core values on which their community is built. Therefore, proving their commitment and accountability of both the community’s as well as the company’s side enhancing the element of responsibility of GitLab’s reputation. Trustworthiness: GitLab channeling through its community collaborative support and the company’s effective decisions managed to reflect and enhance their trustworthiness to all their stakeholders. Mission & Vision Culture Competences Willingness to support The community committed and provided their valuable spare time to retrieve the lost database Responsibility Co-creators and developers showed the real importance of GitLab’s core values proving their commitment and accountability Performance ©CBIMR Urde & Greyser 2016
THANK YOU! Questions?