Download presentation
Presentation is loading. Please wait.
Published byMarsha Butler Modified over 9 years ago
1
Recent Advances in Software Engineering in Microsoft Research Judith Bishop Microsoft Research jbishop@microsoft.com University of Nanjing, 28 May 2015
2
Statistics Trends WER, CRANE Testing IntelliTest Code Hunt Z3 And Friends Prevention Education Hardware Maintenance
3
Software runs on hardware – lots of it Worldwide PC units for personal devices increased by 5% year over year in 1Q14 with sales of basic and utility tablets in emerging markets, plus smartphones driving total device market growth during the quarter. Gartner June 2014
4
Connected Devices and The Cloud
5
Most recent technology shift
6
Desktop operating system market share Source: www.netmarketshare.com
7
Mobile/tablet market share Source: www.netmarketshare.com
8
Market share of operating systems in the United States from January 2012 to September 2014 Not Windows
9
Statistics Trends WER, CRANE Testing IntelliTest Code Hunt Z3 And Friends Prevention Education Hardware Maintenance
11
The Challenge for Microsoft Microsoft ships software to 1 billion users around the world We want to fix bugs regardless of source application or OS software, hardware, or malware prioritize bugs that affect the most users generalize the solution to be used by any programmer get the solutions out to users most efficiently try to prevent bugs in the first place 11
12
Debugging in the Large with WER… 12 5 5 17 23,450,649 Minidump
13
The huge data based can be mined to prioritize work Fix bugs from most (not loudest) users Correlate failures to co-located components Show when a collection of unrelated crashes all contain the same culprit (e.g. a device driver) Proven itself “in the wild” Found and fixed 5000 bugs in beta releases of Windows after programmers had found 100 000 with static analysis and model checking tools. WER’s properties Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg, Gabriel Aul, Vince Orgovan, Greg Nichols, David Grant, Gretchen Loihle, and Galen Hunt, Debugging in the (Very) Large: Ten Years of Implementation and Experience, in SOSP '09, Big Sky, MT, October 2009Debugging in the (Very) Large: Ten Years of Implementation and Experience
14
Bucketing Mostly Works One bug can hit multiple buckets up to 40% of error reports duplicate buckets must be hand triaged Multiple bugs can hit one bucket up to 4% of error reports harder to isolate each bug But if bucketing is wrong 44% of the time? Solution: scale is our friend With billions of error reports, we can throw away a few million 14
15
Top 20 Buckets for MS Word 2010 15 3-week internal deployment to 9,000 users. 3-week internal deployment to 9,000 users. Just 20 buckets account for 50% of all errors Fixing a small # of bugs will help many users Bucket #: CDF
16
Hardware: Processor Bug 16 Day #: WER helped fix hardware error Manufacturer could have caught this earlier w/ WER
17
WER works because … … bucketing mostly works Windows Error Reporting (WER) is the first post-mortem reporting system with automatic diagnosis the largest client-server system in the world (by installs) helped 700 companies fix 1000s of bugs and billions of errors fundamentally changed software development at Microsoft http://winqual.microsoft.com 17
18
CRANE: Risk Prediction and Change Risk Analysis Goal: to improve hotfix quality and response time
19
CRANE adoption in Windows Retrospective evaluation of CRANE on Windows Categorization of fixes that failed in the field Recommendation: Make metrics simple, empirical and insightful, project and context specific, non-redundant and actionable. Jacek Czerwonka, Rajiv Das, Nachiappan Nagappan, Alex Tarvo, Alex Teterev: CRANE: Failure Prediction, Change Analysis and Test Prioritization in Practice - Experiences from Windows. ICST 2011: 357-366Rajiv DasNachiappan NagappanAlex TarvoAlex TeterevICST 2011
20
I MPROVING T ESTING P ROCESSES Release cycles impact verification process Testing becomes bottleneck for development. How much testing is enough? How reliable and effective are tests? When should we run a test? Kim Herzig £$, Michaela Greiler $, Jacek Czerwonka $, Brendan Murphy £ The Art of Testing Less without Sacrificing Code Quality, ICSE 2015.
21
Engineering Process Engineers desktopIntegration process
22
System and Integration Testing Quality gates Developers have to pass quality gates (no control over test selection) Checking system constraints: e.g. compatibility or performance Failures not isolated involve human inspections causes development freeze for corresponding branch
23
System and Integration Testing Software testing is expensive 10k+ gates executed, 1M+ test cases Different branches, architectures, languages, … Aims to find code issues as early as possible Slows down product development
24
Research Objective Only run effective and reliable tests Not every tests performs equally well, depends on code base Reduce execution frequency of tests that cause false test alarms (failures due to test and infrastructure issues) Do not sacrifice code quality Run every test at least once on every code change Eventually find all code defects, taking risk of finding defects later ok. Running less tests increases code velocity We cannot run all tests on all code changes anymore. Identify tests that are more likely to find defects (not coverage).
25
H ISTORIC T EST F AILURE P ROBABILITIES Analyzing past test runs: failure probabilities Execution history These probabilities depend on the execution context!
26
Does it Pay Off? Less test executions reduce cost Taking risk increases cost ~11 month period > 30 million test execs multiple branches ~3 month period > 1.2 million test execs single branch ~12 month period > 6.5 million test execs multiple branches
27
Across All Products Results vary Branching structure Runtime of tests We save cost on all products Fine-tuning possible, better results but not general
28
D YNAMIC & S ELF -A DAPTIVE Probabilities are dynamic (change over time) Skipping tests influences risk factors (of higher level branches) Tests re-enabled when code quality drops Feedback-loop between decision points Training period automatically enable tests again
29
Impact on Development Process Secondary Improvements Machine Setup We may lower the number of machines allocated to testing process Developer satisfaction Removing false test failures increases confidence in testing process Development speed Impact on development speed hard to estimate through simulation Product teams invest as they believe that removing tests: Increases code velocity (at least lower bound) Avoids additional changes due to merge conflicts Reduces the number of required integration branches as their main purpose is to test product “We used the data your team has provided to cut a bunch of bad content and are running a much leaner BVT system […] we’re panning out to scale about 4x and run in well under 2 hours” (Jason Means, Windows BVT PM)
30
Statistics Trends WER, CRANE Testing IntelliTest Code Hunt Z3 And Friends Prevention Education Hardware Maintenance
31
Prevention
32
Continual abstraction
33
Automated Theorem Prover Won 19/21 divisions in SMT 2011 Competition The most influential tool paper in the first 20 years of TACAS (2014) 33 Z3 reasons over a combination of logical theories Boolean Algebra Bit Vectors Linear Arithmetic Floating Point First-order Axioms Non-linear, Reals Algebraic Data Types Sets/Maps/… 33 Leonardo de Moura and Nikolaj Bjørner. Satisfiability modulo theories: introduction and applications. Commun. ACM, 54(9):69-77, 2011.
34
SAGE: Binary File Fuzzing Symbolic execution of x86 traces to generate new input files Z3 theories: bit vectors and arrays Finds assertion violations using stratified inlining of procedures and calls to Z3 Z3 theories: arrays, linear arithmetic, bit vectors, un-interpreted functions Automated Test Generation and Safety/Termination Checking Random + Regression All OthersSAGE Fuzzing bugs found in Win7 (over 100s of file parsers): 34 Corral: Whole Program analysis As of Windows Threshold, Corral is the program analysis engine for SDV (Static Driver Verifier)
35
Problem: 1000s of devices Low level access control lists for different policies Updates to Edge ACL can break policies Complexity is “inhumane” Validating Network ACLs in the Datacenter 35
36
Education
37
Statistics Trends WER, CRANE Testing IntelliTest Code Hunt Z3 And Friends Prevention Education Hardware Maintenance
38
Available in Visual Studio since 2010 (as Pex and Smart Unit Tests) IntelliTest in Visual Studio 2015 Nikolai Tillmann, Jonathan de Halleux, Tao Xie: Transferring an automated test generation tool to practice: from pex to fakes and code digger. ASE 2014: 385-396Jonathan de HalleuxTao XieASE 2014
39
Working and learning for fun Enjoyment adds to long term retention on a task Discovery is a powerful driver, contrasting with direct instructions Gaming joins these two, and is hugely popular Can we add these elements to coding? Code Hunt can! www.codehunt.com
40
Code Hunt Is a serious programming game Works in C# and Java (Python coming) Appeals to coders wishing to hone their programming skills And also to students learning to code Code Hunt has had over 300,000 users since launching in March 2014 with around 1,000 users a day Stickiness (loyalty) is very high
45
Gameplay 1.User writes code in browser 2.Cloud analyzes code – test cases show differences As long as there are differences: User must adapt code, repeat When they are no more differences: User wins level! secret code test cases
46
void CoverMe(int[] a) { if (a == null) return; if (a.Length > 0) if (a[0] == 1234567890) throw new Exception("bug"); } a.Length>0 a[0]==123… T F T F F a==null T Constraints to solve a!=null a!=null && a.Length>0 a!=null && a.Length>0 && a[0]==123456890 Input null {} {0} {123…} Execute&Monitor Solve Choose next path Observed constraints a==null a!=null && !(a.Length>0) a==null && a.Length>0 && a[0]!=1234567890 a==null && a.Length>0 && a[0]==1234567890 Done: There is no path left. Dynamic Symbolic Execution
47
Code Hunt - the APCS (default) Zone Opened in March 2014 129 problems covering the Advanced Placement Computer Science course By August 2014, over 45,000 users started.
48
Effect of difficulty on drop off in sectors 1- 3 Yellow – Division Blue – Operators Green - Sectors
49
Aug 2014 and Feb 2015 PuzzleLevelAugFeb-A Compute -X1.11722 Compute 4 / X1.61821 Compute X-Y1.71822 Compute X/Y1.113238 Compute X%3+11.131518 Compute 10%X1.141216 Construct a list of numbers 0..N-12.13748 Construct a list of multiples of N2.21923 Compute x^y3.11118 Compute X! the factorial of X3.21619 Compute sum of i*(i+1)/23.51722
50
Towards a Course Experience
51
Total Try Count Average Try Count Max Try Count Total Solved Users 1337436313061581 Public Data release in open source For ImCupSept 257 users x 24 puzzles x approx. 10 tries = about 13,000 programs For experimentation on how people program and reach solutions Github.com/microsoft/code-hunt
52
Upcoming events PLOOC 2015 PLOOC 2015 at PLDI 2015, June 14 2015, Portland, OR, USA PLDI 2015 CHESE 2015CHESE 2015 at ISSTA 2015, July 14, 2015, Baltimore, MD, USAISSTA 2015 Worldwide intern and summer school contests Public Code Hunt Contests are over for the summer Special ICSE attendees Contest. Register at aka.ms/ICSE2015 Code Hunt Workshop February 2015
53
Summary: Code Hunt: A Game for Coding 1.Powerful and versatile platform for coding as a game 2.Unique in working from unit tests not specifications 3.Contest experience fun and robust 4.Large contest numbers with public data sets from cloud data Enables testing of hypotheses and making conclusions about how players are mastering coding, and what holds them up 5.Has potential to be a teaching platform collaborators needed
54
Total Try Count Average Try Count Max Try Count Total Solved Users 1337436313061581 Public Data release in open source For ImCupSept 257 users x 24 puzzles x approx. 10 tries = about 13,000 programs For experimentation on how people program and reach solutions Github.com/microsoft/code-hunt
55
Upcoming events PLOOC 2015 PLOOC 2015 at PLDI 2015, June 14 2015, Portland, OR, USA PLDI 2015 CHESE 2015CHESE 2015 at ISSTA 2015, July 14, 2015, Baltimore, MD, USAISSTA 2015 Worldwide intern and summer school contests Public Code Hunt Contests are over for the summer Special ICSE attendees Contest. Register at aka.ms/ICSE2015 Code Hunt Workshop February 2015
56
Summary: Code Hunt: A Game for Coding 1.Powerful and versatile platform for coding as a game 2.Unique in working from unit tests not specifications 3.Contest experience fun and robust 4.Large contest numbers with public data sets from cloud data Enables testing of hypotheses and making conclusions about how players are mastering coding, and what holds them up 5.Has potential to be a teaching platform collaborators needed
57
Websites Game Project Community Data Release Blogs Office Mix www.codehunt.com research.microsoft.com/codehunt research.microsoft.com/codehuntcommunity github.com/microsoft/code-hunt Linked on the Project page mix.office.com
58
Conclusions 1.Software runs on hardware and hardware is increasingly varied 2.The hardware sector that is growing (mobile) is the most tricky 3.Maintenance increases in complexity with the number of deployments 4.Addressing human factors in large maintenance teams pays off 5.Prevention is a hugely valuable aid to maintenance 6.Gaming is a way for practicing software engineering skills Thank you! Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.