Skip to main content
Oral defences & examinations, Thesis defences

Masters Thesis Defense: Amir Hossein Bavand


Date & time
Friday, August 20, 2021
9:30 a.m. – 11:30 a.m.
Cost

This event is free

Where

Online

Candidate:

Amir Hossein Bavand

 

 

           

 

Thesis Title:

 

The Impact of Parallel and Batch Testing in Continuous Integration Environments

           

 

Date & Time:

Friday, August 20th, 2021 at 9:30 AM

 

 

           

 

Location:

Zoom

 

 

           

 

Examining Committee:

       

 

           

 

 

Dr. Joey Paquet

(Chair)

 

 

           

 

 

Dr. Peter Rigby

(Supervisor)

 

 

           

 

 

Dr. Weiyi (Ian) Shang

(Examiner)

 

           

 

 

Dr. Joey Paquet

(Examiner)

 

Abstract

Testing is a costly, time-consuming, and challenging part of modern software development. During continuous integration, after submitting each change, it is tested automatically to ensure that it does not break the system’s functionality. A common approach to reducing the number of test case executions is to batch changes together for testing. For example, given four changes to test, if we group them in a batch and they pass we use one execution to test all four changes. However, if they fail, additional executions are required to find the culprit change that is responsible for the failure.

In this study we first investigate the impact of batch testing in the level of the builds. We evaluate five batch culprit finding approaches: Dorfman, double pool testing, BatchBisect, BatchStop4, and our novel BatchDivide4.

All prior works on batching use a constant batch size. In this work, we propose a dynamic batch size technique based on the weighted historical failure rate of the project.

We simulate each of the batching strategies across 12 large projects on Travis with varying failures rate. We find that dynamic batching coupled with BatchDivide4 outperforms the other approaches. Compared to TestAll, this approach decreases the number of executions by 47.49% on average across the Travis projects. It outperforms the current state-of-the-art constant batch size approach, i.e. Batch4 by 5.17 percentage points.

Our historical weighting approach leads us to a metric that describes the number of consecutive build failures. We find that the correlation between batch savings and FailureSpread is r = -0:97 with a p << 0.0001. This metric easily allows developers to determine the potential of batching on their project.

However, we then show that in the case of failure of a batch, re-running all the test cases is inefficient. Also, for companies with notable resource constraints, e.g., Ericsson, running all the tests in a single machine is not possible and realistic. To address this issues, we extend our work to an industrial application at Ericsson.

We first evaluate the effect of parallel testing for a project at Ericsson. We find that the relationship between the number of available machines for parallelization and the FeedbackTime is nonlinear. For example, we can increase the number of machines by 25% and reduce the FeedbackTime by 53%.

We then examine three batching strategies in the test level: ConstantBatching, TestDynamicBatching, and TestCaseBatching. We evaluate their performance by varying the number of parallel machines. For ConstantBatching, we experiment with batch sizes from 2 to 32. The majority of the saving is achieved using batch sizes smaller than 8. However, ConstantBatching increases the feedback time if there are more than 6 parallel machines available. To solve this problem, we propose TestDynamicBatching which batches all of the queued changes whenever there are resources available. Compared to TestAll TestDynamicBatching reduces the AvgFeedback time and AvgCPU time between 15.78% and 80.38%, and 3.13% and 48.78% depending on the number of machines. Batching all the changes in the queue can increase the test scope. To address this issue we propose TestCaseBatching which performs batching at the test level instead of the change level. Using TestCaseBatching will reduce the AvgFeedback time and AvgCPU time between 19.84% and 84.20%, and 5.65% and 50.92% respectively, depending on the number of available machines for parallel testing. TestCaseBatching is highly effective and we hope other companies will adopt it.

Back to top

© Concordia University