In today's software development industry, high-level tests such as Graphical User Interface (GUI) based system and acceptance tests are mostly performed with manual practices that are often costly, tedious and error prone. Test automation has been proposed to solve these problems but most automation techniques approach testing from a lower level of system abstraction. Their suitability for high-level tests has therefore been questioned. High-level test automation techniques such as Record and Replay exist, but studies suggest that these techniques suffer from limitations, e.g. sensitivity to GUI layout or code changes, system implementation dependencies, etc.
Visual GUI Testing (VGT) is an emerging technique in industrial practice with perceived higher flexibility and robustness to certain GUI changes than previous high-level (GUI) test automation techniques. The core of VGT is image recognition which is applied to analyze and interact with the bitmap layer of a system's front end. By coupling image recognition with test scripts, VGT tools can emulate end user behavior on almost any GUI based system, regardless of implementation language, operating system or platform. However, VGT is not without its own challenges, problems and limitations (CPLs) but, like for many other automated test techniques, there is a lack of empirically based knowledge of these CPLs and how they impact industrial applicability. Crucially, there is also a lack of information on the cost of applying this type of test automation in industry.
This manuscript reports an empirical, multi-unit case study performed at two Swedish companies that develop safety-critical software. It studies their transition from manual system test cases into tests automated with VGT. In total, four different test suites that together include more than 300 high-level system test cases were automated for two multi-million lines of code systems. The results show that the transitioned test cases could find defects in the tested systems and that all applicable test cases could be automated. However, during these transition projects a number of hurdles had to be addressed; a total of 58 different CPLs were identified and then categorized into 26 types. We present these CPL types and an analysis of the implications for the transition to and use of VGT in industrial software development practice. In addition, four high-level solutions are presented that were identified during the study, which would address about half of the identified CPLs.
Furthermore, collected metrics on cost and return on investment of the VGT transition are reported together with information about the VGT suites' defect finding ability. Nine of the identified defects are reported, 5 of which were unknown to testers with extensive experience from using the manual test suites. The main conclusion from this study is that even though there are many challenges related to the transition and usage of VGT, the technique is still valuable, flexible and considered cost-effective by the industrial practitioners. The presented CPLs also provide decision support in the use and advancement of VGT and potentially other automated testing techniques similar to VGT, e.g. Record and Replay.