AI Can Substitute Developers but Not Testers

AI will replace lots of jobs. We show that AI is not a good test designer and makes wrong test cases and omits necessary test cases as well. However, the gap between manual testers and AI is small, thus testers should know test design better or even better to apply new and efficient test design techniques.

It’s a common belief that testing is easier than implementing code. A musician, policeman, or teacher can be a tester in a very short period. Companies prescribe an ISTQB foundation level exam and that’s all. If you know how to pass exams, after some weeks you are a tester and can have a job. In addition, a tester should know a test management tool with some bug-tracking features. It can be learned in two weeks maximum. Finally, some soft skills are also necessary, but most jobs require soft skills as well.

Developers need a lot of work to know the following things:

Programming languages with frameworks and developer tools
Data Structures and Algorithms
Source Control (Git)
IDEs (IntelliJIIDEA or VSCode)
Database and SQL
Networking Basics

It takes five years to be a good senior developer.

This implies that software developer work is more difficult than tester’s work. If so, you may think that a tester is easier to substitute by AI. However, it’s false. AI is a very good developer. If you add a good prompt, AI can generate excellent code. I tried only smaller tasks, but AI’s code was more elegant and bug-free.

But it’s interesting that AI is not good in software testing. If I add the requirements, sometimes AI misunderstands them. May the requirements are incomplete or misunderstandable but testers understand the same requirements with no difficulties. AI usually creates one test case for one requirement. Unfortunately, if one test case would cover more requirements, AI creates separate test cases. The biggest problem is that AI is very bad at test design and important test cases will be missed.

Here is an example:

Requirements for Extra holiday

R1 An employee can do overtime (overtime++).

R2 Doing overtime three times results in a free day (overtime = 0, free day++).

R3 An employee can take a free day by pressing – (free day--).

R4 If an employee is late, then one overtime is withdrawn (overtime--, if overtime = 0 → free day--, overtime = 2).

R5 If an employee takes three free days without lateness, then for the subsequent one-time lateness her accepted overtime remains unchanged.

R6 If an employee is late and overtime ≤ 0 and free days ≤ 0 then two overtimes are withdrawn (overtime = -2, free day = 0 → overtime = -1, free day = -1).

R7 If the number of free days is reduced to minus three, then GAME OVER

AI (here Copilot) generated the following 8 test cases:

Test Case 1: Overtime Accumulation (R1)

Precondition: Employee has 0 overtime.
Action: Employee does overtime.
Expected Result: Employee’s overtime increases by 1.

Test Case 2: Free Day Award (R2)

Precondition: Employee has done overtime twice (overtime = 2).
Action: Employee does overtime again.
Expected Result: Employee’s overtime resets to 0, and free day increases by 1.

Test Case 3: Taking a Free Day (R3)

Precondition: Employee has at least 1 free day.
Action: Employee takes a free day.
Expected Result: Employee’s free day decreases by 1.

Test Case 4: Lateness Penalty (R4)

Precondition: Employee has 1 overtime and no free days.
Action: Employee is late.
Expected Result: Employee’s overtime decreases by 1.

Test Case 5: Lateness with Sufficient Overtime (R4)

Precondition: Employee has 3 overtime and no free days.
Action: Employee is late.
Expected Result: Employee’s overtime decreases by 1 (overtime = 2).

Test Case 6: Unchanged Overtime After Free Days (R5)

Precondition: Employee takes 3 free days without being late, then is late once.
Action: Employee is late.
Expected Result: Overtime remains unchanged.

Test Case 7: Double Penalty for Lateness (R6)

Precondition: Employee has -1 overtime and no free days.
Action: Employee is late.
Expected Result: Employee’s overtime decreases by 2 (overtime = -3).

Test Case 8: Game Over Scenario (R7)

Precondition: Employee has -2 free days.
Action: Employee takes another free day.
Expected Result: GAME OVER!

Some test cases are wrong. In test case 5 the precondition ‘Employee has 3 overtime and no free days.’ Is impossible as 3 overtime will be converted into a free day. In test case 6, the precondition includes one late, and the action (input) is one late again. The expected result is wrong as the overtime is reduced. Test case 8 has the action ‘Employee takes another free day’. It’s not possible to only add lates.

For wrong test cases we followed the preconditions and the actions, for example in test case 5, we added 3 overtimes.

AI detected only 65% of the potential errors. For inserting errors, we have a mutation framework with intelligent mutants.

The biggest problem with AI is test design. AI tests the requirements, but it should test the cases when the assertion in the requirement doesn’t hold. If something is not explicitly in the requirements, the testers know it and test it but AI doesn’t. Here are some examples:

Test R5, it should be tested when we add lateness before we take all three free days. The related test is:

Test case 9: Add 9 overtime, then take one free day, add lateness, then add one overtime, then take 2 free days, and finally add lateness again. Output: the second lateness decreases the number of overtime or free days.

With this, we test taking 3 days, but lateness happens in the meantime.

R5 should be tested with 3 test cases. We should test that taking 3 free days without lateness results in one free lateness without punishment. However, we also have to test that the second lateness involves punishment in overtime/free days. As Test case 6 is wrong we don’t know which of these test cases is created but we know that only one of them is.

It’s obvious, but not mentioned that free days can be taken if we have free days. This should be tested by this test case:

Test case 10: Take one free day. Output: The number of free days remains 0.

A good tester would add these test cases. Here is a table for 3 applications. The result is that manual testers are better than or equal to AI tester. Here DDP is Defect Detection Percentage, and we also involved our professional method action-state test design.

Test approach	GPT-4	manual testing	Action-state testing
DDP Pizza	47	47	100
DDP Extra holiday	60	80	95
DDP Car rental	67	70	100

The conclusion is that even if the tester’s jobs seem to be simpler, we testers don’t need to worry about being replaced by AI. However, we need to know the test design better, and apply more efficient test design techniques, To do this we suggest reading these two books:

Practical Test Design

Modern Software Testing Techniques

and use Harmony.

AI Can Substitute Developers but Not Testers

Recent Posts

Comments