How not to run an A/B test

injb · on July 11, 2023

If I understand correctly, the issue is when you track the effect as a rolling value and stop the test as soon as it dips above the significance level.

So for instance, let's say your test is to roll a dice 6 times. You say "if I get 6 100% of the time, then I'll consider that significant". You do the first roll and get a 6. Well that's 100% of the time, so you end the test and claim you had a significant result.