It could be random…
You test everything. You want to isolate the best text, the best headline, the best creative. Which combination works best? You do the same with targeting, though I still question whether you should.
My biggest issue with this is when you overreact to results that may not mean anything. You test three different text versions, and one gets better results.
Why are the results better? Is it because the difference is statistically significant? Is it a small sample size? Will you get the same results if you run the same test 10 times?
Randomness
Don’t discount the possibility of randomness in results.
I often wonder about this. The people who are shown our ad is based on a long list of complicated factors, and it can be somewhat random. Many factors impact whether a person is not only shown it, but they see and act on it.
This shouldn’t be a surprise. But, once you start nitpicking results to isolate individual ads, headlines, and creative, that potential randomness becomes more impactful. The smaller the pool of data you’re acting on, the less reliable it is.
Do This
I encourage you to try something. Run a split test of three ad sets. Each one is exactly the same. The optimization, targeting, placements, ads, everything. They’re duplicates.
Will you get results that are exactly the same? Probably not. Will you get one ad set that noticeably outperforms the others? Maybe.
If you get noticeably different results, what would that mean? What would it say about some of your decisions that are based in statistically insignificant or small sample size results?
I’m planning to try exactly this. Stay tuned…