It seems that testing is the flavour of the month in business these days. All the presentations I go to talk about A/B split testing and multivariate Taguchi methods. Of course the guiding principle of testing is a good one; but I think it gives some the misguided notion that business is a purely deterministic process and that persistent testing provides an algorithm for success (or quick, cheap failure, which is also good). There are some useful parallels between empiricism and its critics.
What am I actually testing? The process seems pretty simple; do A/B tests on your google ads, your landing pages, your email blasts, your automated workflows etc etc. Eke out success one word change at a time. How do you know you are isolating the one thing you want to test? How do you know you are not just locally optimising in totally the wrong place.
The empiricists and positivists thought the only source of knowledge is experience. It is a fundamental part of the scientific method that all hypotheses and theories must be tested against observations of the natural world, rather than resting solely on a priori reasoning, intuition, or revelation. Sounds reasonable. Quine illustrated problems with this view in the “Two Dogmas of Empiricism”. Quine argued for a holistic theory of testing; he thought that you cannot understand a particular thing without looking at its place in a larger whole. Holism about testing says that we cannot test a single hypothesis in isolation; instead we can only test complex networks of claims and assumptions. To test one claim you need to make assumptions about many other things e.g. measurement equipment, data quality etc. So whenever you think you are testing a single idea, what you are really testing is a long, complicated conjunction of statements. If a test has an unexpected result, then something in that conjunction is false, but the failure of the test itself does not tell you where the error is.
Take an example of ‘test the business model over a period of one year’, the background assumptions and conjunction of interdependencies are legion. Two things can happen; you can say it doesn’t work when there is a simple element, which can be changed easily, in the web of dependencies that is the cause of failure i.e. you get a false negative. A wrong pricing decision for example. You can also ‘forgive’ a fundamental problem by saying that something else in the chain is the cause i.e. a false positive. For any complex business decision the theory is always underdetermined by the available evidence i.e. there will always be a range of possible alternative theories compatible with the set of evidence. So what good is my test if it doesn’t tell us something definitive?
It didn’t work this time is different from it doesn’t work. People are also very keen with the notion of failing fast and failing cheap. Once again admirable but how do you know when you have failed? Karl Popper thought science progressed by a process of falsification; from the problem of induction you could never say that a general statement was true from a handful of observations but you could say the statement was false if an observation contradicted it. The issue of underdetermination rears its head again; you could never force someone to logical conclude that a theory was false because it may be a background assumption that is at fault. Falsification also struggles with probabilistic statements; take the example of proton decay – some grand unified theories predict that a proton should decay into new X bosons. During the 80’s there were a lot of experiments and they never saw a proton decay. They were able to put a lower limit of the proton half-life of 6.6×10^33 years but were not able to say that it doesn’t decay. Most people may conclude that it doesn’t decay but the key thing is that they have to make a choice to believe so, it does not follow logically from observation. Doing a split test on a low volume search term feels a bit like waiting for proton decay.
Now take an example like James Dyson – he made 5,126 prototypes of his vacuum cleaner before hitting the big time. Why did he not declare that he had failed quickly and cheaply after the first 10 tries? Often it is difficult to know if you have the admirable quality of persistence or whether you are just a nutter.
Putting things to the test is a good idea but it only really works in a very well bounded context; most of the success stories come from web-based business that have a large enough user base to derive useful conclusions. For the majority of businesses there will be other things that matter a great deal more. A business has a huge amount of knobs that you can turn, the only problem is that you can’t turn them all independently of each other. Basically I don’t think people should spend a lot of their time obsessing with analytics. Doing things intuitively has served a lot of people well for a very long time. If anyone can figure out how to do an A/B split test on the ‘cut of your jib’ please let me know.