Experimentation theory
For those founders shivering and reading this thinking I’m about to take you back to your high school chemistry class, it’s important to clarify that not all validation experiments take place with test-tubes in a lab (unless you’re working on something like a BioTech startup, that is).
When I talk about an experiment, I’m just referring to a situation in which you are trying something new, and seeing whether it has a positive or negative effect on the environment you’ve been observing. If you try something new and the variable you’ve been monitoring changes noticeably, and you’ve been working in a good experimental environment (no other major changes), chances are the new thing has been the cause of the effect you just witnessed. That’s the gist of an experiment.
In his highly influential book, The Lean Startup, Eric Ries outlines an experimental method called Build, Measure, Learn. You run an experiment by building something new, making sure you can measure the effect this change will have, and then learning from whether there has or hasn’t been any noticeable change. If there has been a noticeable change and it was good, try more experiments in that direction; if not, avoid it.
Why this approach matters, is that it suggests that you can build an entire product by introducing one new feature at a time, deciding based on the results of the experiment whether to keep it or not. What you build after a year of iterating on this process will be a product where almost every remaining feature has been heavily validated, as opposed to building it all in one go, and then not being able to pinpoint the exact cause of why no one is using your product.
Even though each iteration of the method plays out in the order of building, measuring, and learning, the thought process leading up to the experiment (the experimental design) actually plays out in the reverse: Learn, Measure, Build.
Learn: You first need to decide which question you want to answer. The question you are answering could speak to desirability, which is what Ries calls the ‘value hypothesis’ and answers whether people would want to buy/use what you are building. It could speak to feasibility, which answers whether you can even build the product and any technology it relies upon that does not already exist. Or it could speak to viability, which Ries calls the ‘growth hypothesis’, and answers whether you could build a sustainable business (revenue > costs) out of your idea.
Measure: Once you know what you want to learn, you need to identify a metric (read: number) that you can track throughout the duration of your experiment. You’ll need to think about what the significance of a change in this number would be in relation to your question, as sometimes a change will validate your hypothesis, sometimes it will invalidate it, and sometimes the change is too small to conclusively determine anything. The metric also needs to be relevant to the question you are answering, and not what we call a ‘vanity metric’. If you are introducing a change to see if sales will increase, focus on how the change has affected your sales funnel (ad clicks, website visits, call-to-action clicks, people reaching checkout, etc.), and not on fairly unrelated numbers such as Instagram followers.
Build: This word could easily be replaced by ‘develop’, or ‘do’, or ‘change’, as you won’t always be working on new features. In this stage you make an effort to do something new, hoping that the effect will be a noticeable change in the thing you are measuring. This could be building out an entire new part of your business, a change in strategy, a difference in the way you treat your employees, a major or minor feature, or anything else that is suspected to have a causal relationship with your chosen metric. Later on in the case studies you will see that sometimes the first thing you have to build in a business is the hardest and riskiest of them all (the leap of faith assumption), but thereafter if that assumption is validated, you have a good foundation upon which to make smaller, incremental changes that can each be measured and analyzed.
Here’s an example: Your subscription-based startup has pretty high user churn (let’s say a churn rate above 50%), and in most cases these churned customers have contacted support a couple of times before canceling their subscriptions. Your hypothesis is that increasing the quality of your support team would reduce churn. You decide to run a CX (customer experience) workshop for your support team, and then monitor churn for the next month. If the churn for that month is significantly lower than the previous three-month average churn, you’ll have deemed the experiment a success.
Hopefully this demystifies experiments a little bit. They’re not a brand-new type of activity for a founder. Many founders already try new approaches when things aren’t going too well; the difference, here, is really about how that new approach is framed: you make sure there is a clear hypothesis linked to something measurable; then you make a specific change over a period of time; such that, afterwards, you can compare the metric before and after the change. Only when these conditions are met, can a founder’s new idea be called an experiment.
At The Delta, we always design our experiments using assets called ‘test cards’, where we state: our hypothesis (what we want to learn), our steps to validate the hypothesis (what we will build), and then the metric we will track and the success criteria for it (what we will measure).
There are some important comments I’d like to make about how to run experiments correctly in your startup:
On evidence: Not all experiments are created equal. It should come as no surprise that a person saying they would buy something is very different from them actually buying something. There are many factors that can affect human behavior, such as kinship with the seller, knowing they are under experimental conditions, knowing that they don’t have to make a purchasing decision right now, etc.
So whenever you run an experiment, you have to have a solid grasp of the strength of evidence that experiment will yield. There’s nothing better than a paying customer, it’s the highest form of validation; yet, if to get a paying customer you first need to build your early product, and you are trying to validate your general product direction before building anything, you have to compromise and run lesser-evidence experiments that will still confirm whether there is merit in what you want to build.
For example, imagine showing customers an explainer video and then asking them to sign up as a beta tester. This will at least show you that someone is willing to give you their email address and be contacted in the future when you have built your product. While handing over an email is different from paying money, it's certainly better than a flat-out refusal to engage with your idea.
To work out the strength of evidence for the experiment you intend on running, I recommend thinking about the quality and quantity of your data.
Quality speaks to how much the behaviour you have witnessed signals genuine desirability, feasibility, or viability. Handing over an email signals more desirability than a verbal promise to follow up on the progress of the product. Developing a PoC (Proof-of-Concept) of the product signals more feasibility than finding a research paper that says such a product is possible. Having customers organically refer their friends to your website signals more viability than every website visitor having come from you sending the link out to friends and family.
Quantity speaks to the number of data points you have gathered. It goes without saying that having one customer will convince an investor less than having one hundred customers. The positive feedback of one beta tester might not hold across a group of ten beta testers; in fact, it might have been a minority viewpoint that you had gathered, and if your next build-measure-learn cycle is based purely on that one point of feedback, you’re heading in a worrisome direction.
Hence, a strong experiment is one which has high quality signals of desirability, feasibility, or viability, as well as a high quantity of those signals. It’s up to you as a founder to determine a ranking scale for your experiments (e.g. very weak, weak, okay, strong, very strong), and then to determine what level of evidence would justify your startup proceeding into its next phase of life (e.g. time to build a product, time to go to market, time to seek investment).
The key takeaway here is that you have to recognise that you should only enter a high-effort phase of your business, once you have strong evidence that the phase is necessary. Create a sequence of experiments that increase in complexity as well as evidence, and if your startup idea makes it through every single ‘stage-gate’, you’re on your way to validating a great new venture. If it doesn’t pass through a certain gate, try to modify the idea slightly or completely scrap it. We call these stage-gates the ‘persevere, pivot, or perish’ decisions.
On hypothesis sequencing: The reason it is recommended that you answer the desirability question first, is that, even if you could technically build your product, you simply cannot build a viable business out of it if no one wants it. That’s why, despite Ries calling it the ‘value hypothesis’, I like to think of it as the ‘existential hypothesis’.
Only once you have validated a desire for a certain solution, do you need to go through the pains of figuring out whether you have the skills or know-how to make the solution work. Additionally, if you’ve found a desire for your product and then figure out that you can’t build your product (it’s not feasible), you’ve still validated a customer need and can explore other, more feasible, solutions for them.
Once the existential hypothesis has been validated, you can test a range of feasibility hypotheses. It’s important to do this before worrying about viability as, once more, there’s no point building a solid business model conditional on a working product, if you cannot actually build that product.
So finally, only once you have a fairly good idea that there is demand for a product (desirability), and that you could supply/deliver that product (feasibility), should you worry about validating the right business model for that system of matching supply with demand (viability). The reason Ries refers to this set of hypotheses as the ‘growth hypothesis’, is because usually desirability and feasibility are validated in a fairly small-scale environment. The next phase of a business is hence figuring out which engines of growth to employ to scale the product and serve it at a larger, more sustainable scale.
On the right time to run experiments: What you’ll discover as your venture grows is that the experimental method can always be applied to answer your next biggest unknown.
In the early stages, it may be around who the customer is and how much they’d be willing to pay. When you first go to market you may experiment with different features to try to build something that will catch like wildfire once it’s in users’ hands; called product-market fit. You may realize that this is only possible assuming you can reach customers, and hence run different channel experiments to find product-channel fit.
Once you’ve found the right channels, your focus may be on experiments to make incremental product improvements that make your existing customers even happier, creating a sticky user experience. Or you may focus on new ways to reach new customers, or ways to get existing customers to help refer new customers to your product.
Even when your business is starting to slow down on adding new features and pursuing new customers, you may want to explore completely new product lines or markets, which is almost like starting another startup within the framework of your company - this, particularly, is where experimentation will come in handy again. Ries points out how even large corporations have successfully maintained an edge on their competitors through adopting lean and experimental approaches, in industries ranging from computer software to car manufacturing.
The point is, experimentation isn’t just for founders, and it certainly isn’t just for the first few years of your business. It can be used by any team (design, sales, customer support, etc.), and at any stage of your business (ideation, development, growth, etc.). Experiments are your friends.
On running multiple experiments at once: As your startup grows and you have multiple teams embracing the experimental method, it’s absolutely fine to have experiments running in parallel, provided one rule is adhered to: you don’t want to introduce two new changes to the same area of the business at once.
This is possibly the most important point to make. Bad experimental design will set your team up for failure, because you won’t be able to make any sense of the results.
When two changes are made to the same area, influencing the same metric, it becomes extremely difficult to isolate the effects of each change. Suppose you try two different experiments over the same period of time, both with the goal of improving website sales. If website sales go up, how can you tell which change led to that effect on sales? Did both increase sales slightly? One more than the other? One almost entirely? By confounding the results, you won’t be able to conclude anything for sure, and you would have just wished you had run the two experiments over two separate periods. Or that you had done an experiment influencing sales and an experiment influencing team productivity at the same time, but not two on sales or two on productivity.
There is some nuance here: you can set experiments up where you are changing one variable in multiple ways (like six configurations of information on a website), in which case technically there have been many changes since the base version, but in this instance each change can have an isolated effect as it was still just one variable. This is as opposed to changing two variables at once (different configurations of information on a website, as well as different images), where it becomes increasingly difficult to know what the effect of just one of the versions of one of the variables did to the overall metric.
A helpful framework: The amount of theory on experimentation can be overwhelming, and with all these rules and regulations around the timing of experiments and how to run them correctly versus incorrectly, it may seem intractable to introduce them to your team. But there are a number of frameworks your team can use to effectively manage the growing list of experiments that could be run at any given time. We like to use Growth Tribe’s GROWS framework:
- Gather: This is where your team lists all the ideas they have for experiments touching on different parts of the business. You can collect these ideas in a workshop format or have them submitted to an agenda leading up to your next sync/meeting.
- Refine: Owing to time and resource constraints, as well as the principle of not running too many experiments on similar areas at once, your team needs to triage and prioritize the list of experiments. A useful method is to map each experiment on a graph/matrix that has ‘level of information’ and ‘importance to business’ as its axes. Naturally, you will want to focus on resolving assumptions where you currently have very little information but where that assumption holding is critical to the success of your venture.
- Outline: Your team will need to take the shortlisted experiments and design proper experiments around them, following the Learn, Measure, Build approach or using our test cards.
- Work, work, work: Now your team needs to go out and make the required changes for the duration of the experiment. Some experiments are long and costly, and some quick and cheap, but make sure that the requisite level of effort is put in to gather enough data to look at your chosen metric and be able to make statistically significant conclusions. This is very much the Build & Measure phase of Ries’s lean experimentation cycle.
- Study: Sit with your team to look at any changes in the metrics across your experiments, and see what you can Learn. The conclusions from this phase may add new experiments to your backlog, and answer questions that remove the need for certain experiments to be run in the future. Remain data-driven, hold no bias towards a certain result, and prepare to start with the next cycle of experiments.
All of the above has ideally illustrated that experimentation is a multi-faceted process that goes beyond building, measuring, and learning. While the theory is simple, the nuance comes in ensuring that your evidence is good, that things are done in the right sequence, and that the appropriate framework is used to introduce experimentation at an organisational level.
Mastering this nuance, however, gives you a superpower.
Because when you truly understand experimentation, it makes you realise that a business you thought would take months or years to build, can be approximated in a matter of days or weeks through lean approaches.
In the next guide I’ll walk you through the wide array of experiments available to founders, many which have been inspired by the brilliant research done by Strategyzer.
We’ve validated hundreds of ideas for early-stage entrepreneurs and unicorns alike. It’s what has helped us curate a venture portfolio worth more than €3.4bn.
If you’re a founder, startup, or scaleup with a great idea for your next business, feature, or product, and are unsure about how you can get running with it today, talk to us and get a free consultation with one of our top strategists.