How to Perform Successful Split Test SEO Experiments Explained by Kevin Indig - Kalicube Tuesdays.

Poster for Kevin Indig and Jason Barnard

Kevin Indig is an SEO expert and a successful Data Storyteller with many years of experience. In this video interview with Jason Barnard, he explains what a split SEO experiment is and how to make your successful.

How to run and think about SEO Experiments

Scheduled for 21 September 2021 at 16 H CEST (Paris)

The event is 100% free:

Book your place (free) >>

Save to your calendar >>

See on google my business>>

Set a reminder on YouTube >>

Organised by Kalicube in partnership with Wordlift.

Part of the Kalicube Tuesdays series.

Listen to the podcast episode>>

Helpful Resources About Split Test SEO Experiments

SEO First Principles

Transcript:

[00:03:24] Jason: We’re now going to move on to split testing SEO experiments, and I purposefully didn’t learn anything about it so that you can teach me what I need to know about split test SEO experiments. So, can we start off? What do you mean?

[00:03:37] Kevin Indig: I love it. Yes, absolutely.

[00:03:39] So generally, split testing, I think a lot of people are familiar with user testing, right?

[00:03:45] Where you split a user group into a control group and a variant group. You make the change to the version that is exposed to the variant group, and then compare the reaction basically to the control group.

[00:03:56] That’s like the very high level… That’s really what we’re talking about, and there’s a lot of new ones in there, and there’s also a bit of a new ones when it comes to SEO testing specifically.

[00:04:08] So generally just to set the stage a little bit. I think, to be a bit bold here, I think that if you don’t do SEO testing these days, your chances of succeeding are very, very low and the reason I say that is because for once SEO has become really competitive in my mind, especially in certain verticals.

[00:04:29] There are brands with million dollar budgets with huge SEO teams with great knowledge, and it’s really hard to compete if you don’t adopt the testing mindset.

[00:04:39] Number two, is that the Google ranking algorithm or what we consider the Google ranking algorithm has become so complex that there’s probably not a single person to really understand all of it.

[00:04:51] It’s almost like the tax system.

[00:04:54] Jason Barnard: But isn’t it also with machine learning that things have moved so far away from, even the humans at Google, that even the people at Google, don’t really understand. They’re saying basically, we’ve got this big package. You need to please the package.

[00:05:07] The package is aiming there and so you need to aim where the package is going. I don’t know why I’m calling it a package. It’s an algorithm.

[00:05:14] Kevin Indig: Package is fine.

[00:05:16] You’re absolutely on point here, Jason. And the thing is also that this package changes over time, right? It’s not just, “Hey, this is where you have to aim for it.” It’s like this is where you have to aim for now and next month it could be slightly different.

[00:05:28] And the target constantly changes. But, there’s constant process of evolution and adoption going on.

[00:05:34] Here’s a funny anecdote. So, I’m Jewish and currently there are lots of Jewish holidays and so a week ago I was at this Jewish event here at Chicago, at the beach and it was a completely new group of people and I got in touch with some of them and so of course turns out that one of these people is a Google engineer and she works on the core algorithm and you know how many secrets I learned about that? Exactly zero. Cause there’s not a single person that knows everything.

[00:06:04] There are dedicated teams to just work on all the recipe SERPs or all the SERPs way to a specific vertical, and so it’s very fragmented. Of course there are some teams that bring it closer together. There is a very shielded team in Mountain View.

[00:06:23] Jason Barnard: But, that means that if you’re in the recipe space, the team working on the kind of specific tweaking of the recipe algorithm in the blue link algorithm, the core algorithm is going to be slightly different to the shoe vertical because they’ve got a team working on the shoe vertical or whatever it might be. So, when we generalize, we’re really barking up the wrong tree.

[00:06:49] Kevin Indig: Right. Absolutely. That’s what it comes down to, and that’s why testing is so important, right? Things in your vertical might be slightly different. Long time ago, I was working at Searchmetrics and back then they still did these ranking factors studies and one of the outcomes was that the travel verticals, very different than, say, the retail vertical or the finance vertical even when it comes to ranking signals. So you have to test to find out what works for you and you have to test because the algorithm constantly changes.

[00:07:20] Jason Barnard: Right. And talking about verticals. One thing with Kalicube that I’ve been doing is diving down into entities on a geo, entity type and industry level… and it is stunning. What I do is pull up the dominating blue links, the dominating social channels, the dominating news, dominating video, dominating trusted sources for the knowledge graph and it differs enormously between industries, entity type and geo.

[00:07:50] It’s stunning. Sorry. I’ve just gone off on a rant.

[00:07:53] Kevin Indig: No, it is stunning. It is absolutely stunning. I think the Brand SERP you showed earlier is a good example of that. But yeah, entity, knowledge graph, I mean, all these things are not the same, globally and vertically, as you said and so…

[00:08:07] Jason Barnard: But, the point you were making, let’s come back to that, is that because it varies so much across industries, because it varies so much also in terms of the types of results that will appear, if you’re not split testing, you’re shooting in the dark.

[00:08:21] Kevin Indig: You’re shooting in the dark, absolutely. You have this old mindset, which I called “The Blueprint SEO Mindset” and I grew up in that world. There was a catalog of things and how they should be, and you go to a site and you say “Is that the case or not? Okay.” Then we’ve got to change things, right.

[00:08:36] Today, not that easy anymore. We have a map, a high level map of what is important, content, titles, backlinks, all these kinds of things. But, to go a level deeper, we need to test what works and so there are different types of tests.

[00:08:52] Jason Barnard: Sorry, just before we go on that, one point, there are some fundamentals that never changed, like being crawlable and being indexable.

[00:08:59] You’ve got your basics. Once you’ve got your foundations, just to make sure nobody goes off and start split testing, indexing and crawling, and then you can start building the split testing, which is where we come in.

[00:09:09] Kevin Indig: Absolutely. Yeah. There are four steps basically of the ranking process in my mind.

[00:09:16] As you mentioned, there’s the crawling part, rendering, indexing, and then ranking, right.

[00:09:22] And the first three steps, they basically don’t change. Its information retrieval. We’re just talking about the ranking part of the process, good call out, Jason and so when it comes to…

[00:09:33] Jason Barnard: We just keep being charming to each other for half an hour. We’re doing a very good job!

[00:09:37] Kevin Indig: We just appreciate each other so much.

[00:09:40] But there’s basically three types of testing in my mind, right?

[00:09:45] We call this episode split testing. That is the most common one, and is most applicable when it comes to aggregator sites. An aggregator is basically a site like a marketplace or an e-commerce store where you have a lot of pages with the same template.

[00:10:04] EBay’s a great example, Amazon of course, these e-commerce players… but then also sites like Facebook or LinkedIn. They have a profile page. They have a company page, maybe a couple of others, but basically they have like an overseeable amount of page templates that all have the same elements and components.

[00:10:23] And that makes it very testable because it’s very simple to take a large number of pages with the same template, split them into two or several groups, and then make changes and measure the impact. Very basic. Very easy.

[00:10:38] Jason Barnard: Sorry. I have a statement, just really quickly.

[00:10:41] There’s a webinar tomorrow about indexing and crawling with Dawn Anderson and Fabrice Canel on Duda tomorrow. Sign up for that.

[00:10:47] Second point was by doing that split testing, if you take a profile page, Fabrice Canel from Bing was saying it really relies on sites having this templated system where it can rely on the way the page is built in each of these different templates.

[00:11:02] So it will go all the way through LinkedIn and expect to see the same template all the time. If you’re split testing, aren’t you throwing a big curve ball?

[00:11:10] Kevin Indig: Yeah, absolutely. And we’re going to get to the counterpart in just a second, but yes… by the way, Dawn and Fabrice on a webinar, it sounds pretty tempting.

[00:11:19] But, I mean, it’s really this kind of templatization, that makes things easier to test and we have a high degree of that on the web in part, because of CMS, like WordPress or maybe like Shopify, but anyway, long story short, much easier to test. They’re very straightforward.

[00:11:38] The counterpart of aggregators are integrators. And those are usually the sites that rely heavily on content: SaaS sites, maybe publishers. And that’s where every page pretty much looks different, right? The content is very different. You don’t have a product catalog, you don’t have the same elements. You have a variety.

[00:11:56] And that demands a fundamentally different type of testing. And that’s where we often speak about before and after testing, where basically you make a change and then you just see this traffic go up or down, this click through rate go up or down, clicks, impressions, those kinds of things. And those are valuable, but we often disregard that we can also do a form of split testing on non templatized pages.

[00:12:22] If there’s one more controversial statement that you would want to take away from our conversation, I think it would be that is that these before and after tests are actually not very valuable and instead you want to adopt a split testing mindset, even when it comes to non templatized pages.

[00:12:38] Jason Barnard: Right. But if it’s non templated, it’s an enormous amount of work to go through, to actually split test, because you need to apply them on a one by one basis or be incredibly smart with your regex in your database.

[00:12:51] Kevin Indig: That’s right. And so here’s what you want to do about this. So first of all, you can still look for control groups, even if not all pages are templatized, it is more work and best case you have some sort of telemetry or a tool that helps you do that.

[00:13:05] But even manually, you can just see, “Okay, which pages have the same traffic patterns over a long span of time?” This is where we’re going to get into testing and designing your test (we’ll look at what factors are important in a moment), but just know that you can still pick control groups even if pages don’t have the exact same template.

[00:13:24] Jason Barnard: Right. Okay. And in that case, how many pages do you need to actually take into account? I mean, I would just say, two or three, let’s see what happens, but you’re not going to accept that? Are you?

[00:13:35] Kevin Indig: I might actually. I think it’s much more on the amount of traffic, right? So here’s the thing, what matters much more than the number of control pages is, “Are they following the same traffic patterns,” one. And two, “Is there a significant amount of traffic?” So if you have a page that ranks for two or three keywords in position five, here the noise is going to be so high, you’re not going to get anything meaningful out of it.

[00:14:01] So we’re talking about pages that get at least a hundred visits a day, or have at least a somewhat steady traffic pattern over time. Now the kicker of where 90% of people fail when it comes to SEO testing is (not)to revert the change that they made. What happens most of the time is that you make a change on a page, let’s say you add the current year to the meta title, you rolled it out on the page.

[00:14:28] You see, “Oh, the page gains more traffic over time. Awesome, successful test!” But it’s not true. That is not a successful test. It is only successful if you revert the change, you take the year out of the title again, and then you see the traffic coming back down to baseline. Then you know it has an actual impact and then you can also understand the incremental value of adding the year in the title.

[00:14:54] Jason Barnard: Right! You just made me think of something else as well. But right! That point is saying, we’ve got this, it goes up here, and then it goes back down or maybe it goes down and then it goes back up. And if you haven’t done that, you don’t actually know. I mean, most people, as you say just think, “Yeah, it’s gone up. So that’s a success and we’ll move everything over to that.” And you’re saying that that’s a foolish mistake that we shouldn’t make. I probably would have made (that mistake) if you hadn’t told me.

[00:15:16] Kevin Indig: Well, it’s tempting. It’s also very painful to revert the change, right?

[00:15:21] Because “hey I just got all this traffic, but should I shoot myself in the foot”? But if you want to really determine the impact, you have to do it. And why do you have to do it? Because SEO does not deal with laboratory conditions. We cannot isolate a single factor and say, it’s a hundred percent that.

[00:15:42] You can only say there’s a very likely chance that it is that single factor that moved the needle and by reverting the change, we increase the robustness of the test or the statistical significance. Now, just as an aside, for all the Statistics fans who are tuning in right now, every SEO test is a quasi experiment.

[00:16:00] The super rigid statisticians here in the audience would probably say “Oh, it’s not real SEO. It’s not real split testing,” and yeah, it’s quasi testing. It’s still valuable, everybody should do it but the thing to take away here is to revert the change, to really understand the incrementality of the change.

[00:16:22] Jason Barnard: Right. Cause I actually did Statistics at university alongside Economics. You wouldn’t believe it when you hear me talking, but one of the things is statistical reliability or whatever it’s called, I can’t remember anymore, it’s 30 years ago, comes after 7,000 of anything, under 7,000… I seem to remember that as a number that my teachers were throwing at me. It is basically saying, “If you’ve got 7,000 of something, you can probably rely on the fact that result is statistically relevant – that was the word. Have you ever heard the number of 7,000?

[00:16:53] Kevin Indig: In different context, yes. And to be fair, if you want to run SEO experiments on a large site and you actually have a data scientist who has a degree in Math or Statistics, there’s a lot of depth to go down to and a lot of things that they can help with. So yeah, like the number 7,000 or, just this idea of 95% or 98% statistical significance, there are ways to get there and in the ideal case these people can help you with that.

[00:17:23] Jason Barnard: The 95 percentile thingy, whatever that’s called in, well it IS called the 95th percentile, if you chop off the top five and the bottom five, because they’re outliers and you’d take that middle 90% and that’s going to be more reliable.

[00:17:36] Is that something you guys do?

[00:17:38] Kevin Indig: Yes. Yeah. I’m blessed with a fantastic data science team who helps with that, and we’ve built some tooling around that as well, but there are also some third-party tools that you can rely on to help you find the right control groups, or basically help you set up a robust test design. I think it’s important to understand that this is one of the messages that I have for everybody out there is that you can take it to that level of depth, but you should not think that either you go that far or else your tests don’t make sense.

[00:18:11] Everybody should test, no matter what. If you’re talking about the…

[00:18:14] Jason Barnard: That was the 95th percentile and Anton’s just put it on screen for people listening to the audio podcast. Carry on Kevin, I’m sorry.

[00:18:25] Kevin Indig: No problem. You definitely want to test no matter what, even if your test isn’t statistically perfect.

[00:18:32] It’s better than no testing. What Anton showed and thanks for showing that in here, ’cause it’s super helpful is the so-called p-value and the p-value basically just tells you, “Hey, you have two hypothesis, right?” You have what we call the null hypothesis, which is basically the current state, and then you have a new hypothesis, and the p-value tells you how confident you are that the null hypothesis, or the way that it’s currently set up, is not better.

[00:19:03] It’s going to be a bit more complicated, but just know that a high p-value is generally not a bad thing.

[00:19:10] Jason Barnard: Right. Okay. And if we come back to the idea of do- it- yourself, for people who don’t have a team like yours…

[00:19:16] Thank you, Brian. “Great chat so far guys.” from Brian Moseley. Thank you very much. I know you know a lot about it because you’ve got a tool that does this kind of thing at Semrush.

[00:19:26] Now we were talking about smaller things. I’d say clients of Semrush would probably be using Semrush for that. I don’t really know, but as a do-it-yourself thing where you think “I’ll just pick my highest traffic pages, divide them into two with about the same amount of traffic split test one, and then look at search console to see what the results were”.

[00:19:48] Would that be a fair way of going about it or is that too haphazard and stickytape-and-sellotape-and-string?

[00:19:55] Kevin Indig: If you just do that, you already have a leg up, right? That’s what most people don’t even start doing. And, I think now when we speak about test design, there are a couple of things to look after before we even go into what you can test, right?

[00:20:12] When it comes to good test design, we talked about a high traffic page or high traffic pages in general. Number two is there’s the duration of the test and there are two ways to approach the duration. There is an actual math you can put behind this based on how much traffic you get and how impactful you think the change you want to make will be. So more impactful changes, say in our case the meta title, might not have to run as long on a high traffic page as something very subtle, like, maybe a small internal linking module, or maybe a small change to the body copy, which takes a while to pick up and for the changes to really become visible.

[00:20:57] There’s this phenomenon in Statistics, which is the reversion to the mean. What it basically means is that when you start testing, in the beginning you see a lot of volatility, and that volatility over time calms down and it reverts to the mean, which is the real impact that you wanted to acheive, the real change.

[00:21:17] And so when you have highly impactful things that you want to test, like a meta title, it reverts to the mean relatively quickly. If you have a more subtle factor that you want to test, it will take a longer time. So generally, I think the best duration (for an SEO split test) is at least three to four weeks.

[00:21:35] Anything under that typically is very tough to become clear as an experiment, but there’s also an approach where we can calculate the exact number of days that it takes to run an actual statistical test.

[00:21:50] Jason Barnard: Right. And in that three weeks, do you take into account the time for Google to crawl, reindex and reevaluate?

[00:21:57] Kevin Indig: That’s such a good point, cause that’s also something that you want to make sure about. Good test design means you look at the server log files and you make sure that Google has crawled both the control page and the variant page.

[00:22:11] Jason Barnard: Because, I’ve been looking at the speed of indexing…

[00:22:13] brian’s asked a really complicated question. We can come back to that cause I don’t understand it… A year ago I did a video for Semrush about the speed of indexing and I could get a new page indexing in a couple of seconds or a minute and I actually did it live and within a minute it was showing the new meta description and that was really cool, but it’s not like that anymore.

[00:22:38] I updated a meta title and a meta description a few days ago. The meta title has been digested and used, the meta description hasn’t. And it’s been a few days and that’s also a problem, as you say, well, I started the test today. Is it tomorrow? Even if it gets crawled, even if I’ve submitted it through search console and checked that Google bot has been around, is it actually looking at the data?

[00:22:59] Is it actually updating it and has it actually taken it into account? And the answer is you don’t know.

[00:23:04] Kevin Indig: Yes. There’s a lot of these like subtle factors that can make experiments go sour. As you said the crawl frequency, Google picking up the change. When we test things like internal linking at broader scale, that takes even longer.

[00:23:19] The more pages you test on, the longer it takes Google to crawl all of them. So all these things should go into test design and also means that you need a certain amount of telemetry. You need to have that understanding. If you don’t have access to your log files, it’s getting more tricky out there, right?

[00:23:34] Again, it shouldn’t prevent you from testing, but yes, these are all things that you should, in best case, take into account and measure over time. So another interesting observation would be when you make a change to a variant group of URLs, does it impact Google crawl frequency? Does Google actually come back more often?

[00:23:53] And what does that say? Has the backlink profile changed to these pages over time? These are all factors that you want to take into account as well as we can.

[00:24:03] Jason Barnard: We are just going back to… The thing is you’re making this one change, but there’s so many other factors that are coming into play that immediately, you start saying, “Well, that doesn’t really count” but if you can implement it, keep it going for several weeks or several months, depending on the subtlety of the changes as you said earlier on, (then) revert, you’ve got a strong split testing environment where you can actually start to think, “I might not be sure, but I certainly learned something and I’ve moved forward in what I’m trying to do within my vertical,” if we come back to that, within my specific vertical, which is going to be specific. And with Shopify, you must have gazillions of verticals. And that’s my question. Are you split testing on your clients?

[00:24:45] Kevin Indig: No, we’re not split testing on our clients. We are testing all sorts of things, we have our own testing environments to some degree too, but no, we’re not split testing on our clients, but there’s a lot of anecdotal stuff that we pick up and at scale, there’s interesting patterns that emerge.

[00:25:04] But no, there would be…

[00:25:07] Jason Barnard: From my perspective, it comes back to what Russ Jefferey from Duda was saying, is that they’ve got all of these clients, and they’ve got all this data so they can actually say, “here you go” – the core web vitals was his example – “after all that noise, didn’t actually change anything”..

[00:25:20] Kevin Indig: Exactly. Yeah, absolutely. At this scale you see a lot of interesting data, a lot of patterns and but there’s also a lot of noise. Some merchants use different themes, or they have a different level of SEO in-house and so on. So there is actually a lot of noise out there.

[00:25:41] You can do quite a lot at that scale. Even if you have, some of these larger sites. At G2, for example we did a lot of testing. That was a very templatized site and it was feasible because we had a large number of pages in the Google index. So at a certain scale, no matter what type of business you are, whether you’re a platform or a SaaS, business or a marketplace, it gets really interesting.

[00:26:04] Jason Barnard: And basically you’ve been doing this for years. You’ve been doing it on massive sites. So you really know what you’re talking about. But Peter Mead just asked a question about paid search where this is really easy. You say, right, I’ll do this split test. And now he said that I immediately realise that I have done split testing on paid traffic. Why did I never do it on SEO?

[00:26:23] Kevin Indig: Yeah, Peter. Thanks for asking the question. It’s good seeing you again. We met in Sydney a couple of years ago. Peter is a great lad and it’s funny because there are actually things that you can learn from Paid split testing and then transfer them over to your organic strategy, right? Like title testing, for example, the copy itself that can have a true impact and you can measure how people react to a page. Jason, you and I both know the actual experience of the page and how helpful it is in solving your problem matters more and more in SEO.

[00:26:54] And you can test it with paid traffic.

[00:26:58] Jason Barnard: Sorry!, I was agreeing. But in fact, in paid traffic, I like to do DSA. The DSA is Dynamic Search Ads, which is basically Google Ads running off the search algorithm. So if you’re a great SEO, DSA is brilliant. And in fact, that’s why I went, “ooooh” at one point, is that if you’re running DSA campaigns off your SEO, you can change some things in DSA and then do split testing without having to get involved in your SEO. But you can also do the split test on the SEO and measure it in Google Ads by pure numbers: “How much money am I making?” And that just gave me a great idea for a client because we’ve got a brilliant DSA campaign and that’s something I will now be doing as of tomorrow morning.

[00:27:40] Kevin Indig: I love it. There’s a lot of cool stuff you can do with organic and paid results. This could be a whole conversation on its own. We’ll maybe do a follow-up at some point of time. But when it comes to just sheer testing: you can also test different value propositions on organic versus paid on the same SERP and see how that changes the game.

[00:28:00] But now, I’m venturing off too far. Coming back to good test design: we spoke about the duration, we spoke about the importance of having a lot of traffic on these pages, and a good control group. There’s also this idea of a cooldown period. And so this is another thing that I learned, which when you run a test on a set of URLs, you revert the changes you see: “Okay, this change brought a 5% increase in click, the incrementality,” you want to give these URLs a couple of weeks or at least one week of a cool down period before you run another test on them, just simply to make sure that nothing else changes that you could see. This is also something that I learned the hard way and that I forgot initially.

[00:28:42] And of course, it should go without saying, but actually, in practicality it is often overlooked, you don’t want to run several tests on the same URLs. You want to make sure there’s only one test running on a set of pages at the same time.

[00:28:56] Jason Barnard: When you said cool down period, I thought of an angry teenager, but in fact, you just mean that Google needs time to settle down and digest and not be freaked out. I look at Google in those terms: you can throw a big change at it, and it can digest it, but if you do that too often, it just freaks out and you will see this big dip.

[00:29:19] And I just kind of say that the machine’s just thinking, “I’m not sure anymore and to be safe, I’ll just get rid of it while I think about it and then bring it back.” So I tend to warn clients to be very careful about changing multiple things very quickly over time, because you’re going to freak the machine out because it is a little bit sensitive.

[00:29:34] And it does work on confidence a great deal. I think we fail to realize that there is this idea of confidence in an algorithm, and if it’s confident, it will throw things forward and to the top, and if it’s not, it will think “if I’m not sure I’m not going to do it”..

[00:29:49] Kevin Indig: Yeah. It’s funny that you say that. All you have to do is look on page 4, 5, 6 on the Google search results for any given keyword and you see the results.

[00:29:57] Google is very unconfident in promoting, in any way. Andyou find the craziest things on these pages, but, yes you want to give it a cooldown period. You also want to see, “Is anything else changing unexpectedly?” And also if the test is over for say like a week and then all of a sudden, there’s like a major dip or spike in traffic, then you have to ask yourself, “Okay, what happened here? Is that unrelated to my test? Was there some sort of a lagging factor that played a role here?” Where maybe you’re test didn’t run long enough. So cool down period is incredibly helpful to make sure that your own confidence in a test result is incredibly high.

[00:30:34] Jason Barnard: And how’d you deal with stuff like, kind of trending news that might affect… or seasonality that might… Obviously seasonality needs to take that into account because you should know the seasonality for your own products in your own business and your own industry, but trending news that you can’t predict that suddenly changes the landscape. For example, more impressions that you didn’t usually get, or a news box turning up at the top of the SERP, which suddenly changes the entire SERP.

[00:31:00] Kevin Indig: Yeah, exactly or here, how Hans mentions “Google update in the middle of the test” All of these things…

[00:31:04] Jason Barnard: The most obvious one is the one is the one I forgot.

[00:31:09] Kevin Indig: But they all kill your test. It’s dead. You can start from scratch and especially during a core update or larger Google updates, you want to wait until things “cooldown.”

[00:31:19] Now, the exception is when you picked a control, that was impacted the same way, then you are actually good. Same accounts for seasonality. That’s one of the reasons for why you want to pick a control to normalize for, or equate for seasonality but there are factors that can kill your test like a query, all of a sudden deserving freshness and a news box showing up… or again, like a couple of very impactful backlinks going to one of the pages that you’re testing. All of these kinds of things, they just kill your test. Which leads to another important thing. And that is retesting the same thing over. It’s not the first thing that you want to do.

[00:32:05] I think when you just get started with SEO tests, you just want to test a couple of things and get a bit of skin in the game and gain some experience. But over time, especially with things that you tested that seemed to move the needle quite a lot, it is smart to test them again and repeat the same test to make sure that it still results in the same impact.

[00:32:26] This is like you do this with very high impact factors, but then also if there’s a significant amount of time that has passed since the last time you tested this thing, you want to test it again because these factors change over time. It’s very interesting, just testing in general.

[00:32:46] There’s this crisis of repeatability where some of the biggest tests, for example, in Psychology or Economics actually yield very different results when you test them again these days, which has led to a crisis in the scientific community and, long story short, you just want to be aware and conscious that you might want to retest some of the things over and over again, just to make sure that they still work the way that you think they would.

[00:33:11] Jason Barnard: Right. It could just be a one-off and you’ve got to be careful about that. Now, the question from Brian that he asked earlier on was “Can you explain the difference between neural networks and Google Causal Impact for analyzing the results?”

[00:33:24] Now, I don’t even know what Google Causal Impact is. So I don’t understand the question because I don’t understand the most important word in it. Do you understand?

[00:33:32] Kevin Indig: Yes. Google Causal Impact is actually pretty cool. It is an open source library for R, which you can use to run and evaluate tests that basically uses Bayesian testing or Bayesian time series for testing.

[00:33:53] So here’s what this means. R, this really cool tool that you can use, you can also use Python, but R is very popular in the data science community for evaluating large datasets, and causal impact is the open source library for Google that allows you to run these Bayesian tests. The idea of Bayesian testing is actually brilliant. I wrote an article in my book about this a while ago, because you can run Bayesian tests, but you can also just adopt Bayesian thinking in itself and so here’s what this means. We spoke earlier about how some tests, like in the beginning, they might not be super reliable or super robust, but you’re slowly iterating your way to the optimal solution over time by running several tests and in best case, these tests are all connected with each other.

[00:34:40] And that’s the idea of Bayesian testing. So Bayes, like a couple of hundred years ago, is this guy who became super smart, not at his lifetime, like he became super smart after his death, like most people during that time but basically what he did is what he said is “Okay, if I have an apple in my hands and I throw it over my shoulder without seeing where it should land, and I’m aiming to throw the apple in a bowl, the first time I throw it and it doesn’t hit the bowl. I know where the bowl is not. So the more often I throw the apple over my shoulders, the better my understanding of where the bowl is not and what not to do.”

[00:35:18] So you can iterate your way towards the optimal solution. You can do the same in SEO testing, and there’s a simple but elegant method to it where you say, “okay, if I made, for example, a change to my title tag, and I see that the impact is 5%, then I can run another test within the same confines within the title tag as well, that might be slightly related, but is different”.

[00:35:43] And that initial test results can be the basis for the next test. So you can say, for example, “Hey making a change to a title might result in a 10% increase in clicks. Now let’s test something else in the title and use those 10% as the basis and see how the other test shakes out. And so you run five or 10 of these tests in a row.

[00:36:04] All of a sudden you have a really good understanding of title testing and causal impact can help you get there in a technical way.

[00:36:12] Jason Barnard: Right. But my only problem with that is that with five tests with the cooling off period, well, the crawling period, the testing period, the reverting period and the cooling off period, whatever you were calling it. You’re looking at a year to do those five tasks more or less, by which time the algorithms have changed again.

[00:36:30] Kevin Indig: Potentially. Yes, absolutely. That’s why SEO testing never stops. It’s a continuous process and the best SEO teams in the world, in my mind, they have a high degree of testing because they are aware of that. But yeah, you’re right. It takes a lot of time, takes a lot of effort and I don’t think most companies spend enough resources to go after it because it is such a high effort.

[00:36:54] Jason Barnard: Right. And would it be incredibly foolish, if you’ve got a massive site to do four different split tests with one single baseline at one time? Is that completely foolish?

[00:37:07] Kevin Indig: No, it’s not foolish. It will help you iterate over time and just get smarter about this. It’s not technically Bayesian testing.

[00:37:14] But that doesn’t matter. It’s still interesting and still valid because what if two of these tests are positive, two of those turn out negative and you can pin it down back to what exactly you changed, all of a sudden you got smarter, you can run some follow-up tests. So yes, in essence, the more tests you can run simultaneously, the faster you can iterate, right? And this the same thing in user testing. In fact, most startups measure as one indicator of growth and progress, they measure of the test velocity. How many tests have we run in a certain amount of period? And can we increase it over time? So it is not foolish at all.

[00:39:19] So that’s going to be phenomenally interesting and I am stunned by already what Gennaro shared in an email and I won’t share it with you because it will spoil the surprise, but it’s going to be brilliant. Gennaro is delightful. Kevin, you were wonderful, charming, and you get the outro song.

[00:39:35] Jason Barnard: Brilliant, Kevin. (Sings) “A quick goodbye to end the show. Thank you, Kevin.”

[00:39:41] Kevin Indig: Thank you so much, Jason. That was brilliant. That was awesome. I had a great time.

[00:39:44] Jason Barnard: Thank you and thank you everyone for watching.

Staff Writer