Now on Now: How we run 40 million tests every day at enterprise scale

Hi everyone, so thanks for coming to see our Now on Now session. Today we're gonna be talking about how at ServiceNow we run approximately 40 million tests every day at enterprise scale. We actually orchestrate all of this from our ServiceNow instance. And we're doing this to validate the, okay, got a little problem there, to actually validate the product that you all use every day. So these tests are used when developers check the code to make sure that that's all looking good. And so we're gonna talk a little bit about how we do that. One of the takeaways we want you to get from this today too is that, you know, we have all these nice out of box applications and everything, but you can also build your own applications inside of the platform that are really tailored to your needs. And you can do big things with this platform. We run a lot of stuff every day and we scale that out just using the tools available on our platform. All good? - Yeah, mic was the wrong way. - Okay. All right, so this is our safe harbor notice. You've probably seen this a thousand times. Don't make any financial decisions based on anything we tell you. That's generally a good idea if it's coming from me anyway. My name's Scott Marshall. I am a principal software engineer in our developer productivity organization. Been with the company for about 14 years. And then, oh. - Need a special mic. - There you go. So this is my co-presenter. - All right, is everyone having fun at Knowledge? - Yeah! - All right, so I'm Anitha Loganathan. I'm director of engineering at ServiceNow. I'm part of their I work with Scott. In ServiceNow, I manage BuildTools1, an internal ServiceNow instance, which is used to run our engineering operations. Everything from product management, release management, SDLC tooling, to get our platforms off the road, is run on this instance. When it all started, we had a good integration going out between BuildTools1, Jenkins, and Git. This enabled our engineers to write code, commit, build, get their test results on the instance. All of this happened in a reasonable amount of time. The feedback loop was acceptable. Everyone was happy. Until we grew. We grew in number of engineers who were working. And we grew in the number of products we were building as well. What an interesting problem to have, right? So the system that worked for one team wasn't scaling for 400 teams. It was becoming hard for us to pinpoint when a failure occurred, if it was because of a code written by team A or team B. This also meant that we were running lot more tests than the product actually required. The teams were waiting for their test results. The feedback loop suffered. This mud pit of environment wasn't scaling. It was directly impacting our ability to grow. What would have helped us is something like this. Where each team had their individual workspaces, where the tests were run closer to their product, and obviously the feedback loop was acceptable, right? Once we knew the problem at hand, we had put together the project with key focus areas listed in the screen. First thing was to deliver products at a faster pace. As most of you in this room know our release process, we have release programs which where our platform releases occurs twice a year. And when it comes to our application releases, they can go alongside platform release or even quarterly releases. On top of it, we also have innovation labs. To meet our commitment that we had with you guys, we had to ensure we address this problem. The second thing was to focus on our infrastructure and the feedback loop. Jenkins was in the middle of the problem. Our ability to scale was directly dependent on Jenkins queue management system. And we were already pushing it to its limits. In the process, we were also learning that there were a lot of constraints with Jenkins which Scott will talk about when he's talking about the implementations. When it comes to feedback loop, we measure ourselves with a KPI called time to feedback. I'm sure everyone would have a similar metric in your organization, which is nothing but by definition, the time between when a developer commits the code and when he gets the test results. I'll touch upon the improvement we made to this particular metric towards the end of the presentation again, so we'll come back to it. Faster feedback loop, what will that enable? It'll enable the teams to collaborate early on and avoid rework, right? Let's take an example. I am a developer who comes in and writes one line of code and checks in. And I have to wait two days for the test results to come. Come day two, I would pick up another story to make progress in my sprint. In the process, I've touched five services now. If the test result comes as a failure, I have already caused rework in six different places. This could have been easily avoided if I had gotten the indicators early on. I could have had those conversations and had collaborated with my designers, my product managers, argued with them, and gotten away with what I'd done, or even fixed the bug if it was actually a bug. All of these would have helped our developers' life better. We also had to focus on giving our management the visibility. The visibility that will help them, give them the intelligence required to assess the risk in our releases. All this meant that we had to unleash the power of our platform to build something in the platform's ecosystem, a system that could scale, be flexible, and was mainly transparent to our users, and also our leadership team. I'll hand it over to Scott, who will be talking about the implementation details. In the process, Scott will also touch upon some of our platform features that were crucial in rolling out this solution. - We're literally handing it over. Okay, so as Anitha mentioned, we started out with this sort of big mud pit that everybody was working in, and people were stepping on each other's toes. Wasn't the best experience once we started to scale up. So we moved on to these individual environments that were kind of attached to each team, which was great for them because if there was a validation that had failed, there's a very high probability that the team that was responsible for that environment checked in something, and that check-in caused that failure. So they didn't have to go searching around to see who was causing what problem. One of the side effects of this, though, was that we were still running all of these validations for the whole platform for every one of these environments. And so we went from running 3,000 test projects a day to 3,000 times however many environments we had. And as Anitha mentioned, we had some concurrency issues with Jenkins, which was our build orchestrator. From the vendor, we were informed that these things can run about 400 concurrent tasks a day before the controllers just kind of start to top out. So the next logical step with this really is just, okay, set up more controllers, scale that horizontally. And so that was the first thing that we did. We started building out more of these Jenkins controllers. And that was okay when the configurations were all kind of the same for each team. But we realized pretty quickly that we were running the same tests over and over again, even though a lot of teams weren't changing code, that even got touched by those tests. So that's when we decided, okay, we need to make this a bit more flexible for these teams. We want them to be able to pick and choose the tests that run for their parts of the product. But we still need to be able to do a full validation of the entire platform before this code goes into our mainline. So we still need to do both types of testing. And what we found when we tried to implement this is that a lot of these per-team configurations that made each Jenkins environment a bit special got kind of difficult for us to be able to implement effectively. And so what we ended up realizing pretty early on was that we were gonna have to kind of take control of this and come up with our own solution for it. And then, of course, we're a growing company. We're still growing pretty quickly. There's always more tests, more products, more developers. And so this problem was only gonna get worse. So what we realized is it was time to move this into our platform. And so that's exactly what we ended up doing. So we know that the ServiceNow platform can scale out tremendously. I mean, we're tiny compared to a lot of our big customers that you see here that are doing these massive implementations. So our goal was to take all of that queue management and all of the intelligence where Jenkins was saying, okay, the build completed, now we need to go run these tests, now we need to do this other validation, and let our ServiceNow instance handle those things. And we did that using all the tools that you all have available to yourselves. We have scheduled jobs and workflows and business rules, and we use the UI and everything. But we really built a completely bespoke solution for this problem that was really tailored to us. And it's not like a core part of the platform. This is just a new application that we built that really did what we needed. So we ended up making these Jenkins jobs really generic. Instead of it trying to tightly orchestrate every last step of the process, we have a handful of these things. We have one that says, go run a test project. go build the platform. Go create a database dump so that we can precede the test environment. And those are the same across all of these controllers, but our ServiceNow instance is where all that intelligence is, and it says, okay, for this team, I want you to do this build. And then we're gonna test this on Oracle, so I want you to do an Oracle database dump. And then I want you to take that Oracle database dump, and I want you to go run these tests on it. And so the instance is now where all that intelligence lies, and those Jenkins controllers just wait for jobs. And so ServiceNow goes, hey, do you have capacity? And Jenkins says, yeah, I got a few slots. We throw out some jobs. Then we go to the next one, no, you're busy. Okay, we'll find another one. And we kind of repeat this process as we go through this queue of jobs that we need to deal with. And then the other challenge that we had, as Anitha kind of mentioned as well, is that over the course of our release, the test load ebbs and flows. So at the beginning of a release cycle for a family release, teams are maybe doing a little bit of experimentation. They're still finalizing some of their designs. They're not really in the thick of getting everything ready to go out the door. So there's less testing that we have to do. But then as you move on to the end of the release, when we're in release readiness, and we're verifying everything, dotting our I's, crossing our T's, we need a tighter loop on these results. And so a team needs to be able to say, oh, I don't have a string that's internationalized. Let me go fix that. And then I need to do another test run to get that checked off the list. And so we need more resources to be able to accommodate that. That's where we integrate with Kubernetes and with AWS so that we can spin up and down and shift these resources where they're needed so that we can kind of dynamically adjust to what's going on with our test burn. And of course, we're service now, right? So we always want things to work great for people. And we're able to, again, leverage a lot of the functionality to make this a great experience for our users. So inside of our UI that everyone's familiar with, familiarity breeds content, right? They're able to go in there and say, okay, for my team, I wanna run these tests. And then, oh, we realized that we're actually impacting this other thing. Let me add one more project in there. All of that stuff is able to happen through the UIs that everybody's familiar with. people need to take action on these things, right? It's not just about running tests. So we take all of those test results and all of the validations we run about string internationalization and third-party libraries, and we digest that down and summarize it. And we present our users with actionable insights that they can use to go resolve the problems that they need to fix. So if there's a string that's not internationalized, that shows up on the list. If there's a new library they brought in, we automatically create an approval request with legal and with security to get that library approved. And so we try and minimize the active energy that people have to focus on solving these problems. And then they fix them, they rinse and repeat, and then everything knocks off the list, and they're good to go. Oh, and we also keep references to all the log data and everything for these runs. So if there's a test failure, an engineer can go in there and find all the information they need to debug that really effectively. The other thing is that we're dealing with a fair amount of data here, and with any sort of database-driven system, the more data you put in there, the slower things can get. Now, ServiceNow gives you a lot of tools that you can use to tune queries, and you can rework things to accommodate that, but that is work. And so we were fortunate enough to be customer zero for ServiceNow's new database architecture, which I believe is gonna be available to you all fairly soon as well. And the query planner and the query execution is a lot more intelligent in that framework, and so it's actually able to do a lot of those optimizations for us, and we don't have to spend that time going in and trying to tune each one of these parameters to make sure that a team is able to get their results in a timely manner and things like that. So with this, I'm gonna hand it back over to Anitha. She's gonna talk a little bit about how we roll this stuff up to our release managers and our leadership so that they can keep an eye on the overall health of the release and make sure that things are ready to get out the door for you all to use. - Thank you, Scott. That was a lot of work and a ton of change management to roll this out within our company. But we were able to achieve this view. We were able to get away from the mud pit of the environment setup we had and give our teams individual sandboxes where they were able to hand-pick tests that were closer to their product, and all the test results were in a reasonable feedback cycle. Just to set some context on the size of our engineering organization so you can relate to this, we have close to 5,000 engineers, which is broken down into 2,000 platform engineers, 1,700 app engineers, 1,200 QE engineers supported by researchers, designers, program managers, product managers, all broken into 450-plus Scrum teams who make this happen. This view, this system, was great for developers. They could immediately look at their reports and know what to do next. But this visualization wasn't helpful for our leadership team, wasn't directly applicable to them. For them, the test results was not as important as having visibility to identify where in the release process the risk lies, which teams needed help immediately. A view like this, which removes all the low-level details of test results and provides at a higher level, if I look at it, I can say there are five teams who are working on this release, and two out of them are putting my release at risk. As soon as we are able to provide this visibility, they're almost always able to extend help. They're able to use this information to do risk assessment, reprioritize if needed, and even provide resource help, adding more people to help the teams get out of those failure modes. When the right visibility is there, and the help is extended, teams were almost always able to turn things around. And my report would look all green. When a report looks all green, it only meant one thing. We are ready to ship this product to you guys. There's no ambiguity, there is no second-guessing, there's no chasing around. It was quite clear. I'm going back to the KPI I was talking about earlier. As a side effect of all this implementation, we were able to bring down our time to feedback from 33 hours to 24 hours. That is 30% savings in time. Time that we were able to give back to our developers. Who actually could use to do actual work instead of chasing people around. Just to put this number in context, right? This is not time that savings in running unit tests. It is time to test the entire platform feature against the product that they were building. That's huge savings. And as most of you in this room can relate, every project is our journey and this is ours. And this is where we are. And we don't stop here. We will continue to improve this metric as we go. With that, we open up for questions. And if you have questions, you can line up on the side of the stage and we will help you guys. Thank you for staying and listening in to us. - Thanks everybody. (crowd applauds)