Scaling QA Infra using headless Selenium and Kubernetes

Rajat Jindal

Aug 14th 2024, 02:00PM

In this blog, I will walk you through how we scaled QA Infra from 3 Windows hosts to on-demand infra using Kubernetes.

First a personal backstory and thank you note

Back in 2009, I switched from being a C++/Perl developer to a QA engineer at a small consulting firm. First day at work, my mentor/lead introduced me to browser automation using Selenium (we were at Selenium 1.x and Firefox 3.x). Then helped me play around with Selenium IDE and my first Hello World browser automation. Yes, the same famous steps:

  • Open firefox
  • Navigate toΒ https://google.com
  • Search for "hello world"
  • Verify that the title is "hello world"

This was my first time using browser automation. I was amazed by the tool and how it worked. I didn't know at that time that this tool would become part of my daily life for the next decade or so. I owe a lot of my success in my career to Selenium, the folks behind Selenium, my first QA automation project (The one I am going to talk about in this post) and my awesome team.

Origin and Destination

Origin

When I joined the team, we had this automation running on three Windows hosts, with no support for Linux or IE. It would take a few days to set up a new test host. The release cycle for the project was 12-18 months (❗❗) and there was weekly on-call schedule to keep the automation tests running. alt text

Destination

Over the time, we moved a lot of QA Infra to run on on-demand infra using Kubernetes and used tools such as ElasticSearch, Vuejs, Kibana, Redis to automate and optimize the tooling to support the monthly (πŸŽ‰πŸŽ‰) release cycles. All the test executions, and monitoring was automated. Multiple triage reports were created using the data-analytics on the test results. The application/test logs and commit/build details were indexed into ElasticSearch.

I also received an outstanding technical contributions award, from our company's CEO, for my contributions to the Infrastructure enhancements.

Journey

We achieved the above in a few phases. While these phases were not pre-planned, they helped us reach where we were able to set up new test boxes in 10s (❀️ Kubernetes) down from a few days and were able to provide feedback on new builds (using smoke tests) from about a day to 1 hour(that included ~20 mins of new application box setup) (❀️ Docker/Containers).

Supporting headless browser

During my short stint at Google (that is a separate story), one of my colleagues worked on automation for Google Flights pricing validation, and that was my first encounter with headless browsers. Fast forward a year, I joined back my team as an employee in the US, and after a short wait, I was given the responsibility of improving our harness infra. One of the first things I did there was to add the support for running the browser tests on Linux in Headless mode. For that, I added a few CentOS based VM's to our on-premise datacenter (built on top of OpenStack), and Puppet.

First version of QA Dashboard

Once on Linux, I was able to run Apache server on those boxes, and wrote a whole bunch of CGI scripts to automate some tasks which were manual:

  • Building a new VM using ESX boxes and VSphere api's
  • Configuring the Cobbler system using their api's
  • Configuring the host machine itself (to ensure tests and framework are always up-to-date)
  • Started storing the test results, and app/test logs in ElasticSearch instance.
  • Provide all these abilities + test-reporting on a centralized dashboard (bootstrap + whole bunch of CGI scripts)

Selling the idea to my Manager and VP

On demonstrating the value that this dashboard brought to the team, I asked to hire an intern to build this dashboard from scratch using more modern tooling. First week into the office, our intern delivered our first tool, and our projects' leadership team almost jumped out of their seats and were like: THIS !!!

Enter Kubernetes/Containers

Around this time, our cloud architect gave a presentation around containers/Kubernetes, and how it could help us reduce time-to-production. I was immediately sold on the value and started exploring that for QA Infra stuff. I rewrote the infra tooling to be able to run the tests using Containers/Kuberentes, and thanks to the earlier enhancements, within a month we had almost all the QA Infra stuff running using Containers on Kubernetes. At this stage, we started to identify some blockers on QA Test scripts and other supporting infra. Fixing them was not trivial given the nature of the system under test.

End of my QA Journey

At this stage, my VP called me one evening (while I was on vacation) and asked me to join a new team we were forming to adopt Kubernetes for all our microservices. That is how I became the founding engineer for Cloud-15 team. We were responsible for providing an internal platform for our developers, to be able to deliver a new service from an idea to running in production in 15 minutes or less. (of course it didn't include the time to code the actual business logic, but just the skeleton service). As part of this team, I came across the Cloud Native ecosystem and made several open-source contributions in that area since.

You can see my Kubernetes Contributor card here

Next Steps for QA Infra

Even after I moved on from the team, I always had a wishlist of things we could improve on for the QA Infra to scale things even better:

  • Move from CGI scripts to Golang based tooling for QA Infra. (I actually started this work before leaving the company, and it was at a stage where it was working but not completely tested. The team built on top of that, and some of those scripts are now rewritten/used using Golang)
  • Rebuild some of our dated core-infra such as Cobbler and DNS Servers to scale well.
  • Reduce the amount of time it takes to setup the new host with "application under test".
  • Reduce the duplicate tests (even if they are just set up for other tests) to dramatically reduce the execution time for those tests.
  • Remove the need for a jump host and move that to a Kubernetes job to allow for faster scaling.

If you have any questions about this journey or need help (or consulting) for scaling your infrastructure, please feel free to contact me