Our customers find increasing value as we continue to add more clinical decision support and shared decision-making tools to the HealthDecision platform. In addition to the core guidelines that our tools support, we also offer services to customize and add institution-specific guidelines and information. Both of these dimensions increase the software development and maintenance complexity and necessitate automated testing. In this post, I will cover our team’s journey to simplifying this while ensuring that we maintain the quality of our offerings.
As we continue to grow, automated testing has become critical. Initially, the focus was on unit tests, but as we started offering customer-specific guidelines and functionality customization, the end-to-end tests became critical. We currently use AngularJS as our front-end framework (more on that later in another post) and AngularJS’ preferred testing framework is Protractor. So, we started with Protractor but quickly realized that we needed to add another layer to keep up the development pace.
After some exploration, we ended up writing a relatively simple Excel-based Protractor test generator. This approach allowed our developers to focus on the software development and the tool analysts to create the automated tests that covered the tool- and customer-specific algorithms. Then came the next challenge.
Given the number of parameters and sources of the data, we ended up with a huge test suite that covered a good breadth of the workflows that our tools perform. The breadth of the tests gave us confidence that the tools were working as expected, but the thousands of tests that we were running took hours to complete—many times taking a full business day—with few Selenium Chrome instances per machine. The number of instances per machine was limited by some odd Chrome and Selenium/Protractor behavior; after a certain number of instances the tests would start having unexpected issues like timeouts and other failures. After some further exploration, a Docker-based Selenium grid solution seemed to make sense.
There were many open source Docker images that accomplished what we wanted so all we needed was a good orchestration solution. Azure has some relevant offerings, e.g. ACS or Azure Batch, but we have always been intrigued by Kubernetes and it seemed like a more aligned solution for us given our future goals. So, we started exploring Kubernetes as the orchestrator. Even though HealthDecision is hosted in Azure, Azure did not offer a managed Kubernetes service at the time so we decided to try Google Cloud’s managed Kubernetes offering so we could focus on proving the concept instead of learning how to setup and manage Kubernetes first. Our initial experiments were successful, and, fortunately, by the time we had to make a commitment, Azure announced its managed Kubernetes offering.
That brings us to today. We are now running an Azure-managed Kubernetes cluster that we can easily scale up and down to tens of nodes within minutes. Once the cluster is scaled to the needed size, Kubernetes does its magic and, again, within minutes, scales the needed Selenium browser containers that all seamlessly connect to the Selenium hub. In addition to this optimization, we have also broken the test suites into two sets: a selective set that is still fairly large and runs on all relevant code changes, and a full set that has a much broader coverage and gets run at different milestones. Given our scalable Kubernetes solution, we can now run both of these within a couple hours. Furthermore, the additional cost for this scaling is negligible.
Our team has been impressed by Kubernetes and Azure’s managed Kubernetes offering. It’s not perfect, but it has simplified many things for us and allowed the team to continue focusing on new tools.
Looking into the future, one of the changes that we are looking at is either moving Jenkins to the Kubernetes cluster or migrating to Concourse CI on the same cluster.