Infrastructure as code: How we do it at GoHealth Slovakia
At GoHealth we focus on helping our customers to choose the best healthcare plan based on their needs and provide care to address health insurance issues they might face later after enrollment. We rely on our technology platform to cover the customer journey processes, collect data for analysis, and much more. The technology platform consists of several applications, database systems, storage, and other services, most of them running in the cloud. Managing the entire infrastructure requires a clear picture of all the rules and available configurations. We use Infrastructure as Code (IaC) approach to maintain the configuration in one place, checked into the source code management system (SCM). All the configuration is managed via Terraform — our tool of choice to implement the IaC concepts.
We have picked Terraform because it is a cloud independent open-source system and one of the most adopted IaC tools by the IT community worldwide. In addition, it provides lots of great features, like automatic unused resource cleanup or dependency resolution which allows it to perform changes in the right order.
How we do it…
The Terraform code is logically split into Git repositories, usually a single repository per configured service or more services in one repository. Infrastructure changes can be implemented by any software engineer and in more complex cases we still get support from our SRE team. When we are done with the changes, we raise a pull request, which is then reviewed by SREs. This is an important step to avoid misconfiguration or even worse case — an outage of our services. Even if changes are fine, it is always great to get some feedback on code structure or other aspects from more experienced colleagues. Once approved, we rely on the Jenkins pipeline to validate changes on top of the existing infrastructure state and deliver them to the target environment.
Changes are deployed automatically. We don’t waste time on manual changes of complex infrastructure configurations. Manual changes are error-prone and difficult to track. They put time and responsibility pressure on individuals — people with the required permissions must be available right at the time when you make your infrastructure change. Terraform also works great when we need to deploy many changes at once in the shortest possible timeframe. Manually it would take much longer. With modularization, we can finetune the desired setup in a testing environment and easily apply the same state to other environments.
Sometimes you need to find out who and why executed a change in the configuration. Having all the changes stored in the source code management tool enables us to determine the owner of the change and the reason for the change captured in the commit message. Every commit is linked to a Jira ticket, where we can get the full context of a particular change.
Having the infrastructure expressed as code, we have the documentation of desired configuration state easily accessible. You do not need to rely on the memory or individual knowledge of people. If those guys owning all the knowledge of your infrastructure leave the company or become unavailable, you could end up in a situation where no one else knows how something should work or why it was set up as it stands. This risk is significantly reduced with the utilization of IaC.
Bad, unpredictable things happen more often than we would expect. It is good to be prepared, if your service suddenly breaks because your infrastructure got corrupted or misconfigured. It is great to have an infrastructure blueprint ready to restore the previous working state.
Infrastructure setup can be owned by the same team implementing the new application or feature. “Infra” can become one of the stories in the epic, or one of the tasks within the user story, just as the other application development tasks.
Copy-paste pattern in action
Having the option to see a complete set of changes in SCM done by other teams in the past for similar use cases, enables other teams to use a copy-paste pattern. It is no shame to follow the example provided by others, which is already proven in the production environment. It helps the team to reduce the risk of forgetting to implement some important parts of complex configuration and saves much time from being spent on implementation. Of course, even if we copy existing code, understanding each line of code is a must.
Typical use cases:
- Configure application environment
When we implement new applications or services, we typically deploy them to cloud container instances. For the services to work correctly we need to set up a lot of various parts of infrastructure — we set reasonable hardware resources including memory, CPU, and instance count to ensure the high availability of our services. We define network configuration including service URI, load balancing, accessibility, service ports, and protocols. We also define access permissions to other services including various cloud services. There is still a lot more we need to consider like monitoring or log management. Having a well-defined code helps us not to forget about any important parts of the environment setup and speed up the new service delivery.
2. Update permissions for cloud storage locations
We often store sensitive data that need strict access management. Some objects can be read, modified, created, or deleted only by a specific group of people. Adding or removing permissions and extending or narrowing groups is an easy task with IaC at hand. We just need to add or update a few lines of code to do the trick, deploy changes and that’s it.
3. Version control system rules
We use Terraform to also manage rules and permissions for our SCM repositories. We use Terraform modules to define common settings for the repository like reviewing rules, merging permissions, etc. and apply these rules automatically across all related repositories. Each team can easily tweak the settings or manage team members by editing configuration files.
4. Message broker setup
Another frequent use case is the configuration of Kafka topics. Apache Kafka is a distributed event streaming platform perfectly suited for publisher-subscriber messaging-based communication between services. Sometimes we need to create a new Kafka topic including the configuration of a data retention strategy, partition count, ACL and more. It is a task suitable even for less experienced team members as infrastructure code is easy to read and understand.
5. SFTP management
We maintain multiple SFTP (SSH File Transfer Protocol) locations running on our infrastructure, to enable file exchange between us and our partners. For each of them we need to define the physical storage location for the files and associate a strong public key with a partner account since we use SSH key authentication to meet industry security standards. With many partners, key and storage management can become a complex issue. This is where IaC comes in to help, allowing all partner accounts to be configured in a single place in a well-structured and clear way.
In the beginning, it takes time to define infrastructure resources using code, but this effort pays off later. It is worth investing time in it.
Written by Juraj Grigeľ