Helix Engineering: The art of configuration, part 2
This is part of a blog series written by the Helix Engineering staff. To follow along with the team as they publish, keep tabs on the Helix Engineering blog category.
In our last installment, we covered some of the more conventional forms of configuration and their pros and cons. In today’s conclusion, we’ll explore further, including some more modern techniques.
Configuration of cloud applications
Modern cloud applications are composed of multiple components running on multiple machines. They are often packaged as a set of containers. How do you configure such systems? Well, that depends on your CI/CD pipeline and how you eventually deploy and launch your services and applications. With Docker, you can use all above-mentioned options. You can pass command-line arguments to your Docker command, you can pass environment variables, and you can add configuration files to your Docker image. But, those options are often not satisfactory. They typically require baking a new image or changing a deployment script and restarting every instance. This means that you need to go through the whole process of change management just like a code change. You may as well hard-code your configuration as constants. There is also the question of distributing the new configuration across all instances, which will not be immediate.
Is there another way? Can running applications be configured dynamically? I’m glad you asked! The answer is “yes.” You can keep your configuration in a database that applications can access or use a dedicated configuration service. Let’s examine both options.
Storing your configuration in a database gives you immense power and flexibility:
• All service instances can be updated at once (almost, depending how they refresh their configuration)
• You can use all the DB goodness to have versioning, a history of previous configurations, and to keep track who changed what and when
• You can limit each service to access only its own configuration (e.g., keep a separate table for each service)
This immense power and flexibility is also the problem with the DB approach. Exposing a “raw” database to the services requires you to ensure services can’t accidentally or maliciously temper with other service configurations, or DDoS the configuration DB. You’ll need to define and manage DB credentials for every service. If the configuration DB schema changes, you’ll need to upgrade all your services. It also breaks the pure share-nothing concept of micro-services.
Some of these problems can be alleviated with a configuration library that exposes a nice interface to your services and does all the heavy lifting of talking to the configuration DB itself.
Remote Configuration Service
A remote configuration service has similar attributes to the database solution. It is also a central repository of configuration information, however by adding an additional layer of indirection you get a better control of the interaction. The configuration service can expose a precise interface and ensures services get exactly what they need without the risk of breaking anything. The service can still use a database to store the configuration.
Kubernetes Config Maps
The problem with the database or remote configuration service approach is that it is bespoke. Environment variables and configuration files are low-level and ubiquitous. Large organizations may have hundreds or even thousands of legacy applications and services that are configured using environment variables and/or configuration files. How do you bring them into the fold?
Kubernetes has a very interesting solution called ConfigMap. A ConfigMap is stored for you by the Kubernetes API server, which functions as a remote configuration service (except you don’t have to implement, deploy, and manage a separate service). The interesting part is that the contents of the configuration are exposed to services and applications as environment variables or files. This allows migrating existing applications easily to Kubernetes. (If you want to know more about Kubernetes in general and ConfigMaps in particular check out my book Mastering Kubernetes – 2nd Edition.)
Using AWS Parameter Store as a remote configuration service
If your system is running on AWS, the Parameter Store is a great option for a remote configuration service. It has the following going for it:
• Hierarchical organization
• Path-Based access control
• Built-in encryption
AWS manages it for you. That means reliability, availability, and backup/restore. You don’t have to develop it and it comes with UI, CLI support, and client libraries in any programming language. You get to control access using standard AWS IAM, and since you are running your system on AWS anyway, your services will already have credentials to access Parameter Store. If you want to keep secrets there, then the built-in encryption is a huge time-saver. Finally, it is very cheap compared to other services.
But, it’s not all rosy in Parameter Store land. Since it’s so cheap, you have no guarantees about scale and concurrent requests. If you access it with too much vigor you will get a throttling error. The rate limit is pretty low at a few tens of requests per second. Another issue we ran into at Helix is the limited history of parameters. Parameter store will keep a history of 100 versions of your parameter. This is not a problem on its own, but if you try to modify the value again, you will get an error. Finally, the size of values is limited to 4,096 bytes.
At Helix, we minimized the impact of the rate limit problem by serializing multiple values into a single JSON blob. We took care of the versioning limit by deleting old versions when reaching the 100 limit. For large values, we added a splitting scheme to our tools and code that split them across multiple parameters. Each parameter contains a chunk of 4,096 bytes (except the last one that may be smaller).
Software configuration is an important and delicate spect of developing, deploying and evolving software at scale. I hope this blog post gives you some ideas about the different options and how to mix and match them to improve your system. In future blog posts, we will dive deeper into the design and implementation of some of the concepts mentioned here as well as additional topics such as secret management and generic serialization into Golang strongly-typed structs. Until then, here are a couple great resources for further reading about AWS Parameter Store:
Helix is the leading population genomics and viral surveillance company operating at the intersection of clinical care, research, and data analytics.