I’m currently on a short sabbatical from my role at KPMG UK, which gives me plenty of time to get back into tech blogging. Today, I’m writing from Airlie Beach, Australia (the gateway to the Whitsunday Islands) and writing an opinionated short article about my three golden rules for deploying cloud-based technology.
Now, “delivering technology” in this context could range from an ad hoc script that pulls some metadata from a cloud environment, to a new feature, or even building an application from scratch.
Regardless of the scope of the solution, these three core principles remain the same and form the basis of any design thinking I do.
Simply put, maintainability can be described as the ease with which a tool can do its job.
Aiming for “No Ops” – I love the term “no-ops” where a solution lives its life with either little or no manual intervention. Often this Nivarna requires some forward thinking and investment. Done right, it can ensure your team can focus on new features without having to worry about keeping legacy solutions running. Avoid manual intervention whenever possible and automate like your life depends on it!
Complexity, the nemesis of maintainability – I always try to keep solutions as simple as possible. The legendary adage “Don’t reinvent the wheel” applies here. Always try to use a PAAS/SAAS service when requirements allow and opt for well maintained open source libraries rather than starting from scratch.
Low barrier to entry – Junior and newer team members should be able to seamlessly contribute to the product. This can be achieved through solid documentation, contribution guidelines, and a repository of “first editions” that are tagged and ready for new engineers to get their first experience. It’s not ideal to have solutions that can only be serviced by single points of failure – and will definitely affect the speed of your team.
Everything as code!! – the By far the best documentation is clean code. Make sure all infrastructure is written with Terraform/Bicep. This means engineers can easily reference the topology in a language they understand. To name some other preferred examples; Machine Images (Packer / Ansible), Policy (YAML / JSON), K8s (Helm) and of course the source code itself! Ideally, a solution should be immutable, ie easily built from scratch if that’s not possible – identify the manual steps and solve some backlog tickets!
Almost daily, my LinkedIn and Medium feeds are full of new companies falling victim to a cloud-based data breach, usually through social engineering or accidental misconfigurations. Regardless of the solution in the cloud, it pays to keep things secure!
Invest early in crash barriers – All major cloud providers have a wealth of built-in policies that can protect your organization from serious cloud misconfigurations. Many of these are ready to use out of the box, have them configured as a base as soon as possible. As your business matures, it pays to provide a way to deploy these cloud policies as code to ensure you can easily keep up with newer standards. Check out an Azure-based example below.
Safety is everyone’s responsibility – E-learning with yearly privileged access is not enough (if your organization even has one!). The threat landscape is constantly changing, and cybercriminals are getting smarter by the day. Safety should be built into every engineer’s goals. Encourage cloud security certifications, couple your threat intelligence capability with engineers, and encourage reading of regular threat reports such as B. those identified below by the NCSC. Learn where others have fallen short and fill in the gaps.
Watch out for very permissive accounts – Least privilege is gospel for anyone working in cybersecurity, but I’ve seen some questionable configurations throughout my career. Only assign permissions that the tool/solution needs. If a high level of permissions is required, see if you can combine this with mitigating controls like a conditional access policy. This controls when the credentials can be used, for example from a trusted IP range or device. I’ve included a super cool preview feature from Microsoft below.
Keep track of secrets – If the solution is based on a shared service account, ensure keys are rotated regularly, especially when engineers leave the company as they can easily keep them. A better alternative is credential-less access using AWS IAM roles/Azure Managed Identities. Finally, you should implement robust secret scanning in your SCM toolsets, an accidental access key in a git repo can wreak havoc in the wrong hands.
Sure enough – It is worth noting that security features can come with additional cost and complexity. It pays to have a quick, standardized way to assess the risk of your technology and apply a reasonable level of control. Don’t over-insure yourself, or you’ll sacrifice maintainability and, in some cases, even reliability.
Reliability is the likelihood of your solution breaking, ruining a user’s day and an on-call engineer’s evening.
Design around entropy – Entropy is the scientific measure of uncertainty. Entropy is high when you make changes to your system. So make sure you have a solid test suite to ensure you understand when changes will either break or undo a system. Couple this with a deployment pipeline so you can easily undo failed changes.
Monitor/Respond to Key Symptoms – Unfortunately Since most systems have unreliable components, make sure you are able to monitor key indicators of failure. Ideally, combine this with an automated runbook to resolve the symptom before it becomes an outage.
Opt for reliable components – this may state the obvious. But some components and services are just more reliable than others, consult the documentation and make sure the availability meets your business needs.
Vital Signs Warning – At the end of the day, it is better if you detect a system failure than the end user. Ensure you can detect and alert critical system failures using health checks or detect critical job failures etc. Link this to an alerting mechanism of your choice and ensure technicians know how to respond in order to increase uptime maximize.
Both business criticism and data sensitivity will of course determine how much you invest in each of these areas. In my experience, systems become more critical and sensitive over time. So start thinking about these areas from day one and iterate over time.
That’s a wrap folks! I hope you enjoyed reading this article. As with anything, this is not an exhaustive list, but it has served me well over the years in making cloud design decisions of any scale.