Well-Architected Framework (for Databases)

If you work mainly with on-premises and focus on databases, there’s a good chance that you are not familiar with the concept of a Well-Architected Framework (WAF). However, while you might not be familiar with the term, you probably are already doing things related to it. How do I know that, you ask? Because, ultimately, WAF is a collection of best practices and architectural guidelines, aimed at improving the quality of the workloads.

toy building blocks lying next to phone
It’s just about putting things together in the right way, so easy, right?

Personally, I have yet to meet a database professional who wouldn’t consider that to be one of their main objectives. And while WAF’s are defined with the cloud infrastructure in mind, many of these same principles apply to on-premises database workloads.

In this post, my goal is to provide you with a high-level view of the topics covered by a Well-Architected Framework, from the database perspective. Furthermore, I am giving you some key questions to consider, when implementing principles of Well-Architected Framework to your own database estate.

What is a Well-Architected Framework?

The Well-Architected Framework is, as mentioned, a set of architectural guidance and industry best practices with the goal of improving the quality of the cloud-based workloads. At the heart of the WAF, there are five (5) pillars, each focusing on a specific topic.

The list below comes from the Microsoft Well-Architected Framework. AWS has a rather similar framework of their own, with only one slight difference. They have a 6th pillar, Sustainability. As environmental topics are quite a hot topic today, I wouldn’t be surprised to find Microsoft adding this one as well to their framework in the near future.

  • Reliability
    • Focusing on the system’s ability to recover from failures.
  • Security
    • Focusing on how the systems are protected against security threats and data protection.
  • Cost optimization
    • Focusing on cost-effectiveness of the deployed systems.
  • Operational excellence
    • Focusing on the operations of the systems.
  • Performance efficiency
    • Focusing on the system’s ability to adapt to changing workloads.

Looking at the list above, I feel confident in saying that these are all topics that DBAs and other database professionals work with almost in a daily basis, on-premises or the cloud. And for a good reason too! For many organizations, the production database workloads are some of the most business critical ones they have.

So, how to get started implementing any of these?

There’s a metric ton of documentation and videos available about each of these topics, so I will not try to cram all that up in a single blog post. What I can and will do, though, is to give some examples of how to identify the core requirements from these pillars, and then point you in the right direction for the details on implementation and more thorough guidance.

Sounds good? If so, read on.


I like to start with Reliability, as that’s the one thing everyone is always interested about, especially the DBAs and the application owners and operations teams. Reliability is also one concept we’re typically most familiar with. When I need to understand what is required for Reliability, the key questions for me are:

  • Have you defined reliability targets and metrics for the database services?
  • Is the design of the database services resilient to failures?
  • Do you have the capacity and ability to scale up or down, based on demand?
  • Do you have an up-to-date Disaster Recovery plan?
  • Have you tested the failover and failback of the databases, do you do it regularly?
  • What measures are in place for ensuring reliable operations?
  • What security measures are in place to protect the data and limit access to databases?
  • Do you have a break glass accounts, and have you tested them?
  • How are database services and databases monitored?

For me, the nature of the public cloud emphasizes the importance of Reliability. Unlike the on-premises, where we are used to having the premium, enterprise-grade hardware, the clouds are mostly made of commodity one. Assume that everything can break, and be prepared for it. To learn more about Reliability pillar, and about the implementation, see the following links.


I did say Reliability is one of the more interesting topics, but Security comes definitely very high on the list as well. Today, it might actually be also higher, depending on what kind of services the database is used by. Unlike Reliability, I don’t categorize Security questions. The typical ones I ask are:

  • What considerations are done to ensure compliance and governance?
  • What measures exist to protect sensitive data?
  • How is identity and authorization managed?
  • How is the client access to databases secured?
  • How are the critical accounts protected?

Traditionally, the security has focused a lot on encryption and keeping logins secure. However, with regulations like the GDPR, having compliance and governance in place are equally important. To learn more about the actual implementations, follow these links:

Cost optimization

This is the one topic that interests the people who are paying the bills. While you still need to optimize licenses and hardware in the on-premises, this is the one pillar that I consider mostly relevant to the public cloud. The cloud also makes cost optimization easy, by providing all the billing details and services for setting up, monitoring and alerting based on planned budgets. The key questions around cost optimization, for me, are:

  • What are the cost optimization actions you are doing on the database services?
  • How is the organization modelling the cloud costs?
  • What measures are in place for proper provisioning of the cloud resources?
  • How are the database services costs being monitored?

Running database services in the cloud, without a proper budget controls, is an easy way to spend huge amounts of extra money for no real benefits. To learn more about the budgets, cost alerts, etc. follow the links below.

Operational excellence

The Operational excellence pillar deals with the hands-on operations for the databases and database services. It’s not, however, only about deploying and managing the infrastructure. The Operational excellence pillar also covers parts of the database development, and the continuous improvements of operations. The key questions for me here, are:

  • How are the database services deployed and configured?
  • How are database changes being deployed?
  • Are there test and development environments, that match production?
  • Are the infrastructure and database deployments tested?
  • Is there monitoring and alerting for deployments?

The Operational excellence pillar directly measures your operations’ maturity. This means use of source control, automated deployments, runbooks, etc. Moreover, this is also the only pillar that applies, without almost any changes, to working with both on-premises and the cloud. To learn more about the Operational excellence, follow these links:

Performance efficiency

The last of the well-architected architecture pillars is the Performance efficiency. For a database professional, the name might be slightly misleading. The topics related to Performance efficiency tend to focus more on having adequate resources and understanding of the workload, rather than performance tuning. The key questions for me, regarding Performance efficiency, are:

  • How are the services designed to scale, based on the workload?
  • How is scaling considered during the database design?
  • What are the planned actions to handle increased workloads?
  • What are the planned key actions for ensuring the performance of the databases?

The Performance efficiency pillar, by default, focuses heavily on the scalability of the underlying infrastructure and testing, to handle seasonality and other changes in the workloads. However, personally, I do like to poke around the database design decisions too. Depending on the database system, there are things you can also do there, to make them more scalable. To learn more about the details of the Performance pillar, follow the links below:

Some closing thoughts.

Despite the fancy name, for us database professionals many of these things are part of our daily work. And, while it’s clearly for the public clouds, there’s nothing stopping you from implementing the majority of these things in on-premises. Moreover, this post is just scratching the surface. If you really want to learn more about the Well-Architected Frameworks, I urge you to look at the linked materials in this post.

Just make sure that you have enough time, and coffee, at hand.

Published by

Leave a Reply

%d bloggers like this: