
Microsoft recently released their new SLA, RPO and RTO guarantees for Azure SQL Database and oh boy, those are really something else. In fact, they are so much something else, that at the moment of me writing this, no other cloud provider has managed to promise the same level of business continuity for their Platform-as-a-Service database services.
Besides the highest SLA currently in the market, Microsoft has gone one step further and is now guaranteeing also RPO and RTO for their service. Now this is a very interesting approach because neither Google nor AWS is giving any for their own services.
Service Level Agreements (SLA)
Let’s start with a brief introduction to Service Level Agreements. These are, as the name suggests, used to define the responsibilities between the service providers and the users of the said service. In IT, this usually means the overall availability and is commonly described in “nines”. The more nines you have in the SLA, the higher the availability you’re getting (and paying for). For Azure SQL Database the highest availability guarantee is 99.995% which translates roughly to 26 minutes of unplanned downtime a year.
Keyword here being unplanned. Planned downtime doesn’t often, if ever, count against the SLAs.
99.995% is still quite a promise to give, even though there are also some requirements for you to get there. Below are described the combination of deployments and their promised SLAs.
SLA | Tier | Requirements |
---|---|---|
99.995% | Business Critical + Premium | Zone Redundant Deployment |
99.99% | Business Critical + Premium | N/A |
99.99% | General Purpose, Standard, Basic, Hyperscale | 2 or more replicas |
99.99% | Hyperscale | 0 replicas |
Besides business critical tier, there’s quite a nice number of nines for other tiers as well, as long as the specific requirements are met.
So if you’re looking at those numbers and thinking what’s the significance of having 99.995% vs 99.99%, it is quite a lot if you’re talking about business critical applications. 99.99% availability means that your application can have 52 minutes of unplanned downtime in a year, which is twice as much as you are guaranteed with 99.995%.
Microsoft commitment
Numbers are nice and all, but how is Microsoft prepared to back their claim? The answer to that is “money”. If their service fails to uphold the agreed SLA, you are entitled to return of service credit. The amount depends on what the downtime each month is. For the highest tier, see numbers below.
Uptime Pct | Actual Time | Service Credit Returned |
< 99.995% | 2min 10sec | 10% |
< 99% | 7h 12min | 25% |
< 95% | 1d 12h | 100% |
Source: SLA Uptime Calculator
SLA Comparison to Cloud SQL and RDS
Both Google and Amazon have also defined SLAs for their SQL Server PaaS option. Amazon promises that their RDS Multi-AZ instances are available for 99.95% of the time. Google Cloud SQL has the same service level objective, 99.95% While the difference doesn’t look that big, in a yearly level this amounts to roughly 4 hours and 20 minutes!
RPO and RTO
RPO stands for Recovery Point Objective, and it’s the maximum expected data loss. RTO stands for Recovery Time Objective and is the amount of time in which the service becomes available again after failure. Since Microsoft is the only hyperscaler currently guaranteeing RPO and RTO for their PaaS database services, there’s little point to compare it. Instead, let’s take a look at what they’re promising if you have Business Critical tier database with geo-replication.
- RPO of 5 seconds
- RTO of 30 seconds
So, not just promising high availability, Microsoft is promising that during the failure you’ll lose at most 5 seconds of data and your service will be available again after 30 seconds. That is not bad, not bad at all.
My thoughts about it
I’ve been thinking about the Microsoft promises for SLA, RPO and RTO. They’re first to go forward and give guarantees for the two latter ones, and it’s based on the knowledge they have about their services. Microsoft collects a huge amount of telemetry data from Azure and different services there, so when they promise a 5-second RPO it’s something they have learned from all that data they’ve collected.
While it’s really nice to have these guarantees, it’s equally nice to know that they’re not based on the optimistic nature of sales people, but facts and numbers.
Leave a Reply