I have been looking, for various reasons, to purpose-built database space recently. Purpose-built databases, as you can imagine, are databases that are specialized to provide just a single (well, in some cases it’s two) type of data store. Purpose-built databases are also great when you’re building modern, cloud native applications, which has led to the birth of some interesting, fully managed purpose-built database offerings. AWS especially has done a good work on the area, so I figured I’d explore available options there.
Since there’s actually a bunch of these databases available from AWS, I’ve decided to split the post into 3 parts. In the first part, we’ll look into Amazon offering for Redis. Redis is an open-source, in-memory database, that is very popular with the developers. It is also one where AWS is providing us with two alternatives for it. These are Amazon ElasticCache and Amazon MemoryDB.
Why two? Read on, and I’ll tell you…
What are purpose-built databases?
So, what exactly is a purpose-built database? The best clue is in the name, it’s a database that is designed to serve a specific purpose. While we all love (or at least I do) relational databases, there are definitely cases where they provide too many features, and constraints, for your application needs. The most common types of purpose-built databases, and some examples of them, are:
- Document (sometimes also called NoSQL)
- Time series
- Wide column
- Ledger (you may have heard of the blockchain, mentioned somewhere)
And yes, some of these database engines are purpose-built for more than one purpose.
AWS ElasticCache for Redis
AWS has two options for your in-memory database workloads, with a slightly different approach on how they’re implemented. The first one I’ll cover is ElasticCache for Redis, which is a fully managed database offering. As the name, not so subtly, suggests, it’s mainly aimed as an alternative for Redis. This makes sense, as Redis is one of the most popular in-memory databases currently, being especially popular with the developers.
AWS ElasticCache for Redis offers very high compatibility to Redis, and it provides both cluster and non-cluster deployment models. Typical to managed databases, it also provides high level of security, reliability, and performance without need to actively manage it, thanks to auto-scaling features. It also scales up to 1 Petabyte in data size, and hundreds of nodes and shards.
From a performance perspective (and this is something where you can typically see purpose-built databases stand out), ElasticCache for Redis, can provide a microsecond latency for most Read/Write operations. It can also (automatically) scale from thousands to millions of requests in a second. From the security perspective (where I often don’t see purpose-built databases stand out), ElasticCache for Redis, is backed with AWS engineering genius: RBAC authentication, at-rest and in-transit encryption, VPC endpoints, etc.
Typical use cases for ElasticCache for Redis include:
- Real-time analytics
- Gaming services dashboards
And quite a few others. One thing that purpose-built databases come with, is trade-offs. The scaling and performance comes at the cost of durability and consistency. Yes, I also felt a slight disturbance in the Force, having said that. While ElasticCache for Redis does a good job of replicating data across multiple regions and replicas, it does this asynchronously, which can lead to a data lag. Data lag is a situation where all the replicas might not be up-to-date with the primary, for various reasons, such as network issues.
If you are very unlucky, data lag can also lead into the situation where you will experience some loss of data. This can happen, for instance, when your primary replica is way ahead of the other replicas and crashes. When that happens, the service will recover by automatically promoting the most up-to-date replica into the role of the primary. Now, what you need to keep in mind is, that most up-to-date isn’t the same as up-to-date.
So if you’re considering ElasticCache for Redis in AWS, keep in mind that you will also likely want to have:
- Data that is ephemeral in nature
- Separate primary database for the data that needs durability
AWS MemoryDB for Redis
MemoryDB is sometimes titled the fasted database in AWS, and it too, provides a fully managed, Redis compatible, database. Some of you might wonder why an earth AWS provides two alternatives for Redis? And this was a question I asked myself, before I took a bit more profound look into the differences between the two. The big difference is: MemoryDB has a transaction log. And you know what transaction log stands for? Yes, indeed, it stands for durability and consistency! The two things that ElasticCache doesn’t give you.
It does come with all the same benefits of the ElasticCache, and typical use cases, but with a small hit to write performance (measured in single-digit milliseconds). But rather than being just something you’d use alongside a primary database, MemoryDB can also take that role by providing data durability, alongside consistency. So, how does the MemoryDB provide the durability and consistency for asynchronous replicas? The answer is: By having a transaction log over multiple Availability Zones. Below, a somewhat simplified architecture.
The similarity to ElasticCache is, that the data read from the secondary replicas can still be stale, as it’s taken to secondary replicas asynchronously. The considerable difference is, that transaction is acknowledged to be completed only, after it’s written to the transaction log. Compared to ElasticCache, there’s no risk of data loss, even in the case replicas are not up-to-date and there’s a crash of the primary database. If the primary database would crash, the service would look at the replicas and determine which one is the closest to the previous primary regarding transactions. It would then proceed to hydrate the secondary from the transactions logs, and eventually promote it to primary as it’s reaching up-to-date status.
Migrating to Amazon ElasticCache or MemoryDB
So, what to do, if you’re already running Redis on EC2 or on-premises and would like to utilize a fully managed service? There are couple ways to do it, two most typical are.
- Offline: BGSAVE or SAVE your data to AWS S3 and use that to build a new Redis Cluster in AWS
- Online: Logical replication of data with AWS Data Migration Service
The first option, typically called offline migration, will require an extensive amount of downtime depending on how much data you’re about to transfer. It is also the simpler method of the two, as you can use the Redis backup to seed a new database cluster. The second option, sometimes referred to as online migration, will take a little more work. You need to set up the Database Migration Service and replication between the target and the source, but it also allows for near zero downtime failover to AWS.
Wrapping it up
If you found this post about AWS Purpose-Built Databases useful, I would also recommend reading the two other posts in this series.