AWS Database Services
There are two major database offerings within AWS. The first is RDS, which is designed to house relational databases. The second is DynamoDB which is for non-relational databases.
RDS (Relational Database Service)
As described above, RDS is used for relational databases. It supports a variety of different database engines. These all offer free tier usage, except Aurora:
- Aurora (a fork from MySQL with 5 times better performance than MySQL)
- Microsoft SQL Server
RDS is a cost-efficient and scalable way to launch industry standard relational databases. For this service, you’ll be charged for:
- The specific DB engine you choose (e.g. MySQL)
- The instance class (like instance type in EC2)
- The purchasing terms you’ve selected (on-demand or reserved)
- The storage you’re using
- The transfer in and out of RDS
RDS instances do not have a GUI in the AWS console
RDS has the added benefit of being a fully managed service, which means you get:
- Automatic minor updates
- Automatic backups (point in time snapshots)
- Multi-AZ deployments with a single click
- Automatic recover in the event of a failure
Note: point in time snapshots are deleted once the database instance is deleted.
Multiple Availability Zone (AZ) Deployments
RDS enables you to launch multi-AZ deployments in a single click. With this, data is replicated to a standby instance in a different availability zone (but the same region). If there is a service outage, a primary node failure or a software update, AWS will automatically change the CNAME record to point to the standby instance.
Further to this, the backups in a multi-AZ deployment are taken against the standby instance to reduce load on the primary instance.
For Multi-AZ to work, your RDS instance must be launched into a subnet group
Read replicas are copies of the primary database, used for read-only purposes. When data is written to the primary database, it’s then copied / replicated by AWS to the read replica.
So, all read traffic is automatically redirected by AWS from the primary instance to the read replica to reduce load & improve performance on the primary database.
Read replicas are particularly useful if your database is used for a lot of reports- as you can reduce load on the primary database by running all MI reports from the read replica and they’re scalable. So, if on a Monday morning all reports are pulled, you may choose to have 4 read replica instances running, while later in the week you may get by with a single read replica instance.
You can promote a read replica to be a primary instance. This is useful as you can rebuild indexes on the read replica (a CPU heavy job) or import / export data into / out of RDS and then promote the replica to the primary instance. This means that at no point do you have degraded performance on the primary instance.
Read replicas are best suited for high volume, non-cached database traffic.
Creating an RDS instance
When you create an RDS instance, you’ll need to launch it into a subnet group. This can be configured from the RDS console. If you’re launching your instance into a public subnet, you’ll want to group the public subnets you have into a single subnet group. The reason that you will do this, is that multi-AZ deployments are only applicable if instances are launched into a subnet group.
You can then launch the instance into that subnet group. Remember, if it’s in a private subnet, you want to make sure ‘publically accessible’ is set to ‘no’ during instance creation.
It will ask you what availability zone you wish to launch into. As you’ve already selected the subnet group to use & a subnet resides in an availability zone, you can select ‘no preference’ – this will launch into one of the availability zones defined in your subnet group.
An RDS security group must have a rule to open port 3306. If you do not have a security group with this configuration, it will automatically be added by AWS by selecting ‘create new security group’.
To connect to an RDS instance in a private subnet, you’ll need to SSH tunnel. To do this, you can download something like MYSQL Workbench. From the menu, select ‘Standard TCP/IP over SSH’.
If you receive a failed connection message, you’ll need to check the following to ensure all the settings are correct & enable connectivity: Security Groups; Network Access Control Lists; Route Tables and Internet Gateways.
Note: when AWS creates a security group for you, you may find it restricts source data to a specific IP address which will cause connectivity issues. You can change this to 0.0.0.0/0.
DynamoDB, unlike RDS does not offer existing database engines for you to adopt. You can adopt the DynamoDB model which is intended to replace MongoDB, Cassandra and Oracle no-SQL. DynamoDB does offer free tier usage.
The DynamoDB service offers a fast and flexible service that provides consistent performance and millisecond latency at any scale.
If using DynamoDB, you will be charged for:
- Provisioned throughput capacity
- Indexed data storage
- DynamoDB streams
- Reserved capacity
- Data transfer in / out of DynamoDB
The service is fully managed, which means that AWS handles the provisioning and scaling of hardware on your behalf. DynamoDB is fully distributed & scales automatically with demand and growth. All you need to do is specify the required throughput capacity.
DynamoDB is fault tolerant as all data is synced across all availability zones within a region.
Elasticache is a fully managed, in-memory cache engine which enables us to improve database performance by caching the results of queries – leading to less repeat requests on the database, reducing load.
Elasticache is powered by either Memcached or Redis. MySQL has a Memcached plugin which enables us to easily utilize the Elasticache features.
Redshift is a petabyte-scale data warehousing service which is fully managed and scalable. It’s used for big data analytics & integrates with popular business intelligence tools such as Microstrategy & Tableau.