Last Updated on January 13, 2021
I have been using gp2 type Elastic Block Store (EBS) Volumes by default, that when the gp3 type was launched I was really curious about the difference between the two.
This lead me into a rabbit hole trying to look beyond the statement of Amazon Web Services (AWS) that “gp3 is 20% cheaper than gp2 EBS Volume types”.
I realized that I have a small understanding about Throughput compared to IOPS, so in this post, I will go into the details about the Throughput of EBS Volumes.
- What is Throughput?
- Does the amount of data to read/write per second affect Throughput and IOPS?
- Wrong assumption on I/O size, IOPS and Throughput
- What happens if I exceed the Throughput of my EBS Volumes?
What is Throughput?
For an EBS Volume, throughput is the total amount of data that a storage can read/write per second.
In the AWS documentation the unit for Throughput is
Here are some simple examples on how to compute Throughput.
Example 1.1: If I have a 100 MiB file and it gets written to my storage in 1 second, then my Throughput for writing that file to my volume is 100 MiB/s.
100 MiB / 1 second = 100 MiB/s
Example 1.2: Reading a 200 MiB file in 4 seconds, results in a Throughput of 50 MiB/s.
200 MiB / 4 seconds = 50 MiB/s
Example 1.3: If I have a 300 MiB file to be written in my storage and another 200 MiB file that I need to read, both tasks finished in 2 seconds, then my Throughput for both operations is 250 MiB/s.
(300 MiB + 200 MiB) / 2 seconds = 250 MiB/s
Does the amount of data to read/write per second affect Throughput and IOPS?
To answer the question above, we need to learn first the concept of I/O size.
I/O size is the amount of data that volumes consider as 1 Input or Output (I/O) operation.
In the AWS Documentation, Solid State Drive (SSD) volumes and Hard Disk Drive (HDD) Volumes have a different capped I/O size.
|capped I/O Size||EBS Volume Types|
|SSD Volumes||256 KiB||gp2, gp3 – General Purpose SSD|
io1, io2 & io2 Block Express – Provisioned IOPS SSD
|HDD Volumes||1,024 KiB||st1 – Throughput Optimized HDD|
sc1 – Cold HDD
1 MiB = 1024 KiB
I’ll stress out the word capped in the capped I/O size statement. What this means is that 256 KiB is the maximum size of a single I/O operation for SSD EBS Volumes (gp2, gp3, io1, io2). While the HDD Volume types, like st1 and sc1, has a maximum I/O size of 1024 KiB per I/O operation.
Since I’m interested in gp2 and gp3 volumes, the section below will be focusing on the Throughput of SSD Volumes. But the same concepts still applies to HDD Volumes.
If we have an I/O operation for gp2 or gp3 that is less than 256 KiB, then this will be considered as 1 I/O operation.
But if we have an I/O operation that is greater than 256 KiB, then the I/O operation will then be divided into 256 KiB parts.
To understand this better, here are some examples below.
Example 2.1: If I need to read a file that is only 16 KiB in size, then how many I/O operations is this?
Since 16 KiB is less than 256 KiB, then to read the file will only be 1 I/O operation.
If the file was read in 1 second then the Throughput is 16 KiB/s or 0.01563 MiB/s.
Throughput = (16 KiB) * (1 MiB / 1024 KiB) / (1 second) Throughput = (0.01563 MiB) / (1 second) Throughput = 0.01563 MiB/s
Example 2.2: If I need to write a data that is 1000 KiB in size, then how many I/O operations is this?
Since 1000 KiB is greater than 256 KiB, then we will need to divide file into 256 KiB chunks.
|1 I/O operation||256 KiB|
|1 I/O operation||256 KiB|
|1 I/O operation||256 KiB|
|1 I/O operation||232 KiB|
From the table above, to write the 1000 KiB data, it will require 4 I/O operations.
3 I/O operation with the size of 256 KiB and 1 I/O operation with a size of 232 KiB.
If the data was written in 1 second then the Throughput is 1000 KiB/s or 0.9766 MiB/s.
Throughput = (1000 KiB) * (1 MiB / 1024 KiB) / (1 second) Throughput = (0.9766 MiB) / (1 second) Throughput = 0.9766 MiB/s
Another feature with EBS Volumes is that if there are small physically sequential I/O operations that has a total size of less than 256 KiB, Amazon EBS will merge this into 1 I/O operation.
AWS Documentation used the term contiguous, which means next or together in sequence.
Example 2.3: Let’s say that I have 6 data to be read with a size of 40 KiB per data. All 6 data is physically adjacent or contiguous in the volume.
Total Size = 6 * 40 KiB Total Size = 240 KiB
Since the total size of 240 KiB is less than 256 KiB, and the data is physically sequential in the volume then SSD EBS Volume will merge this operation and consider this as only 1 I/O operation.
If all 6 data were written in 1 second then the Throughput is 240 KiB/s or 0.2344 MiB/s.
Throughput = (6 * 40 KiB) * (1 MiB / 1024 KiB) / (1 second) Throughput = (240 KiB) * (1 MiB / 1024 KiB) / (1 second) Throughput = (0.2344 MiB) / (1 second) Throughput = 0.2344 MiB/s
Example 2.4: Let’s say that I have 5 data to be read with a size of 50 KiB per data. All 5 data are not physically sequential or non-contiguous in the volume.
It does not matter that the total size of all 5 data (250 KiB) is the maximum I/O size of SSD Volumes (256 KiB). Since all 5 data are not located beside each other, gp2 and gp3 volumes will treat this as 5 I/O operations.
If all 5 data were written in 1 second then the Throughput is 250 KiB/s or 0.2441 MiB/s.
Throughput = (250 KiB) * (1 MiB / 1024 KiB) / (1 second) Throughput = (0.2441 MiB) / (1 second) Throughput = 0.2441 MiB/s
Going back to the question – Does the amount of data to read/write per second affect Throughput and IOPS?
For IOPS, the answer is yes.
The amount of data per I/O operation does affect the number of IOPS. This can be seen in examples 2.2 and 2.3.
For Throughput, the answer is no.
In Example 2.4, where we have 5 data with a size 50 KiB each that was read in 1 second, we just summed all the data then divided it by 1 second to get the Throughput (0.2441 MiB/s).
It did not matter that doing the task resulted in 5 I/O operations. I just need to sum all the data sizes that was read in 1 second to get the Throughput.
Wrong assumption on I/O size, IOPS and Throughput
In Example 2.4, it’s tempting to think that since each 5 data only has a size of 50 KiB, which resulted to 5 I/O operations in 1 second, and the capped I/O size for each I/O operation is 256 KiB, then Throughput should be (256 KiB * 5 / 1 second) = 1280 KiB/s.
This assumption is wrong. I’ll stress again the ‘capped’ word of the 256 KiB capped I/O size for SSD Volumes. It’s the maximum per I/O operation.
Throughput does not care about the capped I/O size, nor the number of I/O operations, nor the IOPS. Throughput only cares about the total size of data it reads/writes in 1 second.
I had to write this here, because I fell into this branch of the rabbit hole where I spent around half a day to get back out of.
What happens if I exceed the Throughput of my EBS Volumes?
If ever we do exceed the Throughput of our EBS Volumes, like when we are copying a file with a very huge size that is smaller than the Throughput of the volume, then we would expect throttling of the read and write operations.
So how do we detect throttling?
We can detect this by checking 3 metrics of our EBS Volume –
Average Read Latency,
Average Write Latency and
Average Queue Length.
If any of those 3 metrics goes up, then we are now experience throttling.
The issue might not be automatically Throughput, but the issue might be with IOPS. It really depends on the use case.
I hope the above helped you understand what Throughput is with EBS Volumes.