corso/website/blog/2023-4-04-aws-storage-class.md
Nočnica Mellifera ec30c9ccce
fix preview text for blog post (#3030)
<!-- PR description-->

---

#### Does this PR need a docs update or release note?

- [ ]  Yes, it's included
- [ ] 🕐 Yes, but in a later PR
- [ ]  No

#### Type of change

<!--- Please check the type of change your PR introduces: --->
- [ ] 🌻 Feature
- [ ] 🐛 Bugfix
- [ ] 🗺️ Documentation
- [ ] 🤖 Supportability/Tests
- [ ] 💻 CI/Deployment
- [ ] 🧹 Tech Debt/Cleanup

#### Issue(s)

<!-- Can reference multiple issues. Use one of the following "magic
words" - "closes, fixes" to auto-close the Github issue. -->
* #<issue>

#### Test Plan

<!-- How will this be tested prior to merging.-->
- [ ] 💪 Manual
- [ ]  Unit test
- [ ] 💚 E2E
2023-04-04 15:53:13 -07:00

6.3 KiB
Raw Blame History

slug, title, description, authors, tags, date, image
slug title description authors tags date image
aws-storage-class Choosing the Best AWS S3 Storage Class for Corso Backups Recently when writing about the storage options for Corso, I found myself going pretty far in the weeds on storage classes in S3. I thought Id make a list of all the storage options and why they might, or might not, work for backups. nica
corso
microsoft 365
AWS
backups
2023-04-04 ./images/box_sizes.jpeg

multiple box sizes Recently when writing about the storage options for Corso, I found myself going pretty far in the weeds on storage classes in S3. I thought Id make a list of all the storage options and why they might, or might not, work for backups.

First, some assumptions

If were talking about backups, we're assuming far more writes than read, and that most objects that are written will never be read.

Increasing age of an object increases the chances that it will never be read.

And we cant afford to lose anything! One-zone options that carry a small chance of data loss like OneZone_IA won't work.

Finally, there will be index data and metadata that may well be overwritten frequently. For more detail on this, and an exploration of non-AWS alternatives to S3, see our past article on where to store your Corso data.

If your backup solution breaks one of these expectations, for example if youre restoring from backups every single day, the advice in this article may not be applicable to you.

Best practices no matter your storage class

Using a tool for backups rather than a naive file copy process is the first step towards an efficient backup process. Before you drag that folder over to that network drive icon, consider the following requirements:

  • Compression - dont use more network bandwidth than you have to
  • De-duplication - backing up a teams email shouldnt mean storing 50 identical copies of Presentation_FINAL.pptx
  • Incremental Backups - Ideally, your second backup should only include updated objects
  • Bundling - creating millions of 2kb objects each backup is going to add to costs and hurt performance

Storage Classes, considered

The AWS Storage classes are STANDARD | REDUCED_REDUNDANCY | STANDARD_IA | ONEZONE_IA | INTELLIGENT_TIERING | GLACIER | DEEP_ARCHIVE | OUTPOSTS | GLACIER_IR

of which we wont consider REDUCED_REDUNDANCY (its outdated and Standard is now cheaper) and OUTPOSTS (if you need on-prem S3, its not for cost or efficiency).

STANDARD

The S3 Standard storage should work for all backup implementations, as long as youre not using something that cant really work with object storage with network latency (for example your backup application is trying to do fine-grained low-latency database-style queries using indices stored in S3).

For Corso, Standard is a great place to start testing your setup, letting you perform regular backups, restores, and deletions. We also recommend storing all your non-blob data in Standard, how to do this automatically is covered at the end of this list.

STANDARD_IA and ONEZONE_IA

These are the storage classes AWS recommends for backups! But its likely that Glacier Instant Retrieval will be cheaper. Also, Infrequent Access charges a minimum storage size of 128KB and a minimum storage time of 30 days. If your backups are creating many small objects, or if you have incremental backups constantly updating most objects, Infrequent Access may come out more expensive than standard.

For Corso, its not likely that this storage class will make the most sense. Maybe a case where periodic restores are expected with some frequency would benefit from this class, but that would have to be so frequent Im not sure backup is the right term. If you found this was the best class for you please join our Discord and tell us about it.

INTELLIGENT_TIERING

Intelligent Tiering is the most appealing of AWSs new S3 offerings for backups. As objects age theyll move down to cheaper and cheaper, finally dropping to the same storage costs per GB as Glacier Instant Retrieval.

Two considerations should give you pause when using Intelligent Tiering for backups: first theres a small compute cost to Intelligent Tiering, and second you probably do know the usage pattern of these backups: almost all will never be touched.

With Intelligent Tiering youll pay for your backups to be on a more expensive tier for 60 days before you get the pricing that you probably could have picked out for yourself in the first place.

Intelligent Tiering probably only makes sense if youre using backups in a nonstandard way, for example restoring from backups every morning. If youre not sure how your data will be used, Intelligent Tiering is a safe bet.

GLACIER and DEEP_ARCHIVE

Glacier (not Instant retrieval, which is discussed below) is a great way to archive data, which is a slightly different idea than backups. If you have a reason to store and not touch data (for example, for compliance) and can tolerate extremely high latencies (hours to days) for recovery you may want to use Glacier Archive. However, high-performance backup tools, like Corso, usually contain smart optimizations like incremental backups and backup indexes that wont work if the latency for all requests is in minutes. Further, for cost and efficiency, deduplicating object stores will often compact data as the primary data source churns. Using default Glacier or Glacier Deep Archive is a poor fit for that workload.

GLACIER_IR

Likely to be your best option for backups Glacier IR is cheaper than any non-glacier option for storage costs, with a low request latency. Corsos de-duplication, bundling, and compression will help ensure that youre paying as little as possible for storage.

Glacier Instant Retrieval is the best choice for Corso backups

With these considerations, and with the best practices mentioned above, you should be able to build reliable backups with a minimal cost impact. If youre ready to give Corso a try, check out our Quickstart Guide, or take a look at a recent article on backing up large Exchange instances with Corso.