new blog post on large backup best practices (#2446)

## Description

new blog post on large backup best practices

## Does this PR need a docs update or release note?

- [ ]  Yes, it's included
- [ ] 🕐 Yes, but in a later PR
- [x]  No 

## Type of change

<!--- Please check the type of change your PR introduces: --->
- [ ] 🌻 Feature
- [ ] 🐛 Bugfix
- [x] 🗺️ Documentation
- [ ] 🤖 Test
- [ ] 💻 CI/Deployment
- [ ] 🧹 Tech Debt/Cleanup

## Issue(s)

<!-- Can reference multiple issues. Use one of the following "magic
words" - "closes, fixes" to auto-close the Github issue. -->
* #<issue>

## Test Plan

<!-- How will this be tested prior to merging.-->
- [ ] 💪 Manual
- [ ]  Unit test
- [ ] 💚 E2E

---------

Co-authored-by: Niraj Tolia <ntolia@users.noreply.github.com>
This commit is contained in:
Nočnica Mellifera 2023-02-21 12:07:59 -08:00 committed by GitHub
parent 533164744b
commit d7c0f61105
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 65 additions and 0 deletions

View File

@ -0,0 +1,65 @@
---
slug: large-microsoft-365-exchange-backups
title: "Backing up large Microsoft 365 Exchange mailboxes with Corso"
description: "A guide to using Corso to back up very large Exchange mailboxes in Microsoft 365"
authors: nica
tags: [corso, microsoft 365, backups, S3]
date: 2023-2-21
image: ./images/heavy-mover.jpg
---
![heavy earth mover By Lowcarb23 - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=114344394](./images/heavy-mover.jpg)
Over the last few months its been amazing sharing Corso with more and more users. One pleasant surprise has been users
who are operating in large, often multi-tenant deployments of Microsoft 365 who want to use Corso to back up all their
data. In our discussions on the [Corso User Discord](https://discord.gg/63DTTSnuhT), weve found some best practices for
backing up large Exchange mailboxes with Corso.
<!-- truncate -->
### Make sure youre using the latest version of Corso
We've recently done a lot of work to harden Corso against transient network outages and Graph API timeouts. This
hardening work makes the most impact during large backups as their long runtime increase the probability of running
into transient errors.
Our recent work has also included support for incremental backups, which youll definitely need for larger data sets.
This means that while your first backup of a user with a large mailbox can take some time, all subsequent backups
will be quite fast as Corso will only capture the incremental changes while still constructing a full backup.
### Dont be afraid to restart your backups
Fundamentally, Corso is a consumer of the Microsoft Graph API, which like all complex APIs, isnt 100% predictable.
Even in the event of a failed backup, Corso will often have stored multiple objects in the course of a backup. Corso
will work hard to reuse these stored objects in the next backup, meaning your next backup isnt starting from
zero. A second attempt is likely to run faster with a better chance of completing successfully.
### Batch your users
If many of your users have large file attachments (or if you have more than a few hundred users), youll want to batch
your users for your first backup. A tool like [Microsoft365dsc](https://microsoft365dsc.com/) can help you get a list
of all user emails ready for parsing. After that you can back up a few users or even a single user at a time with the
Corso command `corso backup create exchange --user "alice@example.com,bob@example.com"`
Why cant you just run them all in one go with `--user '*'` ? Again were limited by the Microsofts Graph API which
often has timeouts, 5xx errors, and throttles its clients.
The good news is that with Corsos robust ability to do incremental backups, after your first backup, you can
absolutely use larger batches of users, as all future backups to the same repository will run **much** faster.
### Use multiple repositories for different tenants
If youre a managed service provider or otherwise running a multi-tennant architecture, you should use multiple separate
repositories with Corso. Two ways to pursue this:
- Point to separate buckets
- Place other repositories in subfolders of the same bucket with the `prefix` option
In both cases, the best way to keep these settings tidy is by using multiple `.corso.toml`
[configuration files](../../docs/setup/configuration/#configuration-file). Use the
`-config-file` option to point to separate config files
### Have questions?
Corso is under active development, and we expect our support for this type of use case to improve rapidly.
If you have feedback for us please do [join our discord](https://discord.gg/63DTTSnuhT) and talk directly with the team!

Binary file not shown.

After

Width:  |  Height:  |  Size: 544 KiB