This PR introduces an URL cache for onedrive items. Integration with code and additional unit/integration tests will be added in follow up PRs to keep this PR short.
Why do we need an URL cache?
* This is a perf improvement for long running backups.
* If the time difference between download URL fetch and content download is > 1 hour, the download will fail with a `401: Unauthorized` error. This is because JWT tokens attached with the download URLs are short lived (1 hour).
* Currently we refresh download URLs after `401` with a per item `GET` . This should be optimized because it's an extra round trip for every item (to get the 401 and then renew).
How does URL cache help?
* URL cache will do a full delta re-enumeration when we hit a 401. It will be cheaper (in terms of Graph requests/tokens) vs calling GetItem for everything left.
* It relies on lazy refresh. The URLs will only be refreshed once we hit a 401 and the client expliclty requests the cache for the download URL. Any following 401s will be served from the cache.
*
**Cache flow**
```mermaid
flowchart TD
A[Download content] -->|401 error| B(Fetch from URL cache)
B --> |cache.getItemProperties| C{Cache needs refresh?}
C -->|Yes| D{Refresh in progress?}
D -->|Yes, Block| D
C -->|No| E[Read from cache]
D -->|No| M{Cache needs refresh? 2nd check}
M -->|Yes|F[Delta Query]
M -->|No| E
F --> |success|H[Update cache]
E --> |not found| I[Return error]
E --> |item deleted| J[Return Deleted]
F --> |failure|I
I --> |fallback|L[GET item]
H --> E
```
---
#### Does this PR need a docs update or release note?
- [ ] ✅ Yes, it's included
- [x] 🕐 Yes, but in a later PR
- [ ] ⛔ No
#### Type of change
<!--- Please check the type of change your PR introduces: --->
- [x] 🌻 Feature
- [ ] 🐛 Bugfix
- [ ] 🗺️ Documentation
- [ ] 🤖 Supportability/Tests
- [ ] 💻 CI/Deployment
- [ ] 🧹 Tech Debt/Cleanup
#### Issue(s)
<!-- Can reference multiple issues. Use one of the following "magic words" - "closes, fixes" to auto-close the Github issue. -->
* https://github.com/alcionai/corso/issues/3069
#### Test Plan
<!-- How will this be tested prior to merging.-->
- [ ] 💪 Manual
- [ ] ⚡ Unit test
- [x] 💚 E2E
Corso
Corso is the first open-source tool that aims to assist IT admins with the critical task of protecting their Microsoft 365 data. It provides a reliable, secure, and efficient data protection engine. Admins decide where to store the backup data and have the flexibility to perform backups of their desired service through an intuitive interface. As Corso evolves, it can become a great building block for more complex data protection workflows.
Corso is currently in Beta.
Corso supports M365 Exchange and OneDrive with SharePoint and Teams support in active development. Coverage for more services, possibly beyond M365, will expand based on the interest and needs of the community.
Getting Started
See the Corso Quickstart on our docs page.
Building Corso
To learn more about working with the project source core and building Corso, see the Developer section of the Corso Documentation.
Roadmap
You can learn more about the Corso roadmap and how to interpret it here.
If you have feature requests, please file a GitHub issue
and attach the enhancement label to the issue.
Contribution Guidelines
Code of Conduct
It's important that our community is inclusive and respectful of everyone. We ask that all Corso users and contributors take a few minutes to review our Code of Conduct.
License
Corso is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.