17 Commits

Author SHA1 Message Date
Keepers
fd6dff3270
retry on http/2 stream errors (#3802)
http/2 stream error failures don't allow
us to leverage client middleware.  As a
result we're lacking a retry handler.  This
change adds retries for http_wrapper
in the explicit case of a http/2 stream
error.

---

#### Does this PR need a docs update or release note?

- [x]  Yes, it's included

#### Type of change

- [x] 🐛 Bugfix

#### Test Plan

- [x] 💪 Manual
- [x] 💚 E2E
2023-07-14 22:57:51 +00:00
Keepers
ce72acbcc1
auto-log recoverable errors with stack (#3598)
automatically log when we add a recoverable error or a skipped item to fault.  This log will include a stack trace of the call from the location of the logged recoverable.  Clues does not have a method for pulling a stack trace out of an error yet; that can be added at a future date.

---

#### Does this PR need a docs update or release note?

- [x]  No

#### Type of change

- [x] 🤖 Supportability/Tests

#### Test Plan

- [x]  Unit test
- [x] 💚 E2E
2023-06-13 22:18:18 +00:00
Keepers
6febc3ce5b
add namespace to fault items (#3415)
prevent fault items and skips from clobbering eachother by adding a namespace to the fault item that defines a deduplication boundary.  This allows multiple items in a single service operation to have identical ids so long as their namespace differs.

---

#### Does this PR need a docs update or release note?

- [x]  No

#### Type of change

- [x] 🐛 Bugfix
- [x] 🤖 Supportability/Tests

#### Issue(s)

* #3283

#### Test Plan

- [x] 💪 Manual
- [x]  Unit test
- [x] 💚 E2E
2023-05-16 17:18:51 +00:00
Abhishek Pandey
ec064e539b
Add misc typo fixes (#3006)
<!-- PR description-->

---

#### Does this PR need a docs update or release note?


- [ ]  No

#### Type of change

<!--- Please check the type of change your PR introduces: --->

- [ ] 🧹 Tech Debt/Cleanup

#### Issue(s)

<!-- Can reference multiple issues. Use one of the following "magic words" - "closes, fixes" to auto-close the Github issue. -->
* #<issue>

#### Test Plan

<!-- How will this be tested prior to merging.-->
- [ ] 💪 Manual
2023-04-12 23:30:38 +00:00
Keepers
fa2cf046bb
fixes, logging improvements for fault.Errors (#2848)
A last-second change in 2708 caused us to
pass along the wrong fault.Errors into backup
persistence, thus slicing the count of skipped
items.  That's been fixed, along with improved
end-of-operation logging of fault errors.

---

#### Does this PR need a docs update or release note?

- [x]  No

#### Type of change

- [x] 🐛 Bugfix

#### Issue(s)

* #2708

#### Test Plan

- [x] 💪 Manual
- [x] 💚 E2E
2023-03-20 06:01:04 +00:00
Keepers
8dfb00f308
add fault errors streamstore (#2731)
Adds a new streamstore controller for fault.Errors. This provides large scale, extensible file storage for fault errors to be persisted, much like we do for backup details.

---

#### Does this PR need a docs update or release note?

- [x]  No

#### Type of change

- [x] 🌻 Feature

#### Issue(s)

* #2708

#### Test Plan

- [x] 💚 E2E
2023-03-17 01:07:07 +00:00
Keepers
871f0b406d
introduce fault items (#2710)
Adds the item struct to the fault package for tracking serializable and dedupliatable error sources.

---

#### Does this PR need a docs update or release note?

- [x]  No

#### Type of change

- [ ] 🌻 Feature

#### Issue(s)

* #2708

#### Test Plan

- [x]  Unit test
2023-03-10 17:23:19 +00:00
Keepers
9e783efe3a
fault package funcs rename (#2583)
## Description

Renaming the funcs in the fault
package to be more clear about
their purpose and behavior.  Largely
just find&replace changes, except
for fault.go and the fault examples.

## Does this PR need a docs update or release note?

- [x]  No 

## Type of change

- [x] 🧹 Tech Debt/Cleanup

## Issue(s)

* #1970

## Test Plan

- [x]  Unit test
- [x] 💚 E2E
2023-02-25 03:29:02 +00:00
Keepers
1459e1406c
1970 11 tracker fails redo (#2624)
#### Type of change

- [x] 🐛 Bugfix

#### Issue(s)

* #1970

#### Test Plan

- [x]  Unit test
- [x] 💚 E2E
2023-02-24 15:57:50 +00:00
Keepers
b95231d85f
add fault.tracker for error additions (#2510)
## Description

Realized we had a race condition: in an async
runtime it's possible for an errs.Err() to be
returned by multiple functions, even though that
Err() was only sourced by one of them.  The
addition of a tracker contains the returned
error into the scope of that func so that only
the error produced in the current iteration is
returned.

## Does this PR need a docs update or release note?

- [x]  No 

## Type of change

- [x] 🧹 Tech Debt/Cleanup

## Issue(s)

* #1970

## Test Plan

- [x]  Unit test
- [x] 💚 E2E
2023-02-22 04:51:06 +00:00
Vaibhav Kamra
c3f5e50b9f
Fix how ErrrorData is marshaled (#2565)
## Description

The error type does not get marshaled which can lead to errors when we try to un-marshal a manifest
that had errors stored in it.

Repro here: https://go.dev/play/p/tgj8oq5CGFd

For now - this disables JSON marshaling and also fixes the error un-marshaling. I have added a regression test
to verify both behaviors.

As a follow-up, I believe we can implement a custom marshaler.

## Does this PR need a docs update or release note?

- [ ]  Yes, it's included
- [ ] 🕐 Yes, but in a later PR
- [x]  No 

## Type of change

<!--- Please check the type of change your PR introduces: --->
- [ ] 🌻 Feature
- [x] 🐛 Bugfix
- [ ] 🗺️ Documentation
- [ ] 🤖 Test
- [ ] 💻 CI/Deployment
- [ ] 🧹 Tech Debt/Cleanup

## Test Plan

<!-- How will this be tested prior to merging.-->
- [ ] 💪 Manual
- [x]  Unit test
- [ ] 💚 E2E
2023-02-18 01:43:59 +00:00
Keepers
43e5ccdb5b
add fault/clues throughout exchange backup (#2382)
## Does this PR need a docs update or release note?

- [x]  No 

## Type of change

- [x] 🧹 Tech Debt/Cleanup

## Issue(s)

* #1970

## Test Plan

- [x]  Unit test
- [x] 💚 E2E
2023-02-14 01:30:55 +00:00
Keepers
f33dbc6351
deprecate adder interface for only fault.Errors (#2378)
## Does this PR need a docs update or release note?

- [x]  No 

## Type of change

- [x] 🧹 Tech Debt/Cleanup

## Issue(s)

* #1970

## Test Plan

- [x]  Unit test
- [x] 💚 E2E
2023-02-11 00:17:42 +00:00
Keepers
667d2d8e29
add fault/clues to graph_conector.go (#2376)
## Description

Refactors error handling in graph_connector.
Also begins some error refactoring in support by
moving StackTraceErrror style funcs into a more
normalized handler in graph/errors.go.  And
removes the (Non)Recoverable error wraps which
we weren't using anyway.

## Does this PR need a docs update or release note?

- [x]  No 

## Type of change

- [x] 🧹 Tech Debt/Cleanup

## Issue(s)

* #1970

## Test Plan

- [x]  Unit test
- [x] 💚 E2E
2023-02-09 02:50:07 +00:00
Keepers
0436e0d128
extra panic protection in operations (#2383)
## Does this PR need a docs update or release note?

- [x]  No 

## Type of change

- [x] 🧹 Tech Debt/Cleanup

## Test Plan

- [x] 💚 E2E
2023-02-03 02:01:42 +00:00
Keepers
070b8fddee
update operation/backup errs (#2239)
## Description

Begins updating operations/backup with the new
error handling procedures.  For backwards
compatibility, errors are currently duplicated in
the old stats.Errs and the new Errors struct.

## Does this PR need a docs update or release note?

- [x]  No 

## Type of change

- [x] 🧹 Tech Debt/Cleanup

## Issue(s)

* #1970

## Test Plan

- [x]  Unit test
- [x] 💚 E2E
2023-01-31 22:25:02 +00:00
Keepers
4852667468
Add fault pkg and errors struct (#2236)
## Does this PR need a docs update or release note?

- [x]  No 

## Type of change

- [x] 🧹 Tech Debt/Cleanup

## Issue(s)

* #1970

## Test Plan

- [x]  Unit test
2023-01-25 22:00:23 +00:00