Restore API: Fix file settings handling#137585
Conversation
When restoring a snapshot including global state, do not remove the reserved state, but initialise an empty one. This prevents the readiness service from crashing. Also, initialising an empty state now correctly removes pre-existing reserved state settings and file settings. Fixes elastic#122429
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
|
Hi @mamazzol, I've created a changelog YAML for you. |
...rnalClusterTest/java/org/elasticsearch/reservedstate/service/SnapshotsAndFileSettingsIT.java
Show resolved
Hide resolved
test/framework/src/main/java/org/elasticsearch/test/InternalTestCluster.java
Outdated
Show resolved
Hide resolved
| .metadata() | ||
| .reservedStateMetadata() | ||
| .get(FileSettingsService.NAMESPACE); | ||
| assertThat(reservedState, notNullValue()); |
There was a problem hiding this comment.
Is there any possibility of a race condition here? Since reserved state is process async, is there a possibility that the "clearing" hasn't been finished before we assert on it here?
There was a problem hiding this comment.
<sigh> We're overdue for a straightforward, sanctioned way to wait for various things to be done. Eventual consistency sucks for testing (and for users, but that's another topic). I suppose a callback on a CountdownLatch or similar could do it, though it somewhat sucks to have to rediscover this for each test.
There was a problem hiding this comment.
(Also the CountdownLatch technique only works if your callback is only called at the appropriate time. Since "exactly once" semantics are sometimes hard to implement in systems, they often settle for "at least once", and then the callback can occasionally have extra bonus calls, which can make the CountdownLatch technique flaky in various ways.)
There was a problem hiding this comment.
I tried adding the CountdownLatch technique present in other parts of this test class, but I wasn't able to find the sweet spot to make it work. The fact that the file processing happens right at startup means I cannot bind the listener before or after the restart as that is too early or too late.
I added the assertBusy as a way of handling it but I am happy to evaluate different ways if this is not good enough.
server/src/test/java/org/elasticsearch/reservedstate/service/FileSettingsServiceTests.java
Show resolved
Hide resolved
test/framework/src/main/java/org/elasticsearch/test/InternalTestCluster.java
Outdated
Show resolved
Hide resolved
💔 Backport failed
You can use sqren/backport to manually backport by running |
* Restore API: Fix file settings handling When restoring a snapshot including global state, do not remove the reserved state, but initialise an empty one. This prevents the readiness service from crashing. Also, initialising an empty state now correctly removes pre-existing reserved state settings and file settings. Fixes elastic#122429
* Restore API: Fix file settings handling When restoring a snapshot including global state, do not remove the reserved state, but initialise an empty one. This prevents the readiness service from crashing. Also, initialising an empty state now correctly removes pre-existing reserved state settings and file settings. Fixes elastic#122429
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
* Restore API: Fix file settings handling When restoring a snapshot including global state, do not remove the reserved state, but initialise an empty one. This prevents the readiness service from crashing. Also, initialising an empty state now correctly removes pre-existing reserved state settings and file settings. Fixes #122429
* Restore API: Fix file settings handling When restoring a snapshot including global state, do not remove the reserved state, but initialise an empty one. This prevents the readiness service from crashing. Also, initialising an empty state now correctly removes pre-existing reserved state settings and file settings. Fixes #122429
* Restore API: Fix file settings handling When restoring a snapshot including global state, do not remove the reserved state, but initialise an empty one. This prevents the readiness service from crashing. Also, initialising an empty state now correctly removes pre-existing reserved state settings and file settings. Fixes #122429
When restoring a snapshot including global state, do not remove the reserved state, but initialise an empty one. This prevents the readiness service from crashing.
Also, initialising an empty state now correctly removes pre-existing reserved state settings and file settings.
Fixes #122429