Skip to content

feat: add support for using sqlite for storing delete requests #16437

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Feb 28, 2025

Conversation

sandeepsukhani
Copy link
Contributor

What this PR does / why we need it:
Add support for using sqlite instead of boltdb for storing delete requests.
I have added a flag to choose between boltdb vs SQLite for storing the requests, which defaults to boltdb.
When sqlite is selected for storing the requests, the factory function will copy the data from boltdb to sqlite to populate the existing data.
I have also added support for using a backup store to reduce the stress of migration. It would use a tee to perform all the write operations on both the stores. Currently, the config only allows using boltdb as a backup store since users will be moving from boltdb to SQLite.

Checklist

  • Documentation added
  • Tests updated
@sandeepsukhani sandeepsukhani requested review from trevorwhitney and a team as code owners February 25, 2025 12:05
@sandeepsukhani sandeepsukhani removed the request for review from trevorwhitney February 25, 2025 12:05
@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Feb 25, 2025
Copy link
Contributor

github-actions bot commented Feb 25, 2025

💻 Deploy preview deleted.

}

s.connPool, err = sqlitex.NewPool(s.path, sqlitex.PoolOptions{})
return err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears we are generally handling errors related to the sqlite db in a consistent manner, that is, returning err to the caller.

With that said, do we need to handle a database corruption in a more resilient manner? Or, is the answer to something like this "delete the database and restart the compactor"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the answer to database corruption is "delete the database and restart the compactor" because our code cannot handle it. The other option is to find a working version(assuming versioning is enabled on object storage) and restore it. Also, it would be a serious issue, so it warrants the operators' attention.

processed_shards INT DEFAULT 0,
query TEXT NOT NULL
);`
sqlCreateDeleteRequestShardsTable = `CREATE TABLE IF NOT EXISTS shards (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should also add some indicies?

CREATE INDEX idx_shards_id ON shards(id);
CREATE INDEX idx_shards_user_id ON shards(user_id);
CREATE INDEX idx_shards_time ON shards(start_time, end_time);
CREATE INDEX idx_requests_user_id ON requests(user_id);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea. I have added indices considering the most frequently run queries.

})
if err != nil {
return nil, err
func createDeleteRequestsStore(DeleteRequestsStoreDBType DeleteRequestsStoreDBType, workingDirectory string, indexStorageClient storage.Client, now func() model.Time) (DeleteRequestsStore, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the purpose of passing a now() function here solely for the test case that uses it? It appears that generally this is just a wrapped Time.Now() call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we use it to have a fixed creation time for delete requests to make it easier to compare them in tests.
Here is an example

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wonder if it would be cleaner to not have that parameter, and mock the store in the test case to have this behavior. I don't feel that adding a parameter that is only used by test cases is necessarily the cleanest.

@sandeepsukhani sandeepsukhani enabled auto-merge (squash) February 28, 2025 12:06
@sandeepsukhani sandeepsukhani merged commit 5b33e65 into main Feb 28, 2025
64 checks passed
@sandeepsukhani sandeepsukhani deleted the sqlite-for-delete-requests-store branch February 28, 2025 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
2 participants