Skip to content

Ensure frontend response status code are correct. #2122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 18, 2020

Conversation

cyriltovena
Copy link
Contributor

Signed-off-by: Cyril Tovena cyril.tovena@gmail.com

There were some places where the frontend and middlewares were not using httpgrpc.Errorf which causes some 500 instead of 400.

I'm also wondering if the frontend should really send back a 500 in case of cancellation. Couldn't not find a status quo on this.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Copy link
Contributor

@jtlisi jtlisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one minor nit

I also recommend adding a note about any changes a user might notice in the changelog.

@@ -42,7 +46,7 @@ func (r retry) Do(ctx context.Context, req Request) (Response, error) {
var lastErr error
for ; tries < r.maxRetries; tries++ {
if ctx.Err() != nil {
return nil, ctx.Err()
return nil, errCanceled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we aren't going to propagate the error up, can we log it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it's correct logging it, cause we may end up with double logs for 1 cancelled request.

Why not doing httpgrpc.Errorf(http.StatusInternalServerError, ctx.Err().Error())? I think it's just because of the assertions in tests, but we can change them to not use strict comparison.

Copy link
Contributor Author

@cyriltovena cyriltovena Feb 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know I actually can drop this line. it will work the same as non grpc errors are ending up as 500 with the string from the error.

But I wanted to talk about if that makes sense to do a 500 here. I think request cancelled should be a 4xx from the server point of view, but the upstream client who decided to cancel could do a 502. (gateway).

The problem I foresee here is that some request could be cancelled by a page refresh very quickly and count toward our error budget. While this is not a biggie for Cortex cause the average latency is 1s, it's pretty hard to cancel this before. But in Loki someone could send a 20s query and decide to cancel, it should not count as an error from the gateway point of view.

WDYT @jtlisi ?

TD;DR: I'm fine to remove this line, it doesn't do much right now, but I'd like to talk about how we should handle cancellation in our API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm going to die on this hill.

upstream context cancellation should not 500.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I briefly discuss offline with @cyriltovena about this topic. My personal suggestion:

  1. Rollback this change in this PR (to unlock the merge without further longer discussion)
  2. Open a new PR to switch from 500 to 400 the response status code in case of context Canceled (but not on DeadlineExceeded)
@cyriltovena
Copy link
Contributor Author

LGTM, one minor nit

I also recommend adding a note about any changes a user might notice in the changelog.

I think this was introduced by many of my refactoring, I didn't realize httpgrc.Errorf was a requirement. So I'm kinda fixing something that should already be like this. Do you still want a changelog entry ?

We need to update this behaviour somewhere else.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cyriltovena for addressing my feedback and sorry for the very long delay! LGTM

@pracucci pracucci merged commit ea08e33 into cortexproject:master Feb 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants