Skip to content

Conversation

@ashwanthgoli
Copy link
Contributor

@ashwanthgoli ashwanthgoli commented Dec 4, 2024

What this PR does / why we need it:

  • Block scheduler is updated to implement BlockBuilderServiceServer rpc interface and registered with grpc server.
  • Builders now poll for Jobs from the scheduler
  • GetJob is now a server-side streaming rpc which allows builders to wait on the server for pending jobs.
  • JobQueue uses a cond var to wake up pending routines if there are any new pending jobs

minor improvements:

  • uses int32 for partition field in types.Job to maintain consistency
  • updates loadRecords to return any backoff errors

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR
panic("unimplemented")
func (s *BlockScheduler) CompleteJob(_ context.Context, req *proto.CompleteJobRequest) (*proto.CompleteJobResponse, error) {
s.queue.MarkComplete(req.Job.Id)
return nil, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should return non-nil response types to satisfy caller expectations even though they're basically empty structs.

},
})

return nil, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should return non-nil response types to satisfy caller expectations even though they're basically empty structs.

}

resp, err := t.GetJob(ctx, protoReq)
client, err := t.GetJob(ctx, protoReq)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think creating streaming clients for every RPC is any better than using the non-streaming RPCs we had prior. These are generally run in a loop

@owen-d
Copy link
Member

owen-d commented Dec 5, 2024

Some examples of how we generally use streaming methods. e.g. scheduler.proto shows a few bidirectional streaming:

service SchedulerForFrontend {
  // After calling this method, both Frontend and Scheduler enter a loop. Frontend will keep sending ENQUEUE and
  // CANCEL requests, and scheduler is expected to process them. Scheduler returns one response for each request.
  //
  // Long-running loop is used to detect broken connection between frontend and scheduler. This is important for both
  // parties... if connection breaks, frontend can cancel (and possibly retry on different scheduler) all pending
  // requests sent to this scheduler, while scheduler can cancel queued requests from given frontend.
  rpc FrontendLoop(stream FrontendToScheduler) returns (stream SchedulerToFrontend) {}
}

// Scheduler interface exposed to Queriers.
service SchedulerForQuerier {
  // After calling this method, both Querier and Scheduler enter a loop, in which querier waits for
  // "SchedulerToQuerier" messages containing HTTP requests and processes them. After processing the request,
  // querier signals that it is ready to accept another one by sending empty QuerierToScheduler message.
  //
  // Long-running loop is used to detect broken connection between scheduler and querier. This is important
  // for scheduler to keep a list of connected queriers up-to-date.
  rpc QuerierLoop(stream QuerierToScheduler) returns (stream SchedulerToQuerier) {}

  // The querier notifies the query-scheduler that it started a graceful shutdown.
  rpc NotifyQuerierShutdown(NotifyQuerierShutdownRequest) returns (NotifyQuerierShutdownResponse);
}

https://grpc.io/docs/languages/go/basics/#defining-the-service is also a good resource

It might be better to leave it as unary for now and dedicate a single PR to bidirectional streaming GRPC later

@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Dec 5, 2024
@ashwanthgoli ashwanthgoli marked this pull request as ready for review December 5, 2024 07:26
@ashwanthgoli ashwanthgoli requested a review from a team as a code owner December 5, 2024 07:26
@owen-d owen-d merged commit c519ab6 into main Dec 9, 2024
59 checks passed
@owen-d owen-d deleted the scheduler-rpc branch December 9, 2024 02:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories

2 participants