Conversation
🤖 GitHub commentsJust comment with:
|
|
This pull request is now in conflicts. Could you fix it? 🙏 |
|
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
0adb194 to
41ebd88
Compare
|
This pull request is now in conflicts. Could you fix it? 🙏 |
41ebd88 to
04a16c2
Compare
🔍 Preview links for changed docs |
04a16c2 to
14673f7
Compare
|
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
…ing up the global one (which is set automatically), which may not be correct (because there may be several).
…through CEL's HTTP client.
…cessing of env vars.
Co-authored-by: Janeen Mikell Roberts <57149392+jmikell821@users.noreply.github.com>
Co-authored-by: Janeen Mikell Roberts <57149392+jmikell821@users.noreply.github.com>
Co-authored-by: Dan Kortschak <dan.kortschak@elastic.co>
117e012 to
d096d78
Compare
orestisfl
left a comment
There was a problem hiding this comment.
Use of https://pkg.go.dev/go.opentelemetry.io/contrib/exporters/autoexport to interpret OTel environment variables and set up the exporter was removed in favor of manual handling, which seems to be standard when using the Go SDK (unlike implementations in some other languages).
I was not aware that this is the accepted standard. What is that based on?
Typically OTel tracing considers the whole process to be the "resource".
However, in this case the resource is the input instance. For that
reason a trace provider is created specifically for the input instance
and it is not explicitly set as the global tracer provider.
I would perhaps expect "filebeat" perhaps to be the resource, otherwise I would be concerned we would spam the service inventory with every single input name.
OTel traces for the CEL Input in Elastic Observability
Any idea why it's listed as "unknown" on the path above?
| metrics, reg := newInputMetrics(env.MetricsRegistry, env.Logger) | ||
|
|
||
| ctx := ctxtool.FromCanceller(env.Cancelation) | ||
| otelTracerProvider, err := otel.NewTracerProvider(ctx, getResourceAttributes(env, cfg), i.Name()) |
There was a problem hiding this comment.
The Shutdown method is never called on the provider. Could that lead to unexpected data loss?
| metrics, reg := newInputMetrics(env.MetricsRegistry, env.Logger) | ||
|
|
||
| ctx := ctxtool.FromCanceller(env.Cancelation) | ||
| otelTracerProvider, err := otel.NewTracerProvider(ctx, getResourceAttributes(env, cfg), i.Name()) |
There was a problem hiding this comment.
Q: Could this lead to significant overhead for multiple inputs? Could we instead make this call once and set any run-specific attributes in the span level?
Yes, that's what I'm seeing. |
andrewkroh
left a comment
There was a problem hiding this comment.
I ran the PR locally. Works as I expected, minus the few things I commented on. 👍
- URL query parameter redaction works
- Header redaction works
- Default sensitive-word detection works
- File tracer (
resource.tracer) includestrace.idandspan.id - Filebeat debug logs include
trace.idandspan.idfields - Resource attributes are populated
BEATS_OTEL_TRACES_DISABLE=celdisables trace export as expected
| case <-waitCtx.Done(): | ||
| runSpan.SetStatus(codes.Unset, waitCtx.Err().Error()) | ||
| return waitCtx.Err() |
There was a problem hiding this comment.
When <-waitCtx.Done() fires, the function returns without calling waitSpan.End(). Please add waitSpan.End() before the return.
| if !ok { | ||
| metricsRecorder.AddProgramRunDuration(ctx, time.Since(start)) | ||
| metricsRecorder.AddProgramRunDuration(execCtx, time.Since(start)) | ||
| continue |
There was a problem hiding this comment.
The loop continues without ending execSpan. It looks like we are leaking the span?
| errorSpans(err, end{execSpan}, runSpan) | ||
| return errors.New("unexpected missing events array from evaluation") |
There was a problem hiding this comment.
IIUC, at this point, err may be nil here. A fresh error should be used instead:
err := errors.New("unexpected missing events array from evaluation")
errorSpans(err, end{execSpan}, runSpan)
return err
| err := fmt.Errorf("unexpected type returned for evaluation cursor element: %T", cursors[0]) | ||
| metricsRecorder.AddProgramRunDuration(pubCtx, time.Since(start)) | ||
| errorSpans(err, end{pubSpan}, end{execSpan}, runSpan) | ||
| return fmt.Errorf("unexpected type returned for evaluation cursor element: %T", cursors[0]) |
There was a problem hiding this comment.
Duplicated error.
| return fmt.Errorf("unexpected type returned for evaluation cursor element: %T", cursors[0]) | |
| return err |
| } | ||
|
|
||
| func (rt *ExtraSpanAttribsRoundTripper) RoundTrip(r *http.Request) (*http.Response, error) { | ||
|
|
There was a problem hiding this comment.
I think this file wants to be gofumpt -w -extraed. 😄
| span.SetAttributes(attribute.StringSlice( | ||
| "url.full", | ||
| []string{sanitizedURLString(r.URL, rt.shouldRedact)}, |
There was a problem hiding this comment.
Per OTel semantic conventions, url.full is a string type, not an array. This should be
attribute.String("url.full", sanitizedURLString(r.URL, rt.shouldRedact)).
https://opentelemetry.io/docs/specs/semconv/registry/attributes/url/#url-full
| func (rt *ExtraSpanAttribsRoundTripper) RoundTrip(r *http.Request) (*http.Response, error) { | ||
|
|
||
| span := trace.SpanFromContext(r.Context()) | ||
| if span != nil && span.SpanContext().IsValid() { |
There was a problem hiding this comment.
trace.SpanFromContext never returns nil. It may return a noop, but never nil.
| if span != nil && span.SpanContext().IsValid() { | |
| if span.SpanContext().IsValid() { |
| return resp, err | ||
| } | ||
|
|
||
| if span != nil && span.SpanContext().IsValid() { |
There was a problem hiding this comment.
| if span != nil && span.SpanContext().IsValid() { | |
| if span.SpanContext().IsValid() { |
| // TraceIDKey is key used to add a trace.id value to the context of HTTP | ||
| // requests. The value will be logged by LoggingRoundTripper. | ||
| const TraceIDKey = contextKey("trace.id") |
| return false | ||
| } | ||
|
|
||
| var sensitiveWords = map[string]struct{}{ |
There was a problem hiding this comment.
The word "credentials" in Access-Control-Allow-Credentials is a CORS concept, not a secret. This is a common header and we should avoid redacting all the time. Users / developers would need to add this to otel.trace.unredacted to see the value, which is unlikely to occur to them.
Maybe we need a set of known safe headers...?
var knownSafeNames = map[string]struct{}{
"access-control-allow-credentials": {},
// etc.
}
We will have to be vigilant on our code reviews for packages to make sure that we are setting the unredacted for things like sort_key, country_code, etc. We can probably put something about this into our code review wiki page for developing packages, and hopefully AI tools can help keep us straight.
Proposed commit message
There were a couple of things for which the initial approach changed:
ContextInjectorrewrite requests in the client used by CEL, which also solves the problem for OAuth2 requests.There are some differences from the attribute and other names given in the planning document:
cel.periodic.program_count→ Changed to
cel.periodic.execution_countto matchcel.program.execution.cel.program.batch_count→ Removed. It would only indicate whether an execution returned any events or not. Any other batching is internal to the CEL evaluation.
cel.{periodic,program}.success→ Removed, in favor of span status.
cel.program.error_message→ Not set. Uses
SetStatusandRecordErrorinstead.BEATS_OTEL_TRACING_DISABLE→ Changed to
BEATS_OTEL_TRACES_DISABLEto matchOTEL_TRACES_EXPORTERandOTEL_EXPORTER_OTLP_TRACES_*.Handling of span-specific context and loggers is somewhat cumbersome. Refactoring to extract separate functions from
runfor separate stages of processing will help to tidy this up and is planned as follow-up work: #48464.Checklist
stresstest.shscript to run them under stress conditions and race detector to verify their stability../changelog/fragmentsusing the changelog tool.How to test this PR locally
You can use
otel-desktop-vieweras simple receiver and viewer of OTel traces:In the terminal with those environment variables set, you can run the input with an example that includes OAuth2 and multiple requests per period, like this:
You can also use Elastic Observability to receive and view OTel traces, but it involves a bit more setup.
Bring up the Elastic Stack:
In Kibana, go to "Management > Integrations" and go to the "APM" integration page. Click "Manage APM integration in Fleet", then "Add Elastic APM". Under "Configure integration > Integration settings > General > Server configuration", change the Host and URL settings to use '0.0.0.0' instead of 'localhost'. Under "Where to add this integration?", choose "Existing hosts > Elastic Agent (elastic-package)". Then click "Save and continue".
Now, back in the terminal, find the IP address of the agent container.
Use that as the destination for OTel traces:
Then from the terminal with those settings you can run the input using example Filebeat configuration as above.
To view the exported traces in Kibana, go to "Observability > Applications > Traces".
Related
Use cases
This tracing is to be used for troubleshooting, particularly for Agentless.
Screenshots
OTel traces for the CEL Input in Elastic Observability:
