Skip to content

perf: Reuse Values in dataobj.Reader to reduce allocs #16988

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Apr 14, 2025

Conversation

benclive
Copy link
Contributor

@benclive benclive commented Apr 1, 2025

What this PR does / why we need it:
Updates the dataobj readers to re-use memory where possible.

  • This uses the []Value slices passed in by the caller to store & reuse memory for batch operations.
  • I had to do this at 3 layers: The Values themselves, copying Values into logs.Record during Decode and then copying the internal logs.Record over to the dataobj.Record for output to the user.
    • From my testing, we can gain another 5-10% perf increase by eliminating that last copy, but it means we won't have clean separation between internal and external types so I left it in for now.

Benchmark Results:

goos: darwin
goarch: arm64
pkg: github.com/grafana/loki/v3/pkg/dataobj
cpu: Apple M3 Max
              │ before_logs_reader.txt │        after_logs_reader.txt        │
              │         sec/op         │   sec/op     vs base                │
LogsReader-14              2.648m ± 1%   2.325m ± 2%  -12.19% (p=0.000 n=10)

              │ before_logs_reader.txt │         after_logs_reader.txt         │
              │          B/op          │     B/op       vs base                │
LogsReader-14            1282.8Ki ± 5%   218.0Ki ± 22%  -83.01% (p=0.000 n=10)

              │ before_logs_reader.txt │       after_logs_reader.txt        │
              │       allocs/op        │ allocs/op   vs base                │
LogsReader-14             60472.5 ± 0%   455.0 ± 0%  -99.25% (p=0.000 n=10)

The majority of remaining allocs come from downloading the pages themselves. I'll look at them separately to see if we can improve that but I wanted to get this PR reviewed first:
image

Which issue(s) this PR fixes:
Fixes https://github.com/grafana/loki-private/issues/1471

Timestamp time.Time
Metadata labels.Labels
Line []byte
MdValueCaps []int
Copy link
Contributor Author

@benclive benclive Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to store capacities in a few places in this PR. Is there a better way to return buffers to their original capacity when reusing them?

Every time something turns into a string, we lose the original capacity of the underlying slice so I added this variable to be able to resize them again later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me at where we lose the original capacity of the slice? It's not obvious to me from reviewing the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We lose the underying capacity whenever we turn a []byte into a string.
This one is in Decode: pkg/dataobj/internal/sections/logs/iter.go:141
Then we have to do the same thing for the data in the logs_reader because we copy from an internal logs.Record to an external dataobj.Record.

@benclive benclive changed the title Reuse values in dataobj reader Apr 1, 2025
@pull-request-size pull-request-size bot added size/XL and removed size/L labels Apr 2, 2025
@benclive benclive force-pushed the reuse-values-in-dataobjReader branch from 5a11309 to f6ae7de Compare April 2, 2025 15:51
@benclive benclive marked this pull request as ready for review April 2, 2025 16:49
@benclive benclive requested a review from a team as a code owner April 2, 2025 16:49
@benclive benclive requested review from rfratto and ashwanthgoli April 2, 2025 16:49
MdValueCaps []int
}

func (r *Record) DeepCopy() Record {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I should have this or not. It's supposed to be a helper to a caller can copy elements from the buffer they passed in and can re-use the buffer. Maybe a better solution is to create new buffers from the client when something is returned and never pass the same buffer into a Read call?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels reasonable for now (we can optimize or change it as necessary in the future). It natural to me that we provide the disclaimer: "The iterator assumes control of the passed slice; results are only valid until the next iteration. If the caller needs to store results for later use (across iterations), they must deep copy them".

// GrowToCap grows the slice to at least n elements total capacity.
// It is an alternative to slices.Grow that increases the capacity of the slice instead of allowing n new appends.
// This is useful when the slice is expected to have nil values.
func GrowToCap[Slice ~[]E, E any](s Slice, n int) Slice {
Copy link
Contributor Author

@benclive benclive Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found myself using this calculation in quite a few places so I extracted it. The regular call without subtracting the current len would double the slice on every iteration since we tend to have len==cap and assign to values directly thoughout the readers.

Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where this is heading, but having panics while running logql/bench that need to be looked into.

@@ -69,7 +69,7 @@ func IterSection(ctx context.Context, dec encoding.StreamsDecoder, section *file
})
defer r.Close()

var rows [1]dataset.Row
var rows [3]dataset.Row
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this specifically changed? There may be nice benefits from batching during iteration, but IMO that can be an isolated change in another PR applied to all our iterator types.

r.buf = r.buf[:len(s)]

// Fill the row buffer with empty values so they can re-use the memory we pass in.
for i := range r.buf {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could be extracted so it's only called when necessary. This is because *LogsReader.Read is repeatedly called with the same buffer. We can take advantage of this to minimize re-initialization of r.buf:

bufLen := len(r.buf)
sLen := len(s)
if bufLen < sLen {
  slicegrow.GrowToCap
  r.populate(r.buf[bufLen:])
}

pseudo-code, but hopefully it gets the idea across.

}

// Pre-allocate memory for metadata
for i := range s {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit(unrelated, future): It looks like we're initializing metadata columns even if they may be unused. This can (a) likely be minimized so we only pay initialization costs when necessary, this time on the s []Record argument instead of the internal buf. This is a bit more complicated because caching that would likely require a type wrapper over []Record. And (b) it looks like we're initializing capacity based on the total metadata columns (which can be very large), rather than those requested by the query.

Neither optimization needs to be applied today, but it looks like there's some significant opportunity here.

n, err := r.reader.Read(ctx, r.buf)
if err != nil && !errors.Is(err, io.EOF) {
return 0, fmt.Errorf("reading rows: %w", err)
} else if n == 0 && errors.Is(err, io.EOF) {
return 0, io.EOF
}

// Pre-allocate memory for metadata
for i := range s {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same optimization opportunities as explained in logs reader

Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! I really appreciate the effort here, since allocations are our biggest pain point right now.

I have a few primary concerns at the moment before we merge:

  • Creating a dataset.StringValue and using it as part of a read can cause obscure memory bugs, even though doing so is technically allowed by the API.

  • It also looks like some of our internal optimizations are starting to leak to callers, such as clearing capacities in tests or LogsReaders/StreamReaders preallocating/copying memory.

I think there's still a little bit of work left to reduce how much complexity is being pushed downstream. If you're looking for someone to pair with, I'm happy to help.

// Depending on which type this value was previously used for dictates how we reference the memory.
switch v.any.(type) {
case stringptr:
dst = unsafe.Slice(v.any.(stringptr), int(v.cap))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned with us doing this for stringptr:

If a caller passes in a string Value created via StringValue, its memory would get overwritten here and violate Go's memory model, which can cause bugs that are very difficult to track down, if the program doesn't segfault outright.

IMO, if it's a stringptr, we should consider eating the cost and allocating a new slice, just to save ourselves from dealing with obscure memory corruption bugs. (And then we should also avoid using string values wherever possible)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have two options here:

  1. Allocate when creating a StringValue so the referenced memory is always owned by the Value
  2. Remove strings completely and just use bytearray values. I think that would clear up a bunch of the handling with labels since converting []byte to strings is what loses the slice capacity information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can remove strings completely, I'm definitely in favour. That would be a breaking change to the format (we wouldn't be able to read old data objects that use strings); I assume that's ok given how early on we are?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I've added a commit to remove string types. It's a big change so this PR will be XXL, but it should be self-contained.
Without changing the dataobj.Record to stop using labels.Labels, it doesn't have a big impact on the capacities that need to be maintained. logs.Decode is simpler, though, because the internal logs.Record type can be changed to use []byte instead of string for values.
Let me know what you think. I'm happy to revert this commit and tackle it in a future PR if that would be clearer!

@@ -82,7 +82,9 @@ func Test(t *testing.T) {
for result := range logs.Iter(context.Background(), dec) {
record, err := result.Value()
require.NoError(t, err)
actual = append(actual, record)
next := record.DeepCopy()
next.MdValueCaps = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the caller need to zero out MdValueCaps? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for testing, but require.Equal also compares the MdValueCaps values. I don't want the test to depend on them because they are an implementation detail, so I zero them out.
I'm happy to move them to a helper method which ignores them. They would also disappear if we remove strings, so I'll trial that first.

Timestamp time.Time
Metadata labels.Labels
Line []byte
MdValueCaps []int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me at where we lose the original capacity of the slice? It's not obvious to me from reviewing the PR.

@benclive benclive requested a review from rfratto April 8, 2025 12:36
Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a pass over the most recent three commits; I'll give this an overall pass shortly

@benclive benclive force-pushed the reuse-values-in-dataobjReader branch from 643abda to 6454b3f Compare April 9, 2025 17:05
@benclive
Copy link
Contributor Author

benclive commented Apr 9, 2025

Latest results on a noop pipeline run

goos: darwin
goarch: arm64
pkg: github.com/grafana/loki/v3/pkg/logql/bench
cpu: Apple M3 Max
                                                     │ before.txt  │              after.txt              │
                                                     │   sec/op    │   sec/op     vs base                │
LogQL/dataobj/{region="ap-southeast-1"}_[FORWARD]-14   25.40 ± 27%   19.25 ± 34%  -24.22% (p=0.019 n=10)

                                                     │  before.txt   │               after.txt               │
                                                     │     B/op      │     B/op       vs base                │
LogQL/dataobj/{region="ap-southeast-1"}_[FORWARD]-14   7.695Gi ± 11%   3.601Gi ± 33%  -53.21% (p=0.001 n=10)

                                                     │  before.txt  │              after.txt              │
                                                     │  allocs/op   │  allocs/op   vs base                │
LogQL/dataobj/{region="ap-southeast-1"}_[FORWARD]-14   138.04M ± 0%   56.35M ± 0%  -59.18% (p=0.000 n=10)

Latest results on LogReader alone:

goos: darwin
goarch: arm64
pkg: github.com/grafana/loki/v3/pkg/dataobj
cpu: Apple M3 Max
              │ before_logs_reader.txt │       after_logs_reader.txt        │
              │         sec/op         │   sec/op     vs base               │
LogsReader-14              2.648m ± 1%   2.462m ± 1%  -7.03% (p=0.000 n=10)

              │ before_logs_reader.txt │        after_logs_reader.txt         │
              │          B/op          │     B/op      vs base                │
LogsReader-14            1282.8Ki ± 5%   223.3Ki ± 1%  -82.59% (p=0.000 n=10)

              │ before_logs_reader.txt │       after_logs_reader.txt        │
              │       allocs/op        │ allocs/op   vs base                │
LogsReader-14             60472.5 ± 0%   555.0 ± 0%  -99.08% (p=0.000 n=10)
@benclive benclive requested a review from rfratto April 9, 2025 17:18
Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 Great work! I'm really happy with how this turned out.

I won't have any more rounds of feedback after this, but I'll approve as soon as I'm able to run the LogQL chunk-to-dataobj comparison tests over this.

Comment on lines +33 to +34
presenceReader *bufio.Reader
valuesReader *bufio.Reader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feedback for a future PR (I'm also happy to contribute this): over the weekend I found it's not really necessary for us to wrap the readers in a bufio.Reader here, and we could update the underlying readers to implement the interfaces we want.

definitely not something I think this PR should do though

Comment on lines 123 to 140
// Copies the label names from src to dst, while re-using the memory for values
func copyLabelNames(dst labels.Labels, src labels.Labels) labels.Labels {
for i, label := range src {
dst[i].Name = strings.Clone(label.Name)
dst[i].Value = label.Value
}
return dst
}

// Copies the label values into a new slice. We re-use the label names as they were previously copied by Process.
func copyLabelValues(in labels.Labels) labels.Labels {
lb := make(labels.Labels, len(in))
for i, label := range in {
lb[i] = labels.Label{Name: label.Name, Value: strings.Clone(label.Value)}
}
return lb
}

Copy link
Member

@rfratto rfratto Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like these are unused (unfortunately it looks like we don't use the linter which helps you remove unused unexported code)

(there might be other unused functions we introduced in earlier commits, but they didn't stand out to me)

Comment on lines +102 to +103
stream.LbValueCaps = slicegrow.GrowToCap(stream.LbValueCaps, labelColumns)
stream.LbValueCaps = stream.LbValueCaps[:labelColumns]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to get rid of these for simplicity; is is possible for us to do something similar to what we did with RecordMetadata in logs.Decode?

Copy link
Contributor Author

@benclive benclive Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could but I don't want to do it now.
We use the same streams.Stream object & labels.Labels functionality quite a bit in the getOrAddStream method (hash, equality, sorting, etc.) and I didn't want to replace it all, at least not in this PR.
In the logs section we don't use that functionality so it was easy to replace labels.Labels with a custom RecordMetadata which references a []byte field.

Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storage equality tests look good! :shipit:

@benclive benclive merged commit e5784d7 into main Apr 14, 2025
61 checks passed
@benclive benclive deleted the reuse-values-in-dataobjReader branch April 14, 2025 12:56
rfratto added a commit that referenced this pull request May 21, 2025
…7762)

Originally, the dataobj package was a higher-level API around sections. This
design caused it to become a bottleneck:

* Implementing any new public behaviour for a section required bubbling it up
  to the dataobj API for it to be exposed, making it tedious to add new
  sections or update existing ones.

* The `dataobj.Builder` pattern was focused on constructing dataobjs for
  storing log data, which will cause friction as we build objects around other
  use cases. 

This PR builds on top of the foundation laid out by #17704 and #17708, fully
inverting the dependency between dataobj and sections:

* The `dataobj` package has no knowledge of what sections exist, and can now be
  used for writing and reading generic sections. Section packages now create
  higher-level APIs around the abstractions provided by `dataobj`.

* Section packages are now public, and callers interact directly with these
  packages for writing and reading section-specific data.

* All logic for a section (encoding, decoding, buffering, reading) is now fully
  self-contained inside the section package. Previously, the implementation of
  each section was spread across three packages
  (`pkg/dataobj/internal/encoding`, `pkg/dataobj/internal/sections/SECTION`,
  `pkg/dataobj`).

* Cutting a section is now a decision made by the caller rather than the
  section implementation. Previously, the logs section builder would create
  multiple sections. 

For the most part, this change is a no-op, with two exceptions:

1. Section cutting is now performed by the caller; however, this shouldn't
   result in any issues. 

2. Removing the high-level `dataobj.Stream` and `dataobj.Record` types will
   temporarily reduce the allocation gains from #16988. I will address this after
   this PR is merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 participants