Skip to content

Conversation

@tbzhang
Copy link
Contributor

@tbzhang tbzhang commented Dec 2, 2019

This patch will use the multi-survivors for SVM GC to reduce the frequency of full GC.
In the micro benchmark it could reduce the frequently of full GC(the full GC times could change from ~17/260 to ~9/250)
And in the real workload in Alibaba, it could also reduce the frequency of full GC(the full GC times could change from ~30/100 to ~1/200)

@graalvmbot
Copy link
Collaborator

Hello tongbao.ztb, thanks for contributing a PR to our project!

We use the Oracle Contributor Agreement to make the copyright of contributions clear. We don't have a record of you having signed this yet, based on your email address tongbao -(dot)- ztb -(at)- alibaba-inc -(dot)- com. You can sign it at that link.

If you think you've already signed it, please comment below and we'll check.

@graalvmbot
Copy link
Collaborator

tongbao.ztb has signed the Oracle Contributor Agreement (based on email address tongbao -(dot)- ztb -(at)- alibaba-inc -(dot)- com) so can contribute to this repository.

Copy link

@christianwimmer christianwimmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! Tenuring in the young generation is an important feature that was so far missing.

I added several more detailed comments, but here is a high-level summary:

  • The age information should be stored in each HeapChunk.Header for fast access (and it should be consistently called age).
  • Objects in aligned and unaligned chunks should be handled the same way, i.e., large arrays should use survivor spaces too.
  • The policy how many survivor spaces to use should be factored out in a policy, and the default policy should be based on historic promotion rates.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a quite ineffective policy to decide whether to keep an object in the young generation. When the threshold is crossed while a collection is running, suddenly all remaining objects (even the ones from eden) are ending up in the old generation.

I think we should instead collect some statistics about survivor space sizes (by age) and promotion rates, and at the beginning of a collection estimate how many survivor spaces should be used. That works because the chunked heap model does not impose strict size limits on the survivor spaces. If the estimate was too optimistic, the young generation is just a bit larger than expected after a collection, but no other harm is done. The next young generation collection will then use a smaller limit for survivor spaces, i.e., promote earlier to the old generation.

And if the promotion statistics say that the promotion rate from space n to n + 1 is very high (say > 80% or so), then it is also not beneficial to use space n + 1 anymore and instead objects should be promoted to the old generation.

All of that can and should be abstracted in a separate policy class so that it is easy to experiment with different promotion strategies.

@tbzhang
Copy link
Contributor Author

tbzhang commented Dec 3, 2019

Thanks for your contribution! Tenuring in the young generation is an important feature that was so far missing.

I added several more detailed comments, but here is a high-level summary:

  • The age information should be stored in each HeapChunk.Header for fast access (and it should be consistently called age).
  • Objects in aligned and unaligned chunks should be handled the same way, i.e., large arrays should use survivor spaces too.
  • The policy how many survivor spaces to use should be factored out in a policy, and the default policy should be based on historic promotion rates.

Hi Christian, thank you for the suggestions!

It makes sense to me that the age information should be stored in HeapChunk.Header, will do that and update this PR soon.

I didn't see many unaligned chunks in our scenarios, that’s why unsigned chunks are not supported in the original patch. But indeed adding age to unaligned chunks will make this a more complete solution, will do that.

About the policy of survivors, although current implementation is a little rigid, we still see significant improvements after applying it to our workloads. How about optimizing policy in another pull-request? then I can make this patch more focused and also easier for code review.

@christianwimmer
Copy link

About the policy of survivors, although current implementation is a little rigid, we still see significant improvements after applying it to our workloads. How about optimizing policy in another pull-request? then I can make this patch more focused and also easier for code review.

I'm fine with having a single simple policy in the initial PR, and refine later. But then the first policy should really be the simplest one: always use all tenuring spaces (with a sensible default of 2 or so) without any size checks. That can then serve as a baseline for all potentially better policies.

@tbzhang
Copy link
Contributor Author

tbzhang commented Dec 4, 2019

About the policy of survivors, although current implementation is a little rigid, we still see significant improvements after applying it to our workloads. How about optimizing policy in another pull-request? then I can make this patch more focused and also easier for code review.

I'm fine with having a single simple policy in the initial PR, and refine later. But then the first policy should really be the simplest one: always use all tenuring spaces (with a sensible default of 2 or so) without any size checks. That can then serve as a baseline for all potentially better policies.

I agree with that simplest policy as the baseline, will do that then update this PR.

@christianwimmer
Copy link

Thanks for the changes! @christianhaeubl please also have a detailed look at the PR.

Copy link
Member

@christianhaeubl christianhaeubl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! I added a couple of comments. Most of them are minor requests for cleanups.

However, I think that there is one larger issue remaining: the current implementation does not limit the young generation size (HeapPolicy.getMaximumYoungGenerationSize()) correctly. Instead, the size limit for the young generation rather acts as a size limit for eden, see HeapPolicy.Sometimes.maybeCauseCollection().

@graalvmbot
Copy link
Collaborator

Hello tbzhang, thanks for contributing a PR to our project!

We use the Oracle Contributor Agreement to make the copyright of contributions clear. We don't have a record of you having signed this yet, based on your email address best -(dot)- tbzhang -(at)- gmail -(dot)- com. You can sign it at that link.

If you think you've already signed it, please comment below and we'll check.

@tbzhang
Copy link
Contributor Author

tbzhang commented Jan 16, 2020

Thanks for your contribution! I added a couple of comments. Most of them are minor requests for cleanups.

However, I think that there is one larger issue remaining: the current implementation does not limit the young generation size (HeapPolicy.getMaximumYoungGenerationSize()) correctly. Instead, the size limit for the young generation rather acts as a size limit for eden, see HeapPolicy.Sometimes.maybeCauseCollection().

Yes, the usage survivor space should be considered when calling HeapPolicy.Sometimes.maybeCauseCollection().
So, I change the maybeCauseCollection and keep the getMaximumYoungGenerationSize as the original version

Copy link
Member

@christianhaeubl christianhaeubl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doing the changes. I reviewed the whole PR one more time.

@tbzhang tbzhang force-pushed the master branch 2 times, most recently from d19857c to 8fb5658 Compare February 3, 2020 08:48
@tbzhang
Copy link
Contributor Author

tbzhang commented Feb 11, 2020

@christianhaeubl Thank you for your review comments and I look forward to your new feedback. Meanwhile, I wonder if I need to solve the "travis-ci” problem to make it passed

@christianhaeubl
Copy link
Member

Thanks for doing the changes. I added one more minor comment. Besides that, the changes look good now. I will run our internal checks and let you know if any further changes are needed.

Regarding "travis-ci": you should fix the code formatting issue that was found by one of the failed Travis jobs.

Regarding the performance results that you mentioned in your very first comment: did the results change now that the size of the young generation is limited correctly?

@tbzhang
Copy link
Contributor Author

tbzhang commented Feb 13, 2020

Thanks for doing the changes. I added one more minor comment. Besides that, the changes look good now. I will run our internal checks and let you know if any further changes are needed.

Regarding "travis-ci": you should fix the code formatting issue that was found by one of the failed Travis jobs.

Regarding the performance results that you mentioned in your very first comment: did the results change now that the size of the young generation is limited correctly?

Thanks for your reply, I fixed that comment you mentioned.

Also, I fix that "travis-ci" code formatting issue, I will follow the Travis jobs

I tested the build at each update of this PR, and didn't see the significant regression of the frequency of FGCs. the previous size of young generation didn't calculate the survivor used space into the young generation size. After that fix, it will increase the frequency of incremental gc, But in our test, the usage space of survivor space didn't take too much of young size and it could be collected in the next incremental GC. So this fix didn't change the performance result a lot

@christianhaeubl
Copy link
Member

Thanks. I started the process of integrating your PR into master. Our internal checks revealed a couple more issues but I decided to directly fix those in separate commits on top of your PR. If everything goes as expected, your PR will get merged into master in the next couple of days.

@adinn
Copy link
Collaborator

adinn commented Feb 13, 2020

Thanks. I started the process of integrating your PR into master

Hurrah!

@tbzhang
Copy link
Contributor Author

tbzhang commented Feb 14, 2020

Thanks. I started the process of integrating your PR into master. Our internal checks revealed a couple more issues but I decided to directly fix those in separate commits on top of your PR. If everything goes as expected, your PR will get merged into master in the next couple of days.

That's great. Thank you!

@graalvmbot graalvmbot merged commit 48755ed into oracle:master Feb 15, 2020
@fniephaus
Copy link
Member

Fantastic, thanks for the contribution! 🚀

smarr added a commit to smarr/SOMns that referenced this pull request Feb 28, 2020
Commit right before merging the SVM GC

See oracle/graal#1912

Signed-off-by: Stefan Marr <git@stefan-marr.de>
@christianhaeubl
Copy link
Member

This PR caused quite a significant slow-down in many of our internal benchmarks as it does not work particularly well with the current default heap policy that tries to balance the time spent in young vs. old GCs: survivor spaces will increase the time spent in young GCs, which then also increases the number of full GCs.

So, as part of a change that improves overall GC performance significantly, I also changed the default from -H:MaxSurvivorSpaces=4 to -H:MaxSurvivorSpaces=0. This effectively disables survivor spaces by default until we have a better heap policy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

6 participants