Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression when enumerating objects (for shallow cloning) v12.15.0 to v12.16.1 #32049

Closed
dominykas opened this issue Mar 2, 2020 · 18 comments
Labels
performance Issues and PRs related to the performance of Node.js.

Comments

@dominykas
Copy link
Member

dominykas commented Mar 2, 2020

  • Version: 12.16.1
  • Platform: Linux; OSX
  • Subsystem: V8

What steps will reproduce the bug?

The following code became slightly slower going from v12.15.0 to v12.16.1 (likely 12.16.0).

'use strict';

const NUM_METHODS = 250;
const ITERATIONS = 10000;

const obj = {};

for (let i = 0; i < NUM_METHODS; ++i) {

    obj[`item${i}`] = {};
}

console.time();

for (let i = 0; i < ITERATIONS; ++i) {
    const res = Object.assign({}, obj); // { ...obj } performs worse by a similar margin
}

console.timeEnd();

Additional information

I'm still not 100% sure this accounts for all of the slowdowns while looping through object keys that I'm seeing and I'm working on a further reduced test case, but the results here are very consistent.

Adding more items into the object makes things slower, seemingly linearly.

Note that this is quite similar to #31961, however the fix on master in 3d894d0 resolves #31961, but not this issue.

@mmarchini mmarchini added the performance Issues and PRs related to the performance of Node.js. label Mar 2, 2020
@mmarchini
Copy link
Contributor

I ran some tests and found about 5-7% overhead on v12.16.1 (compared to v12.15.0). Is that coherent with your results? If so, this is a small regression and it would be hard to perceive it in a real-world application (in comparison, #31961 will make the operations hundreds of times slower, which will be easier to perceive in an application).

Have you noticed a regression on an application after upgrading from v12.15.0 to v12.16.1? If so, can you share how much overhead you noticed? It might be something else (not covered by this test case).

@dominykas
Copy link
Member Author

dominykas commented Mar 3, 2020

Have you noticed a regression on an application after upgrading from v12.15.0 to v12.16.1? If so, can you share how much overhead you noticed? It might be something else (not covered by this test case).

Yes, we have noticed a slow down in an application, which I have pin pointed to looping through object keys for shallow cloning. When I run one of the unit tests, the unit test has a slowdown from ~2100ms to ~2800ms. So the difference is really perceivable and it seems like it's orders of magnitude.

That said, the test case that I produced here is really simple, and the result that I'm seeing can be influenced by a multitude of other factors, which I'm still trying to work out - so yeah, there definitely can be something else, but it is also definitely related to shallow cloning as changing that to an assignment makes the issue go away.

I'll be working to build up a better reduced case...

@mmarchini
Copy link
Contributor

the unit test has a slowdown from ~2100ms to ~2800ms

That's a 33% slowdown, so even if you're affected by this issue, it's unlikely to be causing all the overhead.

If you can provide a test case closer to your unit tests (without sensitive code), we can help pinpoint the issue. Or maybe a CPU profile of your unit tests (either with --cpu-prof, --prof or Linux perf).

@dominykas
Copy link
Member Author

dominykas commented Mar 5, 2020

That's a 33% slowdown, so even if you're affected by this issue, it's unlikely to be causing all the overhead.

Not exactly. The unit test (call it an integration test if you will - it exercises an endpoint and goes through the whole app) in question does a lot lot more than just the shallow cloning, and there's definitely slow downs in other parts of it.

However, there is one method, which does shallow cloning of two objects (each with 100-200 keys). That method gets called ~1000 times. The grand total (sum of durations based on process.hrtime()) for the time spent just inside that method is:

  • 12.15.0: 35ms
  • 12.16.1: 170ms

(For comparison: if I remove the shallow cloning, the method in both cases takes 2-3ms total)

So that's a 5x slowdown just there. I think the rest of the slowdown is fairly spread out across the rest of the system as it isn't obvious from the flame charts, but I'll have to investigate that too at some point.

If you can provide a test case closer to your unit tests

That's the tricky part, right? :) I can't figure out what is special about the objects being cloned, because the reduced case does not exhibit the same behavior. There is no deopt in the method, nothing to grab onto... It might also be a consequence of some other conditions? Possibly GC kicking in aggressively or something like that?

Anyways, working on this...

Or maybe a CPU profile of your unit tests

I'm definitely not able to share the CPU profile - any tips on what I should be looking for inside? That's the same data as you can see in dev tools or clinic flame, right?

@dominykas
Copy link
Member Author

dominykas commented Mar 5, 2020

OK, so --trace_gc does log 3x more messages on 12.16.1 and memoryUsage() is also higher:

15:

{
  rss: 188 964 864,
  heapTotal: 133 545 984,
  heapUsed: 111 016 104,
  external: 1 729 661
}

16:

{
  rss: 271 081 472,
  heapTotal: 225 595 392,
  heapUsed: 200 679 632,
  external: 1 771 952
}

If I skip the shallow cloning and return the same instance, I get roughly the same amount of GC messages log, and the memory usage is a lot closer:

15:

{
  rss: 188 424 192,
  heapTotal: 131 973 120,
  heapUsed: 107 757 448,
  external: 1 737 081
}

16:

{
  rss: 189 775 872,
  heapTotal: 144 068 608,
  heapUsed: 118 096 208,
  external: 1 780 389
}

I think the script that I have at the top post also exhibits higher memory usage - not sure if any of this is related, but figured it's worth testing/sharing...


I also started looking at various V8 options that I could play with and I've no idea what --trace-for-in-enumerate does, but it produces 2x more logs on 12.16.1 than on 12.15.0.

@mmarchini
Copy link
Contributor

I wonder if this is a memory leak. You could take a heap snapshot on each version and compare using Chrome DevTools to see which objects are being retained.

Also, are you able to test on v13.10.1? Would be good to double check if the issue is already fixed or not.

@dominykas
Copy link
Member Author

v13.10.1 still has the problem, and so does latest master.

Next week I'll try to work on a better case to reproduce this, but I spotted slowdown (~3x) in other areas where we use various methods of cloning. The one place I did poke does also show increased time for GC.

I have not yet checked the memory snapshot diffs.

I have however tried a couple of experiments:

  • There's some awaits in our unit test - removing those makes the memory usage more stable (expected I suppose), so @mcollina suggested that it could have to do with objects getting moved between new/old space (if I got it right) due to something making the object lifetime tick over some threshold?
  • So I tried deliberately slowing down the shallow clone operation (by running it 50 times) - doing that makes 12.15.0 memory usage and execution time behave the same as 12.16.1, i.e. I can make 12.15.0 be slow (whereas 12.16.1 is slow even without that).
  • I tried running things with --max-semi-space-size=1 - this slows down 12.16.1 (and 13.10.1) significantly, but does not affect 12.15.0.

So it feels it has less to do with GC, but more with how/where the objects get allocated? I'm sorry if I'm not explaining this well - V8 memory management is totally not my area of competence :)

I'm also not really able to repro much of that with the reduced case I have at the top, so I'll keep on working on that next week.

@mcollina
Copy link
Member

mcollina commented Mar 6, 2020

I've tested the above on the following:

  • 12.15.0: 603ms
  • 12.16.1: 664ms
  • 13.10.1: 639ms
  • master: 680ms

The trend is not really helping.


I wonder if this is a memory leak. You could take a heap snapshot on each version and compare using Chrome DevTools to see which objects are being retained.

We looked into that. However forcing gc() actually free up memory.

@mmarchini
Copy link
Contributor

Note that from 13.10.1 and master are using the same V8 version, showing the results are unstable (which is coherent with what I've seen so far), but there's a small overhead. @mcollina can you try the latest node-v8 build as well?

Premature promotion to old space makes sense. If you haven't already, you can use v8.getHeapSpaceStatistics or --trace-gc-verbose to monitor how each space is growing over time. --trace-gc-verbose also shows the total time the program spent on GC, which can help understand if the difference we're seeing is coming from GC or something else.

@mcollina
Copy link
Member

mcollina commented Mar 6, 2020

@mmarchini in node-v8 it gets even worse: 728.514ms (same dedicated server).

This is the output from the --trace-gc-verbose (one round, they are more or less the same):

[11337:0x56129b20e3e0] Fast promotion mode: false survival rate: 0%
[11337:0x56129b20e3e0]      785 ms: Scavenge 6.0 (10.7) -> 2.1 (10.7) MB, 0.1 / 0.0 ms  (average mu = 1.000, current mu = 1.000) allocation failure
[11337:0x56129b20e3e0] Memory allocator,       used:  10960 KB, available: 4232496 KB
[11337:0x56129b20e3e0] Read-only space,        used:    115 KB, available:      0 KB, committed:    116 KB
[11337:0x56129b20e3e0] New space,              used:     44 KB, available:   4047 KB, committed:   8192 KB
[11337:0x56129b20e3e0] New large object space, used:      0 KB, available:   4091 KB, committed:      0 KB
[11337:0x56129b20e3e0] Old space,              used:   1649 KB, available:    170 KB, committed:   1928 KB
[11337:0x56129b20e3e0] Code space,             used:     64 KB, available:      0 KB, committed:    332 KB
[11337:0x56129b20e3e0] Map space,              used:    221 KB, available:      0 KB, committed:    260 KB
[11337:0x56129b20e3e0] Large object space,     used:    128 KB, available:      0 KB, committed:    132 KB
[11337:0x56129b20e3e0] Code large object space,     used:      0 KB, available:      0 KB, committed:      0 KB
[11337:0x56129b20e3e0] All spaces,             used:   2222 KB, available: 4240805 KB, committed:  10960 KB
[11337:0x56129b20e3e0] Unmapper buffering 0 chunks of committed:      0 KB
[11337:0x56129b20e3e0] External memory reported:      9 KB
[11337:0x56129b20e3e0] Backing store memory:    719 KB
[11337:0x56129b20e3e0] External memory global 0 KB
[11337:0x56129b20e3e0] Total time spent in GC  : 12.4 ms

The only conclusion I could draw from this is that the object is allocated in the Large Object Space.
I can upload the full thing if you are interested, but it's very easy to generate from the example.

@mcollina
Copy link
Member

mcollina commented Mar 6, 2020

Ok, I think I've identified part of the problem. In Node 12.15.0 the object did not go into large object space. In Node 12.16.1 and beyond, it moves into large object space.

@gsathya
Copy link
Member

gsathya commented Mar 10, 2020

cc @camillobruni

@dominykas
Copy link
Member Author

dominykas commented Mar 18, 2020

Found another repo where we can see a ~10% total slowdown in the unit test suite when changing from 12.15.0 to 12.16.1. Running clinic flame in 12.16.1 has the following hottest frames (nowhere to be seen on 12.15.0):

Once again - looping through an object to construct a new object...

@camillobruni
Copy link
Contributor

Tracked in V8 as well: https://bugs.chromium.org/p/v8/issues/detail?id=10339
From what we've measured this is not a GC regression but rather a V8 runtime issue.

@mcollina
Copy link
Member

Closed, this seems fixed.

@dominykas
Copy link
Member Author

Does this need any action to get into v14.x?

@mcollina
Copy link
Member

mcollina commented Jun 19, 2020

I thought it was already shipped in v14, normally PRs are backported after 2 week in a current release.

@dominykas
Copy link
Member Author

dominykas commented Jun 19, 2020

My bad, it seems to actually be there: 748720e, not sure how I missed it. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issues and PRs related to the performance of Node.js.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants