Skip to content
This repository has been archived by the owner on Aug 11, 2022. It is now read-only.

investigating npm 3 perf #10380

Closed
samccone opened this issue Nov 13, 2015 · 74 comments
Closed

investigating npm 3 perf #10380

samccone opened this issue Nov 13, 2015 · 74 comments

Comments

@samccone
Copy link

npm3 perf πŸ”

TL;DR: We've got a network πŸ“‰ utilization regression that's at the root of the npm3 install performance. Here's the bird's 🐦 eye view:

How'd we get here? Read on…


After using npm3 for a while now, it is clear that there has been a slowdown in the time to install packages. Although a level of slowdown was to be expected given the new complexity around the flat tree and deduping dependencies, a 2 - 3x (and sometimes worse) slowdown across installs has become somewhat painful across the node community.

Instead of just yelling at the world, I decided to hop in and do some profiling work to better understand what is happening under the hood and observe (via profiling) the cause and perhaps just maybe some ideas to make it fast again. (yay OSS ✨)

Profiling NPM's JS Activity

To start this journey into answering the question of why something is slow, the first thing that I wanted to do was gather some numbers and some metrics. As anyone who has ever done or attempted to try to profile in node.js, you know that upon a basic google search you are immediately thrown into a dystopian world of outdated stackoverflow answers, gists that redirect to stackoverflow questions, that end in WIP, assorted comments buried deep within the threads on github, and enterprise debugging tools :( ….

However if you are persistent you eventually will stumble upon the two magical and useful things. First, the ability to get v8 logs out of node by passing the --prof flag which gives us a dandy v8 trace you can load in chrome://tracing… In our case this is not so useful.

The second thing that you will find is this repo node-inspector/v8-profiler which gives us exactly what we want… a CPU profile of our node process. A small patch then hooks in the instrumentation: npm 3 patch: (samccone@64cc5ca) npm 2 patch (kdzwinel@3978acc)

Profiling NPM's Network Activity

There was one major piece of the perf story that I was missing and that I knew was really important to get, on top of just the cpu profile, and that was the network activity. I had the hypothesis that one of the reasons why the new version of npm got slower was not due to a CPU bound issue #8826 (comment) but rather a change in how the requests to the npm registry were being made. To get a HAR dump of the process I got in touch with @kdzwinel who has been working on just this thing https://github.com/kdzwinel/betwixt . He was able to send over raw HAR dumps from both npm 2 and npm 3 πŸŽ‰

For my test case I chose to use the package yo https://github.com/yeoman/yo as the example. For both npm2 and npm3 I made sure to install the package with no local npm cache, and also in forced http mode (due to a limitation around the current HAR dumping)

After gathering both of the results I was left with CPU profiles and the HAR dumps for each version of npm installing the same project under the same kind of constraints. It was now time to take a look at the numbers inside of chrome devtools profiler and a handy HAR viewer to try and prove my initial hypothesis about network perf vs cpu bound perf.

First let’s take a look at the CPU profile to see where time is being spent, and see if we can visually see a difference between what is going on in npm2 and npm3.

Results: NPM2 JS Activity

screen shot 2015-11-13 at 8 40 35 am

Results: NPM3 JS Activity

screen shot 2015-11-13 at 8 40 47 am

Well that is interesting, npm3 seems to be a much more sparse flame chart when installing the exact same package (yo), with large gaps in between brief periods of javascript execution. When we zoom in closer to get a look at exactly what is happening between the blank spots we get our first lead

screen shot 2015-11-13 at 8 56 44 am

It looks like each of the spikes in-between the blank space on the flame chart is when npm is handling the result from the registry. This is repeated over and over again until all of the packages have been resolved….

Now this pattern of downloading things, and moving on seems totally 100% reasonable, but still at this point I am not sure why there is a big difference between npm2 and npm3, why are there gaps in the timeline of npm3 and not npm2 what exactly is different?

At this point, I am pretty sure the answer to these questions is all within the HAR dump, so let's take a look.

Results: NPM2 Network Activity

screen shot 2015-11-13 at 9 31 39 am

Results: NPM3 Network Activity

screen shot 2015-11-13 at 9 31 28 am

Right off the bat you notice the difference in how the network requests are being made, instead of groups of parallel requests as in npm2, npm3 seems to be doing its requests in a serial fashion (thus taking around 4x as long), where it waits for previous requests to complete before continuing the chain of requests. I further confirmed this by zooming into the waterfall of requests in npm3 where you can pretty clearly see this exact stair stepping of dependant requests

On the initial request for yo everything lines up, we see a batch of requests for all of the root level dependency meta but after this point npm3 and npm2 diverge significantly

screenshot from 2015-11-13 09 29 48

Further down in the chain of requests we start to see this pattern, where we make a request of a package's metadata and then make a request for the package, and then repeat the process for all of the packages dependencies. This approach of walking through the dependencies really starts to add up because it looks like every batch of requests is dependent of the previous finishing, thus if a single request takes some time (like rx in this case) it pushes the entire request start time of many other packages.

screenshot from 2015-11-13 09 30 41

The end result here, is that we end up with a very sparse dependent network requests that for packages with many dependencies such as yo, is very slow for the end user.


Closing thoughts and ideas.. πŸ’­

npm3 under the hood is operating in a totally new way, and there is a cost for all of the awesome new features such as a deduped tree and a flat tree structure, however there is a pretty significant cost πŸ’° when it comes to install time. Topically it seems like investigating ways to prebundle on the server and or batch requests to the registry would boost perf significantly, there is a fixed cost per network request that npm is making and just the volume of said requests adds up quickly. Figuring out a way to cut this number would boost performance significantly 🚀 .

The other big takeaway here was how hard it was to get these kind of perf metrics out of node, one day this will be fixed when nodejs/node#2546 lands…

Hopefully this was somewhat helpful and can help others who know the code way better than I look into the causes of these perf hits 🐎 πŸ“ˆ .


special thanks to @paulirish and @kdzwinel πŸ’–

@samccone samccone changed the title investigating the npm 3 perf investigating npm 3 perf Nov 13, 2015
@addyosmani
Copy link

cc @othiym23. Thanks for digging into the performance regressions in such detail, @samccone. Super appreciated.

@paulirish
Copy link
Contributor

While the above network details are from forced HTTP, I was able to get one chart of insight from npm3 on HTTPS (the default).

npm@3 install -g gulp with Microsoft Visual Round Trip Analyzer

image

Basically indicates we have ~122KB/sec average throughput over a 22s install time. And the lulls in network utilization match the above network traces.

Great writeup, @samccone !

@glenjamin
Copy link
Contributor

This is some excellent analysis! It looks very similar to profiling results i've seen from a previous project. I'd built a crawler in Node which had to crawl a deeply nested tree structure somewhat similarly to npm, the code itself is unfortunately proprietary, but perhaps some of the approaches I took would be relevant here.

From the looks of the npm3 graph, there's a starvation in work while waiting for network responses, I'll have to poke around into the code to confirm, but to me this implies that npm is doing a depth-first traversal rather than a breadth-first

I think it should be possible to model the downloading/tree-walking process as a work queue with a certain amount of concurrency - I used https://github.com/glenjamin/async-pool-cue to achieve this previously.

The rough sketch looks something like this:

  1. Create a queue for outgoing requests with a limited concurrency (say, 10 reqs at once)
  2. Add each top-level dependency to this queue with priority 100
  3. When a response is received for each of these, add to queue with priority 99
  4. Repeat ad nauseum, new items receive priority of max - depth

The reason I believe this to be effective is that it prioritizes requests which are more likely to produce more requests, therefore keeping the queue topped up and avoiding work droughts.

It might also be beneficial to use https://www.npmjs.com/package/agentkeepalive on the "connection pool", and maintain a separate pool with a distinct concurrency limit for the tarball downloads.

@kdzwinel
Copy link

You can explore HAR files that Sam refers to here (give it a sec to load the timeline):

@othiym23
Copy link
Contributor

Thank you for the thorough and detailed investigation, @samccone (and for the HTTPS followup, @paulirish – I'm glad to have so much of Google's help here). I agree that this information is still frustratingly hard to pull together, and I'm impressed by the determination that it took to get this far.

Background & context

One of the major goals @iarna and I (and @isaacs, who was the original impetus for the installer rewrite) had for npm@3 was to rework the installer to do a couple things:

  1. break the install process into a set of discrete phases
  2. have a way of capturing the "real" dependency tree, the "ideal" dependency tree, and the set of operations necessary to convert the one into the other

This is because it was very hard to follow the operation of the installer before, even if you were intimately familiar with the code (there are maybe a handful of people who can claim that familiarity, and I'm not sure I can count myself among them). Because much (if not most) of the functionality of the installer accreted organically over time, features like bundledDependencies, Git and local dependencies, npm dedupe, and npm shrinkwrap interacted in surprising (even confounding) ways, the process was often non-deterministic, and race conditions and edge cases were everywhere. So, the first purpose of npm@3 was to get ourselves to a place where the installer was behaving in a way we could understand, and to do so repeatably and reliably.

You'll note that "deduping by default" and "flattening the tree" aren't in the list above – @iarna noticed while she was doing exploratory work (a year ago – software is hard) that these would be easy features to add, and would greatly improve the usefulness of npm for both Windows and front-end developers. More significantly, they don't have a really major impact on performance – the bulk of the work is in resolving dependencies, and building up the two trees in a form such that they can be used to produce the final order of execution for the install. So the changes to performance characteristics seen here are not due to those changes, but the deeper redesign of the install process.

Everyone involved in this project knew that there was a certain amount of slowdown inherent in this work. There are two primary things that are inherently slower in npm@3 vs npm@2, they're related, and @samccone has pointed straight at one of them.

  1. When checking the status of already-installed packages in node_modules, npm@2 and earlier only looked at the top level of dependencies in the tree. Because the new install algorithm is realizing the real tree, it ends up looking at much more of the dependency tree. Unless you were a really avid user of npm dedupe, this is probably going to result in a significant (and more or less unavoidable, without complex workarounds) performance hit.
  2. Many of the operations that were previously done in a catch-as-catch-can order, where operations were triggered as soon as they were needed, are now serialized. This is a natural consequence of the multi-stage installer, and makes it much easier to understand what npm is doing. However, because there are a lot of possible ways for things to either get into the dependency tree (which can have significant knock-on effects), this serialization ends up slowing things down. The team cares very deeply about maintaining backwards compatibility whenever possible, and we're probably going to be chasing down and fixing unintended behavior regressions well into next year. Frequently, this causes us to have to do more network requests in even more of a staggered order.

One last consideration to keep in mind is that we're starting to see significant problems with installs failing due to network issues. This is something that happens regardless of the npm version, with only loose correlation to the underlying Node version. It's particularly bad for users on Windows, users with less robust broadband connections or cheap DSL routers, and users of containerization software, and while we haven't isolated the cause with total certainty, right now the finger seems to point at Node and npm's propensity to open many, many network connections at once and hammer out tremendous numbers of requests in short order.

One of the consequences of the small module philosophy is that the amount of work being requested at the network level has grown tremendously over the last couple of years, and it makes the caching process much more brittle. Also, it magnifies the effects of whatever latency may exist between the installing process and whatever registries or git hosts npm may be talking to.

Where we are now

So, with the above in mind, and looking at @samccone's results, I'm left in a bit of a quandary, mostly because I don't see obvious or simple fixes. We do have a number of ideas we're mulling over, and none of them are the sort of thing that we can do as a quick patch (both the existence of and the patch to the deepClone issue were anomalous in this respect). That said, I think it would be helpful to lay them all out here, so that everyone has a better idea of what we have in mind, so y'all can provide feedback or alternative suggestions:

Minimize round trip latency to the registry

According to @seldo, average request latency for simple metadata fetches to the registry average ~500ms at present. This means that even if npm's cache is just doing a freshness check (i.e. it will end up being logged as a 304 and the cached metadata & tarball will be used for an install), there's a 500ms wait for each package involved in the install. @soldair (with some help from @chrisdickinson) is looking into this, and @seldo seems confident that we can cut this number in half. I think that will have a pretty significant impact.

Cache more metadata

One of the trickiest pieces of the install process is that packages need to be unpacked and their package.json manifests checked for bundledDependencies, so the installer can figure out if the bundled versions of the dependencies are up to date (so they can be added to the list of packages to have their metadata fetched). There is alllll kinds of weird stuff inside published packages, and npm ignores this reality at the peril of producing strange, invalid installed trees. We can't handle this at present without fetching the package tarball and examining its contents. I hope it's clear how this imposes a certain amount of serialization on the process of realizing the dependency tree.

Another thing it needs to do is scan the package tarballs for npm-shrinkwrap.json files, because those, if present, will override any dependencies for that package. I could probably write a whole other response as long as this one about the complications presented by shrinkwrap files and how the community uses them, but for the purposes of this discussion, all I need to say is that this has to be pulled out of the tarball in sequence now, which is part of what serializes the process of metadata fetching.

Were we to add a serialized dump of what the bundledDependencies are, as well as the versions present in the cached tarball; and cache any extracted npm-shrinkwrap.json files, it would reduce the number of network requests we'd need to make for tarballs in the metadata fetching phase, as well as the number of tarballs we'd need to unpack. This would only affect users with warm npm caches, though, and so it wouldn't necessarily affect initial installs.

Push more of the tree-building work onto the registry

This one could have a very significant impact, but it's trickier than it looks (for reasons very similar to the complications mentioned above). We've long had hopes that SPDY or HTTP2 could, along with some reworking of the npm cache APIs, greatly reduce the number of round trips necessary to produce a finished install. That said, this implies a much more sophisticated API between the CLI and the registry than currently exists, and as such is going to take a while so that both the CLI and registry teams have the time and resources available to build this out.

Be smarter about how git is used

This is an important piece, and probably too large for this issue, but it's worth keeping in mind that the way that git-hosted dependencies are downloaded and cached is very different from packages hosted on a registry, and is largely done by the Git toolchain itself, over which npm has very little control.

Replace npm's download code

Although it may not look like it, a fair amount of attention has already been given to how npm downloads packages. The networking code is some of the most delicate and important in the CLI, because it has to work in an astonishing array of pretty adverse circumstances. It needs to work under Node all the way back to 0.8, it has to support a wide variety of proxies (and is still barely adequate in many Windows environments, given the lack of built-in NTLM or Kerberos support) and custom SSL configuration for same, deal with lots of fairly dicey corners of the internet (or, as @sebmck has recently noted, with the horrors of hotel or conference wifi). It needs to take advantage of HTTP Keep-Alive whenever possible. It needs to deal with connection resets and timeouts and other failures with some degree of resiliency.

Not to put too fine a point on it, Node's built-in HTTP client code is not nearly as high quality as its server implementation. request has come a long way, and npm has piggybacked on that to its great benefit, but it's still got rough spots. Also, Node-style concurrency when it comes to network access kind of breaks down under the level of traffic something like npm can generate, and simply limiting the number of requests you can originate at once (which @iarna has been experimenting with) isn't sufficient to handle all of these issues.

I think npm has reached a point where it needs its own download manager that is optimized for its peculiar and somewhat extreme needs. I have a really rough cut at a spec for what that thing might look like, but it's early days for that still, and I don't want to understate the difficulty or complexity of that work. This work needs to serve two masters: reducing the frequency with which installs fail due to excessive network traffic, and also ensuring that downloads do happen as quickly as possible once they've started.

What's next?

  • @soldair is going to work on reducing the latency of normal requests to the primary registry network
  • I'm going to keep working on refining the requirements and defining the API for the replacement for the networking layer (and then, probably, @zkat will be responsible for getting that integrated into the CLI, and / or building that package itself)
  • @iarna is going to document the install algorithm in all its complicated glory. I feel that this is an essential tool to have when talking about the performance of the CLI, because it's really easy to suggest what looks like straightforward optimizations that come with disastrously unfortunate complications, due to the details of the install algorithm.
  • The team will experiment with things like metadata caching and reordering when and where various steps in the algorithm things happen, so we can maybe reduce the size and frequency of the CPU stalls shown in the above.
  • The CLI team is probably going to spend the rest of this year getting the npm test suite into better shape, largely for the purposes of getting Travis green and npm tested under Windows CI, but also with an eye towards adding unit tests, increasing meaningful test coverage, and making it easier to use as a diagnostic for performance testing.

How you can help

  • More in-depth performance analysis like this. Visualizations and specific metrics (along with discussions of the use cases and workloads used to generate them) are extremely helpful in guiding the prioritization of performance work.
  • It's likely that @iarna and I have overlooked opportunities to save work within the installer and in the interface between the installer and fetchPackageMetadata (which is where most, if not all, of the network requests in the installer go through). Also, it would be extremely useful to have more expertise in the operation of the installer. This isn't a simple request, because the installer is the most complex piece of npm, but if you wanted to spend some time getting to know it, your ability to make substantive improvements to the performance would increase dramatically.
  • If somebody really wanted to help us out, baking an image of some with testing tools and scripts installed and configured would be amazing. Especially if that were done in such a way that it could be deployed to Windows as well as Linux and OS X, or into a virtualized or containerized setup. It feels like every time we touch the networking code and make the networking code better for one class of users, we degrade it for another. Having a way to meaningfully benchmark and QA changes to the network code would be hugely helpful.

Thanks again, everybody! I apologize for the length of this, but I want to tweet this around and get more eyes on it, and I've had a lot of these thoughts in my head for a while, but not written down in one place. I hope you find the detail more useful than distracting.

@lxe
Copy link
Contributor

lxe commented Nov 13, 2015

I've made a flamegraph of running an uncached, un-shrinkwrapped npm install on hapi from a cpuprofile collected using node-inspector.

I notice a lot of idle (waiting on network/io?), and a huge chunk of time taken by fs. I am unsure what that means.

it makes sense that the network reqs will not be parallelized ideally in an unshrinkwrapped project, but I'm curious to see how much other things like tarball unzipping, url.parse, gc, and other things affect the install.

SVG and cpuprofile source here:
https://gist.github.com/lxe/7b1d178f8df941e5f597

@bengl bengl mentioned this issue Nov 14, 2015
7 tasks
@devongovett
Copy link

I imagine this would require a significant amount of work, but one idea that would significantly improve performance, and reduce the number of requests npm needs to do when installing things, would be to do a lot more work on the server side instead of on the client. For example, when requesting a module, the server could bundle the module, and its entire dependency tree into a single tarball. This way, the npm client would only need to request the modules at the root level, and would get the rest of the tree for free. Going a step further, when requesting multiple modules, the requests could be batched so that only a single tarball of the entire dependency tree (pre-deduped by the server) could be downloaded.

Doing this on the server side instead of on the client has several benefits. It would be faster (since the data is local to the server), and require fewer HTTP requests for one. But for another, it would be much easier to add future optimizations on top of. For example, the npm server could pre-render common install trees for popular modules into static tarballs that could be hosted on a CDN, requiring no work to bundle each time they are installed.

@pthrasher
Copy link

@devongovett The issue with this approach is that modules allow for version ranges. Multiple modules may ask for the same dependency. One might ask for "4.3.1" exactly, and another might ask for ">= 4.0.0" -- In this case, npm.com can't know which to provide. The only other way would be to ask npm.com to fulfill the entire dependency tree for a whole project. Further complicating this, npm tries to do the right thing in the case that you already have some but not all of the required packages installed locally.

@devongovett
Copy link

The client would receive tarballs, each containing a root dependency and all sub-dependencies. Then it would merge the trees. It could receive multiple copies of the same module (different versions, or the same version), due to shared dependencies, but it could dedup them on the client side while merging the trees. While this might seem like a waste, my guess is that it would be faster to download a few extra bytes (most modules aren't that large) than make a million http requests from the client in series, especially if some of the popular modules were pre-rendered and cached on a CDN. For ranges, the server would bundle the latest version matching that range in the tarball (the same as if you ran npm install module in an empty directory, and then tar node_modules).

If batching were supported, the deduping could happen on the server side (npm install all the modules, then tar node_modules). It could be smart and send a list of already installed modules to the server in the request or something too...

This is just an idea at this point. Would require further experimentation to determine actual viability.

@rektide
Copy link

rektide commented Nov 15, 2015

The network usage graphs point to a case where a number of TCP connections are fighting among one another. Rather than fundamentally rework the sending strategy/revise downloading, the new HTTP spec has a number of new techniques to allow for efficient parallel transfers to occur on a single TCP connections. Simply switching to HTTP2 is a clear, very low coupling (no need to restructure how npm works) way to improve things in radical way that ought get a lot of progress, and adding some basic prioritization to that is a simple refinement that should be investigated well before any restructuring work is attempted, to get a baseline of what http2 is capable of doing on a widely-parallel problem.

@othiym23
Copy link
Contributor

"Simply switching to HTTP 2" is not something that can be done right now:

  1. HTTP 2 is not supported by Node's core APIs.
  2. I'm not aware of a pure JS HTTP 2 client that is known to work with versions of Node < 0.12.
  3. npm's registry and content-delivery network will need significant work to support HTTP 2.

HTTP 2 offers a lot of possibilities to improve network performance and download reliability, but it's not a simple, drop-in change, nor does it address the architectural serialization issues I discussed in my response. It does bring with it a whole host of interoperability challenges and opportunities for new sources of fragility and regressions. It's the future, but it's going to take some sustained effort to get there.

@baelter
Copy link

baelter commented Nov 16, 2015

How about npm keeping a local registry to do lookups in and only go to server if there is no match and on npm update?

This is how a lot of other well performant package managers work.

@othiym23
Copy link
Contributor

How about npm keeping a local registry to do lookups in and only go to server if there is no match and on npm update?

That's what the npm cache ($HOME/.npm) is for, and it makes a huge difference to how long installs take once it's populated / warmed. When I talk about reducing the latency of freshness checks, that's intended to speed up fetching packages from the cache.

Because there's a wide variety of registry software in use, and because the semantics of the registry API are ill-defined, the CLI can't make too many assumptions. For instance, it can't assume that a package at a given version is never going to change on the registry side, even though package versions are immutable on npm's own registry. That's why the CLI's caching code makes a request with a cached etag (and if-modified-since) in the headers – to give registries an opportunity to return 304 and let the cache know it has nothing new to download.

One thing that some package managers do to speed things up for local operations is to store basic information about every available package locally. This is what apt, rubygems, and FreeBSD's pkg and ports systems do. It works well up to a point, but it doesn't scale well to hundreds of thousands of packages in an environment where there's the level of activity that there is on npm. Just the metadata is a couple of gigabytes, and it changes continuously. It's possible to follow that stream (mostly) in (mostly) real time, but it's not something you're going to want running on your laptop. As an example: npm search pulls down an incrementally updated JSON blob with metadata for the latest version of every unscoped package. It's huge, and just running a search can cause smaller vms to run out of RAM. Everything about it is naΓ―ve, but writing a fast database engine in pure JavaScript is difficult, so a central local package index would also have to be pretty naΓ―ve.

Finally, it is possible to run a registry mirror locally, and there are several packages (including npm On-Site) that will make it relatively easy to run and maintain a mirror. That said, it's still not a trivial undertaking, and the performance gains are less significant than the architectural changes I discuss above.

@ashaffer
Copy link

For instance, it can't assume that a package at a given version is never going to change on the registry side, even though package versions are immutable on npm's own registry

Would it be possible to special-case this for the npm registry? Or better yet add a header or something that specifies that package versions are immutable?

@baelter
Copy link

baelter commented Nov 16, 2015

... For instance, it can't assume that a package at a given version is never going to change on the registry side

Maybe it should assume that anyways? package managers will have to learn to bump version. This can't be true for git dependencies but for packages in the registry I can't see why not.

@dominykas
Copy link

500ms to download one tarball? At the risk of sounding naive... Would it make sense to stick extracted package.json's (and shrinkwraps et al) on a CDN and be done with it?

@iamstarkov
Copy link

packages in the registry I can't see why not.

because npm consumers can publish already existing version to the registry

@iamstarkov
Copy link

cdn for package.json of all versions of all packages?

@dominykas
Copy link

because npm consumers can publish already existing version to the registry

no they can't, that was disabled over a year ago, wasn't it? You can only unpublish, not overwrite. But even so - stick a new package.json in a CDN and download tarballs later.

@iamstarkov
Copy link

You can only unpublish, not overwrite

isnt it the same from discussed perspective? packages versions are not immutable

@dominykas
Copy link

@iamstarkov that's trivial to solve by having a cached "all versions for this package" file, with narrowed down "all minor versions for this major version for this package" file. On a CDN.

(edit) even an npm view react takes almost 0.5s where I am now - it should be 0.016s tops.

@seldo
Copy link
Contributor

seldo commented Nov 16, 2015

@dominykas @iamstarkov package JSONs are already served from the cache 99% of the time; the 500ms latency is the result primarily of the average size of the data, which these days is pretty big, especially for old packages with lots of versions. There's a lot of low-hanging fruit on this front and so we think we can rapidly bring it down. Look for results on this front before the end of the year.

@paulirish
Copy link
Contributor

@othiym23 thank you mucho for the in depth response and thoughts!

On the possible actions to take…

Minimize round trip latency to the registry

In the above HARs, and other recordings, I havent' seen network latency as a significant factor. Some of the requests in the HAR manage to round-trip in 30ms. So, basically I'm seeing the CDN performance as great and overall NPM registry response time as good enough.

Push more of the tree-building work onto the registry

Yup, I think this could have a pretty big impact, but totally understand that it's not only a large engineering effort, but likely has dramatic impact on resource requirements. Huge potentials here, but I think it isn't worth exploring just yet.

HTTP2

Based on the above results, I do think HTTP2 does give you the biggest bang for your buck.
HTTP2 wouldn't need to be a hard switch, but rather used if the node version supported it and no interesting proxy concerns to grapple with. (Much like how HTTP2 is used on the web; only with browsers that support it).
Easy for me to say, but it does appear that doing a rough spike of HTTP2 support may be fairly quick to try out. It would certainly provide great valuable data on the potential upside.

@glenjamin
Copy link
Contributor

Would HTTP2 be a huge gain vs better keepalives + pipelining? AIUI it would mostly add header compression?

The npm3 network graph looks to me more like a case of work starvation than the network being a bottleneck - the requests seem quite short but are spread throughout the process' run time.

@dominykas
Copy link

@seldo damn it, I knew it... I'm not the only smart person in the world... good luck.

@othiym23
Copy link
Contributor

Would it be possible to special-case this for the npm registry? Or better yet add a header or something that specifies that package versions are immutable?

Doing this would introduce as many complications as it would remove, unfortunately. Also, remember that npm@2 is significantly faster than npm@3 when doing installs, so it's not like latency or bandwidth are the only gating factors here.

The npm3 network graph looks to me more like a case of work starvation than the network being a bottleneck - the requests seem quite short but are spread throughout the process' run time.

Yes. This is why reducing latency, and also caching more metadata could be big wins, and why I'm comfortable waiting until both the CLI and registry teams have the free time necessary to make the registry and CLI use HTTP 2 effectively.

I really can't stress enough that moving to HTTP 2 is a nontrivial task – our CDN, which pretty much makes npm's current scale thinkable, still doesn't do IPv6, and that's a rollout that's been underway for, what, 10 years now? Also, coming up with effective logic for things like figuring out which versions of dependencies to bundle, and how to tell the server what the client already has, is eminently doable but full of gotchas and corner cases of its own.

It's dangerous to optimize the network flow too heavily towards either ends of the connectivity spectrum. Optimize for a fast pipe by bundling everything into big wads, and you're going to make life miserable for people on slow or intermittent connections. Optimize for making the payloads as finely-grained and concurrent as possible (which is more or less what npm@2 does), and you're going to hammer network stacks and firewalls on big connections.

a rough spike of HTTP2 support may be fairly quick to try out. It would certainly provide great valuable data on the potential upside.

People are welcome to give this a shot – most of the network logic is confined to npm-registry-client, and there are various OSS registry stacks, so it's easy enough to play around with, but nothing will land in npm proper unless it's both pure JS and works with Node 0.8 through latest.

@othiym23
Copy link
Contributor

Well put, @wraithan. I don't foresee a model where the CLI yields control over telling the registry what it needs, both because of what you've outlined, and because only a client can know what's already in its cache.

@wraithan
Copy link

@othiym23 I feel the cache problem is easily solved (which probably means I'm not understanding some complexity) by not having the server return all of the tarballs directly, but instead just returning the recommended tree (maybe even with tolerable semver ranges). The CLI could then make the determination which tarballs it still needs and fire requests for them. Theoretically the tree could include hashes so the client wouldn't need to do freshness checks, but then you are bloating the structure and maybe it becomes less worth it again.

Multiple registries still makes this pretty complex though and possibly not worth it.

The problem with having a team of high quality engineers working on the CLI and registry means that those gut instinct optimizations have been at least partially thought out, if not completely. I'll try to keep churning this problem in my head though and see if I can come up with anything.

@fab1an
Copy link

fab1an commented Nov 21, 2015

I don't know about the inner workings of npm, but isn't it possible to do optimistic prefetching of commonly used libraries like lodash?

@vigneshshanmugam
Copy link

@othiym23 Recommend reading this blog while moving to HTTP/2 http://engineering.khanacademy.org/posts/js-packaging-http2.htm

@iamstarkov
Copy link

Is it possible for cli to get package.json and npm-shrinkwrap.json without downloading whole tarball?

As far as I understand there are already some way to do that for README.md to show it on npm package’s page.

@Qix-
Copy link

Qix- commented Nov 26, 2015

@iamstarkov isn't that what's happening already? When you go into ~/.npm/cache, the tarballs and the package.json's are there for all of them, unless they download the tarball and then extract package.json afterwards, which just seems silly.

@iamstarkov
Copy link

@Qix- dunno, thats why im asking

@Qix-
Copy link

Qix- commented Nov 26, 2015

@iamstarkov if not, that seems like it should be the first thing to happen.

Is there a diagram anywhere that outlines exactly what happens during NPM's operations? That would be incredibly useful.

@mprobst
Copy link

mprobst commented Dec 14, 2015

/CC @IgorMinar

@IgorMinar
Copy link

Awesome work, @samccone. This is super cool! Angular 2 is migrating to npm3 in a week or two, so this is a great timing for us actually. Thank you!

@kevinSuttle
Copy link

@kidwm
Copy link

kidwm commented Jan 4, 2016

if decompressing brotli is also fast enough…

@kevinSuttle
Copy link

@halhenke
Copy link
Contributor

...yeah this seems hard. In my naive thinking:

Server: If nobody used semver then it would be a lot easier to construct/store a dependency tree for each package on the server and you could send down all that info in the first request so that the CLI could then sort out the ideal tree and fetch packages as quickly as possible. But semver means that the set of possible trees branches out crazily as you work back through each level...which I imagine means that you more or less have to do the entire ideal tree building algorithm on the server, or do some heuristics in terms of what the ideal tree is and have the client fix things later (which could be pretty slow and in general would make things harder to understand which is itself a problem).

Client: Storing all dependency information on the client sounds good but yeah, i imagine its impractical at this scale and you'd have to update everything from the main registry before any install anyway.

So it seems hard to think of something that

  1. is correct in terms of delivering the optimal tree and not making a best guess
  2. is efficient in terms of not sending down a lot of extraneous stuff
  3. doesnt make the whole install algorithm/process a lot more complex and mysterious

Apart from suggestions like HTTP2 the only thing I can think of off the top of my head is a kind of npm --fast install which might involve pre-calculated trees that are not ideal (might have outdated dependencies, duplication) or an option that pretty much installs things npm 2 style which you can then dedupe later.

Just my (probably misguided) thoughts.

@fab1an
Copy link

fab1an commented Jan 27, 2016

Some questions:

  1. I feel this is unanswered: investigating npm 3 perfΒ #10380 (comment)
    Wouldn't it be easy and ok to add a simple setting that assumes packages to be immutable if they come from https://registry.npmjs.org? It's a good and valid assumption, what are the downsides? It could only break if somebody would redirect the requests to a different host, no?

Or is the problem that the cache cannot be reused in case someone switches registry between installs? I suppose this could be fixed by storing the origin of a tarball in the cache.

  1. If I understand correctly the entire tar-balls need to be downloaded to get shrinkwrap.json and package.json inside them? Why are those files not pre-extracted while publishing on the server, or are they? If they are, a cache like varnish could even keep server them from memory.
  2. The registry must have a top-list of packages and versions that are most often depended upon? The client could optionally download
    /compare that list and keep packages like lodash in cache, maybe even multiple versions of it.

@schmod
Copy link
Contributor

schmod commented Jan 27, 2016

I feel like we've gotten far afield here. This thread was started in relation to a performance regression that appeared between npm@2 and npm@3.

Many of the proposed solutions appear to be targeting bottlenecks that would (in theory) equally affect npm2. For now, I think that we need to focus on profiling, and determining why npm@3 is so much more prone to work starvation.

Also, remember again that any radical changes to NPM's hosting infrastructure (particularly, anything that would break NPM's ability to host almost everything as static files on a CDN) are going to be a non-starter.

@wraithan
Copy link

Also see my replies in November about why server built trees would be much
more difficult than you'd expect.

On Wed, Jan 27, 2016 at 10:07 AM Andrew Schmadel notifications@github.com
wrote:

I feel like we've gotten far afield here. This thread was started in
relation to a performance regression that appeared between npm@2 and
npm@3.

Many of the proposed solutions appear to be targeting bottlenecks that
would (in theory) equally affect npm@2. For now, I think that we need to
focus on profiling, and determining why npm@3 is so much more prone to
work starvation.

Also, remember again that any radical changes to NPM's hosting
infrastructure (particularly, anything that would break NPM's ability to
host almost everything as static files on a CDN) are going to be a
non-starter.

β€”
Reply to this email directly or view it on GitHub
#10380 (comment).

@fab1an
Copy link

fab1an commented Jan 27, 2016

@seldo
Copy link
Contributor

seldo commented Jan 27, 2016

That bug was fixed later that same day: iarna/gauge@a7ab9c9 but is not the focus of this issue.

@halhenke
Copy link
Contributor

I've been doing a bit of work trying to trace the code path followed by a typical install in npm3 and representing it in a MindMap diagram as v rough pseudocode. Will do the same for npm2 and, if it seems useful will post links to them here. If nothing else, by giving people a more concrete idea of whats happening where/when it could hopefully help people (myself included) understand which proposed solutions are viable, which aren't and why things are the way they are atm. πŸ™

@othiym23
Copy link
Contributor

This has been an illuminating and useful discussion, but it reached a point of diminishing returns a while ago, and so I'm going to close and lock it in the hopes that folks will start more focused and actionable discussions about specific aspects of npm's performance, similar to #11283.

For those who have been following this thread, here's the CLI team's current road map:

  1. First up is Rewrite legacy testsΒ #11292, which is part of the team's ongoing effort to get npm's test suite where it needs to be.
  2. After that, the team will be fixing up the test suite to pass on stock Windows using cmd.exe.
  3. Then, get a Windows CI environment stood up so that the team has the assurance it needs that npm is working correctly on Windows.
  4. After that, we turn our attention to the pile of important known issues (labeled big-bug) in npm@3.
  5. Finally, and the part that's relevant to this issue, systematically address the performance of npm@3's installer, with the goal of having performance that meets or exceeds npm@2's for most common use cases.

1-4 are important but unglamorous technical debt to be paid, and the faster the team can get through those, the faster we'll be able to start working on performance in a serious way. Help with any of those (especially anything marked big-bug) would be much appreciated.

Thanks again to @samccone for his detailed analysis, and to everyone for their participation in this discussion.

@othiym23 othiym23 changed the title investigating npm 3 perf investigating npm 3 perf Jan 29, 2016
@npm npm locked and limited conversation to collaborators Jan 29, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests