Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Provide standards around integrity between source code and published package #77

Open
shaunwarman opened this issue Dec 10, 2018 · 85 comments
Labels
discussion The decision process is still ongoing vendor Some vendor need to be onboard

Comments

@shaunwarman
Copy link

shaunwarman commented Dec 10, 2018

Right now, it seems that most maintainers may publish their packages from their local environment. There should be a way to verify what is published against the public source code or specific git sha to maintain transparency of what is being published. Not only will this mitigate out of sync issues or accidents, but will provide greater confidence that additions aren't added as they are published (potentially malicious).

Not sure if this is the best place for this, but after reading through other issues and recent resources I thought I better put this down somewhere. And it brings up the discussion of maintainers permissions to not only package registry, but SCM as well.

@ljharb
Copy link
Member

ljharb commented Dec 10, 2018

npm doesn’t require a repo, and any package with a prepublish build process (read: every Babel user) will correctly have the published package not matching any sha in their git repo.

I’m not sure how we’d be able to do any sort of verification in a consistent and automated way.

@mcollina
Copy link
Member

The only way to guarantee this is to compute an hash of all the content of a module (in prepublish) and cryptographically sign it. However, there is no easy way to know:

a. what are the keys that are allowed to sign a given package
b. associate those keys with npm profiles

People have been asking for a similar feature to npm for the last 2-3 years.

@shaunwarman
Copy link
Author

Thanks @ljharb @mcollina and sorry for the late reply.

This article by @skonves gave me a nice reminder of this topic. And it seems like there are tools in his tbv and another in npm-verified that attempt to do similar.

There would need to be a blessed, yet optional process to add some sort of "trusted": true || false or "verified": true || false metadata that npm or other package managers would need to set. Optional because of the common cases of transpiled code (e.g. babel, etc) and because this is an expensive task with trade offs.

@sompylasar
Copy link

My 1 cent out of 2 (don't have time for more): npm doesn't require a repo, but a package verification service on top of npm might require whatever is needed, and package authors that want to conform and get the "verified" badge will try and make sure their package is good. For example there are similar efforts to standardize open source repositories: https://github.com/todogroup/repolinter — there could be tools that help package authors to manage packages efficiently (I personally miss the kind of tool that would automatically set up CI and releases for me reproducibly and reliably).

@ljharb
Copy link
Member

ljharb commented Jan 9, 2019

Indeed, that’s something npm can solve - but i don’t think it’s something node, and thus we, can.

@niftylettuce
Copy link

niftylettuce commented Feb 13, 2019

Chiming in here, we really need something like this... koa-router was just transferred to a relatively unknown user on GitHub and the package name was apparently sold

ZijianHe/koa-router@bd780c9#diff-04c6e90faac2675aa89e2176d2eec7d8R3

Screenshot (in case commit is force deleted):

screen shot 2019-02-13 at 2 38 39 pm

I've version locked koa-router, which is downloaded 135K+ times per week, and subscribed on https://libraries.io/npm/koa-router to get a notification when a new version is published of this package.

HN: https://news.ycombinator.com/item?id=19156707

@ljharb
Copy link
Member

ljharb commented Feb 13, 2019

@niftylettuce i'd suggest reporting that to npm; i doubt their TOS permits the sale of a package name.

@niftylettuce
Copy link

niftylettuce commented Feb 13, 2019

I did report that to NPM, and here was there response...

screen shot 2019-02-13 at 2 47 19 pm

I also reached out to the author of the package koa-router and received a very negative response which I don't wish to share publicly out of respect since I see no malicious version of the package published yet.

@justinmchase
Copy link

@niftylettuce You did the right thing here even if nothing negative comes of it. Its a red flag for sure and others deserve to be alerted by this information. Good job on being alert.

@niftylettuce
Copy link

niftylettuce commented Feb 13, 2019

To anyone reading this, I'm building a tool to automate this nonsense, at least until Node/NPM do something about it. Email me at niftylettuce@gmail.com if you want to get notified once it's up. I'll notify everyone that posted in this thread and/or left reactions as well. It will be free and open-source.

@Enrico204
Copy link

Maybe it's time to think to an NPM alternative.

@iarna
Copy link
Member

iarna commented Feb 14, 2019

What's more, even if npm validated against the git repo at publish time, the user can just force push after publication. Checking this kind of thing does nothing.

If you want to validate at install time, well, I hope you enjoy your multi-hour install times. =p (Seriously: In modern npm or yarn, install time per package is under 10ms—adding a git clone to that mix would massively increase overall run time.)

This kind of action gets you no security whatsoever. It solves no actual problem. It validates nothing.

@freewil
Copy link

freewil commented Feb 14, 2019

@iarna what do you think about a new command for npm, I've been thinking about a npm diff that you would be able to use while upgrading a package to see actual changes. Haven't thought about it super deep, but npm update or npm install mypkg@latest followed by a npm diff or similar.

This would effectively allow you to review the code of new/updated packages, similar to how adding node_modules to version control would allow for deeper code reviews.

@jaredhirsch
Copy link

@niftylettuce Hey, if you're going to work on a solution, it would be way better to start a repo on github, so people can contribute ideas/feedback via issues

@iarna
Copy link
Member

iarna commented Feb 14, 2019

@freewil That seems much more useful, but I am concerned how you scale that out to the thousands of deps in a typical modern deployment. 'cause yeah, the one thing you asked for may have a reasonable diff, but what about the dozen transitive deps that also updated? Still, I think this would be an excellent place to begin experimenting with, to see how it feels (--dry-run --json can get you what an action would have done, in machine readable format).

@niftylettuce
Copy link

@6a68 yes it will be on GitHub, I will post the link here once I have a proof of concept up

@niftylettuce
Copy link

To anyone reading this - please do not harass, email, or contact the original maintainer of the package mentioned in the above discussion. It was never my intention for anyone to harass them. I simply wanted to raise awareness about this issue with NPM and the potential of this becoming a security issue in general. This is not the only package like this.

@niftylettuce
Copy link

Another update on the koa-router issue for anyone subscribed to this thread. The new maintainer @ZijianHe has provided us with an update ZijianHe/koa-router#494 (comment). Hopefully this eases concern and it looks like we have a new contributor in open source land.

For the CLI npm diff command, would it just accept two different versions of a package? npm diff <package> <version-a> <version-b> and diff compare tarball?

basti1302 added a commit to instana/nodejs that referenced this issue Feb 14, 2019
See
nodejs/package-maintenance#77 (comment)

Might be that nothing malicious is done with the package after a
transfer but let's rather be safe than sorry.
@Enrico204
Copy link

Maybe the best "way" to handle the change of a maintainer is that the new maintainer should open its own repository on NPM. They should not allow a "takeover" procedure: instead, the installation may fails with a message like "hi, this library is not maintained anymore by olddev, there is a new version in newdev". At least the developer knows that there is something going on..

@mcollina
Copy link
Member

I think the current way npm does this is by forcing a bump in a major release. That's enough to protect users form a malicious "takeover" in the case of a non-responsive maintainer.

@panva
Copy link
Member

panva commented Feb 14, 2019

I think the current way npm does this is by forcing a bump in a major release. That's enough to protect users form a malicious "takeover" in the case of a non-responsive maintainer.

Is that so? Any way to confirm?

@mcollina
Copy link
Member

I've done this several time, and that's how it works. I looked in https://www.npmjs.com/policies/disputes but there is no mention about that.

@ljharb
Copy link
Member

ljharb commented Feb 14, 2019

They certainly don’t; and can’t, because a legitimate new maintainer should be able to backport fixes as needed anyways.

Any owner can always and forever publish to any previously unused version number, and that’s how it must stay.

@sompylasar
Copy link

@iarna

What's more, even if npm validated against the git repo at publish time, the user can just force push after publication. Checking this kind of thing does nothing.

I assume you assume the evil user force pushes to remove malicious code?

If such force push leads to file changes, git commit ids will change. Npm would record the commit id it verified against, and a standalone checksum of published files. If there's no such commit in the repo, it's a red flag. If running the same checksum at the same commit id files results in a different checksum, it's a red flag.

I believe this verification has to be a background task in npm; if one of the red flags has triggered, this package downloads are paused to prevent further spread until they are resolved. There has to be cache invalidation mechanism that would notify downstream caches to not use the flagged package.

I'm in no way a security expert though, just brainstorming. What do the npm security people that npm aquihired say?

@sompylasar
Copy link

The way I experimented in npm-verified POC is I needed a standardized package build+publish process (I used npm prepare for that). The packages that conform to the process can be marked as verified if the checks pass. The packages that do not — never. Like in Chrome, the sites that do not use HTTPS are marked as "Not Secure" by default, and even more, colored in red if they collect some user input via a form (for npm, this can be translated to using some OS APIs like process, network, and filesystem).

@freewil
Copy link

freewil commented Feb 14, 2019

@niftylettuce

For the CLI npm diff command, would it just accept two different versions of a package? npm diff <package> <version-a> <version-b> and diff compare tarball?

Yeah, I think that would be one version/use-case of the command. The way I'm currently thinking, it'd be nice to also have a no argument version similar to git diff where the current unstaged/uncommitted changes are shown, but that may be more complex than necessary for an initial version as it might require git/version control integration. I'd be hesitant in entangling npm with version control-specific code, but I think there is already a precedent with some commands - one that I'm aware of is npm version, which creates git tags for you.

@freewil
Copy link

freewil commented Feb 14, 2019

@freewil

but that may be more complex than necessary for an initial version as it might require git/version control integration.

One alternative to prevent the need for tracking changes/state would be just to add a --diff flag to npm install that would output a diff. That should be much easier since npm install already mutates package.json (which means it needs to/could be aware of the version change while running) and prevents needing to track changes/state across two commands, as I originally proposed (npm install followed by npm diff).

@mcollina
Copy link
Member

lockfiles are a very good idea for applications, which is the high majority of users. However, they are not really the target of this initiative.

@dominykas
Copy link
Member

@mcollina this is not about lockfiles. I don't think they make sense as part of the package or part of the repo. They do make sense as a snapshot in time to achieve reproducible builds.

Once again - they should not be published on git or npm - only kept around as a convenient metadata format.

@ljharb
Copy link
Member

ljharb commented Feb 15, 2019

@dominykas package-lock is already not published on npm; it'd be fine to keep a lockfile around if it didn't constrain npm install, but the only way to do that would be to rename it and commit it to git.

That might be reasonable, however - a sort of package-release.json that you'd update in-place in git, and would effectively be npm install --package-lock-only --package-lock && mv package-{lock,release}.json right before publishing a release.

@dominykas
Copy link
Member

dominykas commented Feb 15, 2019

@ljharb that is also an option, but it depends on the toolchain. I've literally spent this week setting up semantic-release, because we have locked masters and therefore can't have version commits. This also means that we need to push release meta data outside of git repo. We also have a pattern where we push shrinkwraps into dangling tags (i.e. a shrinkwrap-X.Y.Z for every vX.Y.Z). Basically anything but master (or npm, for various reasons) :)

I wish npm had a way to store this, and some other, release meta data, but not necessarily distribute it with the package.

eysi09 added a commit to garden-io/garden that referenced this issue Feb 19, 2019
@styfle
Copy link
Member

styfle commented Feb 19, 2019

A possible solution to this problem (integrity between source code and published package) is to provide a "builder" service that take the source code, builds it, then publishes the output.

ZEIT Now is probably 80% of the way there because builds are immutable, every build has a unique URL, the source code is captured during the build process, and you don't need a repo to publish. But if you do have a repo, you can connect it to the build system so that pushing to master (or any branch) automatically publishes.

There's a couple changes that would need to be implemented in npm:

  1. The first missing piece is some way to link the published npm package back to the builder source so that npm (and users) can verify integrity (probably add a field in the generated package.json).

  2. The second missing piece is some sort of allow-list that npm uses to trust certain builders (if anyone could host their own builder service, they could circumvent this integrity).

  3. The third piece is an optional flag that users can use to opt-in to this type of integrity check. Something like npm install --builder-integrity that will stop with an error if any package is missing this new builder field in the package.json or the builder field is not in the allow-list.

@Bene-Graham
Copy link

A possible solution to this problem (integrity between source code and published package) is to provide a "builder" service that take the source code, builds it, then publishes the output.

ZEIT Now is probably 80% of the way there because builds are immutable, every build has a unique URL, the source code is captured during the build process, and you don't need a repo to publish. But if you do have a repo, you can connect it to the build system so that pushing to master (or any branch) automatically publishes.

There's a couple changes that would need to be implemented in npm:

1. The first missing piece is some way to link the [published npm package](https://react-tsx.now.sh) back to the [builder source](https://react-tsx.now.sh/_src) so that npm (and users) can verify integrity (probably add a field in the generated `package.json`).

2. The second missing piece is some sort of _allow-list_ that npm uses to trust certain builders (if anyone could host their own builder service, they could circumvent this integrity).

3. The third piece is an optional flag that users can use to opt-in to this type of integrity check. Something like `npm install --builder-integrity` that will stop with an error if any package is missing this new builder field in the `package.json` or the builder field is not in the _allow-list_.

I was initially thinking the same thing.

But what stops someone from having a build task that inject malicious code into the "built package"?


Wouldn't a per module sandbox / permission system to protect against certain actions (Network, Disk Access, etc) help?

@freewil
Copy link

freewil commented Feb 20, 2019

@styfle

A possible solution to this problem (integrity between source code and published package) is to provide a "builder" service that take the source code, builds it, then publishes the output.

I was also thinking this. Basically the idea is to have a trusted third-party do the packaging from source.

Docker Hub might be a good place to look for inspiration. With Docker, it can be an even bigger issue, as docker images are built into binary files and pushed up to a public repository. Although, Dockerfiles/images are generally much more simple.

https://blog.docker.com/2013/11/introducing-trusted-builds/

@sompylasar
Copy link

But what stops someone from having a build task that inject malicious code into the "built package"?

There shouldn't be any user-provided build task code while using one of the trusted builders, only a build pipeline configuration, for example, like https://github.com/pikapkg/pack does.

Wouldn't a per module sandbox / permission system to protect against certain actions (Network, Disk Access, etc) help?

Yes, that, too, may increase confidence, and complexity. That is needed, too, but is orthogonal to this particular discussion about package build-from-source integrity.

@ljharb
Copy link
Member

ljharb commented Feb 20, 2019

The build process itself needs to be able to contain user-written transforms, if it's babel, or a bundler, to support common and valid use cases.

eysi09 added a commit to garden-io/garden that referenced this issue Feb 20, 2019
@sompylasar
Copy link

The build process itself needs to be able to contain user-written transforms, if it's babel, or a bundler, to support common and valid use cases.

It doesn't if all the common and valid use cases like babel and bundler are standardized into trusted builders. I guess there should be either no way, or a hard and impractical way, to tamper with the executable content (e.g. JavaScript code) of a trusted build by providing a malicious build step.

But again, I'm not a security expert. I pinged the Node.js SecurityWG to take a look at this discussion as I think it is rather opitionated, not engineered as a document with clear sets of requirements, tradeoffs and solutions.

@travi
Copy link

travi commented Feb 20, 2019

It doesn't if all the common and valid use cases like babel and bundler are standardized into trusted builders

i think you missed a very important detail in the point you quoted:

contain user-written transforms

part of the valid use of babel is to apply custom transformations beyond those provided by babel itself or even official plugins. these are often included as devDependencies. locking down approved builders that didnt allow the inclusion of those custom transformations would eliminate a significant portion of the value of tools like this, but allowing them does allow for changes that would be difficult to ensure are not malicious.

@sompylasar
Copy link

part of the valid use of babel is to apply custom transformations beyond those provided by babel itself or even official plugins. these are often included as devDependencies. locking down approved builders that didnt allow the inclusion of those custom transformations would eliminate a significant portion of the value of tools like this, but allowing them does allow for changes that would be difficult to ensure are not malicious.

I'd like to see some data about how popular these are, and whether they can be standardized. They can be published as a trusted package that itself does not use them, thus building a chain of trust. I hear you, but there's always a tradeoff between security and convenience.

@ljharb
Copy link
Member

ljharb commented Feb 20, 2019

@sompylasar packages are all already implicitly trusted; the attack vector i thought you're trying to address here is when a trusted package is hijacked, for which the same risks exist for any package, including ones in the build system.

@sompylasar
Copy link

@sompylasar packages are all already implicitly trusted; the attack vector i thought you're trying to address here is when a trusted package is hijacked, for which the same risks exist for any package, including ones in the build system.

Yeah I guess you're right. Well that's what I mentioned above. We all would benefit from a document (like an RFC) with the list of potential attack vectors, and how they are addressed or can be.

@skonves
Copy link

skonves commented Feb 20, 2019

I have actually already put together a draft of a graph of the NPM ecosystem for just that purpose. Each edge and vertex definitely has its own unique set of challenges. For the past few evenings, I have been working on a writeup, but I wouldn't mind opening it up for community feedback/contribution.

Here is the draft of the graph I have so far:
ecosystem draft

I have a draft on Medium right now, but I will move that to a more public place in a heartbeat if it means more eyes on it. Thoughts?

FWIW, my work with TBV/verifynpm and @sompylasar's similar work with npmverified seems reasonably confined to the "publishes to" edge near the top; however, I strongly agree with Ivan that understanding the full picture helps inform any proposed solution (particularly what is in and out of scope).

@JamesMGreene
Copy link
Contributor

JamesMGreene commented Feb 21, 2019

Wouldn't a per module sandbox / permission system to protect against certain actions (Network, Disk Access, etc) help?

I saw an interesting post about this idea from the NPM perspective not too long ago: https://hackernoon.com/npm-package-permissions-an-idea-441a02902d9b

CC @davidgilbertson

@tlhunter
Copy link

FWIW, we built Package Diff to gain visibility into npm package updates by directly examining the package instead of looking at the potentially-misleading source code repository:
https://diff.intrinsic.com

@sompylasar
Copy link

FWIW, I built package diff for the purpose of comparing next prepared version with the latest published one for the review and version bump purposes https://github.com/sompylasar/zmey-gorynych/blob/9029972/bin/cli.js#L512

@justinmchase
Copy link

@tlhunter very nice.

I had one thought, one way to get around it. What if you were using node-pre-gyp which fetches pre-compiled binaries rather than builds them from the source in the package and someone malicious had uploaded a pre-compiled binary from something other than the source that is published, how could we verify that they match?

@tlhunter
Copy link

@tlhunter very nice.

I had one thought, one way to get around it. What if you were using node-pre-gyp which fetches pre-compiled binaries rather than builds them from the source in the package and someone malicious had uploaded a pre-compiled binary from something other than the source that is published, how could we verify that they match?

For the system to be perfect we'd need to throw away the concept of npm install scripts. One could perform an npm install of every package ever made, on a set schedule, and constantly diff the artifacts left behind, but even that wouldn't be perfect. Like, the fetching of pre-compiled binaries might include information about the host machine. The event-stream incident, for example, would only inject malicious code if certain criteria were met.

I'd like to see some sort of package manager service which does the following:

  1. Distributes both a tarball AND is the git repo
    1. The user wouldn't be able to generate tarballs
    2. The server would guarantee correlation between git tags and tarballs
  2. Disable all installation scripts

@justinmchase
Copy link

Yeah, I agree that sounds like the only way. I could imagine npm flagging releases that were published that way and you could then opt to only accept verified dependencies if you wanted.

It would be tricky for pre-compiled binary modules as they could have a large build matrix for platforms and versions but it would be enormously helpful for confidence and security.

@shaunwarman
Copy link
Author

Interesting and related proposal in the golang ecosystem:
https://go.googlesource.com/proposal/+/master/design/25530-notary.md

@vweevers vweevers mentioned this issue Jun 17, 2019
@Eomm Eomm added discussion The decision process is still ongoing vendor Some vendor need to be onboard labels Sep 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion The decision process is still ongoing vendor Some vendor need to be onboard
Projects
None yet
Development

No branches or pull requests