Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix --preserve-symlinks, Enhancement to exploit #10132

Closed
wants to merge 4 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Dec 6, 2016

Checklist
  • make -j8 test (UNIX), or vcbuild test nosign (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

module

Description of change

please refer to this issue for details.

fix --preserve-symlinks memory bloat, add-on crashing, "main.js" symlink preservation. enhance module search paths to interleave module+node_modules adjacent directories

(includes fix-preserve-symlinks)

dependencies can be located like:
/node_modules
  /mod
    /node_modules
      /dep
practically speaking, this prevents package managers from
exploiting symlinks to a machine module store. but by adding
and additional search path:
/node_modules
  /mod
  /mod.node_modules
    /dep
modules can be symlinked to a machine store but still allow
local trees to have dependency versions specific to the tree
change path character for adjacent-node-modules
directory from '.' to '+' to ensure directory
names can never clash with module names
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. module Issues and PRs related to the module subsystem. labels Dec 6, 2016
@ghost ghost changed the title Enh adj nm Fix --preserve-symlinks, Enhancement to exploit Dec 6, 2016
if (p !== nmLen)
paths.push(from.slice(0, last) + '\\node_modules');
if (code === 92/*\*/ || code === 47/*/*/ || code === 58/*:*/
|| (adjacentNodeModules && code === 46/*.*/)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operators go at the end of the previous line. Also, the next line should match the indentation of the start of the first argument on the previous lines.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


if (adjacentNodeModules) {
paths.push(parent + '+node_modules');
if (code === 46/*.*/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Braces are missing for this conditional (even if they are syntactically optional, they're required for the project's style).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if(!preserveSymlinks) return request

request = Module._resolveFilename(request, null)
if(!fs.lstatSync(request).isSymbolicLink()) return request
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a space between if and ( here and all other instances.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

request = Module._resolveFilename(request, null)
if(!fs.lstatSync(request).isSymbolicLink()) return request

let isExecutable = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing semicolon here and all other instances.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

const execExts = (process.env.PATHEXT || '').split(path.delimiter)
const requestExt = path.extname(request).toLowerCase()
isExecutable =
execExts.some(execExt => requestExt === execExt.trim().toLowerCase())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Braces should be included for arrow functions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

const requestExt = path.extname(request).toLowerCase()
isExecutable =
execExts.some(execExt => requestExt === execExt.trim().toLowerCase())
} else try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing, please just include the try-catch inside a normal else {} block.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jasnell jasnell added the semver-major PRs that contain breaking changes and should be released in the next major version. label Dec 6, 2016
@jasnell
Copy link
Member

jasnell commented Dec 6, 2016

Quick comment... can you pull the at-mention to mscdex off the title of the commit to keep him from getting pinged every time there's an action taken on that commit ;-)

@jasnell
Copy link
Member

jasnell commented Dec 6, 2016

Also @phestermcs .. you mentioned test cases over in the other comment thread... this PR would definitely need to include some new test cases to validate the new behavior. Those will also help us to review the functionality here. Can you update the PR to include those tests?

@addaleax
Copy link
Member

addaleax commented Dec 6, 2016

Besides tests, it might be good to see documentation on this feature and/or a full example of how it’s supposed to be used (there’s a good chance you’ll set the latter up as part of a test case anyway – in that case, feel free to ignore me!).

@mscdex
Copy link
Contributor

mscdex commented Dec 6, 2016

@phestermcs Just rebase, editing the message of the commit containing the '@ mention' and then force push to your PR branch once the rebase is complete.

@ghost ghost force-pushed the enh-adj-nm branch from 84b8800 to 51757d3 Compare December 6, 2016 02:07
@isaacs
Copy link
Contributor

isaacs commented Dec 6, 2016

(As a point of process, would it be ok to ask that we hold off on reviewing syntax until after substantive review takes place? "No" is an ok answer, but it does make it a little harder to dig into the meat.)

I'd like to focus on two things first: "adjacent node_modules", and the changes to the main module handling.

// when preserving symlinks, if the main file is both a file symlink AND
// has execute permission (on windows, its extension is in PATHEXT), then
// its target is simply read and resolved, potentially relative to the
// symlink's dirname. in all other cases the original path is preserved.

I don't understand why it only does the special behavior if the main file is executable. This will be very rare on Windows (since it doesn't support shebang files, .JS is not in PATHEXT by default, and it's rare to name a Node program something like program.cmd if it's JavaScript).

It seems like this is perhaps a nod to npm's idiosyncratic use of symlinked bin files? If that's true, though, it'd be better to avoid the special case, and build something that works in general without requiring a special loophole for npm.

It seems extremely surprising to me that the module environment of running node foo.js will be changed after doing chmod 0755 foo.js.

Second:

dependencies can be located like:
/node_modules
  /mod
    /node_modules
      /dep
practically speaking, this prevents package managers from
exploiting symlinks to a machine module store. but by adding
and additional search path:
/node_modules
  /mod
  /mod.node_modules
    /dep
modules can be symlinked to a machine store but still allow
local trees to have dependency versions specific to the tree

I don't think it's true that the existing node_modules folder prevents package managers from exploiting symlinks to a machine module store.

For example, consider the following folder structure:

.store
├── app
│   └── 2.2.2
│       ├── index.js
│       └── node_modules
│           ├── baz -> ../../../baz/1.2.5
│           └── foo -> ../../../foo/1.2.3
├── bar
│   ├── 2.3.4
│   │   └── index.js
│   └── 9.8.7
│       └── index.js
├── baz
│   └── 1.2.5
│       ├── index.js
│       └── node_modules
│           └── bar -> ../../../bar/9.8.7
└── foo
    └── 1.2.3
        ├── index.js
        └── node_modules
            └── bar -> ../../../bar/2.3.4

In this case, you can even have conflicting versions of meta-dependencies in play. Note that the dep graph looks like:

app
├── baz
│   └── bar @ 9.8.7
└── foo
    └── bar @ 1.2.3

We do this naively in npm 2.x and before using nested folders, and somewhat less naively in npm 3 (still using nested folders for conflicts only), but there's still exactly one copy of everything in this module store, and no duplication ever. (This is essentially a very stripped down version of what ied and pnpm are doing, with many edge cases left out for clarity.)

Note that this only works precisely because we resolve packages to their realpath locations. If we wanted to link .store/app/2.2.2/bin.js to /usr/local/bin/app, it'd work just fine but only because we're resolving symbolic links to their realpaths, 100% of the time.

I suggested a few other ways to address the --resolve-symlinks goal of peer dep resolution, but alas it was likely too little and too late. Feel free to pilfer patches from https://github.com/isaacs/node6-module-system-change though it's unlikely they'll land cleanly now.

I'd personally recommend not adding foo+node_modules to the module lookup path. The last thing we need is more complexity there, and I don't really see what it buys us above what nested node_module folders provide.

@zkochan
Copy link

zkochan commented Dec 6, 2016

@isaacs when using a machine store with node_modules inside the store, a package will always have the same dependency resolution per machine. What if 2 packages require the exact same package but have different dependency lockfiles? @phestermcs has opened my eyes in this thread 😄. Here's a good explanation from him:

Having looked at alexanderGugel/ied#188,

Say three months ago I was working on prjA that used modA@v1, that had a dependency on depB. When I installed prjA's dependencies for the first time, depB came in as depB@v1, and I created a lock file, locking depB@v1 for prjA.

A week ago another dev started prjB, that also used modA@v1, but when they installed, depB came in at [email protected]. They created a lock file.

Today I need to work on prjB. Assuming modA@v1 was in my machine store, it would sill be using depB@v1, so now what happens? I can update it so it use [email protected], but then prjA's lock file is no longer being honored.

Can you see the dilemma? adjacent.node_modules fixes all of that so simply... I hope you guys take a look and see the value, then get vocal with nodejs :)

@sam-github
Copy link
Contributor

It seems extremely surprising to me that the module environment of running node foo.js will be changed after doing chmod 0755 foo.js

extremely surprising is an understatement

The current implementation, by always converting "main.js" to its realpath, does this very thing already by 'accident', and is why the global npm executable file-symlink works. It's just by luck the target of the link itself does not usually have symlinks in its path. If it did, the current implementation would fail, but the behavior in this PR would work as expected. There are several tests around this in the test-project I created for this whole effort.

Your explanation does not have any obvious relationship to the comment... what does calling realpath have do with executable mode?

@isaacs
Copy link
Contributor

isaacs commented Dec 6, 2016

@zkochan

@isaacs when using a machine store with node_modules inside the store, a package will always have the same dependency resolution per machine. What if 2 packages require the exact same package but have different dependency lockfiles?

That situation could be addressed using any of the approaches listed here, with a minimal change to the module system or current behavior, and without the hazards introduced by the --resolve-symlinks switch.

@isaacs
Copy link
Contributor

isaacs commented Dec 6, 2016

(As a point of process, would it be ok to ask that we hold off on reviewing syntax until after substantive review takes place? "No" is an ok answer, but it does make it a little harder to dig into the meat.)
I'm not sure what the source of this is? The code style changes @mscdex recommended, I implemented immediately.

I was suggesting that mscdex and others hold off on talking about code style until we're done with the substantive discussion, because it makes it harder to read the code in the online diffs on github.

@phestermcs I'm not complaining about you. Please try your best to not take these things personally. This process takes time and requires diligence and patience, especially when considering a change to the most critical part of node's system. Expect that 90% or more of proposed changes to the module system are rejected because they have some unexpected downsides, and of that 10% that get through, 90% of those turn out to be unfortunate mistakes.

So, without suggesting a solution yet, the issue is that you want to have to projects A and B, which both symlink a dependency foo from a central store. But, they wish to use different versions of foo's dependencies, which are potentially also linked from that central store.

I see how the adjacent node modules folder can do that, but it would also be done without any command line flags or switches, and with less semantic change to the module system semantics, using this approach: https://github.com/isaacs/node6-module-system-change#use-parent-paths-secondary-introspectable-proposal I'm not attached to that proposal, per se, I'm just pointing out that there exist smaller ways to fix the problem being identified. I think that the trade-offs of all of these solutions are not yet adequately explored, and there is thus more work to do.

My concern about the "pitchy" language of the issue and PR description is that it demonstrates a very enthusiastic attachment to a specific solution. Until we have explored a few different solutions to the problem space, I recommend no changes be made to the node module system. It is clearly good enough for millions of people to be successful with it today, and any change that could disrupt the interaction of a quarter of all developers in the world should be very carefully considered. We must tread very carefully, and diligently comb through every change in a disciplined and critical manner.

@sam-github
Copy link
Contributor

You can't even file mode bits on a symlink in Linux, I guess there you are relying on the fact its always reported as executable? Either way, the resolution of __dirname shouldn't depend on the perceived executability of the symlink, it sounds like there is an apple-specific special case being built in.

I'm still -1 on this, and that its so subtle its almost impossible to describe what its purpose is makes me -more-than-1, node's module resolution is sufficiently complex enough, I'm not eager for more complex debugging and support.

@sam-github
Copy link
Contributor

you most certainly can get the mode bits from a file symlink

Yes, you can get those bits.

And they are always the same for every single symlink, on almost all systems, making them a poor input to conditional behaviour in the module system.

On Linux, they are always set, on Windows they are always clear, and on OS X they can be set with lchmod() (which doesn't exist on most Unixes).

@sam-github
Copy link
Contributor

Then you aren't looking at the permissions of the symlink, you are looking at the permissions of the file it is pointing to.

@jasnell
Copy link
Member

jasnell commented Dec 6, 2016

@isaacs @phestermcs and others... I have opened nodejs/node-eps#46 in the nodejs/node-eps repository as a home for discussing the larger architectural issue here. The EPS repository is the right place to discuss these larger issues. Let's leave this PR thread to discuss the code in this particular PR only. The goal here is to keep the discussion organized and productive and those of us who want to spend some time with our hands in the code to understand the concrete changes being proposed will have a more difficult time doing so if the bulk of the discussion happens here.

@sam-github
Copy link
Contributor

> fs.writeFileSync('a-file', 'hi')
undefined
> fs.symlinkSync('a-file', 'a-link')
undefined
> fs.lstatSync('a-link').mode
41471
> fs.lstatSync('a-file').mode
33204
> fs.unlinkSync('a-file')
undefined
> fs.lstatSync('a-link').mode
41471

All mode bits are set for symlinks on Linux, and cannot be changed.

@ghost
Copy link
Author

ghost commented Dec 6, 2016

@sam-github I am in error.

@sam-github
Copy link
Contributor

I'm not interested in personalities or fault. This minor detail in caught my eye, but doesn't change my overall impression, which I've stated previously, multiple times.

@sam-github
Copy link
Contributor

Do you have a github issue ref for that issue? Or is it an issue only in this PR, which I am -1 on, so don't have any suggestions for fixing?

@ghost ghost force-pushed the enh-adj-nm branch 2 times, most recently from ec1e088 to 21fb86e Compare December 6, 2016 20:57
remove check of execute bit for file-symlink as
it's always set. if file-symlink, just follow its target
@ghost ghost force-pushed the enh-adj-nm branch from 21fb86e to 3b7fca6 Compare December 8, 2016 21:29
@ghost
Copy link
Author

ghost commented Dec 11, 2016

I dont belong here :) @isaacs has got things covered

@ghost ghost closed this Dec 11, 2016
@ghost ghost deleted the enh-adj-nm branch December 11, 2016 00:57
@ghost
Copy link
Author

ghost commented Dec 11, 2016

ftr. i removed several comments as they were irrelevant to the issue and inappropriate.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. module Issues and PRs related to the module subsystem. semver-major PRs that contain breaking changes and should be released in the next major version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants