Speed up tasks by only running on changed files #50

sindresorhus · 2013-01-27T18:40:56Z

I don't know the best way to implement it, but from being involved in grunt early on, we've gotten that request a lot.

It makes sense too. It would be way faster to only execute on a changed file instead of everything.

Let's say you have a sass and coffescript task. You run Automaton, and it takes some time. You then change one of the CoffeScript files and run Automaton again. I now recompiles all Sass and CoffeScript files for this little change. This is extremely inefficient.

What it however should have done, somehow, is to only recompile the changed CoffeScript file.

This is a fairly common pattern, so it shouldn't require a lot of boilerplate in the task.

Related grunt issue: gruntjs/grunt#212

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

marcooliveira · 2013-01-28T23:24:19Z

I see what you mean. This shouldn't be that hard to achieve.

From the top of my head, and as long as files can be processed without mixed side effects (like multiple files getting merged into a single file, which would become invalid if a single file has changed), we could leverage what is being discussed in #46.

When specifying a list of files through an expression, while it was being expanded, we would store each last modification timestamp in a meta file, which would then be used to filter based on the last expansion. Enabling this sort of filtering could be done with something like src/**/*.js,!*.min.js,!stale.

I should also note that I'm not all that crazy with the sort of static exclusion rules I'm suggesting. I think it's fine to provide a few useful ones, but ultimately this should be as simple as declaring function (v) { if (v /* is an acceptable value*/) return true; else return false; }, and pass it as some sort of exclusion rule to the minimatcher, but this is a discussion for #46.

With all that said, I like your suggestion, and I think a lot of people will find it useful. 👍

marcooliveira · 2013-01-31T03:32:28Z

Well, me and @satazor had a long talk today about several things in automaton, and we came up with a possible strategy for doing this, and much more.

In fact, when talking about only running something on files that haven't changed, what we're actually doing, is saying that we only want the task to run on a set of files that match a set of criteria. What this criteria is, can be very heterogeneous, depending on the use case, so we needed a way to give the user that flexibility.

This is where micromatch comes into play. We thought about creating a superset of minimatch, with enhanced exclusion features. Here's an example, consider the expression *.js,!stale(id:minify). Here's a breakdown of what it means (I'll skip the *.js):

!stale refers to excluding files using the stale filter (the stale filter, being the non-changed files filter).
filters are actually functions that receive the list of files, and do whatever they need inside to check if the file should be included or not.
filters can receive a set of options to modify its behaviour, in this case the id: minify, which will be used to identify the expansion (and respective metadata file with timestamps), so that future expansions, with the same id, have a way to check if the file changed since last run. The id allows the filter to separate, for example, an expansion for minification purposes, and another for uglification. As long as the user provides different ids, they won't affect each other.
micromatch will come built-in with a few typical filters, like stale.
micromatch allows the user to create his own filters, by registering a new filter, from the filter name and a closure:

micromatch.addFilter('my_filter', function (opt) {
    // options are the attributes between parenthesis in the expression
    // (in the example above, this would receive a { id: 'minify' })

    return function (files, next) {
        // remove files using whatever criteria
        next(null, filtered_files);
    }
});

The example above would allow me to do something like *.js,!my_filter

As for distributing these filters in the community, which we can support later, we can leverage the npm registry for that. The strategy we discussed, involved the following:

filters are automatically fetched from npm, using a prefix, something like micromatch-filter-.
in case the user wants to use a specific version of the filter, he can just do *.js,!some_filter:0.2.x, which would fetch micromatch-filter-some_filter, the latest 0.2.x version.
micromatch would not load these filters into node_modules, instead using an internal cache, divided in the following manner: .cache/filter_name/version/actual_module.

Btw, sorry about the long post, just wanted to be thorough.

Tell me what you think! Keep in mind that even though it might look a complex solution at first glance, if you take away the internals of how everything works, it is pretty simple for the user to use these features.

CC: @satazor @sindresorhus

satazor · 2013-02-02T21:03:06Z

Bellow I will describe another solution to the problem. In tasks that have a files option, we decided that keys are sources and values the destinations. Grunt is the opposite which we feel counter-intuitive. The solution I will present needs to have files the way grunt did it.

var lib = require('automaton-lib');

//... then, when calling a task
options: {
  files: {
      'tmp/src/': 'src/**/*',
      'tmp/lib': ['lib/**/*', lib.file.excludeStale('minify') ]
  }
}

lib.file.excludeStale('minify') would return a filter function that would filter files matched previously, just like the solution presented in @marcooliveira proposal:

lib.file.excludeStale = function (id) {
  return function (files, next) {
    // Implementation should return only changed files for the `id` group/identifier. 
    next(null, filtered_files);
  };
}

Advantages of this approach:

Easier to implement
Easier to understand (at least for me, in a JavaScript dev point of view)
No problems with versions, since require is used to require whatever library has the filter
No problems with conflicts on filter names, since require is used to require whatever library has the filter
More similarities with grunt regarding the files option, making easier for users to understand

Disadvantages:

No direct way to pass values to filters as placeholders (need to discuss a solution for this)

Also note that ! could be supported internally to exclude files by pattern, and would not be necessary to be a filter:

options: {
  files: {
      'tmp/src/': ['src/**/*', '!test/']
  }
}

Feedback on both proposals needed!

millermedeiros · 2013-02-02T21:19:53Z

I would try to implement it as a separate lib, this feature is useful outside automaton and not all tasks should be able to support or need this feature.

marcooliveira · 2013-02-02T23:13:02Z

I also like that solution, and think both solutions have its advantages and disadvantages.

There is something though, that should also be pointed out as an advantage for the first solution, which is its ability to specify a rich pattern that consists both of inclusion and exclusion statements right in the string (big advantage in the sense that it can be used anywhere a string is allowed, ie. object keys). This renders micromatch as a potential drop-in replacement for minimatch, which would bring additional value to any tool that opts for the replace.

As for the version issues and name conflict are non-issues, I think, since we have a solution for it, just requires implementation.

I guess it mostly comes down to wether the complexity of implementation is worth the gain, as both strategies have disadvantages in terms of ease of use/flexibility.

I agree this should be built into a separate lib, which is why we are considering creating it under the name micromatch.

Anyone else have any thoughts on the strategies?

sindresorhus · 2013-02-05T19:21:13Z

I do see the benefits with the first solution, but I think globbing patterns are far too confusing as they are, no need to make them even more so. So I would prefer the second one.

(The ! character is already a the negating pattern for excluding files)

Also, please don't do any magic with fetching filters from NPM and having version numbers in the pattern. This should be explicit. Like: micromatch.addFilter('my_filter', require('my-filter')); and with the dep added to package.json manually.

How about a third solution, using underscore templates instead. That way you can get values.

micromatch.addFilter('my_filter', function (opt) {
    // options are the attributes between parenthesis in the expression
    // (in the example above, this would receive a { id: 'minify' })

    return function (files, next) {
        // remove files using whatever criteria
        next(null, filtered_files);
    }
});


options: {
  files: {
      'tmp/src/': 'src/**/*',
      'tmp/lib': ['lib/**/*', '<% filters.my_filter(val) %>']
  }
}

satazor · 2013-02-17T11:24:26Z

@marcooliveira We need to come up with a final decision on this. I'll need to work on micromatch in order to solve #46 and #54. These need to be done to release 0.2.0.

satazor · 2013-02-17T11:26:53Z

Please note that for 0.2.0 we won't add filters yet. So, micromatch will only offer inclusion and exclusion. Having this in mind, micromatch could be a tiny wrapper around https://github.com/mklabs/node-fileset.

satazor · 2013-02-23T16:43:16Z

I've implemented this in: https://github.com/IndigoUnited/node-gloth

Feedback welcome.
This will allow hooks to filter/include. This will allow to develop a hook to filter on changed files as suggested by @sindresorhus and many others.

We now need to decide if we want to be able to specify hooks also has strings like:

'some_hook!params_to_pass_to_hook'

or to leave them as functions only.

marcooliveira mentioned this issue Jan 31, 2013

Improve exclusion of files in minimatch #46

Open

satazor mentioned this issue Feb 12, 2013

Files option standardization #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up tasks by only running on changed files #50

Speed up tasks by only running on changed files #50

sindresorhus commented Jan 27, 2013 •

edited by satazor

Loading

marcooliveira commented Jan 28, 2013

marcooliveira commented Jan 31, 2013

satazor commented Feb 2, 2013

millermedeiros commented Feb 2, 2013

marcooliveira commented Feb 2, 2013

sindresorhus commented Feb 5, 2013

satazor commented Feb 17, 2013

satazor commented Feb 17, 2013

satazor commented Feb 23, 2013

Speed up tasks by only running on changed files #50

Speed up tasks by only running on changed files #50

Comments

sindresorhus commented Jan 27, 2013 • edited by satazor Loading

marcooliveira commented Jan 28, 2013

marcooliveira commented Jan 31, 2013

satazor commented Feb 2, 2013

millermedeiros commented Feb 2, 2013

marcooliveira commented Feb 2, 2013

sindresorhus commented Feb 5, 2013

satazor commented Feb 17, 2013

satazor commented Feb 17, 2013

satazor commented Feb 23, 2013

sindresorhus commented Jan 27, 2013 •

edited by satazor

Loading