Announcer 'epochs' don't match actual running cadence well #519

lukebjerring · 2018-09-05T14:44:07Z

Currently, Safari Tech Preview is ~9.5h, Chrome is ~5h, etc. We are seeing a mismatch of SHA selections because of the time of day we're requesting 8h / 4h SHAs.

lukebjerring · 2018-09-05T14:45:05Z

@jugglinmike - can you elaborate on how this is manifesting?
Also, as an FYI to the reader, what would an adjustment to the callers (runners) need, to correct the issue?

lukebjerring · 2018-09-05T14:46:18Z

@foolip said he originally suggested 24 / 8 / 4 / 2 / 1
I'd suggest that 24 / 12 / 6 / 3 / 1.5 would be better, since they are all multiples of 2 (rather than 24 / 8 which is 3).

Hexcles · 2018-09-05T14:49:01Z

@lukebjerring I can see the suggested alternative has advantage because 12>9.5 and 6>5, but why would the fact that they are all multiples of 2 matter?

jugglinmike · 2018-09-05T17:48:40Z

The issue is apparent in the most recent results, where we missed alignment (see Safari Technology Preview) despite following the advice of the revision announcer:

We're capable of running Firefox and Chrome every 6 hours, and we can run Safari Technology Preview (STP) once every 12 hours. This doesn't map directly to the available cadences, so we choose "4 hourly" for Firefox and Chrome and "8 hourly" for STP. We expect to miss some announcements, but the alternative is collecting fewer results and under-utilizing our resources.

It's been a while since I got to play with ASCII, so here it is graphically:

          00:00       06:00       12:00       18:00       00:00
8 hourly    A               C               E               G
4 hourly    A       B       C       D       E       F       G
            ^           ^           ^           ^           ^
         Chrome:A    Chrome:B    Chrome:D    Chrome:E    Chrome:G
         Firefox:A   Firefox:B   Firefox:D   Firefox:E   Firefox:G
           STP:A                   STP:C                   STP:G

One issue is that twice a day, we query the revision announcer at the moment of change. That's clearly a race condition--depending on network timing and clock drift, we could receive a stale revision (either "C" or "D"). This is worse at midnight UTC because STP is managed by a Buildbot "master" running on an independent machine. It's possible that the 00:00 build for Firefox and Chrome received the previous day's revision and that the same build for STP received the current day's revision.

This will be an issue no matter what cadences are available. To avoid it, the results collection project should query a few minutes past the hour. We can track that in a separate issue: web-platform-tests/results-collection#600

That said, we're probably experiencing a much more common problem here. At UTC noontime, the revision announcer is supplying STP with commit "C". That commit was only available on the "4 hourly" track since 08:00, so the previous Firefox and Chrome collections missed it entirely.

@mdittmer asked about this while reviewing the results-collection/revision-announcer integration:

Is there a plan to ensure that longer intervals are hit in case these go down at a bad time? (E.g., suppose builders owned by another vendor update weekly and our daily bot goes down Sunday - Tuesday; we may miss a whole week of hash aligned runs.)

I was only considering this concern as a recovery mechanism for a (hopefully) rare failure. My response:

Nope. That seems like a fine idea, but maintaining that state will require a fair amount of planning and new code.

However, it's become clear that this is a regular concern. That certainly changes my priorities, and it also means we could resolve it in a stateless way. The Chrome and Firefox builds could switch to the "8 hourly" epoch just for UTC noontime, intentionally skipping "D" in favor of "C". Doing that would address the alignment issue without any change to the revision announcer.

Alternatively, we could also address it by changing the available epochs as @lukebjerring suggested above. We might want to find more motivation for that solution, though. Fundamentally, I don't think "12/6/3" is any better than "8/4/2"

foolip · 2018-09-06T14:20:07Z

Fundamentally, I don't think "12/6/3" is any better than "8/4/2"

We're quite happy with "We're capable of running Firefox and Chrome every 6 hours, and we can run Safari Technology Preview (STP) once every 12 hours. " is the justification here :)

mdittmer · 2018-09-06T15:38:35Z

We're capable of running Firefox and Chrome every 6 hours, and we can run Safari Technology Preview (STP) once every 12 hours. This doesn't map directly to the available cadences, so we choose "4 hourly" for Firefox and Chrome and "8 hourly" for STP.

#522 just landed, so we we should be able to move to 6 and 12 hours for Firefox+Chrome and Safari, respectively. Please note the deprecation warnings that have been added to the 4- and 8-hour cadences.

In the meantime we should probably fix the runners by using cadences that are longer than expected run times. IIUC, Firefox+Chrome should be on an 8-hour cadence because they cannot achieve a 4-hour deadline, and Safari should be on a daily cadence because it cannot achieve an 8-hour deadline. Using cadence-locked runners will only give us alignment if all cadences are multiples of each other and all runners meet their deadlines.

jugglinmike · 2018-09-06T16:06:38Z

In that case, I would also want to slow the build schedule and scale back the infrastructure (otherwise, we'd end up repeatedly testing the same revision or under-utilizing resources). That's a little extra work, though, so it might not be worth it depending on when the new code will be available in production. Do you know when we'll deploy next?

mdittmer · 2018-09-06T18:48:44Z

Just deployed! https://wpt.fyi/api/revisions/latest

foolip · 2018-09-06T20:38:38Z

I would also want to slow the build schedule and scale back the infrastructure (otherwise, we'd end up repeatedly testing the same revision or under-utilizing resources)

I think you've seen this already, but if you have a look at https://staging.wpt.fyi/test-runs you'll see some shas that are tested many times, ba4921d holding the record. That's from the pre-revision-announcer 6 hour cadence I believe, and it happens especially during weekends with no changes to wpt.

It'd be nice to fix this, presumably by doing nothing if the sha hasn't changed since last?

mdittmer · 2018-09-06T21:34:22Z

It'd be nice to fix this, presumably by doing nothing if the sha hasn't changed since last?

Another thing to help with wasting resources that @lukebjerring and I were discussing today was adding the bounds or "bucket" of the associated epoch to the part of the payload that contains the hash and commit time. This will tell clients when the current epoch started and ends, allowing the client to sleep until a change of epoch if it is pinned to a particular epoch, like the buildbot runners are. WDYT of this idea, @jugglinmike? Would you make use of it?

See web-platform-tests/wpt.fyi#519

jugglinmike · 2018-09-07T01:09:16Z

Just deployed! https://wpt.fyi/api/revisions/latest

Thanks! Incorporated downstream: web-platform-tests/results-collection@964fd46

WDYT of this idea, @jugglinmike? Would you make use of it?

Since clients are aware of the epoch they have selected, they can use the commit_time to determine if the announced revision is "fresh." If I understand the idea of a "bucket" correctly, the feature would simplify the calculation a bit, but it would not enable (or drastically simplify) the behavior @foolip has requested.

A more circumscribed solution would be a query string parameter like ?fresh. When present, the announcer could emit nulls during lulls (not sorry) in WPT.

In either case, though, it doesn't seem like we're meaningfully reducing complexity in the client, so it might not be worth expanding the revision announcer's API surface.

jugglinmike · 2018-09-07T01:21:50Z

It'd be nice to fix this, presumably by doing nothing if the sha hasn't changed since last?

I've filed an issue against the results collector to track that: web-platform-tests/results-collection#601

foolip · 2018-09-08T18:21:08Z

Fixed in #522, thanks @mdittmer!

lukebjerring assigned mdittmer Sep 5, 2018

jugglinmike added a commit to web-platform-tests/results-collection that referenced this issue Sep 7, 2018

Switch to new revision announcer "epochs"

964fd46

See web-platform-tests/wpt.fyi#519

jugglinmike mentioned this issue Sep 7, 2018

Avoid collecting from the same browser/revision pair multiple times web-platform-tests/results-collection#601

Open

foolip closed this as completed Sep 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Announcer 'epochs' don't match actual running cadence well #519

Announcer 'epochs' don't match actual running cadence well #519

lukebjerring commented Sep 5, 2018

lukebjerring commented Sep 5, 2018

lukebjerring commented Sep 5, 2018

Hexcles commented Sep 5, 2018

jugglinmike commented Sep 5, 2018

foolip commented Sep 6, 2018

mdittmer commented Sep 6, 2018

jugglinmike commented Sep 6, 2018

mdittmer commented Sep 6, 2018

foolip commented Sep 6, 2018

mdittmer commented Sep 6, 2018

jugglinmike commented Sep 7, 2018

jugglinmike commented Sep 7, 2018

foolip commented Sep 8, 2018

Announcer 'epochs' don't match actual running cadence well #519

Announcer 'epochs' don't match actual running cadence well #519

Comments

lukebjerring commented Sep 5, 2018

lukebjerring commented Sep 5, 2018

lukebjerring commented Sep 5, 2018

Hexcles commented Sep 5, 2018

jugglinmike commented Sep 5, 2018

foolip commented Sep 6, 2018

mdittmer commented Sep 6, 2018

jugglinmike commented Sep 6, 2018

mdittmer commented Sep 6, 2018

foolip commented Sep 6, 2018

mdittmer commented Sep 6, 2018

jugglinmike commented Sep 7, 2018

jugglinmike commented Sep 7, 2018

foolip commented Sep 8, 2018