Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Announcer 'epochs' don't match actual running cadence well #519

Closed
lukebjerring opened this issue Sep 5, 2018 · 13 comments
Closed

Announcer 'epochs' don't match actual running cadence well #519

lukebjerring opened this issue Sep 5, 2018 · 13 comments
Assignees

Comments

@lukebjerring
Copy link
Contributor

Currently, Safari Tech Preview is ~9.5h, Chrome is ~5h, etc. We are seeing a mismatch of SHA selections because of the time of day we're requesting 8h / 4h SHAs.

@lukebjerring
Copy link
Contributor Author

@jugglinmike - can you elaborate on how this is manifesting?
Also, as an FYI to the reader, what would an adjustment to the callers (runners) need, to correct the issue?

@lukebjerring
Copy link
Contributor Author

@foolip said he originally suggested 24 / 8 / 4 / 2 / 1
I'd suggest that 24 / 12 / 6 / 3 / 1.5 would be better, since they are all multiples of 2 (rather than 24 / 8 which is 3).

@Hexcles
Copy link
Member

Hexcles commented Sep 5, 2018

@lukebjerring I can see the suggested alternative has advantage because 12>9.5 and 6>5, but why would the fact that they are all multiples of 2 matter?

@jugglinmike
Copy link
Contributor

The issue is apparent in the most recent results, where we missed alignment (see Safari Technology Preview) despite following the advice of the revision announcer:

screenshot from 2018-09-05 11-12-56

We're capable of running Firefox and Chrome every 6 hours, and we can run Safari Technology Preview (STP) once every 12 hours. This doesn't map directly to the available cadences, so we choose "4 hourly" for Firefox and Chrome and "8 hourly" for STP. We expect to miss some announcements, but the alternative is collecting fewer results and under-utilizing our resources.

It's been a while since I got to play with ASCII, so here it is graphically:

          00:00       06:00       12:00       18:00       00:00
8 hourly    A               C               E               G
4 hourly    A       B       C       D       E       F       G
            ^           ^           ^           ^           ^
         Chrome:A    Chrome:B    Chrome:D    Chrome:E    Chrome:G
         Firefox:A   Firefox:B   Firefox:D   Firefox:E   Firefox:G
           STP:A                   STP:C                   STP:G

One issue is that twice a day, we query the revision announcer at the moment of change. That's clearly a race condition--depending on network timing and clock drift, we could receive a stale revision (either "C" or "D"). This is worse at midnight UTC because STP is managed by a Buildbot "master" running on an independent machine. It's possible that the 00:00 build for Firefox and Chrome received the previous day's revision and that the same build for STP received the current day's revision.

This will be an issue no matter what cadences are available. To avoid it, the results collection project should query a few minutes past the hour. We can track that in a separate issue: web-platform-tests/results-collection#600

That said, we're probably experiencing a much more common problem here. At UTC noontime, the revision announcer is supplying STP with commit "C". That commit was only available on the "4 hourly" track since 08:00, so the previous Firefox and Chrome collections missed it entirely.

@mdittmer asked about this while reviewing the results-collection/revision-announcer integration:

  • Is there a plan to ensure that longer intervals are hit in case these go down at a bad time? (E.g., suppose builders owned by another vendor update weekly and our daily bot goes down Sunday - Tuesday; we may miss a whole week of hash aligned runs.)

I was only considering this concern as a recovery mechanism for a (hopefully) rare failure. My response:

Nope. That seems like a fine idea, but maintaining that state will require a fair amount of planning and new code.

However, it's become clear that this is a regular concern. That certainly changes my priorities, and it also means we could resolve it in a stateless way. The Chrome and Firefox builds could switch to the "8 hourly" epoch just for UTC noontime, intentionally skipping "D" in favor of "C". Doing that would address the alignment issue without any change to the revision announcer.

Alternatively, we could also address it by changing the available epochs as @lukebjerring suggested above. We might want to find more motivation for that solution, though. Fundamentally, I don't think "12/6/3" is any better than "8/4/2"

@foolip
Copy link
Member

foolip commented Sep 6, 2018

Fundamentally, I don't think "12/6/3" is any better than "8/4/2"

We're quite happy with "We're capable of running Firefox and Chrome every 6 hours, and we can run Safari Technology Preview (STP) once every 12 hours. " is the justification here :)

@mdittmer
Copy link
Contributor

mdittmer commented Sep 6, 2018

We're capable of running Firefox and Chrome every 6 hours, and we can run Safari Technology Preview (STP) once every 12 hours. This doesn't map directly to the available cadences, so we choose "4 hourly" for Firefox and Chrome and "8 hourly" for STP.

#522 just landed, so we we should be able to move to 6 and 12 hours for Firefox+Chrome and Safari, respectively. Please note the deprecation warnings that have been added to the 4- and 8-hour cadences.

In the meantime we should probably fix the runners by using cadences that are longer than expected run times. IIUC, Firefox+Chrome should be on an 8-hour cadence because they cannot achieve a 4-hour deadline, and Safari should be on a daily cadence because it cannot achieve an 8-hour deadline. Using cadence-locked runners will only give us alignment if all cadences are multiples of each other and all runners meet their deadlines.

@jugglinmike
Copy link
Contributor

In that case, I would also want to slow the build schedule and scale back the infrastructure (otherwise, we'd end up repeatedly testing the same revision or under-utilizing resources). That's a little extra work, though, so it might not be worth it depending on when the new code will be available in production. Do you know when we'll deploy next?

@mdittmer
Copy link
Contributor

mdittmer commented Sep 6, 2018

Just deployed! https://wpt.fyi/api/revisions/latest

@foolip
Copy link
Member

foolip commented Sep 6, 2018

I would also want to slow the build schedule and scale back the infrastructure (otherwise, we'd end up repeatedly testing the same revision or under-utilizing resources)

I think you've seen this already, but if you have a look at https://staging.wpt.fyi/test-runs you'll see some shas that are tested many times, ba4921d holding the record. That's from the pre-revision-announcer 6 hour cadence I believe, and it happens especially during weekends with no changes to wpt.

It'd be nice to fix this, presumably by doing nothing if the sha hasn't changed since last?

@mdittmer
Copy link
Contributor

mdittmer commented Sep 6, 2018

It'd be nice to fix this, presumably by doing nothing if the sha hasn't changed since last?

Another thing to help with wasting resources that @lukebjerring and I were discussing today was adding the bounds or "bucket" of the associated epoch to the part of the payload that contains the hash and commit time. This will tell clients when the current epoch started and ends, allowing the client to sleep until a change of epoch if it is pinned to a particular epoch, like the buildbot runners are. WDYT of this idea, @jugglinmike? Would you make use of it?

jugglinmike added a commit to web-platform-tests/results-collection that referenced this issue Sep 7, 2018
@jugglinmike
Copy link
Contributor

Just deployed! https://wpt.fyi/api/revisions/latest

Thanks! Incorporated downstream: web-platform-tests/results-collection@964fd46

WDYT of this idea, @jugglinmike? Would you make use of it?

Since clients are aware of the epoch they have selected, they can use the commit_time to determine if the announced revision is "fresh." If I understand the idea of a "bucket" correctly, the feature would simplify the calculation a bit, but it would not enable (or drastically simplify) the behavior @foolip has requested.

A more circumscribed solution would be a query string parameter like ?fresh. When present, the announcer could emit nulls during lulls (not sorry) in WPT.

In either case, though, it doesn't seem like we're meaningfully reducing complexity in the client, so it might not be worth expanding the revision announcer's API surface.

@jugglinmike
Copy link
Contributor

It'd be nice to fix this, presumably by doing nothing if the sha hasn't changed since last?

I've filed an issue against the results collector to track that: web-platform-tests/results-collection#601

@foolip
Copy link
Member

foolip commented Sep 8, 2018

Fixed in #522, thanks @mdittmer!

@foolip foolip closed this as completed Sep 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants