-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: measure state lock #14874
base: master
Are you sure you want to change the base?
tests: measure state lock #14874
Conversation
0d1c64f
to
f46e537
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question
overlord/state/state.go
Outdated
@@ -139,6 +142,7 @@ func (s *State) Modified() bool { | |||
|
|||
// Lock acquires the state lock. | |||
func (s *State) Lock() { | |||
s.lockStart = osutil.GetLockStart() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be done with the lock taken? otherwise just the tentative themselves to lock will race/dirty lockStart
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, I'll update this
These are the times after the fix proposed by @pedronis I see it locks about 820ms when the snapd daemon is started (this happens in all the tests) |
overlord/state/state.go
Outdated
@@ -159,6 +163,7 @@ func (s *State) writing() { | |||
func (s *State) unlock() { | |||
atomic.AddInt32(&s.muC, -1) | |||
s.mu.Unlock() | |||
osutil.MaybeSaveLockTime(s.lockStart) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this only needs to happen with the lock taken I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be ok to do with the state lock released if we flock the log file. With the state lock held, there would likely be no contention for flock on the log file. OTOH with the lock released, we'd rely on flock to serialize writes, but this would not keep the state lock unnecessarily held.
FWIW, unlocking may reschedule, so assuming we use flock for serializing writes, the timing may end up being incorrect given that the time is captured inside MaybeSaveLockTime(). The end time will likely need to be captured like so:
lockEnd := time.Now()
s.mu.Unlock()
osutil.MaybeSaveLockTime(s.lockStart, lockEnd)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note, I missed that s.lockStart is used after releasing the lock, it should ofc be read into a variable before the lock is released
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
udpated
@pedronis This is the result of the main suite 145 tests (using a machine with 2 cores and threshold of 800ms), I see some worst times doing: should we specially test those cases? |
osutil/statelock.go
Outdated
defer lockfile.Close() | ||
|
||
pc := make([]uintptr, 10) | ||
n := runtime.Callers(0, pc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skip should likely be set to 2, to skip traceCallers
and MaybeSaveLockTime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
osutil/statelock.go
Outdated
for i := 0; i < n; i++ { | ||
f := runtime.FuncForPC(pc[i]) | ||
file, line := f.FileLine(pc[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per docs of https://pkg.go.dev/runtime?utm_source=godoc#Callers, this should use runtime.CallersFrames()
// MaybeSaveLockTime allows to save lock times when this overpass the threshold | ||
// defined by through the SNAPD_STATE_LOCK_THRESHOLD_MS environment settings. | ||
func MaybeSaveLockTime(lockStart int64) { | ||
lockEnd := time.Now().UnixNano() / int64(time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can prob be done after both getenvs
overlord/state/state.go
Outdated
@@ -159,6 +163,7 @@ func (s *State) writing() { | |||
func (s *State) unlock() { | |||
atomic.AddInt32(&s.muC, -1) | |||
s.mu.Unlock() | |||
osutil.MaybeSaveLockTime(s.lockStart) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be ok to do with the state lock released if we flock the log file. With the state lock held, there would likely be no contention for flock on the log file. OTOH with the lock released, we'd rely on flock to serialize writes, but this would not keep the state lock unnecessarily held.
FWIW, unlocking may reschedule, so assuming we use flock for serializing writes, the timing may end up being incorrect given that the time is captured inside MaybeSaveLockTime(). The end time will likely need to be captured like so:
lockEnd := time.Now()
s.mu.Unlock()
osutil.MaybeSaveLockTime(s.lockStart, lockEnd)
osutil/statelock.go
Outdated
fmt.Fprintf(os.Stderr, "could not retrieve log file, SNAPD_STATE_LOCK_FILE env var required") | ||
} | ||
|
||
lockfile, err := os.OpenFile(lockfilePath, os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0600) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could flock the file after opening it, so that we serialize the writes but the state lock could be unlocked earlier
osutil/statelock.go
Outdated
f := runtime.FuncForPC(pc[i]) | ||
file, line := f.FileLine(pc[i]) | ||
formattedLine = fmt.Sprintf("%s:%d %s\n", file, line, f.Name()) | ||
if _, err = lockfile.WriteString(formattedLine); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could maybe use runtime.Profile
to achieve the same thing but do it in more standard way.
) | ||
|
||
func traceCallers(description string) { | ||
lockfilePath := os.Getenv("SNAPD_STATE_LOCK_FILE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd avoid using lock*
in the name, to not confuse future self or other readers that this is not a lock file, but a file with log of state lock times (and their context). So maybe logFilePath.
if threshold <= 0 { | ||
return | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"SNAPD_STATE_LOCK_FILE" could be tested here
@@ -111,6 +112,8 @@ type State struct { | |||
// task/changes observing | |||
taskHandlers map[int]func(t *Task, old, new Status) (remove bool) | |||
changeHandlers map[int]func(chg *Change, old, new Status) | |||
|
|||
lockStart int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lockStart int64 | |
lockWait int64 | |
lockStart int64 |
overlord/state/state.go
Outdated
s.mu.Lock() | ||
atomic.AddInt32(&s.muC, 1) | ||
s.lockStart = osutil.GetLockStart() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s.mu.Lock() | |
atomic.AddInt32(&s.muC, 1) | |
s.lockStart = osutil.GetLockStart() | |
lockWait := osutil.LockTimestamp() | |
s.mu.Lock() | |
atomic.AddInt32(&s.muC, 1) | |
s.lockStart = osutil.LockTimestamp() | |
s.lockWait = lockWait |
osutil/statelock.go
Outdated
|
||
// MaybeSaveLockTime allows to save lock times when this overpass the threshold | ||
// defined by through the SNAPD_STATE_LOCK_THRESHOLD_MS environment settings. | ||
func MaybeSaveLockTime(lockStart int64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func MaybeSaveLockTime(lockStart int64) { | |
func MaybeSaveLockTime(lockWait, lockStart, lockEnd int64) { |
overlord/state/state.go
Outdated
s.mu.Unlock() | ||
osutil.MaybeSaveLockTime(s.lockStart) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s.mu.Unlock() | |
osutil.MaybeSaveLockTime(s.lockStart) | |
lockWait, lockStart := s.lockWait, s.lockStart | |
s.lockWait, s.lockStart = 0, 0 | |
s.mu.Unlock() | |
osutil.MaybeSaveLockTime(lockWait, lockStart) |
overlord/state/state.go
Outdated
@@ -159,6 +163,7 @@ func (s *State) writing() { | |||
func (s *State) unlock() { | |||
atomic.AddInt32(&s.muC, -1) | |||
s.mu.Unlock() | |||
osutil.MaybeSaveLockTime(s.lockStart) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note, I missed that s.lockStart is used after releasing the lock, it should ofc be read into a variable before the lock is released
osutil/statelock.go
Outdated
elapsedMilliseconds := lockEnd - lockStart | ||
if elapsedMilliseconds > threshold { | ||
formattedLine := fmt.Sprintf("Elapsed Time: %d milliseconds", elapsedMilliseconds) | ||
traceCallers(formattedLine) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elapsedMilliseconds := lockEnd - lockStart | |
if elapsedMilliseconds > threshold { | |
formattedLine := fmt.Sprintf("Elapsed Time: %d milliseconds", elapsedMilliseconds) | |
traceCallers(formattedLine) | |
} | |
heldMs := lockEnd - lockStart | |
waitedMs := lockStart - lockWait | |
if heldMs > threshold || waitedMs > threshold { | |
formattedLine := fmt.Sprintf("waited %d ms held", waitedMs, heldMs) | |
traceCallers(formattedLine) | |
} |
This will prevent raises.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14874 +/- ##
==========================================
+ Coverage 78.20% 78.22% +0.01%
==========================================
Files 1151 1154 +3
Lines 151396 152804 +1408
==========================================
+ Hits 118402 119528 +1126
- Misses 25662 25895 +233
- Partials 7332 7381 +49
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
This change allows to collect the time the snapd state file is locked. To collect the results it is required to include the test snapd-state-lock and make sure just 1 worker is used.
SPREAD_SNAPD_STATE_LOCK_THRESHOLD_MS=15 SPREAD_USE_SNAPD_SNAP_URL= SPREAD_USE_PREBUILT_SNAPD_SNAP=false ./run-spread -artifacts .artifacts google:ubuntu-22.04-64:tests/main/snapd-state-lock google:ubuntu-22.04-64:tests/smoke/
this is an example of the smoke suite with threshold=15ms: https://paste.ubuntu.com/p/vkcMdcGTnM/
this is an example of the smoke suite with threshold=100ms: https://paste.ubuntu.com/p/XTn8jZ62pJ/