Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: measure state lock #14874

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,7 @@ jobs:
- { go-build-tags: withbootassetstesting, skip-coverage: false, snapd-debug: false, go-test-race: false}
- { go-build-tags: nosecboot, skip-coverage: false, snapd-debug: false, go-test-race: false}
- { go-build-tags: faultinject, skip-coverage: false, snapd-debug: false, go-test-race: false}
- { go-build-tags: statelock, skip-coverage: false, snapd-debug: false, go-test-race: false}
- { go-build-tags: snapdusergo, skip-coverage: false, snapd-debug: false, go-test-race: false}
- { go-build-tags: "", skip-coverage: true, snapd-debug: false, go-test-race: true }

Expand Down
4 changes: 2 additions & 2 deletions build-aux/snap/snapcraft.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -355,14 +355,14 @@ parts:
TAGS+=(nomanagers)
case "${VERSION}" in
1337.*)
TAGS+=(withtestkeys faultinject)
TAGS+=(withtestkeys faultinject statelock)
;;
esac
;;
*)
case "${VERSION}" in
1337.*)
TAGS+=(withtestkeys withbootassetstesting faultinject)
TAGS+=(withtestkeys withbootassetstesting faultinject statelock)
;;
esac
;;
Expand Down
83 changes: 83 additions & 0 deletions osutil/statelock.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
// -*- Mode: Go; indent-tabs-mode: t -*-
//go:build statelock

/*
* Copyright (C) 2021 Canonical Ltd
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 3 as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*
*/

package osutil

import (
"fmt"
"os"
"runtime"
"time"
)

func traceCallers(description string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func traceCallers(description string) {
func traceCallers(description string) error {

and then you can update MaybeSave to do:

	if heldMs > threshold || waitMs > threshold {
		formattedLine := fmt.Sprintf("lock: held %d ms wait %d ms", heldMs, waitMs)
		if err := traceCallers(formattedLine); err != {
			logger.Infof("cannot trace callers: %v", err)
		}
	}

and change fmt.Fprintf() to use return fmt.Errorf("cannot this or that: %w", err)

lockfilePath := os.Getenv("SNAPD_STATE_LOCK_FILE")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd avoid using lock* in the name, to not confuse future self or other readers that this is not a lock file, but a file with log of state lock times (and their context). So maybe logFilePath.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

if lockfilePath == "" {
fmt.Fprintf(os.Stderr, "could not retrieve log file, SNAPD_STATE_LOCK_FILE env var required")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fmt.Fprintf(os.Stderr, "could not retrieve log file, SNAPD_STATE_LOCK_FILE env var required")
fmt.Fprintf(os.Stderr, "could not retrieve log file, SNAPD_STATE_LOCK_FILE env var required")
return

}

lockfile, err := os.OpenFile(lockfilePath, os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0600)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could flock the file after opening it, so that we serialize the writes but the state lock could be unlocked earlier

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if err != nil {
fmt.Fprintf(os.Stderr, "could not open/create log traces file: %v", err)
return
}
defer lockfile.Close()

pc := make([]uintptr, 10)
n := runtime.Callers(0, pc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip should likely be set to 2, to skip traceCallers and MaybeSaveLockTime

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

formattedLine := fmt.Sprintf("##%s\n", description)
if _, err = lockfile.WriteString(formattedLine); err != nil {
fmt.Fprintf(os.Stderr, "internal error: could not write trace callers header to tmp file: %v", err)
return
}
for i := 0; i < n; i++ {
f := runtime.FuncForPC(pc[i])
file, line := f.FileLine(pc[i])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per docs of https://pkg.go.dev/runtime?utm_source=godoc#Callers, this should use runtime.CallersFrames()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

formattedLine = fmt.Sprintf("%s:%d %s\n", file, line, f.Name())
if _, err = lockfile.WriteString(formattedLine); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could maybe use runtime.Profile to achieve the same thing but do it in more standard way.

fmt.Fprintf(os.Stderr, "internal error: could not write trace callers to tmp file: %v", err)
return
}
}
}

func GetLockStart() int64 {
return time.Now().UnixNano() / int64(time.Millisecond)
}

// MaybeSaveLockTime allows to save lock times when this overpass the threshold
// defined by through the SNAPD_STATE_LOCK_THRESHOLD_MS environment settings.
func MaybeSaveLockTime(lockStart int64) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func MaybeSaveLockTime(lockStart int64) {
func MaybeSaveLockTime(lockWait, lockStart, lockEnd int64) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

lockEnd := time.Now().UnixNano() / int64(time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can prob be done after both getenvs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update


if !GetenvBool("SNAPPY_TESTING") {
return
}
threshold := GetenvInt64("SNAPD_STATE_LOCK_THRESHOLD_MS")
if threshold <= 0 {
return
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"SNAPD_STATE_LOCK_FILE" could be tested here

elapsedMilliseconds := lockEnd - lockStart
if elapsedMilliseconds > threshold {
formattedLine := fmt.Sprintf("Elapsed Time: %d milliseconds", elapsedMilliseconds)
traceCallers(formattedLine)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elapsedMilliseconds := lockEnd - lockStart
if elapsedMilliseconds > threshold {
formattedLine := fmt.Sprintf("Elapsed Time: %d milliseconds", elapsedMilliseconds)
traceCallers(formattedLine)
}
heldMs := lockEnd - lockStart
waitedMs := lockStart - lockWait
if heldMs > threshold || waitedMs > threshold {
formattedLine := fmt.Sprintf("waited %d ms held", waitedMs, heldMs)
traceCallers(formattedLine)
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
28 changes: 28 additions & 0 deletions osutil/statelock_fake.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
// -*- Mode: Go; indent-tabs-mode: t -*-
//go:build !statelock

/*
* Copyright (C) 2021 Canonical Ltd
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 3 as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*
*/

package osutil

func GetLockStart() int64 {
return int64(0)
}

func MaybeSaveLockTime(lockStart int64) {
}
5 changes: 5 additions & 0 deletions overlord/state/state.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ import (
"time"

"github.com/snapcore/snapd/logger"
"github.com/snapcore/snapd/osutil"
)

// A Backend is used by State to checkpoint on every unlock operation
Expand Down Expand Up @@ -111,6 +112,8 @@ type State struct {
// task/changes observing
taskHandlers map[int]func(t *Task, old, new Status) (remove bool)
changeHandlers map[int]func(chg *Change, old, new Status)

lockStart int64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lockStart int64
lockWait int64
lockStart int64

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

// New returns a new empty state.
Expand Down Expand Up @@ -141,6 +144,7 @@ func (s *State) Modified() bool {
func (s *State) Lock() {
s.mu.Lock()
atomic.AddInt32(&s.muC, 1)
s.lockStart = osutil.GetLockStart()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
s.mu.Lock()
atomic.AddInt32(&s.muC, 1)
s.lockStart = osutil.GetLockStart()
lockWait := osutil.LockTimestamp()
s.mu.Lock()
atomic.AddInt32(&s.muC, 1)
s.lockStart = osutil.LockTimestamp()
s.lockWait = lockWait

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

func (s *State) reading() {
Expand All @@ -159,6 +163,7 @@ func (s *State) writing() {
func (s *State) unlock() {
atomic.AddInt32(&s.muC, -1)
s.mu.Unlock()
osutil.MaybeSaveLockTime(s.lockStart)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only needs to happen with the lock taken I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be ok to do with the state lock released if we flock the log file. With the state lock held, there would likely be no contention for flock on the log file. OTOH with the lock released, we'd rely on flock to serialize writes, but this would not keep the state lock unnecessarily held.

FWIW, unlocking may reschedule, so assuming we use flock for serializing writes, the timing may end up being incorrect given that the time is captured inside MaybeSaveLockTime(). The end time will likely need to be captured like so:

lockEnd := time.Now()
s.mu.Unlock()
osutil.MaybeSaveLockTime(s.lockStart, lockEnd)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note, I missed that s.lockStart is used after releasing the lock, it should ofc be read into a variable before the lock is released

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

udpated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
s.mu.Unlock()
osutil.MaybeSaveLockTime(s.lockStart)
lockWait, lockStart := s.lockWait, s.lockStart
s.lockWait, s.lockStart = 0, 0
s.mu.Unlock()
osutil.MaybeSaveLockTime(lockWait, lockStart)

}

type marshalledState struct {
Expand Down
4 changes: 2 additions & 2 deletions packaging/ubuntu-16.04/rules
Original file line number Diff line number Diff line change
Expand Up @@ -103,8 +103,8 @@ _SNAP_TAGS=
# check if we need to include the testkeys in the binary
ifneq (,$(filter testkeys,$(DEB_BUILD_OPTIONS)))
# if enabled also enable bootloader assets testing and fault injection
_TAGS := withtestkeys,withbootassetstesting,faultinject
_SNAP_TAGS := nomanagers,withtestkeys,faultinject
_TAGS := withtestkeys,withbootassetstesting,faultinject,statelock
_SNAP_TAGS := nomanagers,withtestkeys,faultinject,statelock
else
_SNAP_TAGS=nomanagers
endif
Expand Down
2 changes: 2 additions & 0 deletions spread.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,8 @@ environment:
SNAPD_DEB_FROM_REPO: '$(HOST: echo "${SPREAD_SNAPD_DEB_FROM_REPO:-true}")'
# Directory where the built snapd snaps and other assets are stored
SNAPD_WORK_DIR: '$(HOST: echo "${SPREAD_SNAPD_WORK_DIR:-/var/tmp/work-dir}")'
# Threshold used to determine which stake locks have to be collected during runtime
SNAPD_STATE_LOCK_THRESHOLD_MS: '$(HOST: echo "${SPREAD_SNAPD_STATE_LOCK_THRESHOLD_MS:-0}")'

# Directory where the nested images and test assets are stored
NESTED_WORK_DIR: '$(HOST: echo "${NESTED_WORK_DIR:-/var/tmp/work-dir}")'
Expand Down
4 changes: 3 additions & 1 deletion tests/lib/prepare-restore.sh
Original file line number Diff line number Diff line change
Expand Up @@ -697,14 +697,16 @@ prepare_suite_each() {
fi

if [[ "$variant" = full ]]; then
# shellcheck source=tests/lib/prepare.sh
. "$TESTSLIB"/prepare.sh
# shellcheck source=tests/lib/prepare.sh
. "$TESTSLIB"/prepare.sh
if os.query is-classic; then
prepare_each_classic
else
prepare_each_core
fi

prepare_state_lock "$SPREAD_JOB"
fi

case "$SPREAD_SYSTEM" in
Expand Down
33 changes: 33 additions & 0 deletions tests/lib/prepare.sh
Original file line number Diff line number Diff line change
Expand Up @@ -460,6 +460,7 @@ prepare_classic() {
fi

prepare_reexec_override
prepare_state_lock "SNAPD PROJECT"
prepare_memory_limit_override
disable_refreshes

Expand Down Expand Up @@ -1520,6 +1521,37 @@ EOF
rm -rf "$UNPACK_DIR"
}

prepare_state_lock(){
TAG=$1
CONF_FILE="/etc/systemd/system/snapd.service.d/state-lock.conf"
LOCKS_FILE="$TESTSTMP"/snapd_lock_traces
RESTART=false

if [ "$SNAPD_STATE_LOCK_THRESHOLD_MS" -gt 0 ]; then
echo "###START: $TAG" >> "$LOCKS_FILE"

# Generate the config file when it does not exist and when the threshold has changed different
if ! [ -f "$CONF_FILE" ] || ! grep -q "SNAPD_STATE_LOCK_THRESHOLD_MS=$SNAPD_STATE_LOCK_THRESHOLD_MS" < "$CONF_FILE"; then
echo "Prepare snapd for getting state lock time"
cat <<EOF > "$CONF_FILE"
[Service]
Environment=SNAPPY_TESTING=1 SNAPD_STATE_LOCK_THRESHOLD_MS="$SNAPD_STATE_LOCK_THRESHOLD_MS" SNAPD_STATE_LOCK_FILE="$LOCKS_FILE"
EOF
RESTART=true
fi
elif [ -f "$CONF_FILE" ]; then
rm -f "$CONF_FILE"
RESTART=true
fi

if [ "$RESTART" = "true" ]; then
# the service setting may have changed in the service so we need
# to ensure snapd is reloaded
systemctl daemon-reload
systemctl restart snapd
fi
}

# prepare_ubuntu_core will prepare ubuntu-core 16+
prepare_ubuntu_core() {
# we are still a "classic" image, prepare the surgery
Expand Down Expand Up @@ -1633,6 +1665,7 @@ prepare_ubuntu_core() {
# or restore will break
remove_disabled_snaps
prepare_memory_limit_override
prepare_state_lock "SNAPD PROJECT"
setup_experimental_features
systemctl stop snapd.service snapd.socket
save_snapd_state
Expand Down
16 changes: 16 additions & 0 deletions tests/main/snapd-state-lock/task.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
summary: smoke test used to retrieve the lock state times

details: |
Test used to collect artifacts

priority: -1

artifacts:
- snapd_lock_traces

execute: |
if [ -f "$TESTSTMP"/snapd_lock_traces ]; then
cp -f "$TESTSTMP"/snapd_lock_traces .
else
touch snapd_lock_traces
fi
Loading