-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skipping MCAD CPU Preemption Test #696
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
For future reference, the root cause analysis of the test's failure has been conducted by @dgrove-oss, and it can be found here: |
0a8c451
to
5c3d3ed
Compare
5c3d3ed
to
701e8e9
Compare
Thanks @ronensc That's good to know! |
I don't think it's worth backporting, but I did redo these test cases for mcad v2 to be robust against different cluster sizes in project-codeflare/mcad#83 |
More investigation is required as to why these tests are failing. Closing this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I like the move to more generic tests. One question.
//aw := createDeploymentAWwith550CPU(context, appendRandomString("aw-deployment-2-550cpu")) | ||
cap := getClusterCapacitycontext(context) | ||
resource := cpuDemand(cap, 0.275).String() | ||
aw := createGenericDeploymentCustomPodResourcesWithCPUAW( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the cluster has many smaller nodes resulting a a high cap
but inability to schedule AppWrappers becauase they do not fit on the individual nodes? Do we care about that at all in this test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a test case perspective, the cluster is assumed to have homogenous nodes and it requests deployments that fit on a node in the cluster in CPU dimension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, so happy to move forward with this improvement. Just a couple of small comments.
@@ -793,6 +795,36 @@ func createDeploymentAWwith550CPU(context *context, name string) *arbv1.AppWrapp | |||
return appwrapper | |||
} | |||
|
|||
func getClusterCapacitycontext(context *context) *clusterstateapi.Resource { | |||
capacity := clusterstateapi.EmptyResource() | |||
nodes, _ := context.kubeclient.CoreV1().Nodes().List(context.ctx, metav1.ListOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should handle the error here.
podList, err := context.kubeclient.CoreV1().Pods("").List(context.ctx, metav1.ListOptions{FieldSelector: labelSelector}) | ||
// TODO: when no pods are listed, do we send entire node capacity as available | ||
// this will cause false positive dispatch. | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the error be caught like this instead?
if err != nil { | |
Expect(err).NotTo(HaveOccurred() |
Skipping the MCAD CPU Preemption Test which is failing intermittently on PRs so that we can get some outstanding PRs merged.