Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: https://github.com/sonic-net/sonic-mgmt/pull/15563 could not skip qos sai test on mellanox asic of dualtor-aa topology #16200

Open
echuawu opened this issue Dec 23, 2024 · 11 comments
Assignees
Labels

Comments

@echuawu
Copy link
Contributor

echuawu commented Dec 23, 2024

Issue Description

Qos sai test could not be skipped on mellanox asic platform with dualtor-aa topology. The PR #15563 should be updated.
The script could not match the condition in PR #15563.

  • "topo_name not in ['t0', 't0-64', 't0-116', 't0-35', 't0-56', 't0-standalone-32', 't0-standalone-64', 't0-standalone-128', 't0-standalone-256', 'dualtor-56', 'dualtor-120', 'dualtor', 't0-80', 't0-backend', 't1-lag', 't1-64-lag', 't1-56-lag', 't1-backend', 't2', 't2_2lc_36p-masic', 't2_2lc_min_ports-masic'] and asic_type not in ['mellanox']"
    Please check and update the skip condition.

Results you see

The script qos/test_qos_sai.py keep running at dualtor-aa topology at mellanox platform.

Results you expected to see

The script should be skipped.

Is it platform specific

generic

Relevant log output

mellanox platform

Output of show version

No response

Attach files (if any)

No response

@yxieca
Copy link
Collaborator

yxieca commented Jan 8, 2025

@bingwang-ms for viz

@yutongzhang-microsoft the condition list should be evaluated by 'or' according to line 1405? In that sense, mellanox platforms should be skipped by the first condition already?

https://github.com/sonic-net/sonic-mgmt/pull/15563/files#diff-ab3274ff0521b1296970af58aea75edf72472ce5a2b29506f54c9ef0534fe1f2R1408

@bingwang-ms
Copy link
Collaborator

@echuawu Can you let me know which change in PR #15563 causes the issue?

@yxieca yxieca added the Triaged label Jan 8, 2025
@yutongzhang-microsoft
Copy link
Contributor

Hi, @yxieca , @bingwang-ms , PR #15563 only copies the condition from shorter entry
https://github.com/sonic-net/sonic-mgmt/pull/15563/files#diff-ab3274ff0521b1296970af58aea75edf72472ce5a2b29506f54c9ef0534fe1f2R1399
to longer entries to fix the issue caused by #14912. And in each entey, we have conditions_logical_operator: or, so it won't affect anything.

For this issue, I can arrange the condition of qos test with case owner.

@echuawu
Copy link
Contributor Author

echuawu commented Jan 9, 2025

@echuawu Can you let me know which change in PR #15563 causes the issue?

Hi @bingwang-ms , it's the skip condition:

  • "topo_name not in ['t0', 't0-64', 't0-116', 't0-35', 't0-56', 't0-standalone-32', 't0-standalone-64', 't0-standalone-128', 't0-standalone-256', 'dualtor-56', 'dualtor-120', 'dualtor', 't0-80', 't0-backend', 't1-lag', 't1-64-lag', 't1-56-lag', 't1-backend', 't2', 't2_2lc_36p-masic', 't2_2lc_min_ports-masic'] and asic_type not in ['mellanox']"

We are running on dualtor-aa topology, it could meet the condition "topo_name not in ..." but our DUT is mellanox asic, so it could not meet the second condition, so the test could not be skipped correctly.

@yutongzhang-microsoft
Copy link
Contributor

@echuawu Can you let me know which change in PR #15563 causes the issue?

Hi @bingwang-ms , it's the skip condition:

  • "topo_name not in ['t0', 't0-64', 't0-116', 't0-35', 't0-56', 't0-standalone-32', 't0-standalone-64', 't0-standalone-128', 't0-standalone-256', 'dualtor-56', 'dualtor-120', 'dualtor', 't0-80', 't0-backend', 't1-lag', 't1-64-lag', 't1-56-lag', 't1-backend', 't2', 't2_2lc_36p-masic', 't2_2lc_min_ports-masic'] and asic_type not in ['mellanox']"

We are running on dualtor-aa topology, it could meet the condition "topo_name not in ..." but our DUT is mellanox asic, so it could not meet the second condition, so the test could not be skipped correctly.

This condition is just a copy from entry qos/test_qos_sai.py::TestQosSai:. So I think this issue has exist for a long time, right?

@echuawu
Copy link
Contributor Author

echuawu commented Jan 9, 2025

@echuawu Can you let me know which change in PR #15563 causes the issue?

Hi @bingwang-ms , it's the skip condition:

  • "topo_name not in ['t0', 't0-64', 't0-116', 't0-35', 't0-56', 't0-standalone-32', 't0-standalone-64', 't0-standalone-128', 't0-standalone-256', 'dualtor-56', 'dualtor-120', 'dualtor', 't0-80', 't0-backend', 't1-lag', 't1-64-lag', 't1-56-lag', 't1-backend', 't2', 't2_2lc_36p-masic', 't2_2lc_min_ports-masic'] and asic_type not in ['mellanox']"

We are running on dualtor-aa topology, it could meet the condition "topo_name not in ..." but our DUT is mellanox asic, so it could not meet the second condition, so the test could not be skipped correctly.

This condition is just a copy from entry qos/test_qos_sai.py::TestQosSai:. So I think this issue has exist for a long time, right?

Hi @yutongzhang-microsoft , the first time I found qos sai test could not be run pass at dualtor-aa setup, I created the issue #15205.

@yutongzhang-microsoft
Copy link
Contributor

@echuawu Can you let me know which change in PR #15563 causes the issue?

Hi @bingwang-ms , it's the skip condition:

  • "topo_name not in ['t0', 't0-64', 't0-116', 't0-35', 't0-56', 't0-standalone-32', 't0-standalone-64', 't0-standalone-128', 't0-standalone-256', 'dualtor-56', 'dualtor-120', 'dualtor', 't0-80', 't0-backend', 't1-lag', 't1-64-lag', 't1-56-lag', 't1-backend', 't2', 't2_2lc_36p-masic', 't2_2lc_min_ports-masic'] and asic_type not in ['mellanox']"

We are running on dualtor-aa topology, it could meet the condition "topo_name not in ..." but our DUT is mellanox asic, so it could not meet the second condition, so the test could not be skipped correctly.

This condition is just a copy from entry qos/test_qos_sai.py::TestQosSai:. So I think this issue has exist for a long time, right?

Hi @yutongzhang-microsoft , the first time I found qos sai test could not be run pass at dualtor-aa setup, I created the issue #15205.

Emmm.... I think there are two issues. In issue #15205, the ask is to skip dualtor-aa topology. The copy from shorter entry to longer entry will resolve this issue, this is what #15563 does.

For mellanox, can you help me check the time when this script doesn't skip on mellanox?

@echuawu
Copy link
Contributor Author

echuawu commented Jan 9, 2025

@yutongzhang-microsoft , #15563 could not make qos sai test skip on our setup, our switch asic is mellanox. On mellanox asic & dualtor-aa, it could not be skipped.

@yutongzhang-microsoft
Copy link
Contributor

This condition was introducted by me in
98752da#diff-ab3274ff0521b1296970af58aea75edf72472ce5a2b29506f54c9ef0534fe1f2R471
which is "topo_name not in ['t0', 't0-64', 't0-116', 't0-35', 'dualtor-56', 'dualtor', 't0-80', 't0-backend', 't1-lag', 't1-64-lag', 't1-backend'] or asic_type not in ['mellanox']" at that time. It can skip your mellanox asic & dualtor-aa using the first condition.

I think the root cause is introducted by PR #6361, in this PR, it can't skip mellanox anymore. @wsycqyz can you take a look?

@wsycqyz
Copy link
Contributor

wsycqyz commented Jan 9, 2025

Discussed with @yutongzhang-microsoft. Will raise PR to address the issue.

@yutongzhang-microsoft
Copy link
Contributor

yutongzhang-microsoft commented Jan 9, 2025

@bingwang-ms After syncing with @wsycqyz offline, can you help us confirm if qos should run on mellanox testbeds? If it needs, we can remove and asic_type not in ['mellanox'] to fix this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants