-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analyze reduced coverage when Epetra is disabled #13522
Comments
I agree that reduction of coverage is a concern. We've been discussing this and trying to get packages to address this for the past couple of years. This is something we should continue to work to improve. However, I don't believe your data (as shown above) supports your statement about "massive reductions in coverage." Significant for some packages, yes. Massive reduction in coverage is an exaggeration. |
@mmwolf, until now, people were just guessing as the reduction in coverage. Now there is some real data.
That is a matter of opinion. Personally, I could consider a reduction in coverage from 57.25% to 43.3% or 74.55% to 58.45% to be "massive". But these coverage numbers don't really tell you what you want to know. What you really want to know is if there major features that are not being covered anymore. (And a small amount of code could break a major feature.) Reductions in these coverage numbers just tells you that you might be in trouble. Right now, now one can say with any confidence what feature coverage is being lost when Epetra is disabled. |
The percent coverage might tell you some information but not much without additional information, so it is too early to label changes as significant or massive. Looking at Tempus data, there seems to be some inconsistencies in what is being displayed (e.g., Tempus's Epetra tests are not being displayed for the with-Epetra build/test). There are probably others. I am assuming that the no-epetra case not only is turning off the Epetra tests but also is not counting the lines ifdef'ed for Epetra code. Thus we are changing the tests used AND the lines of code counted. The percentage therefore can change a lot and not in a clear way without additional information. I think it is better to look at the raw line-coverage data and see if important functions/capabilities have coverage (as determined by developers and stakeholders). The Epetra coverage can indicate which features/code were previously covering some lines of code, but it can't tell you its importance (e.g., pivotal, minor, or dead code). An aside: This is one reason I think we should look at feature coverage, but that is a harder problem (how do you define a feature?). :) |
Agreed
That is why we need to add labels to fully differentiate library code from test-only code.
Agreed, but that could be a tedious effort. The idea is that differences in bulk line coverage (at the package level, directly level, file level, etc.) will help us prioritize our scrutinization. (For example, you should likely first look at packages and files that see more than a 10% decrease in line coverage before you look at files packages and files with less than a 1% decrease in line coverage.)
That may be true in some cases, but I think most of the features in downstream packages should be equally valid for both Epetra and Tpetra backend implementation of the linear algebra data-structures.
Exactly! How do you enumerate features in something like library code like Trilinos in a systematic way? It is much easier with an application code that is driven through an input deck. (That is what Sierra is doing to define features, I believe.) |
@ccober6, you know what would likely be a better approach? What if all of the important Trilinos customers built and installed coverage-enabled configurations of Trilinos for all of their build configurations and use them when they run their application test suites. Then you gather up all of those Trilinos coverage files and post them to the Trilinos CDash site in special builds (Kitware could help us with that). Then you diff the coverage between what is produced by the Trilinos test suite (for all downstream packages and just the native package's test suite). Then you would clearly see where there were gaps Trilinos testing were. The only problem with that approach is that many the Trilinos apps may also have poor coverage of their features which will therefore be poor coverage of Trilinos features. The moral of the story is to do good requirements management and write strong high-coverage tests for production code. That takes us back to the TriBITS Lifecycle Model 👍 |
CC: @ccober6, @cgcgcg, @rppawlo, @sebrowne, @jwillenbring, @mmwolf
Description
With Epetra (and downstream packages and code that has required dependencies on Epetra) set to be removed in Trilinos 17.0 before the end of FY25, we need to be concerned about the loss of coverage of downstream packages that have tests based on Epetra. (Loss of coverage is bad.)
Up until recently, it was not possible to post coverage results to CDash and compare coverage of Trilinos packages with and without Epetra enabled. But recently Kitware has fixed CDash to be able to upload and display coverage results and the Trilinos framework team has been posting coverage builds with Epetra on and Epetra off to an test CDash test site https://trilinos-cdash-qual.sandia.gov.
So now we can start to compare the coverage of builds of Trilinos with and without Epetra enabled. Currently, the Trilinos Framework team is running the builds:
gnu-8.5.0-openmpi-4.1.6_debug_shared_coverage_no-epetra
(see results on results on 2024-10-10)gnu-8.5.0-openmpi-4.1.6_debug_shared_coverage
(see results on 2024-10-10)There is some evidence that these two builds are not 100% equivalent. For example, the coverage numbers for the package Teuchos is not exactly the same:
If these builds were identical, then the coverage for all of Teuchos should be identical, not just for some of the labels. (But we will need to check that the coverage results are consistent from day to day to make sure there is not a defect in CTest or CDash.)
But even these early results show some significant drops in coverage in some cases. For example, we see noticeable drops in:
Obviously more analysis needs to be done there, but these would appear to be massive reductions in coverage in some cases when Epetra is disabled. That shows that some Trilinos packages are still depending on Epetra-based tests for a lot of their coverage. Also note that this coverage includes all usage in downstream Trilinos packages. This does not show the drop in coverage in the native package test suite. (One would expect the drop in coverage for the native package test suites to be much more pronounced when Epetra is disabled.)
Also, we need to carefully separate coverage of library code from test code. We are most concerned with coverage of library code (since that is what users depend on). (NOTE: We can most likely do that with labels on source files)
NOTE: Kitware needs to implement filtering by directory and a download API for the
viewCoverage.php
page so that we can download, analyze and compare this data in more detail (and line-by-line), and then we will be able to see where tests need to be added the most. (The SNL Kitware contract will have Kitware implementing these features later in FY25.)Tasks
no-epetra
and with-epetra builds to be identical (accept for disabling Epetra)viewCoverage.php
viewCoverage.php
no-epetra
and with-epetra builds that only show coverage based on the package's native test suite. (NOTE: Thetribits_ctest_driver()
function already supports package-by-package builds and uploads to CDash.)The text was updated successfully, but these errors were encountered: