Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running pyright on a full workspace does not seem to report all existing errors. #9642

Open
rgoya opened this issue Dec 29, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@rgoya
Copy link

rgoya commented Dec 29, 2024

Describe the bug
I am seeing some puzzling behaviour in pyright where the number of errors reported varies depending on how checks are launched. That is, launching pyright on a workspace with ~6K files will miss errors that are found when the same files are checked in small batches or individually. In the tests below, I saw more than 360 errors reported when checking files independently that are missed when launching pyright on the workspace.

The expectation is that a typing error in a file would be reported regardless of whether that file was being checked independently, or as part of a larger list of files.

Code or Screenshots
I apologize in advance that I cannot share details of our code, but I will try my best to describe the symptoms and maybe get some insights in how to debug it further. (I also apologize for the long writeup, I had some time to kill...)

Context
We have a large monorepo with the bulk of code in two folders, let's call them org and org_dev. There are some restrictions in importing relationships:

  • Code in org can import from org, not org_dev.
  • Code in org_dev can import from org and org_dev.
  • Code in org and org_dev import from third party libraries (core python, pandas, numpy, etc).
    The number of python files in org and org_dev are 3600+ and 2400+ respectively.

Originally we were running pyright on everything:

$ pyright org org_dev | grep informations
216 errors, 0 warnings, 0 informations

To speed things up we separted the run into each org and org_dev in parallel:

$ pyright org | grep informations
107 errors, 0 warnings, 0 informations
$ pyright org_dev | grep informations
123 errors, 0 warnings, 0 informations

We got our speed increase, but we also got more errors reported (bulk 216, split 107+123=230). When running pyright directly on the files with new errors reported, pyright indeed reported the errors. (Note: the high number of errors are there because I am testing on 1.1.391 for this report; with our running version, 1.1.365, we went from 0 errors to 20-ish.)

It would seem that some files were not being checked when running pyright on the whole codebase at once, and only checked when fewer files were being given to pyright. Is this expected? Does pyright have an upper limit on files it can check?

I then ran the checks with --verbose --stats and the stats report does suggest that all runs are inspecting the expected number of files (6175 = 3681+2494):

Analysis stats both org org_dev
Errors: 216 107 123
Total files parsed and bound: 16536 11856 11823
Total files checked: 6175 3681 2494

Selecting a file with errors reported only when running on org, and inspecting how it appears in each run log shows something like:

$ # On org and org_dev together
$ grep FILE.py pyright-1.1.391.test.org.org_dev.verbose.stats.txt
38ms: file://FILE.py
$ # On org alone
$ grep FILE.py pyright-1.1.391.test.org.verbose.stats.txt
FILE.py
  FILE.py:94:34 - error: Unnecessary "# type: ignore" comment (reportUnnecessaryTypeIgnoreComment)
39ms: file://FILE.py
$ # Run pyright on file itself confirms report
$ pyright FILE.py
FILE.py
  FILE.py:94:34 - error: Unnecessary "# type: ignore" comment (reportUnnecessaryTypeIgnoreComment)
1 error, 0 warnings, 0 informations

The above seems to suggest that the file is being checked (time of 38ms), but the report is not being output.

More experiments
To dig into this further, I ran three more experiments:

  • Increase the max heap size to detect differences between scanning org, org_dev or both.
    • Result: heap size is not the issue.
  • Spread files between multiple CPUs by using --threads (only on org tree).
    • Result: file check distribution during thread assignment allows pyright to detect more errors.
  • Spread files between multiple pyright runs spread by folders (only on org tree).
    • Result: checking files independently detects 360+ errors missed when running on workspace. Similar error detection than 10 threads.
Heap size test

Heap size

Our production heapsize is set to 8GB by using NODE_OPTIONS="--max-old-space-size=8192". I ran the three checks both (org org_dev), org, and org_dev with heap sizes of 8192, 16384 and 24576. Interestingly, both had a reduction of 6 errors, the rest went unchanged. That is, checking org and org_dev separately always resulted in more errors reported than checking them both in one pyright run:

With heap of 8GB
both     216  errors,  0  warnings,  0  informations
org      107  errors,  0  warnings,  0  informations
org_dev  123  errors,  0  warnings,  0  informations
With heap of 16GB
both     210  errors,  0  warnings,  0  informations
org      107  errors,  0  warnings,  0  informations
org_dev  123  errors,  0  warnings,  0  informations
With heap of 24GB
both     210  errors,  0  warnings,  0  informations
org      107  errors,  0  warnings,  0  informations
org_dev  123  errors,  0  warnings,  0  informations

Although heap size did account for the missing errors, it did make for some interesting plots of memory management:

Memory behaviour with 8GB heap size Image
Memory behaviour with 24GB heap size Image
---
Threads test

Threads

Code block for test
for RUN in 1 2 3
do
    for THREADS in 1 5 10
    do
        echo Run $RUN, $THREADS threads
        pyright --threads $THREADS org > pyright-1.1.391.thread_test.threads_${THREADS}.run_${RUN}.txt
    done
done
echo "---"
echo "Errors by thread group:"
for THREADS in 1 5 10
do
    for FILE in pyright-1.1.391.thread_test.threads_${THREADS}.run_*.txt
    do
        echo $FILE `grep informations $FILE`;
    done
    echo "---"
done | column -t

The results seem puzzling. With one thread, they are consistent at 107 errors reported; but when multithreading, no only do I get more reports than single threaded, but the number of reports can vary between runs! This is the output of the script above:

Run 1, 1 threads
Run 1, 5 threads
Run 1, 10 threads
Run 2, 1 threads
Run 2, 5 threads
Run 2, 10 threads
Run 3, 1 threads
Run 3, 5 threads
Run 3, 10 threads
---
Errors by thread group:
pyright-1.1.391.thread_test.threads_1.run_1.txt   107  errors,  0  warnings,  0  informations
pyright-1.1.391.thread_test.threads_1.run_2.txt   107  errors,  0  warnings,  0  informations
pyright-1.1.391.thread_test.threads_1.run_3.txt   107  errors,  0  warnings,  0  informations
---
pyright-1.1.391.thread_test.threads_5.run_1.txt   354  errors,  0  warnings,  0  informations
pyright-1.1.391.thread_test.threads_5.run_2.txt   358  errors,  0  warnings,  0  informations
pyright-1.1.391.thread_test.threads_5.run_3.txt   398  errors,  0  warnings,  0  informations
---
pyright-1.1.391.thread_test.threads_10.run_1.txt  460  errors,  0  warnings,  0  informations
pyright-1.1.391.thread_test.threads_10.run_2.txt  476  errors,  0  warnings,  0  informations
pyright-1.1.391.thread_test.threads_10.run_3.txt  460  errors,  0  warnings,  0  informations
---

Here are some Venn diagrams showing the differences:

More threads result in more errors reported

Image

In three runs, using one thread had consistent results.

Image

Using five threads resulted in more errors reported with variability between runs.

Image

Using 10 threads resulted in even more errors, but variability did not increase.

Image

---
Check folders and files test

Further separating folders
Since scanning both org and org_dev gave slightly different results. I wanted to know whether splitting org into smaller chunks would have further effects.

I did two runs, one launching pyright targetting each subdirectory of org/ (40+ subdirs, "dirscan") and one targetting every independent file in the org/ tree (3500+ files, "filescan").

In short, org check reported more errors than dirscan, but filescan reported even more errors. Here is a Venn diagram of the overlap:

Image

The large number of errors found with filescan seemed reminiscent of the run with 10 threads. Indeed, most of the errors are found by both, the 10 threads scan and filescan (although not all), here is the Venn diagram of the overlap:

Image

Conclusion
The Venn diagrams describing the overlap between filescan and 10 threads suggest that launching pyright on a workspace is indeed either failing to detect, or report (since --stats does indicate a check time for underreported files), typing errors in the codebase.

VS Code extension or command-line
The tests in this report were run using pyright command line version 1.1.391. Although similar behaviour was seen with previous versions.

@rgoya rgoya added the bug Something isn't working label Dec 29, 2024
@erictraut
Copy link
Collaborator

erictraut commented Dec 29, 2024

Thanks for the detailed analysis.

Here's a few additional questions:

  1. Are you using a pyrightconfig.json or pyproject.toml file with a pyright configuration at the root of your project? If so, are you using an "include", "exclude", "ignore", "extraPaths", or any other config options that potentially affect which files are included in the project or how imports are resolved?
  2. Are you configuring any execution environments in your config file?
  3. Is there more than one pyright config file in your project?
  4. What directory are you running the command-line from? If there is no config file, the working directory defines the "project root". This affects import resolution behaviors and can lead to inconsistent errors if you're running pyright from different directories.
  5. Have you looked in detail at any of the errors that are reported in one run but not another? Do these errors have anything in common? For example, are they all coming from the same diagnostic rule? If you're not able to see any commonality, perhaps you could provide me with a few such errors to see if I can see some commonality.
  6. Do all of the diagnostics that you see appear to be legitimate? In other words, do you suspect that there are false negatives in the runs with fewer diagnostics, or false positives in the runs with more diagnostics?

@erictraut erictraut added the question Further information is requested label Dec 29, 2024
@erictraut
Copy link
Collaborator

Oh, I think I see what's going on here. You mentioned that org_dev can import from org but not the other way around. That means when you tell pyright to include the files in org_dev, it's going to also implicitly include any files in org that are imported by org_dev and are under your project root. It will report errors for these files. If you then check org separately, you'll see the same errors again. This is expected behavior.

My recommendation is that you don't attempt to manually split up the type checking for your repo. Just check the entire repo and use --threads so pyright leverages the multiple cores in your system to reduce type checking times.

If you do split up type checking, you'll need to de-duplicate the resulting diagnostics yourself. You can do this by using the --outputjson command line switch and writing a small script to de-dup diagnostics, then report them in whatever format you prefer.

I'm going to close the issue because I'm pretty confident this explains the behaviors you're seeing, and pyright is working as intended here. If you think that I've missed something, feel free to reply.

@erictraut erictraut closed this as not planned Won't fix, can't repro, duplicate, stale Dec 29, 2024
@erictraut erictraut added as designed Not a bug, working as intended and removed question Further information is requested labels Dec 29, 2024
@rgoya
Copy link
Author

rgoya commented Dec 29, 2024

Oh, I think I see what's going on here. You mentioned that org_dev can import from org but not the other way around. That means when you tell pyright to include the files in org_dev, it's going to also implicitly include any files in org that are imported by org_dev and are under your project root. It will report errors for these files. If you then check org separately, you'll see the same errors again. This is expected behavior.

I considered this reason as well, but I don't think that is the case. Most of the testing was done comparing only launching tests on org. For example, the org_8G/dirscan/filescan and the filescan/t5_r1/t10_r1 Venn diagrams I attached as part of the "Check folders and files test" section were obtained running pyright only on org, its subfolders and files; so no org_dev files were being tested.

If you do split up type checking, you'll need to de-duplicate the resulting diagnostics yourself. You can do this by using the --outputjson command line switch and writing a small script to de-dup diagnostics, then report them in whatever format you prefer.

I had checked for duplicate entries of the sort you suggested and did not find them. Here's the Venn diagram for errors found running pyright on both, org and org_dev:

Image

You can see that running on org and org_dev on their own only catches more errors than running on both.

Additionally, addressing your point, double checking the subtrees where errors are reported confirms that each error is reported only the run that is checking that subtree. During my analysis I loaded all the errors in a data frame and labelled them on which test reported them (boolean columns both_8G, org_8G, org_dev_8G, etc), as well as other features like the tree column which indicates which folder the file is in.

No errors are "crossing boundaries". both reports on both, org on org, and org_dev in org_dev:

> len(df_pivot.query("org_8G == True and tree == 'org'"))
98
> len(df_pivot.query("both_8G == True and tree == 'org_dev'"))
118

> len(df_pivot.query("org_8G == True and tree == 'org'"))
107
> len(df_pivot.query("org_8G == True and tree == 'org_dev'"))
0

> len(df_pivot.query("org_dev_8G == True and tree == 'org_dev'"))
123
> len(df_pivot.query("org_dev_8G == True and tree == 'org'"))
0

Confirming that the tree column matches the filepath (file):

> sum(df_pivot.query("tree == 'org'")["file"].apply(lambda x: x.startswith("org")))
488
> sum(df_pivot.query("tree == 'org'")["file"].apply(lambda x: x.startswith("org_dev")))
0

> sum(df_pivot.query("tree == 'org_dev'")["file"].apply(lambda x: x.startswith("org_dev")))
123
> sum(df_pivot.query("tree == 'org_dev'")["file"].apply(lambda x: x.startswith("org")))
0

I'm going to close the issue because I'm pretty confident this explains the behaviors you're seeing, and pyright is working as intended here. If you think that I've missed something, feel free to reply.

I think there's more to it.

(I'm getting/arranging the data to address the questions in your first response)

@rgoya
Copy link
Author

rgoya commented Dec 30, 2024

Thanks for the very prompt response, @erictraut .

Here's a few additional questions:

  1. Are you using a pyrightconfig.json or pyproject.toml file with a pyright configuration at the root of your project? If so, are you using an "include", "exclude", "ignore", "extraPaths", or any other config options that potentially affect which files are included in the project or how imports are resolved?

We are using a sole pyproject.toml file at the root of the project. I don't believe any of the options we have there would affect imports:

[tool.pyright]
extraPaths = [
  "third_party/folder",
]

exclude = [
  "**/.ipynb_checkpoints",
  "**/__pycache__",
  ".mypy_cache",
  ".conda*",
  ".git",
  ".jupyterlab",
  "**/node_modules",
  "bazel*"
]

typeCheckingMode = "basic"
strictParameterNoneValue = false
reportUnusedExpression = false
reportPrivateImportUsage = false
reportUnnecessaryTypeIgnoreComment = true
stubPath = "third_party/stubs"
  1. Are you configuring any execution environments in your config file?

No.

  1. Is there more than one pyright config file in your project?

No.

  1. What directory are you running the command-line from? If there is no config file, the working directory defines the "project root". This affects import resolution behaviors and can lead to inconsistent errors if you're running pyright from different directories.

I run pyright at the root folder, and made sure all tests ran to gather the data for the issue report were run from the root folder.

  1. Have you looked in detail at any of the errors that are reported in one run but not another? Do these errors have anything in common? For example, are they all coming from the same diagnostic rule? If you're not able to see any commonality, perhaps you could provide me with a few such errors to see if I can see some commonality.

The bulk of the errors reported only when checking files independently (filescan) are of type ReportArgumentType. That said, the enrichment of this type of error seems to be only coincidental (see bottom).

Here is the table summing grouping all errors in org found by folder, dirscan and filecan, and counting how many of each type are detected:

> df_pivot.query("org_8G == True | dirscan == True | filescan == True")[
    ["report_class", "org_8G", "dirscan", "filescan"]
].value_counts().sort_index()

report_class                        org_8G  dirscan  filescan
reportArgumentType                  False   False    True        345
                                            True     True          2
                                    True    False    True         23
                                            True     True          5
reportAttributeAccessIssue          True    True     True         40
reportCallIssue                     False   False    True          7
                                    True    True     True          2
reportGeneralTypeIssues             True    True     True          1
reportIndexIssue                    True    True     True          1
reportMissingImports                True    True     True          1
reportOptionalCall                  True    True     True          2
reportOptionalSubscript             True    True     True          1
reportReturnType                    True    True     True          1
reportUnnecessaryTypeIgnoreComment  False   True     True          2
                                    True    True     True         30

Note: like in my previous reply, these runs check only files in org, so there is no org vs org_dep duplicates.

Looking at the actual files from where the errors were reported shows 108 files had errors reported only when scanning independently. Here's a Venn diagram of the following data:

> df_pivot.query("org_8G == True | dirscan == True | filescan == True")[
    ["file", "org_8G", "dirscan", "filescan"]
].drop_duplicates()
Image
  1. Do all of the diagnostics that you see appear to be legitimate? In other words, do you suspect that there are false negatives in the runs with fewer diagnostics, or false positives in the runs with more diagnostics?

This is what ultimately made me run pyright on each file independently (filescan test in main issue). They do seem to make sense.

The large number of reportArgumentType errors seems to be related to an ongoing issue we are having related to this report, but I also see errors of this tyoe when checking both, just fewer of them.

[Edit:] I found something interesting in 5 files:

  • I have a file A.py that dirscan misses entirely (no errors reported), filescan finds 8 errors, and org_8G does report it, but only reports 2 errors out of the 8.
  • All errors are of the same general type reportArgumentType, some are duplicated in different lines, meaning there is a total of 4 different errors of counts [3, 1, 2, 2]
  • The 2 errors that org_8G detects are of the same type.

@erictraut
Copy link
Collaborator

Thanks for the additional details. Reopening because it does appear there's something unexpected going on here.

@erictraut erictraut reopened this Dec 30, 2024
@erictraut erictraut removed the as designed Not a bug, working as intended label Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants