Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add low-level debugging #256

Merged
merged 1 commit into from
Jan 16, 2024
Merged

Add low-level debugging #256

merged 1 commit into from
Jan 16, 2024

Conversation

yankevn
Copy link
Collaborator

@yankevn yankevn commented Jan 12, 2024

This change adds critical low-level debugging logs, most importantly node-level memory utilization. This change will help greatly in catching OOM failures that otherwise go undetected.

Testing:

  • Tested on local stack

@yankevn yankevn requested a review from raghumdani January 12, 2024 20:18
@yankevn yankevn marked this pull request as ready for review January 12, 2024 20:18
Copy link
Collaborator

@raghumdani raghumdani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good. We may have a much simpler way to add node level logs without tinkering with Ray tasks.

deltacat/compute/compactor_v2/steps/hash_bucket.py Outdated Show resolved Hide resolved
deltacat/compute/compactor_v2/steps/hash_bucket.py Outdated Show resolved Hide resolved
deltacat/compute/compactor_v2/steps/merge.py Outdated Show resolved Hide resolved
deltacat/utils/resources.py Outdated Show resolved Hide resolved
@yankevn yankevn force-pushed the low_level_debug_logs branch from 290783e to 8c7b44d Compare January 12, 2024 22:49
@yankevn yankevn requested a review from raghumdani January 12, 2024 23:12
@yankevn yankevn force-pushed the low_level_debug_logs branch from 8c7b44d to 352af08 Compare January 12, 2024 23:49
@yankevn yankevn requested a review from raghumdani January 13, 2024 00:20
Copy link
Collaborator

@raghumdani raghumdani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think this will be lot of logs. For a standard 100 node cluster running for 2hr, we will print 720 * 3200 * 72 = 165MB of additional logs. Hence, we should be able to disable this feature in prod if required (maybe as part of scaling params).

@yankevn yankevn merged commit 350719a into main Jan 16, 2024
5 checks passed
@yankevn yankevn deleted the low_level_debug_logs branch January 16, 2024 21:45
@yankevn
Copy link
Collaborator Author

yankevn commented Jan 16, 2024

CW costs about $0.50/GB collected, so thankfully that's only about an extra 8 cents per job. Definitely still something to keep in mind though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants