Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new overwrite_files S3 write mode for atomic overwrites #3074

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

pvieito
Copy link

@pvieito pvieito commented Jan 16, 2025

Feature or Bugfix

  • Feature

Detail

This pull request introduces a new mode, overwrite_files, when writing to S3. This new mode ensures atomic replacement of the files uploaded to S3 and removes the possible race-conditions created by the overwrite & overwrite_partitions modes as explained in this issue.

This initial implementation has the following limits by design:

  • The max_rows_by_file argument should not be set.
  • The bucketing_info argument should not be set.

These limitations will ensure that the number of output files is always 1 per partition. This way, the files will always be correctly overwritten by subsequent uploads. In the future, we may allow multi-part uploads and optionally add a final phase to delete any extra files in the partition or root.

Relates

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@pvieito pvieito changed the title New overwrite_files S3 write mode for atomic overwrites feat: new overwrite_files S3 write mode for atomic overwrites Jan 16, 2025
@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: e23e1a8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: e23e1a8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 1b6f892
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 10e30c1
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: cb7fb67
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 3206a2b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 10e30c1
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: cb7fb67
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 1b6f892
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 2d00362
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 2d00362
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: f86b848
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 49df50c
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: f86b848
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 2fe891d
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 49df50c
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@pvieito
Copy link
Author

pvieito commented Jan 17, 2025

Hi @jaidisido! This PR is now ready for review. Let me know if the main idea is consistent and I'll add some tests 👍🏻

// cc. @kukushking

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 2fe891d
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 1b879bf
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: ce2e362
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: d5c6b78
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 18ac992
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: ce2e362
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 1b879bf
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 18ac992
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants