Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-37133][table] Support Submitting Refresh Job of Materialized Table to Yarn/K8s #25988

Merged
merged 4 commits into from
Jan 20, 2025

Conversation

hackergin
Copy link
Contributor

What is the purpose of the change

Support submitting refresh task of materialized table to yarn/k8s

Brief change log

Support submitting refresh task of materialized table to yarn/k8s

Verifying this change

Some e2e test case is added to verify this feature.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (not documented)

@hackergin hackergin force-pushed the support-session-mode-mt branch 4 times, most recently from 4e75a35 to efec0ab Compare January 15, 2025 09:10
@flinkbot
Copy link
Collaborator

flinkbot commented Jan 15, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@hackergin hackergin force-pushed the support-session-mode-mt branch 4 times, most recently from d04dd76 to 1a18003 Compare January 15, 2025 13:42
Copy link
Contributor

@lsyldliu lsyldliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hackergin Thanks for your contribution, I left some comments.

Due to the e2e test only cover one case, so I think we need to test yarn and k8s manually. We should cover the following cases:

  • yarn-application: continuous mode and full mode, create & suspend & resume & drop action
  • yarn-session: continuous mode and full mode, create & suspend & resume & drop action
  • k8s-application: continuous mode and full mode, create & suspend & resume & drop action
  • k8s-session: continuous mode and full mode, create & suspend & resume & drop action

# 2. suspend & resume materialized table in continuous mode
execute_statement $session_handle "alter materialized table my_materialized_table_in_continuous_mode suspend"

kubectl delete deployment $APPLICATION_CLUSTER_ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to manually delete clusters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my test, I found that after "stop with Savepoint", the deployment does not end. I need to spend some time to verify it again. In theory, it should exit automatically.

Copy link
Contributor Author

@hackergin hackergin Jan 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After my testing, it seems that the failure of deployment to exit normally is a bug in SqlDriver when splitting sql. When there is no semicolon at the end of the sql statement, it will enter an infinite loop and thus infinitely submit new jobs. Since SqlDriver needs time to fix this bug, currently we can add a semicolon by default when generate statement of the refresh job to avoid this problem.

Copy link
Contributor Author

@hackergin hackergin Jan 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a discuss with @fsk119 , We decide to directly fix this is this pr, the commit is: de3675e

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a discuss with @fsk119 , We decide to directly fix this is this pr, the commit is: de3675e

BTW, the CI is failed.

Copy link
Contributor Author

@hackergin hackergin Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there are some issues with executing DML statements, so I will switch to executing a simple DDL statement instead.

@lsyldliu
Copy link
Contributor

@hackergin Thanks for your contribution, I left some comments.

Due to the e2e test only cover one case, so I think we need to test yarn and k8s manually. We should cover the following cases:

  • yarn-application: continuous mode and full mode, create & suspend & resume & drop action
  • yarn-session: continuous mode and full mode, create & suspend & resume & drop action
  • k8s-application: continuous mode and full mode, create & suspend & resume & drop action
  • k8s-session: continuous mode and full mode, create & suspend & resume & drop action

We can make these test cases work as one part of crossing-team test, but we should try our best to start crossing-team test as soon as possible.

@davidradl
Copy link
Contributor

Reviewed by Chi on 16/01/2025 Go back to the submitter with review comments.

Configuration executionConfig,
OperationExecutor operationExecutor,
OperationHandle operationHandle) {
if (isApplicationMode(operationExecutor.getSessionContext().getSessionConf())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For minicluster, is it not remote?

@hackergin hackergin changed the title [FLINK-37133][table] Support Submitting Refresh Tasks of Materialized Table to Yarn/K8s [FLINK-37133][table] Support Submitting Refresh Job of Materialized Table to Yarn/K8s Jan 18, 2025
@hackergin hackergin force-pushed the support-session-mode-mt branch 3 times, most recently from 191165a to bf1ae6d Compare January 19, 2025 13:11
public Optional<String> getRestorePath() {
return Optional.ofNullable(restorePath);
}

@Override
public String asSummaryString() {
return String.format(
"{\njobId=%s,\n executionTarget=%s%s\n}",
"{\n jobId=%s,\n executionTarget=%s,\n clusterId=%s%s\n}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"{\n jobId=%s,\n executionTarget=%s,\n clusterId=%s%s\n}",
"{\n executionTarget=%s,\n clusterId=%s,\n jobId=%s%s\n}",

}
}

private static @Nullable String getClusterIdKeyName(String targetName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use Optional<String> as the return type, it would more better.

@hackergin hackergin force-pushed the support-session-mode-mt branch from bf1ae6d to 095a54f Compare January 20, 2025 01:36
@@ -194,6 +194,7 @@ public boolean hasNext() {
return true;
}
}
position = script.length();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I'm not familiar with related code logic, I'm curious why adding this line of code can fix the issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The position keeps track of the current location in the SQL text being parsed. Previously, the logic only updated the position when a semicolon was encountered. However, if there is no semicolon at the end of the text—meaning we’ve finished processing all the text without updating the position—the next time we attempt to split, it will start from the previous position again.

'connector' = 'datagen'
); -- This is another comment

DESCRIBE src
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the select statement to describe statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SELECT statement still requires a MiniCluster for execution. Therefore, I referred to other tests in this unit test and only verified DDL-related operations without relying on the MiniCluster.

@hackergin hackergin force-pushed the support-session-mode-mt branch 2 times, most recently from 5840cd8 to 0bd4f8b Compare January 20, 2025 02:39
@hackergin hackergin force-pushed the support-session-mode-mt branch from 0bd4f8b to b34eeb3 Compare January 20, 2025 03:49
@hackergin
Copy link
Contributor Author

@flinkbot run azure

@hackergin
Copy link
Contributor Author

test_ci connect failed because : https://issues.apache.org/jira/browse/FLINK-36290
test_ci table failed because : https://issues.apache.org/jira/browse/FLINK-36167

@hackergin
Copy link
Contributor Author

@flinkbot run azure

@hackergin
Copy link
Contributor Author

@flinkbot run azure

@hackergin
Copy link
Contributor Author

@hackergin
Copy link
Contributor Author

@flinkbot run azure

Copy link
Contributor

@lsyldliu lsyldliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lsyldliu
Copy link
Contributor

The related tests have been verified manually, and merged it.

@lsyldliu lsyldliu merged commit 4a2535c into apache:master Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants