-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] First-class catalog and schema setting #1979
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for having first-class support for catalog/schema. I don't have a strong opinion if it should be under presets
or a global setting. I think both could work.
Test Details: go/deco-tests/12253819228 |
2ea7b0a
to
8eb96cc
Compare
…esets-catalog-schema-as-params' by 67 commits. # (use "git push" to publish your local commits) # # Changes to be committed: # modified: dbt-sql/databricks_template_schema.json # modified: default-python/databricks_template_schema.json # modified: default-python/template/{{.project_name}}/databricks.yml.tmpl # modified: default-python/template/{{.project_name}}/resources/{{.project_name}}.job.yml.tmpl # modified: default-python/template/{{.project_name}}/resources/{{.project_name}}.pipeline.yml.tmpl # modified: default-python/template/{{.project_name}}/scratch/exploration.ipynb.tmpl # modified: default-python/template/{{.project_name}}/src/notebook.ipynb.tmpl # modified: default-python/template/{{.project_name}}/src/{{.project_name}}/main.py.tmpl # modified: default-sql/databricks_template_schema.json # # Untracked files: # ../../../.cursorrules # ../../../bundle/config/resources/:tmp:tmp.py # ../../../delme.py # ../../../pr-cache-current-user-me # ../../../pr-cleanup-warnings.md # ../../../pr-contrib-templates.md # ../../../pr-cp-diag-ids-for-all.md # ../../../pr-cp-serverless-templates.md # ../../../pr-presets-catalog-schema-using-params.md # ../../../pr-update-sync-command-help.md # Revert template changes for now
5efeaa6
to
6d5cb1d
Compare
If integration tests don't run automatically, an authorized user can run them manually by following the instructions below: Trigger: Inputs:
Checks will be approved automatically on success. |
@@ -22,6 +22,8 @@ type applyPresets struct{} | |||
|
|||
// Apply all presets, e.g. the prefix presets that | |||
// adds a prefix to all names of all resources. | |||
// | |||
// Note that the catalog/schema presets are applied in ApplyPresetsCatalogSchema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apply_presets
is becoming a bit too large. For the new addition, I moved everything to a separate file. Lots of boring but easy to read/test code there.
As a followup we should look into a further refactoring: either factoring the logic by preset
feature or by resource. For some of the overarching tests it's certainly been better to do it by feature, but we could end up with some combination for the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend splitting out the GetLocalPath
changes into a separate PR. This change should be purely additive and not refactor an unrelated piece of code.
As for the core of the change, I'm not yet comfortable with the magic of this change. I can imagine it being a positive in 90% of cases, but worried about the long tail where it isn't. E.g. where setting these fields to non-empty values has unintended side effects on the backend.
if pl.Schema == "" && pl.Target == "" { | ||
// As of 2024-12, the Schema field isn't broadly supported yet in the pipelines API. | ||
// Until it is, we set the Target field. | ||
pl.Target = p.Schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be a breaking change when swapped out?
if p.Catalog == "" && p.Schema == "" { | ||
return diags | ||
} | ||
if (p.Schema == "" && p.Catalog != "") || (p.Catalog == "" && p.Schema != "") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify:
if (p.Schema == "" && p.Catalog != "") || (p.Catalog == "" && p.Schema != "") { | |
if !(p.Catalog != "" && p.Schema != "") { |
// Until it is, we set the Target field. | ||
pl.Target = p.Schema | ||
} | ||
if allSameCatalog && pl.Catalog == p.Catalog { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tricky because the code above changes pl.Catalog
. Recommendations based on existing config should really be done outside the loop that mutates it.
// Check for existing catalog/schema parameters | ||
hasCatalog := false | ||
hasSchema := false | ||
if job.Parameters != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Superfluous; nil slice is equivalent to empty slice.
|
||
if !fileIncludesPattern(ctx, localPath, expected) { | ||
diags = diags.Extend(diag.Diagnostics{{ | ||
Summary: "Use the 'catalog' and 'schema' parameters provided via 'presets.catalog' and 'presets.schema' using\n\n" + fix, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary can be "Add 'catalog' and 'schema' parameters to task %q"
With the multi-line details and fix suggestion in the Detail
section.
return true | ||
} | ||
|
||
matched, err := regexp.MatchString(expected, string(content)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Match
takes a []byte
if p == "" { | ||
return "", "", fmt.Errorf("path cannot be empty") | ||
} | ||
if path.IsAbs(p) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing comment from move:
// We assume absolute paths point to a location in the workspace
if err != nil { | ||
// Apparently this path is not a URL; this can happen for paths that | ||
// have non-URL characters like './profit-%.csv'. | ||
log.Warnf(ctx, "Failed to parse path as a URL '%s': %v", p, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks more like a debug level message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing return.
// If the file path has scheme, it's a full path and we don't need to transform it | ||
if url.Scheme != "" { | ||
if localPath == "" { | ||
// Skip absolute paths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you know that it is absolute?
recordedPaths[path[:i]] = struct{}{} | ||
path = path[i+1:] | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the intent of this, nor the unbounded loop.
Changes
This is the latest, experimental way to add first-class 'catalog' and 'schema' notion.
The basic idea is that databricks.yml can say
which will then configure the default schema for all resources in the bundle (pipelines, jobs, model serving endpoints, etc.)
A caveat exists for notebooks, which need use parameters to configure the catalog and schema. While the
catalog
andschema
parameter values are automatically passed to all job tasks, notebooks need to consume the parameter values. We check whether they do this, and otherwise show a recommendation:Note that the code above also helps for interactive notebook development scenarios: users can use the parameter widgets to set the catalog and schema they use during development.
Similarly, for Python and Wheel tasks, users must add some extra code to process a
catalog
andschema
parameter. For Python tasks we show a similar recommendation; for wheel tasks we can't directly check for this.Tests
Tests based on reflection, making sure we have coverage for all current/future resources:
catalog
,schema
,parameters
, etc. must have one test case or ignore rule