Skip to content

databricks bundle generate job --existing-job-id flattens notebook directory structure #4503

@josepmartinez-vista

Description

@josepmartinez-vista

Describe the issue

When using databricks bundle generate job --existing-job-id, the DAB flattens the directory structure of imported notebooks. Instead of mirroring the workspace folder hierarchy within the local src directory, it places all notebooks directly into the root of src.

This leads to a discrepancy between the original workspace organization and the generated local project structure. The notebook_path references within the generated job YAML points also to the flattened path rather than the organized subfolders.

Configuration

The issue occurs when a Databricks Job targets a notebook located within a nested folder in the Databricks Workspace (e.g., /Workspace/Users/example@company.com/my_folder/my_notebook).

Steps to reproduce the behavior

  1. Ensure you have a job in the Databricks workspace that executes a notebook located in a subfolder (not in the root).
  2. Authenticate the CLI: databricks auth login --profile DEFAULT
  3. Navigate to your bundle directory: cd my_bundle_project
  4. Run the generation command:
    databricks bundle generate job --existing-job-id <my_job_id> --debug

Expected Behavior

The CLI should recreate the workspace folder structure locally to maintain organization.

If the notebook is at .../my_folder/my_notebook in Databricks, it should be saved locally as src/my_folder/my_notebook.py.

The generated job YAML should also reference this relative path correctly.

Actual Behavior

The CLI ignores the parent folders and saves the notebook directly to the src root.

In the log shared above, we can see that it saves it:

File successfully saved to src/my_notebook.py

OS and CLI version

  • OS: Linux (Docker python:3.11-alpine)
  • Databricks CLI Version: v0.288.0

Is this a regression?

Unknown, but consistently reproducible since at least v0.283.0.

Debug Logs

root@docker-desktop:/workspaces/my_repo/databricks# databricks bundle generate job --existing-job-id my_job_id --debug
16:35:49 Info: start pid=31739 version=0.288.0 args="databricks, bundle, generate, job, --existing-job-id, my_job_id, --debug"
16:35:49 Debug: Found bundle root at /workspaces/my_repo/databricks (file /workspaces/my_repo/databricks/databricks.yml) pid=31739
16:35:49 Info: Phase: load pid=31739
16:35:49 Debug: Apply pid=31739 mutator=EntryPoint
16:35:49 Debug: Apply pid=31739 mutator=scripts.preinit
16:35:49 Debug: No script defined for preinit, skipping pid=31739 mutator=scripts.preinit
16:35:49 Debug: Apply pid=31739 mutator=ProcessRootIncludes
16:35:49 Debug: Apply pid=31739 mutator=ProcessRootIncludes mutator=ProcessInclude(resources/my_job_a.job.yml)
16:35:49 Debug: Apply pid=31739 mutator=ProcessRootIncludes mutator=ProcessInclude(resources/my_job_b.job.yml)
16:35:49 Debug: Apply pid=31739 mutator=VerifyCliVersion
16:35:49 Debug: Apply pid=31739 mutator=EnvironmentsToTargets
16:35:49 Debug: Apply pid=31739 mutator=ComputeIdToClusterId
16:35:49 Debug: Apply pid=31739 mutator=InitializeVariables
16:35:49 Debug: Apply pid=31739 mutator=DefineDefaultTarget(default)
16:35:49 Debug: Apply pid=31739 mutator=validate:unique_resource_keys
16:35:49 Debug: Apply pid=31739 mutator=SelectDefaultTarget
16:35:49 Debug: Apply pid=31739 mutator=SelectDefaultTarget mutator=SelectTarget(dev)
16:35:49 Debug: Loading profile DEFAULT because of host match pid=31739
16:35:49 Debug: GET /api/2.2/jobs/get?job_id=my_job_id
< HTTP/2.0 200 OK
< {
<   "created_time": 1676545149756,
<   "creator_user_name": "myself@company.com",
<   "job_id": my_job_id,
<   "run_as_owner": false,
<   "run_as_user_name": "databricks_service_principal",
<   "settings": {
<     "email_notifications": {},
<     "format": "MULTI_TASK",
<     "max_concurrent_runs": 10,
<     "name": "dev.my_repo.my_job",
<     "run_as": {
<       "service_principal_name": "databricks_service_principal"
<     },
<     "tasks": [
<       {
[...]
<       }
<     ],
<     "webhook_notifications": {}
<   }
< } pid=31739 sdk=true
16:35:49 Debug: GET /api/2.0/workspace/get-status?path=/my_data_product/dev/my_folder/my_notebook&return_export_info=true
< HTTP/2.0 200 OK
< {
<   "created_at": 1769625493524,
<   "language": "PYTHON",
<   "modified_at": 1769625493524,
<   "object_id": 3741859288336300,
<   "object_type": "NOTEBOOK",
<   "path": "/personalized-promotion-53f8ad91-70d5-44ff-a160-988495724a56/dev/my_folder/my_notebook",
<   "repos_export_format": "SOURCE",
<   "resource_id": "3741859288336300"
< } pid=31739 sdk=true
16:35:49 Debug: GET /api/2.0/workspace/export?direct_download=true&format=SOURCE&path=/my_data_product/dev/my_folder/my_notebook
< HTTP/2.0 200 OK
< <Streaming response> pid=31739 sdk=true
File successfully saved to src/my_notebook.py
Job configuration successfully saved to resources/dev_my_data_product_my_notebook.job.yml
16:35:49 Info: completed execution pid=31739 exit_code=0
16:35:49 Debug: no telemetry logs to upload pid=31739

Metadata

Metadata

Assignees

No one assigned

    Labels

    DABsDABs related issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions