dltflow.cli.initialize#

This module contains the init command for the dltflow cli. This command is used to help initialize a dltflow project. At a minimum, it will create a dltflow config file in the current directory.

The code, configuration, and workflow directory names can be customized by the user. The default values are: - code: my_project - config: conf - workflows: workflows

When these directories are created during initialization, a .gitkeep file is created in each directory. This is to ensure that the directories are included in the git repository. Also, a setup.py and pyproject.toml file are created in the root directory. These files are used to help package the project as a python package.

The pyproject.toml file is setup to use the bumpver package to help manage versioning.

Module Contents#

Functions#

_get_path()

Get the path to the databricks config file.

_parse_databricks_cfg()

Parse the databricks config file.

_environments_from_databricks_profile(username, ...)

Get the environments from the databricks profile.

init(profile[, project_name, config_path, ...])

Initialize a new dltflow project.

Attributes#

CONFIG_FILE_ENV_VAR

_home

dltflow.cli.initialize.CONFIG_FILE_ENV_VAR = 'DATABRICKS_CONFIG_FILE'#
dltflow.cli.initialize._home#
dltflow.cli.initialize._get_path()#

Get the path to the databricks config file.

dltflow.cli.initialize._parse_databricks_cfg()#

Parse the databricks config file.

dltflow.cli.initialize._environments_from_databricks_profile(username, project_name, shared)#

Get the environments from the databricks profile.

dltflow.cli.initialize.init(profile: str, project_name: str = 'my_project', config_path: str = 'conf', workflows_path: str = 'workflows', build_template: bool = False, overwrite: bool = True, dbfs_location: str = None, shared: bool = True)#

Initialize a new dltflow project.

This cli command is used to help initialize a dltflow project. At a minimum, it will create a dltflow config file in the current directory.

The code, configuration, and workflow directory names can be customized by the user. The default values are:

  • code: my_project

  • config: conf

  • workflows: workflows

When these directories are created during initialization, a .gitkeep file is created in each directory. This is to ensure that the directories are included in the git repository. Also, a setup.py and pyproject.toml file are created in the root directory. These files are used to help package the project as a python package.

The pyproject.toml file is setup to use the bumpver package to help manage versioning.

If the user opts in to include directories, dltflow will create the directories with the following structure:

```text git-root/

my_project/ # code goes here. conf/ # configuration to drive your pipelines. workflows/ # json or yml definitions for workflows in databricks. dltflow.yml # dltflow config file. setup.py # setup file for python packages. pyproject.toml # pyproject file for python packages.

```

Parameters:
  • profile (str) – Databricks profile to use

  • project_name (str) –

  • config_path (str) –

  • workflows_path (str) –

  • build_template (bool) –

  • overwrite (bool) – True