stouputils.data_science.mlflow_utils module#

This module contains utility functions for working with MLflow.

This module contains functions for:

  • Getting the artifact path from the current mlflow run

  • Getting the weights path

  • Getting the runs by experiment name

  • Logging the history of the model to the current mlflow run

  • Starting a new mlflow run

get_artifact_path(from_string: str = '', os_name: str = 'posix') str[source]#

Get the artifact path from the current mlflow run (without the file:// prefix).

Handles the different path formats for Windows and Unix-based systems.

Parameters:
  • from_string (str) – Path to the artifact (optional, defaults to the current mlflow run)

  • os_name (str) – OS name (optional, defaults to os.name)

Returns:

The artifact path

Return type:

str

get_weights_path(from_string: str = '', weights_name: str = 'best_model.keras', os_name: str = 'posix') str[source]#

Get the weights path from the current mlflow run.

Parameters:
  • from_string (str) – Path to the artifact (optional, defaults to the current mlflow run)

  • weights_name (str) – Name of the weights file (optional, defaults to “best_model.keras”)

  • os_name (str) – OS name (optional, defaults to os.name)

Returns:

The weights path

Return type:

str

Examples

>>> get_weights_path(from_string="file:///path/to/artifact", weights_name="best_model.keras", os_name="posix")
'/path/to/artifact/best_model.keras'
>>> get_weights_path(from_string="file:///C:/path/to/artifact", weights_name="best_model.keras", os_name="nt")
'C:/path/to/artifact/best_model.keras'
get_runs_by_experiment_name(experiment_name: str, filter_string: str = '', set_experiment: bool = False) list[Run][source]#

Get the runs by experiment name.

Parameters:
  • experiment_name (str) – Name of the experiment

  • filter_string (str) – Filter string to apply to the runs

  • set_experiment (bool) – Whether to set the experiment

Returns:

List of runs

Return type:

list[Run]

get_runs_by_model_name(experiment_name: str, model_name: str, set_experiment: bool = False) list[Run][source]#

Get the runs by model name.

Parameters:
  • experiment_name (str) – Name of the experiment

  • model_name (str) – Name of the model

  • set_experiment (bool) – Whether to set the experiment

Returns:

List of runs

Return type:

list[Run]

log_history(history: dict[str, list[Any]], prefix: str = 'history', **kwargs: Any) None[source]#

Log the history of the model to the current mlflow run.

Parameters:
  • history (dict[str, list[Any]]) – History of the model (usually from a History object like from a Keras model: history.history)

  • **kwargs (Any) – Additional arguments to pass to mlflow.log_metric

start_run(mlflow_uri: str, experiment_name: str, model_name: str, override_run_name: str = '', **kwargs: Any) str[source]#

Start a new mlflow run.

Parameters:
  • mlflow_uri (str) – MLflow URI

  • experiment_name (str) – Name of the experiment

  • model_name (str) – Name of the model

  • override_run_name (str) – Override the run name (if empty, it will be set automatically)

  • **kwargs (Any) – Additional arguments to pass to mlflow.start_run

Returns:

Name of the run (suffixed with the version number)

Return type:

str

get_best_run_by_metric(experiment_name: str, metric_name: str, model_name: str = '', ascending: bool = False, has_saved_model: bool = True) Run | None[source]#

Get the best run by a specific metric.

Parameters:
  • experiment_name (str) – Name of the experiment

  • metric_name (str) – Name of the metric to sort by

  • model_name (str) – Name of the model (optional, if empty, all models are considered)

  • ascending (bool) – Whether to sort in ascending order (default: False, i.e. maximum metric value is best)

  • has_saved_model (bool) – Whether the model has been saved (default: True)

Returns:

The best run or None if no runs are found

Return type:

Run | None

load_model(run_id: str, model_type: Literal['keras', 'pytorch'] = 'keras') Any[source]#

Load a model from MLflow.

Parameters:
  • run_id (str) – ID of the run to load the model from

  • model_type (Literal["keras", "pytorch"]) – Type of model to load (default: “keras”)

Returns:

The loaded model

Return type:

Any