ebm.migrations module

drop_unnamed(df: DataFrame) → DataFrame[source]

Remove columns starting with ‘Unnamed:’ from a DataFrame, and log a warning if any are not sequential.

Parameters

dfpandas.DataFrame: The input DataFrame from which to drop ‘Unnamed:’ columns.

Returns

pandas.DataFrame: A copy of the input DataFrame with ‘Unnamed:’ columns removed.

Notes

A column is considered sequential if the difference between consecutive values is constant. If any ‘Unnamed:’ columns are found to be non-sequential, a warning is logged.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({
...     'Unnamed: 0': [0, 1, 2],
...     'Unnamed: 1': [5, 7, 9],
...     'data': [10, 20, 30]
... })
>>> drop_unnamed(df)
   data
0    10
1    20
2    30

rename_columns(df: ~pandas.core.frame.DataFrame, translation: dict[slice(<class 'str'>, <class 'str'>, None)]) → DataFrame[source]

Rename columns in a DataFrame using a translation dictionary.

Parameters

dfpandas.DataFrame: The input DataFrame whose columns are to be renamed.
translationdict of str: A dictionary mapping existing column names (keys) to new column names (values).

Returns

pandas.DataFrame: A new DataFrame with columns renamed according to the translation dictionary. If the translation dictionary is empty, the original DataFrame is returned unchanged.

Examples

>>> import pandas as pd
>>> data = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
>>> rename_columns(data, {'A': 'Alpha', 'B': 'Beta'})
   Alpha  Beta
0      1     3
1      2     4

drop_columns(df: DataFrame, columns: list[str]) → DataFrame[source]

Drop specified columns from a DataFrame with logging and validation.

Parameters

dfpandas.DataFrame: The input DataFrame from which columns will be dropped.
columnslist of str: A list of column names to drop from the DataFrame.

Returns

pandas.DataFrame: A new DataFrame with the specified columns removed. If none of the columns are found, the original DataFrame is returned unchanged.

Logs

Logs a debug message if no columns are provided.
Logs a warning if any specified columns are not found in the DataFrame.
Logs a debug message listing the columns that will be dropped.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({'A': [1], 'B': [2], 'C': [3]})
>>> drop_columns(df, ['B', 'D'])
WARNING: Column ['D'] missing from dataframe
   A  C
0  1  3

translate_heating_system_efficiencies(df: DataFrame) → DataFrame[source]

Translate and drop columns in heating_system_efficiencies.csv

Translate column names from norwegian to english

Drop redundant columns

migrate_input_directory(directory: Path, migration: Callable) → None[source]

Migrates heating system efficiency data in a given directory using a specified transformation function.

This function renames legacy input files if necessary, validates the presence of the expected input file, reads the data, applies a migration/transformation function, and writes the result back to the same file.

Parameters

directorypathlib.Path: The path to the directory containing the input CSV file.
migrationCallable[[pd.DataFrame], pd.DataFrame]: A function that takes a pandas DataFrame and returns a transformed DataFrame.

Raises

FileNotFoundError: If the expected input file does not exist or is not a file.
Exception: If reading, transforming, or writing the file fails.

Notes

If a legacy file named ‘heating_systems_efficiencies.csv’ exists and the target file ‘heating_system_efficiencies.csv’ does not, the legacy file will be renamed.
The transformation is applied in-place and overwrites the original file.

Examples

>>> from pathlib import Path
>>> migrate_input_directory(Path("data"), translate_heating_system_efficiencies)