ebm.migrations module

drop_unnamed(df: DataFrame) DataFrame[source]

Remove columns starting with ‘Unnamed:’ from a DataFrame, and log a warning if any are not sequential.

Parameters

dfpandas.DataFrame

The input DataFrame from which to drop ‘Unnamed:’ columns.

Returns

pandas.DataFrame

A copy of the input DataFrame with ‘Unnamed:’ columns removed.

Notes

A column is considered sequential if the difference between consecutive values is constant. If any ‘Unnamed:’ columns are found to be non-sequential, a warning is logged.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({
...     'Unnamed: 0': [0, 1, 2],
...     'Unnamed: 1': [5, 7, 9],
...     'data': [10, 20, 30]
... })
>>> drop_unnamed(df)
   data
0    10
1    20
2    30
rename_columns(df: ~pandas.core.frame.DataFrame, translation: dict[slice(<class 'str'>, <class 'str'>, None)]) DataFrame[source]

Rename columns in a DataFrame using a translation dictionary.

Parameters

dfpandas.DataFrame

The input DataFrame whose columns are to be renamed.

translationdict of str

A dictionary mapping existing column names (keys) to new column names (values).

Returns

pandas.DataFrame

A new DataFrame with columns renamed according to the translation dictionary. If the translation dictionary is empty, the original DataFrame is returned unchanged.

Examples

>>> import pandas as pd
>>> data = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
>>> rename_columns(data, {'A': 'Alpha', 'B': 'Beta'})
   Alpha  Beta
0      1     3
1      2     4
drop_columns(df: DataFrame, columns: list[str]) DataFrame[source]

Drop specified columns from a DataFrame with logging and validation.

Parameters

dfpandas.DataFrame

The input DataFrame from which columns will be dropped.

columnslist of str

A list of column names to drop from the DataFrame.

Returns

pandas.DataFrame

A new DataFrame with the specified columns removed. If none of the columns are found, the original DataFrame is returned unchanged.

Logs

  • Logs a debug message if no columns are provided.

  • Logs a warning if any specified columns are not found in the DataFrame.

  • Logs a debug message listing the columns that will be dropped.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({'A': [1], 'B': [2], 'C': [3]})
>>> drop_columns(df, ['B', 'D'])
WARNING: Column ['D'] missing from dataframe
   A  C
0  1  3
translate_heating_system_efficiencies(df: DataFrame) DataFrame[source]

Translate and drop columns in heating_system_efficiencies.csv

  • Translate column names from norwegian to english

  • Drop redundant columns

migrate_input_directory(directory: Path, migration: Callable) None[source]

Migrates heating system efficiency data in a given directory using a specified transformation function.

This function renames legacy input files if necessary, validates the presence of the expected input file, reads the data, applies a migration/transformation function, and writes the result back to the same file.

Parameters

directorypathlib.Path

The path to the directory containing the input CSV file.

migrationCallable[[pd.DataFrame], pd.DataFrame]

A function that takes a pandas DataFrame and returns a transformed DataFrame.

Raises

FileNotFoundError

If the expected input file does not exist or is not a file.

Exception

If reading, transforming, or writing the file fails.

Notes

  • If a legacy file named ‘heating_systems_efficiencies.csv’ exists and the target file ‘heating_system_efficiencies.csv’ does not, the legacy file will be renamed.

  • The transformation is applied in-place and overwrites the original file.

Examples

>>> from pathlib import Path
>>> migrate_input_directory(Path("data"), translate_heating_system_efficiencies)