ebm.migrations module
- drop_unnamed(df: DataFrame) DataFrame[source]
Remove columns starting with ‘Unnamed:’ from a DataFrame, and log a warning if any are not sequential.
Parameters
- dfpandas.DataFrame
The input DataFrame from which to drop ‘Unnamed:’ columns.
Returns
- pandas.DataFrame
A copy of the input DataFrame with ‘Unnamed:’ columns removed.
Notes
A column is considered sequential if the difference between consecutive values is constant. If any ‘Unnamed:’ columns are found to be non-sequential, a warning is logged.
Examples
>>> import pandas as pd >>> df = pd.DataFrame({ ... 'Unnamed: 0': [0, 1, 2], ... 'Unnamed: 1': [5, 7, 9], ... 'data': [10, 20, 30] ... }) >>> drop_unnamed(df) data 0 10 1 20 2 30
- rename_columns(df: ~pandas.core.frame.DataFrame, translation: dict[slice(<class 'str'>, <class 'str'>, None)]) DataFrame[source]
Rename columns in a DataFrame using a translation dictionary.
Parameters
- dfpandas.DataFrame
The input DataFrame whose columns are to be renamed.
- translationdict of str
A dictionary mapping existing column names (keys) to new column names (values).
Returns
- pandas.DataFrame
A new DataFrame with columns renamed according to the translation dictionary. If the translation dictionary is empty, the original DataFrame is returned unchanged.
Examples
>>> import pandas as pd >>> data = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) >>> rename_columns(data, {'A': 'Alpha', 'B': 'Beta'}) Alpha Beta 0 1 3 1 2 4
- drop_columns(df: DataFrame, columns: list[str]) DataFrame[source]
Drop specified columns from a DataFrame with logging and validation.
Parameters
- dfpandas.DataFrame
The input DataFrame from which columns will be dropped.
- columnslist of str
A list of column names to drop from the DataFrame.
Returns
- pandas.DataFrame
A new DataFrame with the specified columns removed. If none of the columns are found, the original DataFrame is returned unchanged.
Logs
Logs a debug message if no columns are provided.
Logs a warning if any specified columns are not found in the DataFrame.
Logs a debug message listing the columns that will be dropped.
Examples
>>> import pandas as pd >>> df = pd.DataFrame({'A': [1], 'B': [2], 'C': [3]}) >>> drop_columns(df, ['B', 'D']) WARNING: Column ['D'] missing from dataframe A C 0 1 3
- translate_heating_system_efficiencies(df: DataFrame) DataFrame[source]
Translate and drop columns in heating_system_efficiencies.csv
Translate column names from norwegian to english
Drop redundant columns
- migrate_input_directory(directory: Path, migration: Callable) None[source]
Migrates heating system efficiency data in a given directory using a specified transformation function.
This function renames legacy input files if necessary, validates the presence of the expected input file, reads the data, applies a migration/transformation function, and writes the result back to the same file.
Parameters
- directorypathlib.Path
The path to the directory containing the input CSV file.
- migrationCallable[[pd.DataFrame], pd.DataFrame]
A function that takes a pandas DataFrame and returns a transformed DataFrame.
Raises
- FileNotFoundError
If the expected input file does not exist or is not a file.
- Exception
If reading, transforming, or writing the file fails.
Notes
If a legacy file named ‘heating_systems_efficiencies.csv’ exists and the target file ‘heating_system_efficiencies.csv’ does not, the legacy file will be renamed.
The transformation is applied in-place and overwrites the original file.
Examples
>>> from pathlib import Path >>> migrate_input_directory(Path("data"), translate_heating_system_efficiencies)