Batch

class surfalize.batch.Batch(files, additional_data=None)

Bases: object

The batch class is used to perform operations and calculate quantitative surface parameters for a batch of topography files. The implementation allows to register operations and parameters for lazy calculation by invoking methods defined by the Surface class. Every operation method that is defined by Surface can be invoked on the batch class, which then registers the method and the passed arguments for later execution. Similarly, every roughness parameter can be called on the Batch class. The methods of the Surface class do not appear in the this class’s documentation. However, their docstring can be accessed through help(Batch.method) or Batch.method? in Jupyter.

All methods can be chained, since they implement the builder design pattern, where every method returns the object itself. For exmaple, the operations levelling, filtering and aligning as well as the calculation of roughness parameters Sa, Sq and Sz can be registered for later calculation in the following manner:

>>> batch = Batch(filespaths)
>>> batch.level().filter(filter_type='lowpass', cutoff=10).align().Sa().Sq().Sz()

Or on separate lines:

>>> batch.level().filter(filter_type='lowpass', cutoff=10).align()
>>> batch.Sa()
>>> batch.Sq()
>>> batch.Sz()

Upon invoking the execute method, all registered operations and parameters are performed.

>>> batch.execute()

If the caller wants to supply additional parameters for each file, such as fabrication data, they can specify the path to an Excel file containing that data using the ‘additional_data’ keyword argument. The excel file should contain a column ‘filename’ of the format ‘name.extension’. Otherwise, an arbitrary number of additional columns can be supplied.

Parameters:
fileslist[pathlib.Path | str | FileInput]

List of filepaths or FileInput objects. For file-like objects, a FileInput object must be constructed that holds a name and the file-like object.

additional_datastr, pathlib.Path

Path to an Excel file containing additional parameters, such as input parameters. Excel file must contain a column ‘file’ with the filename including the file extension. Otherwise, an arbitrary number of additional columns can be supplied.

Examples

>>> from pathlib import Path
>>> files = Path().cwd().glob('*.vk4')
>>> batch = Batch(filespaths, addition_data='additional_data.xlsx')
>>> batch.level().filter('lowpass', 10).Sa().Sq().Sdr()
add_dir(dir_path, file_extensions=None)

Add all files in a directory to Batch after initialization. If

Parameters:
dir_pathstr | pathlib.Path

Path to the directory containing the files

file_extensionsstr | list-like, optional

File extension or list of file extensions to be searched for, eg. ‘.vk4’, ‘.plu’. The file extension must be prefixed by a dot. If no file extensions are specified, all files are added to the batch that have a file extension that corresponds to a supported file format.

Returns:
self
add_files(files)

Add files to Batch after initialization.

Parameters:
files: str | pathlib.Path | FileInput | list-like[str | pathlib.Path | FileInput]

Files to add to the Batch.

Returns:
self
custom_operation(func)

Add a custom parameter operation in the form of a simple function to the batch calculation. The function must take the surface object as its only parameter and modify the surface object in place, returning None.

Parameters:
func: callable

Function to be executed. Must take a surface object as the only argument and return None.

Returns:
self

Examples

An examplary function might look like this:

>>> def remove_specific_outliers(surface):
...    outlier_value = 1001
...    surface.data[surface.data == outlier_value] = np.nan
custom_parameter(func)

Add a custom parameter calculation in the form of a simple function to the batch calculation. The function must take the surface object as its only parameter and return a dictionary, where the keys are the parameter names and parameter values. If the parameter consists of only one value, the dictionary should have only one entry. The keys of the returned dictionary will be used as the column names in the resulting DataFrame.

Parameters:
func: callable

Function to be executed. Must take a surface object as the only argument and return a dictionary.

Returns:
self

Examples

An examplary function might look like this:

>>> def median(surface):
...    median = np.median(surface.data)
...    return {'height_median': median}

Or with multiple parameters:

>>> def mean_std(surface):
...    mean = np.mean(surface.data)
...    std = np.std(surface.data)
...    return {'mean_value': mean, 'std_value': std}
execute(multiprocessing=True, ignore_errors=True, saveto=None, on_file_complete=None, preserve_chaining_order=True)

Executes the Batch processing and returns the obtained data as a pandas DataFrame. The dataframe can be saved as an Excel file.

Parameters:
multiprocessingbool, default True

If True, dispatches the task among CPU cores, otherwise sequentially computes the tasks.

ignore_errorsbool, default True

Errors that are raised during the calculation of parameters are ignored if True. Missing parameter values are filled with nan values. If False, the batch processing is interrupted when an error is raised.

savetostr | pathlib.Path, default None

Path to an Excel file where the data is saved to. If the Excel file does already exist, it will be overwritten.

on_file_complete: Callable

Hook for a Callable that is executed for every surface that has finished processing. The Callable must take a results parameter that is passed a dictionary of the results of the surface calculation. The dictionary will at least hold a key ‘file’ with the respective filename.

preserve_chaining_orderbool

Whether to preserve the order the different operations and parameter calculations are called on the batch obeject. If True, operations and parameters can be applied in arbitrary order (e.g batch.operation().parameter().operation()). If False, all operations will be performed before the parameter calculations, irrespective of the order they were called on the batch. The order within the operations and parameters themselves will be preserved nonetheless.

Returns:
pd.DataFrame

Examples

>>> pattern = ''
>>> batch.execute(saveto='C:/users/example/documents/data.xlsx')
extract_from_filename(pattern)

Extracts parameters that are encoded in filenames into their own columns. For instance a filename might encode different fabrication parameters of the measured surface:

filename: ‘Sample1_P50_N12_F1.23_FREP10kHz.vk4’

The pattern can encode parameters by specifying their name, datatype, prefix (optional) and suffix (optional). The name is used to label the resulting column in the dataframe. The patterns have the general syntax:

<name|datatype|prefix|suffix>

Both prefix and suffix can be omitted. If only a suffix is defined, the prefix must be indicated as an empty string. A pattern to match the above filename could look like this:

pattern: ‘<power|float|P>_<pulses|int|N>_<fluence|float|F>_<frequency|float|FREP|kHz>’

This pattern is parsed and constructs a regex that searches the filename for the defined parameters. The parameters are extracted and converted to their respective datatype. The values are added as new columns to the dataframe.

Parameters:
filename_patternstr | None

Pattern with which to extract parameters from filename.

Returns:
self
classmethod from_dir(dir_path, file_extensions=None, additional_data=None)

Alternative constructor for Batch class that takes a directory path as well as a string or list of strings of file extensions as positional arguments.

Parameters:
dir_pathstr | pathlib.Path

Path to the directory containing the files

file_extensionsstr | list-like, optional

File extension or list of file extensions to be searched for, eg. ‘.vk4’, ‘.plu’. The file extension must be prefixed by a dot. If no file extensions are specified, all files are added to the batch that have a file extension that corresponds to a supported file format.

additional_datastr, pathlib.Path, optional

Path to an Excel file containing additional parameters, such as input parameters. Excel file must contain a column ‘file’ with the filename including the file extension. Otherwise, an arbitrary number of additional columns can be supplied.

Returns:
Batch

Examples

>>> directory = 'C:\topography_files'
>>> batch = Batch.from_dir(directory)
roughness_parameters(parameters=None)

Registers multiple roughness parameters for later execution. Corresponds to Surface.roughness_parameters. If parameters is None, all available roughness and periodic parameters are registered. Otherwise, a list of parameters can be passed as argument, which contains the parameter method identifier, which must be equal to the method name of the parameter in the Surface class. If a parameter is given as a string, it is registered with its default keyword argument values. In the case that the user wants to specify a parameter with keyword arguments, there are two options. Either register that parameter explicitly by calling Batch.parameter(args, kwargs) or by passing a Parameter class to this method instead of a string.

Parameters:
parameterslist[str | surfalize.batch._Parameter]

List of parameters to be registered, either as a string identifier or as a Parameter class.

Returns:
self

Examples

Here, only the specified parameters will be calculated.

>>> batch = Batch(filepaths)
>>> batch.roughness_parameters(['Sa', 'Sq', 'Sz', 'Sdr', 'Vmc'])

In this case, all available parameters will be calculated.

>>> batch = Batch(filepaths)
>>> batch.roughness_parameters()

Here, we define a custom Parameter class that allows for the specification of keyword arguments. Note that we are passing the Parameter to the method instead of the string version.

>>> from surfalize.batch import _Parameter
>>> Vmc = _Parameter('Vmc', kwargs=dict(p=5, q=95))
>>> batch.roughness_parameters(['Sa', 'Sq', 'Sz', 'Sdr', Vmc])
class surfalize.batch.BatchResult(df)

Bases: object

Class that wraps the DataFrame returned by Batch.execute. Provides a method to get the underlying DateFrame object and a method to apply filename extraction on the DataFrame.

Parameters:
dfpd.DateFrame

Pandas DataFrame object.

extract_from_filename(pattern)

Extracts parameters that are encoded in filenames into their own columns. For instance a filename might encode different fabrication parameters of the measured surface:

filename: ‘Sample1_P50_N12_F1.23_FREP10kHz.vk4’

The pattern can encode parameters by specifying their name, datatype, prefix (optional) and suffix (optional). The name is used to label the resulting column in the dataframe. The patterns have the general syntax:

<name|datatype|prefix|suffix>

Both prefix and suffix can be omitted. If only a suffix is defined, the prefix must be indicated as an empty string. A pattern to match the above filename could look like this:

pattern: ‘<power|float|P>_<pulses|int|N>_<fluence|float|F>_<frequency|float|FREP|kHz>’

This pattern is parsed and constructs a regex that searches the filename for the defined parameters. The parameters are extracted and converted to their respective datatype. The values are added as new columns to the dataframe.

Parameters:
patternstr | None

Pattern with which to extract parameters from filename.

Returns:
None
get_dataframe()

Returns the underlying DataFrame object

class surfalize.batch.FileInput(name: str, data: IOBase, format: str | None = None)

Bases: object

Class that wraps a file-like object, adding a name and an optional format specifier for use in batch processing.

data: IOBase
format: str | None = None
name: str
class surfalize.batch.FilenameParser(template_str)

Bases: object

Parser class that parses filenames according to a template string.

The template can specify parameters by specifying their name, datatype, prefix (optional) and suffix (optional). The name is used to label the resulting column in the dataframe. The patterns have the general syntax:

<name|datatype|prefix|suffix>

Both prefix and suffix can be omitted. If only a suffix is defined, the prefix must be indicated as an empty string. A pattern to match a filename could look like this:

filename: ‘P90_N10_F1.21_FREP10kHz.vk4’ pattern: ‘<power|float|P>_<pulses|int|N>_<fluence|float|F>_<frequency|float|FREP|kHz>’

TYPES = {'float': '\\d+(?:(?:\\.|,)\\d+)?', 'int': '\\d+', 'str': '.+'}
apply_on(df, column, insert_after_column=True)

Extracts the parameters from a column of a dataframe and adds them to the dataframe. Each parameter in the filename will be represented by a new column.

Parameters:
dfpd.DataFrame

DataFrame object that contains a column with filenames

columnstr

Name of the column which contains the filenames

insert_after_columnbool, default True

If True, inserts the new columns directly after the filename column, if False, appends them at the end of the dataframe

Returns:
pd.Dataframe

Original dataframe with added columns

construct_regex(tokens, separators)

Construct a regex from the tokens and separators to match a filename.

Parameters:
tokenslist[_Token]

List of tokens obtained from parsing the template string.

separatorslist[str]

List of string obtained from parsing the template string.

Returns:
regexstr
extract_from(df, column)

Extracts the parameters from a column of a dataframe into a new dataframe, where each column represents one parameter.

Parameters:
dfpd.DataFrame

DataFrame object that contains a column with filenames

columnstr

Name of the column which contains the filenames

Returns
——-
pd.DataFrame
parse_template()

Parses the template string into separate tokens and constructs a regex to match the filename from these tokens.

Returns:
tokens, separatorslist[_Token], list[str]

List of tokens and list of string separators