src.preprocessor module

Functionality for transforming model data into the format expected by PODs once it’s been download to local storage.

src.preprocessor.copy_as_alternate(old_v, data_mgr, **kwargs)[source]

Wrapper for replace() that creates a copy of an existing VarlistEntry old_v and sets appropriate attributes to designate it as an alternate variable.

src.preprocessor.edit_request_wrapper(wrapped_edit_request_func)[source]

Decorator implementing the most typical (so far) use case for PreprocessorFunctionBase.edit_request(), in which we look at each variable request in the varlist separately and, optionally, add a new alternate VarlistEntry based on that request.

This decorator wraps a function which either constructs and returns the desired new alternate VarlistEntry, or None if no alternates are to be added for the given variable request. It adds logic for updating the list of alternates for the pod’s varlist.

class src.preprocessor.PreprocessorFunctionBase(data_mgr, pod)[source]

Bases: abc.ABC

Abstract interface for implementing a specific preprocessing functionality. We prefer to put each set of operations in its own child class, rather than dumping everything into a general Preprocessor class, in order to keep the logic easier to follow.

It’s up to individual Preprocessor child classes to select which functions to use, and in what order to perform them.

edit_request(data_mgr, pod)[source]

Edit the data requested in the POD’s Varlist queue, based on the transformations the functionality can perform. If the function can transform data in format X to format Y and the POD requests X, this method should insert a backup/fallback request for Y.

abstract process(var, dataset)[source]

Apply functionality to the input dataset.

Parameters
  • varVarlistEntry instance describing POD’s data request: desired end result of preprocessing work.

  • datasetxarray.Dataset instance.

_abc_impl = <_abc_data object>
class src.preprocessor.CropDateRangeFunction(data_mgr, pod)[source]

Bases: src.preprocessor.PreprocessorFunctionBase

A PreprocessorFunctionBase which trims the time axis of the dataset to the user-requested analysis period.

static cast_to_cftime(dt, calendar)[source]

HACK to cast python datetime to cftime.datetime with given calendar.

process(var, ds)[source]

Parse quantities related to the calendar for time-dependent data. In particular, date_range was set from user input before we knew the model’s calendar. HACK here to cast those values into cftime.datetime objects so they can be compared with the model data’s time axis.

_abc_impl = <_abc_data object>
class src.preprocessor.PrecipRateToFluxFunction(data_mgr, pod)[source]

Bases: src.preprocessor.PreprocessorFunctionBase

Convert units on the dependent variable of var, as well as its (non-time) dimension coordinate axes, from what’s specified in the dataset attributes to what’s given in the VarlistEntry.

_std_name_tuples = [('precipitation_rate', 'precipitation_flux'), ('convective_precipitation_rate', 'convective_precipitation_flux'), ('large_scale_precipitation_rate', 'large_scale_precipitation_flux')]
_rate_d = {'convective_precipitation_rate': 'convective_precipitation_flux', 'large_scale_precipitation_rate': 'large_scale_precipitation_flux', 'precipitation_rate': 'precipitation_flux'}
_flux_d = {'convective_precipitation_flux': 'convective_precipitation_rate', 'large_scale_precipitation_flux': 'large_scale_precipitation_rate', 'precipitation_flux': 'precipitation_rate'}
edit_request(v, pod, data_mgr)[source]

Edit the POD’s Varlist prior to query. If v has a standard_name in the list above, insert an alternate varlist entry whose translation requests the complementary type of variable (ie, if given rate, add an entry for flux; if given flux, add an entry for rate.)

process(var, ds)[source]

Apply functionality to the input dataset.

Parameters
  • varVarlistEntry instance describing POD’s data request: desired end result of preprocessing work.

  • datasetxarray.Dataset instance.

_abc_impl = <_abc_data object>
class src.preprocessor.ConvertUnitsFunction(data_mgr, pod)[source]

Bases: src.preprocessor.PreprocessorFunctionBase

Convert units on the dependent variable of var, as well as its (non-time) dimension coordinate axes, from what’s specified in the dataset attributes to what’s given in the VarlistEntry.

process(var, ds)[source]

Convert units on the dependent variable and coordinates of var from what’s specified in the dataset attributes to what’s given in the VarlistEntry. Units attributes are updated on the translated VarlistEntry.

_abc_impl = <_abc_data object>
class src.preprocessor.RenameVariablesFunction(data_mgr, pod)[source]

Bases: src.preprocessor.PreprocessorFunctionBase

process(var, ds)[source]

Apply functionality to the input dataset.

Parameters
  • varVarlistEntry instance describing POD’s data request: desired end result of preprocessing work.

  • datasetxarray.Dataset instance.

_abc_impl = <_abc_data object>
class src.preprocessor.ExtractLevelFunction(data_mgr, pod)[source]

Bases: src.preprocessor.PreprocessorFunctionBase

Extract a single pressure level from a DataSet. Unit conversions of pressure are handled by cfunits, but paramateric vertical coordinates are not handled (interpolation is not implemented here.) If the exact level is not provided by the data, KeyError is raised.

edit_request(v, pod, data_mgr)[source]

Edit the POD’s Varlist prior to query. If given a VarlistEntry v which specifies a scalar Z coordinate, return a copy with that scalar_coordinate removed to be used as an alternate variable for v.

process(var, ds)[source]

Determine if level extraction is needed, and return appropriate slice of Dataset if it is.

_abc_impl = <_abc_data object>
class src.preprocessor.ApplyScaleAndOffsetFunction(data_mgr, pod)[source]

Bases: src.preprocessor.PreprocessorFunctionBase

If the variable has scale_factor and add_offset attributes set, apply the corresponding constant linear transformation to the variable’s values and unset these attributes. By default this function is not applied.

See http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#attribute-appendix.

process(var, ds)[source]

Apply functionality to the input dataset.

Parameters
  • varVarlistEntry instance describing POD’s data request: desired end result of preprocessing work.

  • datasetxarray.Dataset instance.

_abc_impl = <_abc_data object>
class src.preprocessor.MDTFPreprocessorBase(*args, **kwargs)[source]

Bases: object

Base class for preprocessing data after it’s been fetched, in order to put it into a format expected by PODs. The only functionality implemented here is parsing data axes and CF attributes; all other functionality is provided by PreprocessorFunctionBase functions.

_XarrayParserClass

alias of src.xr_parser.DefaultDatasetParser

property _functions

Determine which preprocessor functions are applicable to the current package run, defaulting to all of them.

Returns

tuple of classes (inheriting from PreprocessorFunctionBase) listing the preprocessing functions to be called, in order.

edit_request(data_mgr, pod)[source]

Edit POD’s data request, based on the child class’s functionality. If the child class has a function that can transform data in format X to format Y and the POD requests X, this method should insert a backup/fallback request for Y.

setup(data_mgr, pod)[source]

Method to do additional configuration immediately before process() is called on each variable for pod.

open_dataset_kwargs = {'decode_cf': False, 'decode_coords': False, 'decode_times': False, 'engine': 'netcdf4', 'use_cftime': False}
save_dataset_kwargs = {'engine': 'netcdf4', 'format': 'NETCDF4_CLASSIC'}
read_one_file(var, path_list)[source]
abstract read_dataset(var)[source]
clean_nc_var_encoding(var, name, ds_obj)[source]

Clean up the attrs and encoding dicts of obj prior to writing to a netCDF file, as a workaround for the following known issues:

  • Missing attributes may be set to the sentinel value ATTR_NOT_FOUND by xr_parser.DefaultDatasetParser. Depending on context, this may not be an error, but attributes with this value need to be deleted before writing.

  • Delete the _FillValue attribute for all independent variables (coordinates and their bounds), which is specified in the CF conventions but isn’t the xarray default; see https://github.com/pydata/xarray/issues/1598.

  • ‘NaN’ is not recognized as a valid _FillValue by NCL (see https://www.ncl.ucar.edu/Support/talk_archives/2012/1689.html), so unset the attribute for this case.

  • xarray .to_netcdf() raises an error if attributes set on a variable have the same name as those used in its encoding, even if their values are the same. We delete these attributes prior to writing, after checking equality of values.

clean_output_attrs(var, ds)[source]

Call clean_nc_var_encoding() on all sets of attributes in the Dataset ds.

log_history_attr(var, ds)[source]

Update history attribute on xarray Dataset ds with log records of any metadata modifications logged to var’s _nc_history_log log handler. Out of simplicity, events are written in chronological rather than reverse chronological order.

write_dataset(var, ds)[source]
load_ds(var)[source]

Top-level method to load dataset and parse metadata; spun out so that child classes can modify it.

process_ds(var, ds)[source]

Top-level method to apply selected functions to dataset; spun out so that child classes can modify it.

write_ds(var, ds)[source]

Top-level method to write out processed dataset; spun out so that child classes can modify it.

process(var)[source]

Top-level wrapper for doing all preprocessing of data files.

_abc_impl = <_abc_data object>
class src.preprocessor.SingleFilePreprocessor(*args, **kwargs)[source]

Bases: src.preprocessor.MDTFPreprocessorBase

A MDTFPreprocessorBase for preprocessing model data that is provided as a single netcdf file per variable, for example the sample model data.

read_dataset(var)[source]
_abc_impl = <_abc_data object>
class src.preprocessor.DaskMultiFilePreprocessor(*args, **kwargs)[source]

Bases: src.preprocessor.MDTFPreprocessorBase

A MDTFPreprocessorBase that uses xarray’s dask support to preprocessing model data provided as one or several netcdf files per variable.

_file_preproc_functions = <src.util.basic._AbstractAttributePlaceholder object>
edit_request(data_mgr, pod)[source]

Edit POD’s data request, based on the child class’s functionality. If the child class has a function that can transform data in format X to format Y and the POD requests X, this method should insert a backup/fallback request for Y.

read_dataset(var)[source]
_abc_impl = <_abc_data object>
class src.preprocessor.SampleDataPreprocessor(*args, **kwargs)[source]

Bases: src.preprocessor.SingleFilePreprocessor

Implementation class for MDTFPreprocessorBase intended for use on sample model data distributed with the package. Assumes all data is in one netCDF file.

_abc_impl = <_abc_data object>
class src.preprocessor.DefaultPreprocessor(*args, **kwargs)[source]

Bases: src.preprocessor.DaskMultiFilePreprocessor

Implementation class for MDTFPreprocessorBase for the general use case. Includes all implemented functionality and handles multi-file data.

_file_preproc_functions = []
_abc_impl = <_abc_data object>