src.preprocessor module¶
Functionality for transforming model data into the format expected by PODs once it’s been downloaded; see Data layer: Preprocessing.
-
src.preprocessor.copy_as_alternate(old_v, data_mgr, **kwargs)[source]¶ Wrapper for
replace()that creates a copy of an existingVarlistEntryold_v and sets appropriate attributes to designate it as an alternate variable.
-
src.preprocessor.edit_request_wrapper(wrapped_edit_request_func)[source]¶ Decorator implementing the most typical (so far) use case for
PreprocessorFunctionBase.edit_request(), in which we look at each variable request in the varlist separately and, optionally, add a new alternateVarlistEntrybased on that request.This decorator wraps a function which either constructs and returns the desired new alternate
VarlistEntry, or returns None if no alternates are to be added for the given variable request. It adds logic for updating the list of alternates for the pod’s varlist.
-
class
src.preprocessor.PreprocessorFunctionBase(data_mgr, pod)[source]¶ Bases:
abc.ABCAbstract interface for implementing a specific preprocessing functionality. We prefer to put each set of operations in its own child class, rather than dumping everything into a general Preprocessor class, in order to keep the logic easier to follow.
It’s up to individual Preprocessor child classes to select which functions to use, and in what order to perform them.
-
edit_request(data_mgr, pod)[source]¶ Edit the data requested in pod’s
Varlistqueue, based on the transformations the functionality can perform. If the function can transform data in format X to format Y and the POD requests X, this method should insert a backup/fallback request for Y.
-
abstract
process(var, dataset)[source]¶ Apply functionality to the input dataset.
- Parameters
var (
VarlistEntry) – POD varlist entry instance describing POD’s data request, which is the desired end result of preprocessing work.dataset – xarray.Dataset instance.
-
-
class
src.preprocessor.CropDateRangeFunction(data_mgr, pod)[source]¶ Bases:
src.preprocessor.PreprocessorFunctionBaseA
PreprocessorFunctionBaseclass which trims the time axis of the dataset to the user-requested analysis period.-
static
cast_to_cftime(dt, calendar)[source]¶ Workaround to cast python
datetimedt to cftime.datetime with given calendar. Python stdlib datetime has no support for different calendars.
-
process(var, ds)[source]¶ Parse quantities related to the calendar for time-dependent data. In particular,
date_rangewas set from user input before we knew the model’s calendar. Workaround here to cast those values into cftime.datetime objects so they can be compared with the model data’s time axis.
-
static
-
class
src.preprocessor.PrecipRateToFluxFunction(data_mgr, pod)[source]¶ Bases:
src.preprocessor.PreprocessorFunctionBaseConvert units on the dependent variable of var, as well as its (non-time) dimension coordinate axes, from what’s specified in the dataset attributes to what’s given in the
VarlistEntry.-
edit_request(v, pod, data_mgr)[source]¶ Edit the POD’s Varlist prior to query. If v has a standard_name in the list above, insert an alternate varlist entry whose translation requests the complementary type of variable (ie, if given rate, add an entry for flux; if given flux, add an entry for rate.)
-
process(var, ds)[source]¶ Apply functionality to the input dataset.
- Parameters
var (
VarlistEntry) – POD varlist entry instance describing POD’s data request, which is the desired end result of preprocessing work.dataset – xarray.Dataset instance.
-
-
class
src.preprocessor.ConvertUnitsFunction(data_mgr, pod)[source]¶ Bases:
src.preprocessor.PreprocessorFunctionBaseConvert units on the dependent variable of var, as well as its (non-time) dimension coordinate axes, from what’s specified in the dataset attributes to what’s given in the
VarlistEntry.-
process(var, ds)[source]¶ Convert units on the dependent variable and coordinates of var from what’s specified in the dataset attributes to what’s given in the VarlistEntry var. Units attributes are updated on the
TranslatedVarlistEntry.
-
-
class
src.preprocessor.RenameVariablesFunction(data_mgr, pod)[source]¶ Bases:
src.preprocessor.PreprocessorFunctionBase-
process(var, ds)[source]¶ Apply functionality to the input dataset.
- Parameters
var (
VarlistEntry) – POD varlist entry instance describing POD’s data request, which is the desired end result of preprocessing work.dataset – xarray.Dataset instance.
-
-
class
src.preprocessor.ExtractLevelFunction(data_mgr, pod)[source]¶ Bases:
src.preprocessor.PreprocessorFunctionBaseExtract a single pressure level from a Dataset. Unit conversions of pressure are handled by cfunits, (see src.units module) but paramateric vertical coordinates are not handled: interpolation is not implemented here. If the exact level is not provided by the data, KeyError is raised.
-
edit_request(v, pod, data_mgr)[source]¶ Edit the pod’s
Varlistprior to data query. If given aVarlistEntryv which specifies a scalar Z coordinate, return a copy with that scalar_coordinate removed to be used as an alternate variable for v.
-
-
class
src.preprocessor.ApplyScaleAndOffsetFunction(data_mgr, pod)[source]¶ Bases:
src.preprocessor.PreprocessorFunctionBaseIf the variable has
scale_factorandadd_offsetattributes set, apply the corresponding constant linear transformation to the variable’s values and unset these attributes. By default this function is not applied.See CF convention documentation on the
scale_factorandadd_offsetattributes.-
process(var, ds)[source]¶ Apply functionality to the input dataset.
- Parameters
var (
VarlistEntry) – POD varlist entry instance describing POD’s data request, which is the desired end result of preprocessing work.dataset – xarray.Dataset instance.
-
-
class
src.preprocessor.MDTFPreprocessorBase(*args, **kwargs)[source]¶ Bases:
objectBase class for preprocessing data after it’s been fetched, in order to put it into a format expected by PODs. The only functionality implemented here is parsing data axes and CF attributes; all other functionality is provided by
PreprocessorFunctionBasefunctions, which are called in order.-
edit_request(data_mgr, pod)[source]¶ Edit pod’s data request, based on the child class’s functionality. If the child class has a function that can transform data in format X to format Y and the POD requests X, this method should insert a backup/fallback request for Y.
-
setup(data_mgr, pod)[source]¶ Method to do additional configuration immediately before
process()is called on each variable for pod.
-
property
open_dataset_kwargs¶ Arguments passed to xarray open_dataset() and open_mfdataset().
-
property
save_dataset_kwargs¶ Arguments passed to xarray to_netcdf().
-
clean_nc_var_encoding(var, name, ds_obj)[source]¶ Clean up the
attrsandencodingdicts of obj prior to writing to a netCDF file, as a workaround for the following known issues:Missing attributes may be set to the sentinel value
ATTR_NOT_FOUNDbyxr_parser.DefaultDatasetParser. Depending on context, this may not be an error, but attributes with this value need to be deleted before writing.Delete the
_FillValueattribute for all independent variables (coordinates and their bounds), which is specified in the CF conventions but isn’t the xarray default; see https://github.com/pydata/xarray/issues/1598.‘NaN’ is not recognized as a valid
_FillValueby NCL (see https://www.ncl.ucar.edu/Support/talk_archives/2012/1689.html), so unset the attribute for this case.xarray to_netcdf() raises an error if attributes set on a variable have the same name as those used in its encoding, even if their values are the same. We delete these attributes prior to writing, after checking equality of values.
-
clean_output_attrs(var, ds)[source]¶ Call
clean_nc_var_encoding()on all sets of attributes in the Dataset ds.
-
log_history_attr(var, ds)[source]¶ Update
historyattribute on xarray Dataset ds with log records of any metadata modifications logged to var’s _nc_history_log log handler. Out of simplicity, events are written in chronological rather than reverse chronological order.
-
write_dataset(var, ds)[source]¶ Writes processed Dataset ds to location specified by
dest_pathattribute of var, using xarray to_netcdf()
-
load_ds(var)[source]¶ Top-level method to load dataset and parse metadata; spun out so that child classes can modify it. Calls child class
read_dataset().
-
process_ds(var, ds)[source]¶ Top-level method to apply selected functions to dataset; spun out so that child classes can modify it.
-
write_ds(var, ds)[source]¶ Top-level method to write out processed dataset; spun out so that child classes can modify it. Calls child class
write_dataset().
-
-
class
src.preprocessor.SingleFilePreprocessor(*args, **kwargs)[source]¶ Bases:
src.preprocessor.MDTFPreprocessorBaseA
MDTFPreprocessorBasefor preprocessing model data that is provided as a single netcdf file per variable, for example the sample model data.-
read_dataset(var)[source]¶ Read a single file Dataset specified by the
local_dataattribute of var, usingread_one_file().
-
clean_nc_var_encoding(var, name, ds_obj)¶ Clean up the
attrsandencodingdicts of obj prior to writing to a netCDF file, as a workaround for the following known issues:Missing attributes may be set to the sentinel value
ATTR_NOT_FOUNDbyxr_parser.DefaultDatasetParser. Depending on context, this may not be an error, but attributes with this value need to be deleted before writing.Delete the
_FillValueattribute for all independent variables (coordinates and their bounds), which is specified in the CF conventions but isn’t the xarray default; see https://github.com/pydata/xarray/issues/1598.‘NaN’ is not recognized as a valid
_FillValueby NCL (see https://www.ncl.ucar.edu/Support/talk_archives/2012/1689.html), so unset the attribute for this case.xarray to_netcdf() raises an error if attributes set on a variable have the same name as those used in its encoding, even if their values are the same. We delete these attributes prior to writing, after checking equality of values.
-
clean_output_attrs(var, ds)¶ Call
clean_nc_var_encoding()on all sets of attributes in the Dataset ds.
-
edit_request(data_mgr, pod)¶ Edit pod’s data request, based on the child class’s functionality. If the child class has a function that can transform data in format X to format Y and the POD requests X, this method should insert a backup/fallback request for Y.
-
load_ds(var)¶ Top-level method to load dataset and parse metadata; spun out so that child classes can modify it. Calls child class
read_dataset().
-
log_history_attr(var, ds)¶ Update
historyattribute on xarray Dataset ds with log records of any metadata modifications logged to var’s _nc_history_log log handler. Out of simplicity, events are written in chronological rather than reverse chronological order.
-
property
open_dataset_kwargs¶ Arguments passed to xarray open_dataset() and open_mfdataset().
-
process(var)¶ Top-level wrapper for doing all preprocessing of data files.
-
process_ds(var, ds)¶ Top-level method to apply selected functions to dataset; spun out so that child classes can modify it.
-
read_one_file(var, path_list)¶
-
property
save_dataset_kwargs¶ Arguments passed to xarray to_netcdf().
-
setup(data_mgr, pod)¶ Method to do additional configuration immediately before
process()is called on each variable for pod.
-
write_dataset(var, ds)¶ Writes processed Dataset ds to location specified by
dest_pathattribute of var, using xarray to_netcdf()
-
write_ds(var, ds)¶ Top-level method to write out processed dataset; spun out so that child classes can modify it. Calls child class
write_dataset().
-
-
class
src.preprocessor.DaskMultiFilePreprocessor(*args, **kwargs)[source]¶ Bases:
src.preprocessor.MDTFPreprocessorBaseA
MDTFPreprocessorBasethat uses xarray’s dask support to preprocessing model data provided as one or several netcdf files per variable.-
edit_request(data_mgr, pod)[source]¶ Edit POD’s data request, based on the child class’s functionality. If the child class has a function that can transform data in format X to format Y and the POD requests X, this method should insert a backup/fallback request for Y.
-
read_dataset(var)[source]¶ Open multi-file Dataset specified by the
local_dataattribute of var, wrapping xarray open_mfdataset().
-
clean_nc_var_encoding(var, name, ds_obj)¶ Clean up the
attrsandencodingdicts of obj prior to writing to a netCDF file, as a workaround for the following known issues:Missing attributes may be set to the sentinel value
ATTR_NOT_FOUNDbyxr_parser.DefaultDatasetParser. Depending on context, this may not be an error, but attributes with this value need to be deleted before writing.Delete the
_FillValueattribute for all independent variables (coordinates and their bounds), which is specified in the CF conventions but isn’t the xarray default; see https://github.com/pydata/xarray/issues/1598.‘NaN’ is not recognized as a valid
_FillValueby NCL (see https://www.ncl.ucar.edu/Support/talk_archives/2012/1689.html), so unset the attribute for this case.xarray to_netcdf() raises an error if attributes set on a variable have the same name as those used in its encoding, even if their values are the same. We delete these attributes prior to writing, after checking equality of values.
-
clean_output_attrs(var, ds)¶ Call
clean_nc_var_encoding()on all sets of attributes in the Dataset ds.
-
load_ds(var)¶ Top-level method to load dataset and parse metadata; spun out so that child classes can modify it. Calls child class
read_dataset().
-
log_history_attr(var, ds)¶ Update
historyattribute on xarray Dataset ds with log records of any metadata modifications logged to var’s _nc_history_log log handler. Out of simplicity, events are written in chronological rather than reverse chronological order.
-
property
open_dataset_kwargs¶ Arguments passed to xarray open_dataset() and open_mfdataset().
-
process(var)¶ Top-level wrapper for doing all preprocessing of data files.
-
process_ds(var, ds)¶ Top-level method to apply selected functions to dataset; spun out so that child classes can modify it.
-
read_one_file(var, path_list)¶
-
property
save_dataset_kwargs¶ Arguments passed to xarray to_netcdf().
-
setup(data_mgr, pod)¶ Method to do additional configuration immediately before
process()is called on each variable for pod.
-
write_dataset(var, ds)¶ Writes processed Dataset ds to location specified by
dest_pathattribute of var, using xarray to_netcdf()
-
write_ds(var, ds)¶ Top-level method to write out processed dataset; spun out so that child classes can modify it. Calls child class
write_dataset().
-
-
class
src.preprocessor.SampleDataPreprocessor(*args, **kwargs)[source]¶ Bases:
src.preprocessor.SingleFilePreprocessorImplementation class for
MDTFPreprocessorBaseintended for use on sample model data distributed with the package. Assumes all data is in one netCDF file.-
clean_nc_var_encoding(var, name, ds_obj)¶ Clean up the
attrsandencodingdicts of obj prior to writing to a netCDF file, as a workaround for the following known issues:Missing attributes may be set to the sentinel value
ATTR_NOT_FOUNDbyxr_parser.DefaultDatasetParser. Depending on context, this may not be an error, but attributes with this value need to be deleted before writing.Delete the
_FillValueattribute for all independent variables (coordinates and their bounds), which is specified in the CF conventions but isn’t the xarray default; see https://github.com/pydata/xarray/issues/1598.‘NaN’ is not recognized as a valid
_FillValueby NCL (see https://www.ncl.ucar.edu/Support/talk_archives/2012/1689.html), so unset the attribute for this case.xarray to_netcdf() raises an error if attributes set on a variable have the same name as those used in its encoding, even if their values are the same. We delete these attributes prior to writing, after checking equality of values.
-
clean_output_attrs(var, ds)¶ Call
clean_nc_var_encoding()on all sets of attributes in the Dataset ds.
-
edit_request(data_mgr, pod)¶ Edit pod’s data request, based on the child class’s functionality. If the child class has a function that can transform data in format X to format Y and the POD requests X, this method should insert a backup/fallback request for Y.
-
load_ds(var)¶ Top-level method to load dataset and parse metadata; spun out so that child classes can modify it. Calls child class
read_dataset().
-
log_history_attr(var, ds)¶ Update
historyattribute on xarray Dataset ds with log records of any metadata modifications logged to var’s _nc_history_log log handler. Out of simplicity, events are written in chronological rather than reverse chronological order.
-
property
open_dataset_kwargs¶ Arguments passed to xarray open_dataset() and open_mfdataset().
-
process(var)¶ Top-level wrapper for doing all preprocessing of data files.
-
process_ds(var, ds)¶ Top-level method to apply selected functions to dataset; spun out so that child classes can modify it.
-
read_dataset(var)¶ Read a single file Dataset specified by the
local_dataattribute of var, usingread_one_file().
-
read_one_file(var, path_list)¶
-
property
save_dataset_kwargs¶ Arguments passed to xarray to_netcdf().
-
setup(data_mgr, pod)¶ Method to do additional configuration immediately before
process()is called on each variable for pod.
-
write_dataset(var, ds)¶ Writes processed Dataset ds to location specified by
dest_pathattribute of var, using xarray to_netcdf()
-
write_ds(var, ds)¶ Top-level method to write out processed dataset; spun out so that child classes can modify it. Calls child class
write_dataset().
-
-
class
src.preprocessor.DefaultPreprocessor(*args, **kwargs)[source]¶ Bases:
src.preprocessor.DaskMultiFilePreprocessorImplementation class for
MDTFPreprocessorBasefor the general use case. Includes all implemented functionality and handles multi-file data.-
__init__(data_mgr, pod)¶ Initialize self. See help(type(self)) for accurate signature.
-
clean_nc_var_encoding(var, name, ds_obj)¶ Clean up the
attrsandencodingdicts of obj prior to writing to a netCDF file, as a workaround for the following known issues:Missing attributes may be set to the sentinel value
ATTR_NOT_FOUNDbyxr_parser.DefaultDatasetParser. Depending on context, this may not be an error, but attributes with this value need to be deleted before writing.Delete the
_FillValueattribute for all independent variables (coordinates and their bounds), which is specified in the CF conventions but isn’t the xarray default; see https://github.com/pydata/xarray/issues/1598.‘NaN’ is not recognized as a valid
_FillValueby NCL (see https://www.ncl.ucar.edu/Support/talk_archives/2012/1689.html), so unset the attribute for this case.xarray to_netcdf() raises an error if attributes set on a variable have the same name as those used in its encoding, even if their values are the same. We delete these attributes prior to writing, after checking equality of values.
-
clean_output_attrs(var, ds)¶ Call
clean_nc_var_encoding()on all sets of attributes in the Dataset ds.
-
edit_request(data_mgr, pod)¶ Edit POD’s data request, based on the child class’s functionality. If the child class has a function that can transform data in format X to format Y and the POD requests X, this method should insert a backup/fallback request for Y.
-
load_ds(var)¶ Top-level method to load dataset and parse metadata; spun out so that child classes can modify it. Calls child class
read_dataset().
-
log_history_attr(var, ds)¶ Update
historyattribute on xarray Dataset ds with log records of any metadata modifications logged to var’s _nc_history_log log handler. Out of simplicity, events are written in chronological rather than reverse chronological order.
-
property
open_dataset_kwargs¶ Arguments passed to xarray open_dataset() and open_mfdataset().
-
process(var)¶ Top-level wrapper for doing all preprocessing of data files.
-
process_ds(var, ds)¶ Top-level method to apply selected functions to dataset; spun out so that child classes can modify it.
-
read_dataset(var)¶ Open multi-file Dataset specified by the
local_dataattribute of var, wrapping xarray open_mfdataset().
-
read_one_file(var, path_list)¶
-
property
save_dataset_kwargs¶ Arguments passed to xarray to_netcdf().
-
setup(data_mgr, pod)¶ Method to do additional configuration immediately before
process()is called on each variable for pod.
-
write_dataset(var, ds)¶ Writes processed Dataset ds to location specified by
dest_pathattribute of var, using xarray to_netcdf()
-
write_ds(var, ds)¶ Top-level method to write out processed dataset; spun out so that child classes can modify it. Calls child class
write_dataset().
-