src.xr_parser module¶
Utility functions for working with xarray Datasets.
-
class
src.xr_parser.PlaceholderScalarCoordinate(name: str, axis: str, standard_name: str = sentinel.AttrNotFound, units: str = sentinel.AttrNotFound)[source]¶ Bases:
objectDummy object used to describe scalar coordinates referred to by name only in the ‘coordinates’ attribute of a variable or dataset. We do this so that the attributes match those of coordinates represented by real netCDF Variables.
-
src.xr_parser.patch_cf_xarray_accessor(mod)[source]¶ Monkey-patches
_get_axis_coord, a module-level function in cf_xarray, to obtain desired behavior.
-
class
src.xr_parser.MDTFCFAccessorMixin[source]¶ Bases:
objectMethods we add for both xarray Dataset and DataArray objects, although intended use case will be to call them once per Dataset.
-
property
is_static¶
-
property
calendar¶ Reads ‘calendar’ attribute on time axis (intended to have been set by set_calendar()). Returns None if no time axis.
-
_old_axes_dict(var_name=None)[source]¶ Code for the “axes” accessor behavior as defined in cf_xarray, which we override in various ways below.
- Parameters
var_name (optional) – If supplied, return a dict containing the subset of coordinates used by the dependent variable var_name, instead of all coordinates in the dataset.
- Returns
dict mapping axes labels to lists of names of variables in the DataSet that the accessor has mapped to that axis.
-
property
dim_axes_set¶
-
property
axes_set¶
-
property
-
class
src.xr_parser.MDTFCFDatasetAccessorMixin[source]¶ Bases:
src.xr_parser.MDTFCFAccessorMixinMethods we add for xarray Dataset objects.
-
scalar_coords(var_name=None)[source]¶ Return a list of the Dataset variables corresponding to scalar coordinates. If coordinate was defined as an attribute only, store its name instead.
-
get_scalar(ax_name, var_name=None)[source]¶ If the axis label ax_name is a scalar coordinate, return the corresponding xarray DataArray (or PlaceholderScalarCoordinate), otherwise return None.
-
axes(var_name=None, filter_set=None)[source]¶ Override cf_xarray accessor behavior (from :meth:`~MDTFCFAccessorMixin._old_axes_dict).
- Parameters
var_name (optional) – If supplied, return a dict containing the subset of coordinates used by the dependent variable var_name, instead of all coordinates in the dataset.
filter_set (optional) – Optional iterable of coordinate names. If supplied, restrict the returned dict to coordinates in filter_set.
- Returns
dict mapping axis labels to lists of the Dataset variables themselves, instead of their names.
-
-
class
src.xr_parser.MDTFDataArrayAccessorMixin[source]¶ Bases:
src.xr_parser.MDTFCFAccessorMixinMethods we add for xarray DataArray objects.
-
dim_axes()[source]¶ Map axes labels to the (unique) coordinate variable name, instead of a list of names as in cf_xarray. Filter on dimension coordinates only (eliminating any scalar coordinates.)
-
axes()[source]¶ Map axes labels to the (unique) coordinate variable name, instead of a list of names as in cf_xarray.
-
property
formula_terms¶ name in dataset) pairs parsed from formula_terms attribute. If attribute not present, returns empty dict.
- Type
Returns dict of (name in formula
-
-
class
src.xr_parser.DefaultDatasetParser(data_mgr, pod)[source]¶ Bases:
objectClass which acts as a container for MDTF-specific dataset parsing logic.
-
setup(data_mgr, pod)[source]¶ Method to do additional configuration immediately before
parse()is called on each variable for pod.
-
guess_attr(attr_desc, attr_name, options, default=None, comparison_func=None)[source]¶ Select and return element of options equal to attr_name. If none are equal, try a case-insensititve string match.
- Parameters
attr_desc (str) – Description of the attribute (only used for log messages.)
attr_name (str) – Expected name of the attribute.
options (iterable of str) – Attribute names that are present in the data.
default (str, default None) – If supplied, default value to return if no match.
comparison_func (optional, default None) – String comparison function to use.
- Raises
KeyError if no element of options can be coerced to match key_name. –
-
normalize_attr(new_attr_d, d, key_name, key_startswith=None)[source]¶ Sets the value in dict d corresponding to the key key_name.
If key_name is in d, no changes are made. If key_name is not in d, we check possible nonstandard representations of the key (case-insensitive match via
guess_attr()and whether the key starts with the string key_startswith.) If no match is found for key_name, its value is set to the sentinel value ATTR_NOT_FOUND.- Parameters
new_attr_d (dict) – dict to store all found attributes. We don’t change attributes on d here, since that can interfere with xarray.decode_cf(), but instead pass this to
restore_attrs()so they can be set once that’s done.d (dict) – dict of DataSet attributes, whose keys are to be searched for key_name.
key_name (str) – Expected name of the key.
key_startswith (optional, str) – If provided and if key_name isn’t found in d, a key starting with this string will be accepted instead.
-
normalize_calendar(attr_d)[source]¶ Finds the calendar attribute, if present, and normalizes it to one of the values in the CF standard before xarray.decode_cf decodes the time axis.
-
normalize_pre_decode(ds)[source]¶ Initial munging of xarray Dataset attribute dicts, before any parsing by xarray.decode_cf() or the cf_xarray accessor.
-
restore_attrs_backup(ds)[source]¶ xarray.decode_cf() and other functions appear to un-set some of the attributes defined in the netCDF file. Restore them from the backups made in
munge_ds_attrs(), but only if the attribute was deleted.
-
normalize_standard_name(new_attr_d, attr_d)[source]¶ Method for munging standard_name attribute prior to parsing.
-
normalize_unit(new_attr_d, attr_d)[source]¶ HACK to convert unit strings to values that are correctly parsed by cfunits/UDUnits2. Currently we handle the case where “mb” is interpreted as “millibarn”, a unit of area (see UDUnits mailing list.)
-
normalize_dependent_var(var, ds)[source]¶ Use heuristics to determine the name of the dependent variable from among all the variables in the Dataset ds, if the name doesn’t match the value we expect in our_var.
-
normalize_metadata(var, ds)[source]¶ Normalize name, standard_name and units attributes after decode_cf and cf_xarray setup steps and metadata dict has been restored, since those methods don’t touch these metadata attributes.
-
compare_attr(our_attr_tuple, ds_attr_tuple, comparison_func=None, fill_ours=True, fill_ds=False, overwrite_ours=None)[source]¶ Worker function to compare two attributes (on our_var, the framework’s record, and on ds, the “ground truth” of the dataset) and update one in the event of disagreement.
This handles the special cases where the attribute isn’t defined on our_var or ds.
- Parameters
our_attr_tuple – tuple specifying the attribute on our_var
ds_attr_tuple – tuple specifying the same attribute on ds
comparison_func – function of two arguments to use to compare the attributes; defaults to
__eq__.fill_ours (bool) – If the attr on our_var is missing, fill it in with the value from ds.
fill_ds (bool) – If the attr on ds is missing, fill it in with the value from our_var.
overwrite_ours (bool) –
Action to take if both attrs are defined but have different values:
- None (default): Update our_var if fill_ours is True,
but in any case raise a
MetadataEvent.
True: Change our_var to match ds.
False: Change ds to match our_var.
-
reconcile_name(our_var, ds_var_name, overwrite_ours=None)[source]¶ Reconcile the name of the variable between the ‘ground truth’ of the dataset we downloaded (ds_var) and our expectations based on the model’s convention (our_var).
-
reconcile_attr(our_var, ds_var, our_attr_name, ds_attr_name=None, **kwargs)[source]¶ Compare attribute of a
DMVariable(our_var) with what’s set in the xarray.Dataset (ds_var).
-
reconcile_names(our_var, ds, ds_var_name, overwrite_ours=None)[source]¶ Reconcile the name and standard_name attributes between the ‘ground truth’ of the dataset we downloaded (ds_var_name) and our expectations based on the model’s convention (our_var).
- Parameters
our_var (
TranslatedVarlistEntry) – Expected attributes of the dataset variable, according to the data request.ds – xarray DataSet.
ds_var_name (str) – Name of the variable in ds we expect to correspond to our_var.
overwrite_ours (bool, default False) – If True, always update the name of our_var to what’s found in ds.
-
reconcile_units(our_var, ds_var)[source]¶ Reconcile the units attribute between the ‘ground truth’ of the dataset we downloaded (ds_var) and our expectations based on the model’s convention (our_var).
- Parameters
our_var (
TranslatedVarlistEntry) – Expected attributes of the dataset variable, according to the data request.ds_var – xarray DataArray.
-
reconcile_time_units(our_var, ds_var)[source]¶ Special case of
reconcile_units()for the time variable. In normal operation we don’t know (or need to know) the calendar or reference date (for time units of the form ‘days since 1970-01-01’), so it’s OK to set these from the dataset.- Parameters
our_var (
TranslatedVarlistEntry) – Expected attributes of the dataset variable, according to the data request.ds_var – xarray DataArray.
-
reconcile_scalar_value_and_units(our_var, ds_var)[source]¶ Compare scalar coordinate value of a
DMVariable(our_var) with what’s set in the xarray.Dataset (ds_var). If there’s a discrepancy, log an error but change the entry in our_var.
-
reconcile_coord_bounds(our_coord, ds, ds_coord_name)[source]¶ Reconcile standard_name and units attributes between the ‘ground truth’ of the dataset we downloaded (ds_var_name) and our expectations based on the model’s convention (our_var), for the bounds on the dimension coordinate our_coord.
-
reconcile_dimension_coords(our_var, ds)[source]¶ Reconcile name, standard_name and units attributes between the ‘ground truth’ of the dataset we downloaded (ds_var_name) and our expectations based on the model’s convention (our_var), for all dimension coordinates used by our_var.
- Parameters
our_var (
TranslatedVarlistEntry) – Expected attributes of the dataset variable, according to the data request.ds – xarray DataSet.
-
reconcile_scalar_coords(our_var, ds)[source]¶ Reconcile name, standard_name and units attributes between the ‘ground truth’ of the dataset we downloaded (ds_var_name) and our expectations based on the model’s convention (our_var), for all scalar coordinates used by our_var.
- Parameters
our_var (
TranslatedVarlistEntry) – Expected attributes of the dataset variable, according to the data request.ds – xarray DataSet.
-
reconcile_variable(var, ds)[source]¶ Top-level method for the MDTF-specific dataset validation: attempts to reconcile name, standard_name and units attributes for the variable and coordinates in translated_var (our expectation, based on the DataSource’s naming convention) with attributes actually present in the Dataset ds.
-
check_calendar(ds)[source]¶ Checks the ‘calendar’ attribute has been set correctly for time-dependent data (assumes CF conventions).
Sets the “calendar” attr on the time coordinate, if it exists, in order to be read by the calendar property defined in the cf_xarray accessor.
-
check_metadata(ds_var, *attr_names)[source]¶ Wrapper for
normalize_attr(), specialized to the case of getting a variable’s standard_name.
-
check_ds_attrs(var, ds)[source]¶ Final checking of xarray Dataset attribute dicts before starting functions in
src.preprocessor.Only check attributes on the dependent variable var_name and its coordinates: any other netCDF variables in the file are ignored.
-
parse(var, ds)[source]¶ Calls the above metadata parsing functions in the intended order; intended to be called immediately after the Dataset is opened.
Note
decode_cf=Falseshould be passed to the xarray open_dataset method, since that parsing is done here instead.Strip whitespace from attributes as a precaution to avoid malformed metadata.
Call xarray’s decode_cf, using cftime to decode CF-compliant date/time axes.
Assign axis labels to dimension coordinates using cf_xarray.
Verify that calendar is set correctly.
- Verify that the name, standard_name and units for the variable and its
coordinates are set correctly.
-