src.xr_parser module¶
Code for normalizing metadata in xarray Datasets; see Data layer: Preprocessing.
Familiarity with the cf_xarray package, used as a third-party dependency, as well as the src.data_model module is recommended.
-
src.xr_parser.ATTR_NOT_FOUND= sentinel.AttrNotFound¶ Sentinel object serving as a placeholder for netCDF metadata attributes that are expected, but not present in the data.
-
class
src.xr_parser.PlaceholderScalarCoordinate(name: str, axis: str, standard_name: str = sentinel.AttrNotFound, units: str = sentinel.AttrNotFound)[source]¶ Bases:
objectDummy object used to describe scalar coordinates referred to by name only in the ‘coordinates’ attribute of a variable or dataset. We do this so that the attributes match those of coordinates represented by real netCDF Variables.
-
__init__(name: str, axis: str, standard_name: str = sentinel.AttrNotFound, units: str = sentinel.AttrNotFound) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
__post_init__(*args, **kwargs)¶
-
-
src.xr_parser.patch_cf_xarray_accessor(mod)[source]¶ Monkey-patches
_get_axis_coord, a module-level function in cf_xarray, to obtain desired axis-to-coordinate lookup behavior. Specifically, if a variable has been recognized as one of the coordinates in the dict above and no variable has been set as the corresponding axis, recognize the variable as that axis as well. See discussion at https://github.com/xarray-contrib/cf-xarray/issues/23.
-
class
src.xr_parser.MDTFCFAccessorMixin[source]¶ Bases:
objectProperties we add to both xarray Dataset and DataArray objects via the accessor extension mechanism.
-
property
is_static¶ Returns bool according to whether the Dataset/DataArray has/is a time coordinate.
-
property
calendar¶ Reads ‘calendar’ attribute on time axis (intended to have been set by
DefaultDatasetParser.normalize_calendar()). Returns None if no time axis.
-
property
dim_axes_set¶ Returns a frozenset of names of axes which are dimension coordinates.
-
property
axes_set¶ Returns a frozenset of all axes names.
-
__init__()¶ Initialize self. See help(type(self)) for accurate signature.
-
property
-
class
src.xr_parser.MDTFCFDatasetAccessorMixin[source]¶ Bases:
src.xr_parser.MDTFCFAccessorMixinMethods we add for xarray Dataset objects via the accessor extension mechanism.
-
scalar_coords(var_name=None)[source]¶ Return a list of the Dataset variable objects corresponding to scalar coordinates on the entire Dataset, or on var_name if given. If a coordinate was defined as an attribute only, return its name in a
PlaceholderScalarCoordinateobject instead.
-
get_scalar(ax_name, var_name=None)[source]¶ If the axis label ax_name is a scalar coordinate, return the corresponding xarray DataArray (or
PlaceholderScalarCoordinate), otherwise return None. Applies to the entire Dataset, or to var_name if given.
-
axes(var_name=None, filter_set=None)[source]¶ Override cf_xarray accessor behavior (from
_old_axes_dict()).- Parameters
var_name (optional) – If supplied, return a dict containing the subset of coordinates used by the dependent variable var_name, instead of all coordinates in the dataset.
filter_set (optional) – Optional iterable of coordinate names. If supplied, restrict the returned dict to coordinates in filter_set.
- Returns
Dict mapping axis labels to lists of the Dataset variables themselves, instead of their names.
-
dim_axes(var_name=None)[source]¶ Override cf_xarray accessor behavior by having values of the ‘axes’ dict be the Dataset variables themselves, instead of their names.
-
__init__()¶ Initialize self. See help(type(self)) for accurate signature.
-
property
axes_set¶ Returns a frozenset of all axes names.
-
property
calendar¶ Reads ‘calendar’ attribute on time axis (intended to have been set by
DefaultDatasetParser.normalize_calendar()). Returns None if no time axis.
-
property
dim_axes_set¶ Returns a frozenset of names of axes which are dimension coordinates.
-
property
is_static¶ Returns bool according to whether the Dataset/DataArray has/is a time coordinate.
-
-
class
src.xr_parser.MDTFDataArrayAccessorMixin[source]¶ Bases:
src.xr_parser.MDTFCFAccessorMixinMethods we add for xarray DataArray objects via the accessor extension mechanism.
-
dim_axes()[source]¶ Map axes labels to the (unique) coordinate variable name, instead of a list of names as in cf_xarray. Filter on dimension coordinates only (eliminating any scalar coordinates.)
-
axes()[source]¶ Map axes labels to the (unique) coordinate variable name, instead of a list of names as in cf_xarray.
-
property
formula_terms¶ name in dataset) pairs parsed from formula_terms attribute. If attribute not present, returns empty dict.
- Type
Returns dict of (name in formula
-
__init__()¶ Initialize self. See help(type(self)) for accurate signature.
-
property
axes_set¶ Returns a frozenset of all axes names.
-
property
calendar¶ Reads ‘calendar’ attribute on time axis (intended to have been set by
DefaultDatasetParser.normalize_calendar()). Returns None if no time axis.
-
property
dim_axes_set¶ Returns a frozenset of names of axes which are dimension coordinates.
-
property
is_static¶ Returns bool according to whether the Dataset/DataArray has/is a time coordinate.
-
-
class
src.xr_parser.MDTFCFDatasetAccessor[source]¶ Bases:
src.xr_parser.MDTFCFDatasetAccessorMixin,objectAccessor that’s registered (under the attribute
cf) for xarray Datasets. Combines methods inMDTFCFDatasetAccessorMixinand the cf_xarray Dataset accessor.-
__init__()¶ Initialize self. See help(type(self)) for accurate signature.
-
axes(var_name=None, filter_set=None)¶ Override cf_xarray accessor behavior (from
_old_axes_dict()).- Parameters
var_name (optional) – If supplied, return a dict containing the subset of coordinates used by the dependent variable var_name, instead of all coordinates in the dataset.
filter_set (optional) – Optional iterable of coordinate names. If supplied, restrict the returned dict to coordinates in filter_set.
- Returns
Dict mapping axis labels to lists of the Dataset variables themselves, instead of their names.
-
property
axes_set¶ Returns a frozenset of all axes names.
-
property
calendar¶ Reads ‘calendar’ attribute on time axis (intended to have been set by
DefaultDatasetParser.normalize_calendar()). Returns None if no time axis.
-
dim_axes(var_name=None)¶ Override cf_xarray accessor behavior by having values of the ‘axes’ dict be the Dataset variables themselves, instead of their names.
-
property
dim_axes_set¶ Returns a frozenset of names of axes which are dimension coordinates.
-
get_scalar(ax_name, var_name=None)¶ If the axis label ax_name is a scalar coordinate, return the corresponding xarray DataArray (or
PlaceholderScalarCoordinate), otherwise return None. Applies to the entire Dataset, or to var_name if given.
-
property
is_static¶ Returns bool according to whether the Dataset/DataArray has/is a time coordinate.
-
scalar_coords(var_name=None)¶ Return a list of the Dataset variable objects corresponding to scalar coordinates on the entire Dataset, or on var_name if given. If a coordinate was defined as an attribute only, return its name in a
PlaceholderScalarCoordinateobject instead.
-
-
class
src.xr_parser.MDTFCFDataArrayAccessor[source]¶ Bases:
src.xr_parser.MDTFDataArrayAccessorMixin,objectAccessor that’s registered (under the attribute
cf) for xarray DataArrays. Combines methods inMDTFDataArrayAccessorMixinand the cf_xarray DataArray accessor.-
__init__()¶ Initialize self. See help(type(self)) for accurate signature.
-
axes()¶ Map axes labels to the (unique) coordinate variable name, instead of a list of names as in cf_xarray.
-
property
axes_set¶ Returns a frozenset of all axes names.
-
property
calendar¶ Reads ‘calendar’ attribute on time axis (intended to have been set by
DefaultDatasetParser.normalize_calendar()). Returns None if no time axis.
-
dim_axes()¶ Map axes labels to the (unique) coordinate variable name, instead of a list of names as in cf_xarray. Filter on dimension coordinates only (eliminating any scalar coordinates.)
-
property
dim_axes_set¶ Returns a frozenset of names of axes which are dimension coordinates.
-
property
formula_terms¶ name in dataset) pairs parsed from formula_terms attribute. If attribute not present, returns empty dict.
- Type
Returns dict of (name in formula
-
property
is_static¶ Returns bool according to whether the Dataset/DataArray has/is a time coordinate.
-
-
class
src.xr_parser.DefaultDatasetParser(data_mgr, pod)[source]¶ Bases:
objectClass containing MDTF-specific methods for cleaning and normalizing xarray metadata.
Top-level methods are
parse()andget_unmapped_names().-
__init__(data_mgr, pod)[source]¶ Constructor.
- Parameters
data_mgr – DataSource instance calling the preprocessor.
pod (
Diagnostic) – POD whose variables are being preprocessed.
-
setup(data_mgr, pod)[source]¶ Hook for use by child classes (currently unused) to do additional configuration immediately before
parse()is called on each variable for pod.- Parameters
data_mgr – DataSource instance calling the preprocessor.
pod (
Diagnostic) – POD whose variables are being preprocessed.
-
guess_attr(attr_desc, attr_name, options, default=None, comparison_func=None)[source]¶ Select and return element of options equal to attr_name. If none are equal, try a case-insensititve string match.
- Parameters
attr_desc (str) – Description of the attribute (only used for log messages.)
attr_name (str) – Expected name of the attribute.
options (iterable of str) – Attribute names that are present in the data.
default (str, default None) – If supplied, default value to return if no match.
comparison_func (optional, default None) – String comparison function to use.
- Raises
KeyError – if no element of options can be coerced to match key_name.
- Returns
Element of options matching attr_name.
-
normalize_attr(new_attr_d, d, key_name, key_startswith=None)[source]¶ Sets the value in dict d corresponding to the key key_name.
If key_name is in d, no changes are made. If key_name is not in d, we check possible nonstandard representations of the key (case-insensitive match via
guess_attr()and whether the key starts with the string key_startswith.) If no match is found for key_name, its value is set to the sentinel valueATTR_NOT_FOUND.- Parameters
new_attr_d (dict) – dict to store all found attributes. We don’t change attributes on d here, since that can interfere with xarray.decode_cf(), but instead modify this dict in place and pass it to
restore_attrs()so they can be set once that’s done.d (dict) – dict of Dataset attributes, whose keys are to be searched for key_name.
key_name (str) – Expected name of the key.
key_startswith (optional, str) – If provided and if key_name isn’t found in d, a key starting with this string will be accepted instead.
-
normalize_calendar(attr_d)[source]¶ Finds the calendar attribute, if present, and normalizes it to one of the values in the CF standard before xarray.decode_cf() decodes the time axis.
-
normalize_pre_decode(ds)[source]¶ Initial munging of xarray Dataset attribute dicts, before any parsing by xarray.decode_cf() or the cf_xarray accessor.
-
restore_attrs_backup(ds)[source]¶ xarray.decode_cf() and other functions appear to un-set some of the attributes defined in the netCDF file. Restore them from the backups made in
munge_ds_attrs(), but only if the attribute was deleted.
-
normalize_standard_name(new_attr_d, attr_d)[source]¶ Method for munging standard_name attribute prior to parsing.
-
normalize_unit(new_attr_d, attr_d)[source]¶ Hook to convert unit strings to values that are correctly parsed by cfunits/UDUnits2. Currently we handle the case where “mb” is interpreted as “millibarn”, a unit of area (see UDUnits mailing list.) New cases of incorrectly parsed unit strings can be added here as they are discovered.
-
normalize_dependent_var(var, ds)[source]¶ Use heuristics to determine the name of the dependent variable from among all the variables in the Dataset ds, if the name doesn’t match the value we expect in our_var.
-
normalize_metadata(var, ds)[source]¶ Normalize name, standard_name and units attributes after decode_cf and cf_xarray setup steps and metadata dict has been restored, since those methods don’t touch these metadata attributes.
-
compare_attr(our_attr_tuple, ds_attr_tuple, comparison_func=None, fill_ours=True, fill_ds=False, overwrite_ours=None)[source]¶ Worker function to compare two attributes (on our_var, the framework’s record, and on ds, the “ground truth” of the dataset) and update one in the event of disagreement.
This handles the special cases where the attribute isn’t defined on our_var or ds.
- Parameters
our_attr_tuple – tuple specifying the attribute on our_var
ds_attr_tuple – tuple specifying the same attribute on ds
comparison_func – function of two arguments to use to compare the attributes; defaults to
__eq__.fill_ours (bool) – If the attr on our_var is missing, fill it in with the value from ds.
fill_ds (bool) – If the attr on ds is missing, fill it in with the value from our_var.
overwrite_ours (bool) –
Action to take if both attrs are defined but have different values:
- None (default): Update our_var if fill_ours is True,
but in any case raise a
MetadataEvent.
True: Change our_var to match ds.
False: Change ds to match our_var.
-
reconcile_name(our_var, ds_var_name, overwrite_ours=None)[source]¶ Reconcile the name of the variable between the ‘ground truth’ of the dataset we downloaded (ds_var) and our expectations based on the model’s convention (our_var).
-
reconcile_attr(our_var, ds_var, our_attr_name, ds_attr_name=None, **kwargs)[source]¶ Compare attribute of a
DMVariable(our_var) with what’s set in the xarray.Dataset (ds_var).
-
reconcile_names(our_var, ds, ds_var_name, overwrite_ours=None)[source]¶ Reconcile the name and standard_name attributes between the ‘ground truth’ of the dataset we downloaded (ds_var_name) and our expectations based on the model’s convention (our_var).
- Parameters
our_var (
TranslatedVarlistEntry) – Expected attributes of the dataset variable, according to the data request.ds – xarray Dataset.
ds_var_name (str) – Name of the variable in ds we expect to correspond to our_var.
overwrite_ours (bool, default False) – If True, always update the name of our_var to what’s found in ds.
-
reconcile_units(our_var, ds_var)[source]¶ Reconcile the units attribute between the ‘ground truth’ of the dataset we downloaded (ds_var) and our expectations based on the model’s convention (our_var).
- Parameters
our_var (
TranslatedVarlistEntry) – Expected attributes of the dataset variable, according to the data request.ds_var – xarray DataArray.
-
reconcile_time_units(our_var, ds_var)[source]¶ Special case of
reconcile_units()for the time variable. In normal operation we don’t know (or need to know) the calendar or reference date (for time units of the form ‘days since 1970-01-01’), so it’s OK to set these from the dataset.- Parameters
our_var (
TranslatedVarlistEntry) – Expected attributes of the dataset variable, according to the data request.ds_var – xarray DataArray.
-
reconcile_scalar_value_and_units(our_var, ds_var)[source]¶ Compare scalar coordinate value of a
DMVariable(our_var) with what’s set in the xarray.Dataset (ds_var). If there’s a discrepancy, log an error but change the entry in our_var.
-
reconcile_coord_bounds(our_coord, ds, ds_coord_name)[source]¶ Reconcile standard_name and units attributes between the ‘ground truth’ of the dataset we downloaded (ds_var_name) and our expectations based on the model’s convention (our_var), for the bounds on the dimension coordinate our_coord.
-
reconcile_dimension_coords(our_var, ds)[source]¶ Reconcile name, standard_name and units attributes between the ‘ground truth’ of the dataset we downloaded (ds_var_name) and our expectations based on the model’s convention (our_var), for all dimension coordinates used by our_var.
- Parameters
our_var (
TranslatedVarlistEntry) – Expected attributes of the dataset variable, according to the data request.ds – xarray Dataset.
-
reconcile_scalar_coords(our_var, ds)[source]¶ Reconcile name, standard_name and units attributes between the ‘ground truth’ of the dataset we downloaded (ds_var_name) and our expectations based on the model’s convention (our_var), for all scalar coordinates used by our_var.
- Parameters
our_var (
TranslatedVarlistEntry) – Expected attributes of the dataset variable, according to the data request.ds – xarray Dataset.
-
reconcile_variable(var, ds)[source]¶ Top-level method for the MDTF-specific dataset validation: attempts to reconcile name, standard_name and units attributes for the variable and coordinates in translated_var (our expectation, based on the DataSource’s naming convention) with attributes actually present in the Dataset ds.
-
check_calendar(ds)[source]¶ Checks the ‘calendar’ attribute has been set correctly for time-dependent data (assumes CF conventions).
Sets the “calendar” attr on the time coordinate, if it exists, in order to be read by the calendar property defined in the cf_xarray accessor.
-
check_metadata(ds_var, *attr_names)[source]¶ Wrapper for
normalize_attr(), specialized to the case of getting a variable’s standard_name.
-
check_ds_attrs(var, ds)[source]¶ Final checking of xarray Dataset attribute dicts before starting functions in
src.preprocessor.Only checks attributes on the dependent variable var and its coordinates: any other netCDF variables in the file are ignored.
-
parse(var, ds)[source]¶ Calls the above metadata parsing functions in the intended order; intended to be called immediately after the Dataset ds is opened.
Note
decode_cf=Falseshould be passed to the xarray open_dataset method, since that parsing is done here instead.Calls
normalize_pre_decode()to do basic cleaning of metadata attributes.Call xarray’s decode_cf, using cftime to decode CF-compliant date/time axes.
Assign axis labels to dimension coordinates using cf_xarray.
Verify that calendar is set correctly (
check_calendar()).Reconcile metadata in var and ds (
reconcile_*methods).- Verify that the name, standard_name and units for the variable and its
coordinates are set correctly (
check_*methods).
- Parameters
var (
VarlistEntry) – VerlistEntry describing metadata we expect to find in ds.ds (Dataset) – xarray Dataset of locally downloaded model data.
- Returns
ds, with data unchanged but metadata normalized to expected values. Except in specific cases, attributes of var are updated to reflect the ‘ground truth’ of data in ds.
-
static
get_unmapped_names(ds)[source]¶ Get a dict whose keys are variable or attribute names referred to by variables in the Dataset ds, but not present in the dataset itself.
- Returns
Values of the dict are sets of names of variables in the dataset that referred to the missing name (keys).
- Return type
(dict)
-