src.data_sources module¶
Implementation classes for the model data query/fetch functionality
implemented in src/data_manager.py, selected by the user via --data_manager.
-
class
src.data_sources.SampleDataFile(first_arg=None, *args, **kwargs)[source]¶ Bases:
objectDataclass describing catalog entries for sample model data files.
-
frequency: src.util.datelabel.DateFrequency = sentinel.Mandatory¶
-
_is_regex_dataclass= True¶
-
_pattern= {}¶
-
classmethod
from_string(str_, *args)¶
-
-
class
src.data_sources.SampleDataAttributes(CASENAME: str = sentinel.Mandatory, FIRSTYR: str = sentinel.Mandatory, LASTYR: str = sentinel.Mandatory, CASE_ROOT_DIR: str = '', convention: str = sentinel.Mandatory, log: dataclasses.InitVar = <Logger src.data_manager (WARNING)>, sample_dataset: str = '')[source]¶ Bases:
src.data_manager.DataSourceAttributesBaseData-source-specific attributes for the DataSource providing sample model data.
-
class
src.data_sources.SampleLocalFileDataSource(*args, **kwargs)[source]¶ Bases:
src.data_manager.SingleLocalFileDataSourceDataSource for handling POD sample model data stored on a local filesystem.
-
_FileRegexClass¶ alias of
SampleDataFile
-
_AttributesClass¶ alias of
SampleDataAttributes
-
_DiagnosticClass¶ alias of
src.diagnostic.Diagnostic
-
_PreprocessorClass¶
-
col_spec= DataframeQueryColumnSpec(expt_cols=<src.data_manager.DataFrameQueryColumnGroup object>, pod_expt_cols=<src.data_manager.DataFrameQueryColumnGroup object>, var_expt_cols=<src.data_manager.DataFrameQueryColumnGroup object>, remote_data_col=None, daterange_col=None)¶
-
_query_attrs_synonyms= {'name': 'variable'}¶
-
property
CATALOG_DIR¶ Placeholder class used in the definition of the
abstract_attribute()decorator.
-
_abc_impl= <_abc_data object>¶
-
-
class
src.data_sources.MetadataRewriteParser(data_mgr, pod)[source]¶ Bases:
src.xr_parser.DefaultDatasetParserAfter loading and parsing the metadata on dataset ds but before applying the preprocessing functions, update attrs on ds with the new metadata values that were specified in
ExplicitFileDataSource’s config file.-
setup(data_mgr, pod)[source]¶ Make a lookup table to map
VarlistEntryIDs to the set of metadata that we need to alter.If user has provided the name of variable used by the data files (via the
var_nameattribute), set that as the translated variable name. Otherwise, variables are untranslated, and we use the herusitics inxr_parser.DefaultDatasetParser.guess_dependent_var()to determine the name.
-
_post_normalize_hook(var, ds)[source]¶ After loading the metadata on dataset ds but before reconciling it with the record, update attrs with the new metadata values that were specified in
ExplicitFileDataSource’s config file.Normal operation is to set the changed attrs on the VarlistEntry translation, and then have these overwrite attrs in ds in the inherited
xr_parser.DefaultDatasetParser.reconcile_variable()method. If the user set the--disable-preprocessorflag, this is skipped, so instead we set the attrs directly on ds.
-
-
class
src.data_sources.MetadataRewritePreprocessor(*args, **kwargs)[source]¶ Bases:
src.preprocessor.DaskMultiFilePreprocessorSubclass
DaskMultiFilePreprocessorin order to look up and apply edits to metadata that are stored inExplicitFileDataSourceConfigEntryobjects in the config_by_id attribute ofExplicitFileDataSource.-
_file_preproc_functions= []¶
-
_XarrayParserClass¶ alias of
MetadataRewriteParser
-
property
_functions¶ Determine which preprocessor functions are applicable to the current package run, defaulting to all of them.
- Returns
tuple of classes (inheriting from
PreprocessorFunctionBase) listing the preprocessing functions to be called, in order.
-
_abc_impl= <_abc_data object>¶
-
-
class
src.data_sources.GlobbedDataFile(first_arg=None, *args, **kwargs)[source]¶ Bases:
objectApplies a trivial regex to the paths returned by the glob.
-
_is_regex_dataclass= True¶
-
_pattern= {}¶
-
classmethod
from_string(str_, *args)¶
-
-
class
src.data_sources.ExplicitFileDataSourceConfigEntry(glob_id: src.util.basic.MDTF_ID = None, pod_name: str = sentinel.Mandatory, name: str = sentinel.Mandatory, glob: str = sentinel.Mandatory, var_name: str = '', metadata: dict = <factory>, _has_user_metadata: bool = None)[source]¶ Bases:
object-
glob_id: src.util.basic.MDTF_ID = None¶
-
property
full_name¶
-
-
class
src.data_sources.ExplicitFileDataAttributes(CASENAME: str = sentinel.Mandatory, FIRSTYR: str = sentinel.Mandatory, LASTYR: str = sentinel.Mandatory, CASE_ROOT_DIR: str = '', convention: str = sentinel.Mandatory, log: dataclasses.InitVar = <Logger src.data_manager (WARNING)>, config_file: str = None)[source]¶
-
class
src.data_sources.ExplicitFileDataSource(*args, **kwargs)[source]¶ Bases:
src.data_manager.OnTheFlyGlobQueryMixin,src.data_manager.LocalFetchMixin,src.data_manager.DataframeQueryDataSourceBaseDataSource for dealing data in a regular directory hierarchy on a locally mounted filesystem. Assumes data for each variable may be split into several files according to date, with the dates present in their filenames.
-
_FileRegexClass¶ alias of
GlobbedDataFile
-
_AttributesClass¶ alias of
ExplicitFileDataAttributes
-
_DiagnosticClass¶ alias of
src.diagnostic.Diagnostic
-
_PreprocessorClass¶ alias of
MetadataRewritePreprocessor
-
col_spec= DataframeQueryColumnSpec(expt_cols=<src.data_manager.DataFrameQueryColumnGroup object>, pod_expt_cols=<src.data_manager.DataFrameQueryColumnGroup object>, var_expt_cols=<src.data_manager.DataFrameQueryColumnGroup object>, remote_data_col=None, daterange_col=None)¶
-
expt_key_cols= ()¶
-
expt_cols= ()¶
-
property
CATALOG_DIR¶ Placeholder class used in the definition of the
abstract_attribute()decorator.
-
parse_config(config_d)[source]¶ Parse contents of JSON config file into a list of :class`ExplicitFileDataSourceConfigEntry` objects.
-
iter_globs()[source]¶ Iterator returning
FileGlobTupleinstances. The generated catalog contains the union of the files found by each of the globs.
-
_abc_impl= <_abc_data object>¶
-
-
class
src.data_sources.CMIP6DataSourceAttributes(CASENAME: str = sentinel.Mandatory, FIRSTYR: str = sentinel.Mandatory, LASTYR: str = sentinel.Mandatory, CASE_ROOT_DIR: str = '', convention: str = 'CMIP', log: dataclasses.InitVar = <Logger src.data_manager (WARNING)>, activity_id: str = '', institution_id: str = '', source_id: str = '', experiment_id: str = '', variant_label: str = '', grid_label: str = '', version_date: str = '', model: dataclasses.InitVar = '', experiment: dataclasses.InitVar = '')[source]¶ Bases:
src.data_manager.DataSourceAttributesBase-
model: dataclasses.InitVar = ''¶
-
experiment: dataclasses.InitVar = ''¶
-
-
class
src.data_sources.CMIP6ExperimentSelectionMixin[source]¶ Bases:
objectEncapsulate attributes and logic used for CMIP6 experiment disambiguation so that it can be reused in DataSources with different parents (eg. different FetchMixins for different data fetch protocols.)
Assumes inheritance from DataframeQueryDataSourceBase – should enforce this.
-
_query_attrs_synonyms= {'name': 'variable_id'}¶
-
property
CATALOG_DIR¶
-
_query_group_hook(group_df)[source]¶ Eliminate regional (Antarctic/Greenland) and spatially averaged data from consideration for data fetch, since no POD currently makes use of data of this type.
-
resolve_expt(df, obj)[source]¶ Disambiguate experiment attributes that must be the same for all variables in this case:
- If variant_id (realization, forcing, etc.) not specified by user,
choose the lowest-numbered variant
If version_date not set by user, choose the most recent revision
-
-
class
src.data_sources.CMIP6LocalFileDataSource(*args, **kwargs)[source]¶ Bases:
src.data_sources.CMIP6ExperimentSelectionMixin,src.data_manager.LocalFileDataSourceDataSource for handling model data named following the CMIP6 DRS and stored on a local filesystem.
-
_FileRegexClass¶ alias of
src.cmip6.CMIP6_DRSPath
-
_DirectoryRegex= {}¶
-
_AttributesClass¶ alias of
CMIP6DataSourceAttributes
-
_DiagnosticClass¶ alias of
src.diagnostic.Diagnostic
-
_PreprocessorClass¶ alias of
src.preprocessor.DefaultPreprocessor
-
col_spec= DataframeQueryColumnSpec(expt_cols=<src.data_manager.DataFrameQueryColumnGroup object>, pod_expt_cols=<src.data_manager.DataFrameQueryColumnGroup object>, var_expt_cols=<src.data_manager.DataFrameQueryColumnGroup object>, remote_data_col=None, daterange_col='date_range')¶
-
_convention= 'CMIP'¶
-
_abc_impl= <_abc_data object>¶
-