src.data_sources module¶

Implementation classes for the model data query/fetch functionality implemented in src/data_manager.py, selected by the user via --data_manager.

class src.data_sources.SampleDataFile(first_arg=None, *args, **kwargs)[source]¶

Bases: object

Dataclass describing catalog entries for sample model data files.

sample_dataset: str = sentinel.Mandatory¶

frequency: src.util.datelabel.DateFrequency = sentinel.Mandatory¶

variable: str = sentinel.Mandatory¶

remote_path: str = sentinel.Mandatory¶

_is_regex_dataclass = True¶

_pattern = {}¶

classmethod from_string(str_, *args)¶

class src.data_sources.SampleDataAttributes(CASENAME: str = sentinel.Mandatory, FIRSTYR: str = sentinel.Mandatory, LASTYR: str = sentinel.Mandatory, CASE_ROOT_DIR: str = '', convention: str = sentinel.Mandatory, log: dataclasses.InitVar = <Logger src.data_manager (WARNING)>, sample_dataset: str = '')[source]¶

Bases: src.data_manager.DataSourceAttributesBase

Data-source-specific attributes for the DataSource providing sample model data.

sample_dataset: str = ''¶

_set_case_root_dir(log=<Logger src.data_sources (WARNING)>)[source]¶: Additional logic to set CASE_ROOT_DIR from MODEL_DATA_ROOT.

class src.data_sources.SampleLocalFileDataSource(*args, **kwargs)[source]¶

Bases: src.data_manager.SingleLocalFileDataSource

DataSource for handling POD sample model data stored on a local filesystem.

_FileRegexClass¶: alias of SampleDataFile

_AttributesClass¶: alias of SampleDataAttributes

_DiagnosticClass¶: alias of src.diagnostic.Diagnostic

_PreprocessorClass¶: alias of src.preprocessor.SampleDataPreprocessor

col_spec = DataframeQueryColumnSpec(expt_cols=<src.data_manager.DataFrameQueryColumnGroup object>, pod_expt_cols=<src.data_manager.DataFrameQueryColumnGroup object>, var_expt_cols=<src.data_manager.DataFrameQueryColumnGroup object>, remote_data_col=None, daterange_col=None)¶

_query_attrs_synonyms = {'name': 'variable'}¶

property CATALOG_DIR¶: Placeholder class used in the definition of the abstract_attribute() decorator.

_abc_impl = <_abc_data object>¶

class src.data_sources.MetadataRewriteParser(data_mgr, pod)[source]¶

Bases: src.xr_parser.DefaultDatasetParser

After loading and parsing the metadata on dataset ds but before applying the preprocessing functions, update attrs on ds with the new metadata values that were specified in ExplicitFileDataSource’s config file.

setup(data_mgr, pod)[source]¶

Make a lookup table to map VarlistEntry IDs to the set of metadata that we need to alter.

If user has provided the name of variable used by the data files (via the var_name attribute), set that as the translated variable name. Otherwise, variables are untranslated, and we use the herusitics in xr_parser.DefaultDatasetParser.guess_dependent_var() to determine the name.

_post_normalize_hook(var, ds)[source]¶

After loading the metadata on dataset ds but before reconciling it with the record, update attrs with the new metadata values that were specified in ExplicitFileDataSource’s config file.

Normal operation is to set the changed attrs on the VarlistEntry translation, and then have these overwrite attrs in ds in the inherited xr_parser.DefaultDatasetParser.reconcile_variable() method. If the user set the --disable-preprocessor flag, this is skipped, so instead we set the attrs directly on ds.

class src.data_sources.MetadataRewritePreprocessor(*args, **kwargs)[source]¶

Bases: src.preprocessor.DaskMultiFilePreprocessor

Subclass DaskMultiFilePreprocessor in order to look up and apply edits to metadata that are stored in ExplicitFileDataSourceConfigEntry objects in the config_by_id attribute of ExplicitFileDataSource.

_file_preproc_functions = []¶

_XarrayParserClass¶: alias of MetadataRewriteParser

property _functions¶

Determine which preprocessor functions are applicable to the current package run, defaulting to all of them.

Returns: tuple of classes (inheriting from PreprocessorFunctionBase) listing the preprocessing functions to be called, in order.

_abc_impl = <_abc_data object>¶

class src.data_sources.GlobbedDataFile(first_arg=None, *args, **kwargs)[source]¶

Bases: object

Applies a trivial regex to the paths returned by the glob.

dummy_group: str = sentinel.Mandatory¶

remote_path: str = sentinel.Mandatory¶

_is_regex_dataclass = True¶

_pattern = {}¶

classmethod from_string(str_, *args)¶

class src.data_sources.ExplicitFileDataSourceConfigEntry(glob_id: src.util.basic.MDTF_ID = None, pod_name: str = sentinel.Mandatory, name: str = sentinel.Mandatory, glob: str = sentinel.Mandatory, var_name: str = '', metadata: dict = <factory>, _has_user_metadata: bool = None)[source]¶

Bases: object

glob_id: src.util.basic.MDTF_ID = None¶

pod_name: str = sentinel.Mandatory¶

name: str = sentinel.Mandatory¶

glob: str = sentinel.Mandatory¶

var_name: str = ''¶

metadata: dict¶

_has_user_metadata: bool = None¶

property full_name¶

classmethod from_struct(pod_name, var_name, v_data)[source]¶

to_file_glob_tuple()[source]¶

class src.data_sources.ExplicitFileDataAttributes(CASENAME: str = sentinel.Mandatory, FIRSTYR: str = sentinel.Mandatory, LASTYR: str = sentinel.Mandatory, CASE_ROOT_DIR: str = '', convention: str = sentinel.Mandatory, log: dataclasses.InitVar = <Logger src.data_manager (WARNING)>, config_file: str = None)[source]¶

Bases: src.data_manager.DataSourceAttributesBase

config_file: str = None¶

class src.data_sources.ExplicitFileDataSource(*args, **kwargs)[source]¶

Bases: src.data_manager.OnTheFlyGlobQueryMixin, src.data_manager.LocalFetchMixin, src.data_manager.DataframeQueryDataSourceBase

DataSource for dealing data in a regular directory hierarchy on a locally mounted filesystem. Assumes data for each variable may be split into several files according to date, with the dates present in their filenames.

_FileRegexClass¶: alias of GlobbedDataFile

_AttributesClass¶: alias of ExplicitFileDataAttributes

_DiagnosticClass¶: alias of src.diagnostic.Diagnostic

_PreprocessorClass¶: alias of MetadataRewritePreprocessor

col_spec = DataframeQueryColumnSpec(expt_cols=<src.data_manager.DataFrameQueryColumnGroup object>, pod_expt_cols=<src.data_manager.DataFrameQueryColumnGroup object>, var_expt_cols=<src.data_manager.DataFrameQueryColumnGroup object>, remote_data_col=None, daterange_col=None)¶

expt_key_cols = ()¶

expt_cols = ()¶

property CATALOG_DIR¶: Placeholder class used in the definition of the abstract_attribute() decorator.

parse_config(config_d)[source]¶: Parse contents of JSON config file into a list of :class`ExplicitFileDataSourceConfigEntry` objects.

iter_globs()[source]¶: Iterator returning FileGlobTuple instances. The generated catalog contains the union of the files found by each of the globs.

_abc_impl = <_abc_data object>¶

class src.data_sources.CMIP6DataSourceAttributes(CASENAME: str = sentinel.Mandatory, FIRSTYR: str = sentinel.Mandatory, LASTYR: str = sentinel.Mandatory, CASE_ROOT_DIR: str = '', convention: str = 'CMIP', log: dataclasses.InitVar = <Logger src.data_manager (WARNING)>, activity_id: str = '', institution_id: str = '', source_id: str = '', experiment_id: str = '', variant_label: str = '', grid_label: str = '', version_date: str = '', model: dataclasses.InitVar = '', experiment: dataclasses.InitVar = '')[source]¶

Bases: src.data_manager.DataSourceAttributesBase

convention: str = 'CMIP'¶

activity_id: str = ''¶

institution_id: str = ''¶

source_id: str = ''¶

experiment_id: str = ''¶

variant_label: str = ''¶

grid_label: str = ''¶

version_date: str = ''¶

model: dataclasses.InitVar = ''¶

experiment: dataclasses.InitVar = ''¶

CATALOG_DIR: str¶

class src.data_sources.CMIP6ExperimentSelectionMixin[source]¶

Bases: object

Encapsulate attributes and logic used for CMIP6 experiment disambiguation so that it can be reused in DataSources with different parents (eg. different FetchMixins for different data fetch protocols.)

Assumes inheritance from DataframeQueryDataSourceBase – should enforce this.

_query_attrs_synonyms = {'name': 'variable_id'}¶

property CATALOG_DIR¶

_query_group_hook(group_df)[source]¶: Eliminate regional (Antarctic/Greenland) and spatially averaged data from consideration for data fetch, since no POD currently makes use of data of this type.

static _filter_column(df, col_name, func, obj_name)[source]¶

_filter_column_min(df, obj_name, *col_names)[source]¶

_filter_column_max(df, obj_name, *col_names)[source]¶

resolve_expt(df, obj)[source]¶

Disambiguate experiment attributes that must be the same for all variables in this case:

If variant_id (realization, forcing, etc.) not specified by user,
choose the lowest-numbered variant
If version_date not set by user, choose the most recent revision

resolve_pod_expt(df, obj)[source]¶

Disambiguate experiment attributes that must be the same for all variables for each POD:

Prefer regridded to native-grid data (questionable)
If multiple regriddings available, pick the lowest-numbered one

resolve_var_expt(df, obj)[source]¶

Disambiguate arbitrary experiment attributes on a per-variable basis:

If the same variable appears in multiple MIP tables, select the first
MIP table in alphabetical order.

class src.data_sources.CMIP6LocalFileDataSource(*args, **kwargs)[source]¶

Bases: src.data_sources.CMIP6ExperimentSelectionMixin, src.data_manager.LocalFileDataSource

DataSource for handling model data named following the CMIP6 DRS and stored on a local filesystem.

_FileRegexClass¶: alias of src.cmip6.CMIP6_DRSPath

_DirectoryRegex = {}¶

_AttributesClass¶: alias of CMIP6DataSourceAttributes

_DiagnosticClass¶: alias of src.diagnostic.Diagnostic

_PreprocessorClass¶: alias of src.preprocessor.DefaultPreprocessor

col_spec = DataframeQueryColumnSpec(expt_cols=<src.data_manager.DataFrameQueryColumnGroup object>, pod_expt_cols=<src.data_manager.DataFrameQueryColumnGroup object>, var_expt_cols=<src.data_manager.DataFrameQueryColumnGroup object>, remote_data_col=None, daterange_col='date_range')¶

_convention = 'CMIP'¶

_abc_impl = <_abc_data object>¶

src.data_sources module¶

MDTF Diagnostics

Navigation

Related Topics