GFDL-specific information ========================= This page contains information specific to the site installation at the `Geophysical Fluid Dynamics Laboratory `__ (GFDL), Princeton, NJ, USA. Site installation ----------------- The DET team maintains a site-wide installation of the framework and all supporting data at /home/mdteam/DET/analysis/mdtf/MDTF-diagnostics. This is kept up-to-date and is accessible from both workstations and PPAN; in particular it is **not** necessary for an end user to set up conda environments or download any supporting data, as described in the installation instructions. Invoking the package from the site installation's wrapper script automatically prepends ``--site="NOAA_GFDL"`` to the user's command-line flags. Please contact us if your use case can't be accommodated by this installation. Additional ways to invoke the package ------------------------------------- The site installation provides alternative ways to run the diagnostics within GFDL's existing workflow: 1. Called from an interactive shell on PPAN or workstations. This is the standard mode of running the package, described in the rest of the documentation. 2. As a batch job on PPAN, managed via slurm. This previously required its own wrapper script, but now can be done using the same entry point and CLI options as for interactive execution. 3. Within FRE XMLs. This is done by calling the `mdtf_gfdl.csh `__ wrapper script from an ```` tag in the XML. Currently, FRE requires that each analysis script be associated with a single model ````. This poses difficulties for diagnostics which use data generated by multiple components. We provide two ways to address this issue: A. If it's known ahead of time that a given ```` will dominate the run time and finish last, one can call ``mdtf_gfdl.csh`` from an ```` tag in that component only. In this case, the framework will search all data present in the /pp/ output directory when it's called. The ```` being used doesn't need to generate data analyzed by the diagnostics; in this case it's only used to schedule the diagnostics' execution. B. If one doesn't know which ```` will finish last, a more robust solution is to call ``mdtf_gfdl.csh --component_only`` from *each* ```` generating data to be analyzed. When the ``--component_only`` flag is set, every time the framework is called it will *only* run the diagnostics for which all the input data is available *and* which haven't run already (which haven't written their output to ``$OUTPUT_DIR``. Additional data sources ----------------------- In addition to the framework's :ref:`built-in data sources`, several data sources are defined that are only accessible to GFDL users. All the data sources in this section use GFDL's in-house General Copy Program (GCP, not to be confused with Google Compute Platform) for all file transfers. If GCP is not present on ``$PATH`` when the package is started, the package will load the appropriate environment module. Any data which is on GFDL's DMF tape-backed filesystem will be requested with ``dmget`` prior to copy. All files requested by all PODs are batched into a single call to ``dmget`` and to GCP. Framework execution blocks after the call to ``dmget`` is issued (the framework has no other tasks to do until the data is transferred locally), which can lead to long or unpredictable run times if data that has been migrated to tape is requested. CMIP6 data on the Unified Data Archive ++++++++++++++++++++++++++++++++++++++ Selected via ``--data-manager="CMIP6_UDA"``. Data source for analyzing CMIP6 data made available on on the Unified Data Archive (UDA)'s high-priority storage at /uda/CMIP6. Command-line options and method of operation are the same as documented in :ref:`ref-data-source-cmip6`. CMIP6 data on the /archive filesystem +++++++++++++++++++++++++++++++++++++ Selected via ``--data-manager="CMIP6_archive"``. The same as above, but for analyzing the wider range of CMIP6 data on the DMF filesystem at /archive/pcmdi/repo/CMIP6. Command-line options and method of operation are the same as documented in :ref:`ref-data-source-cmip6`. CMIP6 data on /data\_cmip6 ++++++++++++++++++++++++++ Selected via ``--data-manager="CMIP6_data_cmip6"``. The same as above, but for analyzing pre-publication data on /data\_cmip6/CMIP6 (only mounted on PPAN). Command-line options and method of operation are the same as documented in :ref:`ref-data-source-cmip6`. Results of FREPP-processed runs +++++++++++++++++++++++++++++++ Selected via ``--data-manager="GFDL_PP"``. This data source searches for model data produced using GFDL's in-house postprocessing tool, FREPP. Note that this is a completely separate concern from invoking the package from the FRE pipeline (described above): data that has been processed and saved in this convention can be analyzed equally well in any of the package's modes of operation. **Command-line options** <*CASE_ROOT_DIR*> should be set to the root of the postprocessing directory hierarchy (i.e., should end in ``/pp``). --any-components If this flag is set, the data source will return data from different model ````\s requested by the same POD. This is necessary for, e.g., PODs that compare data from different modeling realms. The default behavior is to require all variables requested by a POD to come from the same model ````. **Data selection heuristics** This data source implements the following logic to guarantee that all data it provides to the PODs are consistent, i.e. that the variables selected have been generated from the same run of the same model. An error will be raised if no set of variables can be found that satisfy the user's input above and the following requirements: * This data source only searches data saved as time series (``/ts/``), rather than time averages, since no POD is currently designed to use time-averaged data. * If the same data has been saved in files of varying chronological length (````), the shortest ```` is used, in order to minimize the amount of data that is transferred but not used (because it falls outside of the user's analysis period). * Unless the ``--any-components`` flag is set, the model ```` must be the same for all variables requested by a POD, but can be different for different PODs. The same value will be chosen for all PODs if possible. Setting the ``--any-components`` flag drops this restriction. * If the same data is provided by multiple model ````\s, a single ```` is selected via the following heuristics: - Preference is given to model components starting with "cmip" (case insensitive), in order to support analysis of data produced as part of CMIP6. - If multiple ````\s are still eligible, the one with the fewest words in the identifier (separated by underscores) is selected; in case of a tie, the ```` name with the shortest overall string length is used. Quasi-automated source selection ++++++++++++++++++++++++++++++++ Selected via ``--data-manager="GFDL_auto"``. Provided mostly for backwards compatibility, this dispatches operation to the ``CMIP6_UDA`` or ``GFDL_PP`` data sources based on whether <*CASE_ROOT_DIR*> is a valid postprocessing directory. Command-line options are the union of those for the ``CMIP6_UDA`` or ``GFDL_PP`` data sources. Additional command-line options ------------------------------- In addition to the framework's built-in `command-line options <../sphinx/ref_cli.html>`__, the following site-specific options are recognized. For long command line flags, words may be separated with hyphens (GNU standard) or with underscores (python variable name convention). For example, ``--file-transfer-timeout`` and ``--file_transfer_timeout`` are both recognized by the package as synonyms for the same setting. GFDL-specific flags +++++++++++++++++++ The following new flags are added: --GFDL-PPAN-TEMP If running on the GFDL PPAN cluster, set the ``$MDTF_TMPDIR`` environment variable to this location and create temp files here. This must be a location accessible via GCP, and the package does not currently verify this. Defaults to ``$TMPDIR``. --GFDL-WS-TEMP If running on a GFDL workstation, set the ``$MDTF_TMPDIR`` environment variable to this location and create temp files here. The directory will be created if it doesn't exist. This must be accessible via GCP, and the package does not currently verify this. Defaults to /net2/``$USER``/tmp. --frepp Normally this is set by the `mdtf_gfdl.csh `__ wrapper script, and not directly by the user. Set flag to invoke the framework in the FRE-based execution mode (3A. or 3B. above), processing data as part of the FRE pipeline. --ignore-component Normally this is set by the `mdtf_gfdl.csh `__ wrapper script, and not directly by the user. If set, this flag tells the framework to search the entire /pp/ directory for model data (mode 3A. above); default is to restrict to model component passed by FRE. Ignored if ``--frepp`` is not set. GFDL-specific default values ++++++++++++++++++++++++++++ The following paths are set to more useful default values: --OBS-DATA-REMOTE Site-specific installation of observational data used by individual PODs at /home/Oar.Gfdl.Mdteam/DET/analysis/mdtf/obs\_data. If running on PPAN, this data will be GCP'ed to the current node. If running on a workstation, it will be symlinked. --OBS-DATA-ROOT Local directory for observational data. Defaults to ``$MDTF_TMPDIR``/inputdata/obs_data, where the environment variable ``$MDTF_TMPDIR`` is defined as described above. --MODEL-DATA-ROOT Local directory used as a destination for downloaded model data. Defaults to ``$MDTF_TMPDIR``/inputdata/model, where the environment variable ``$MDTF_TMPDIR`` is defined as described above. --WORKING-DIR Working directory. Defaults to ``$MDTF_TMPDIR``/wkdir, where the environment variable ``$MDTF_TMPDIR`` is defined as described above. -o, --OUTPUT-DIR Destination for output files. Defaults to ``$MDTF_TMPDIR``/mdtf_out, which will be created if it doesn't exist.