Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add set_attr config file options to override the metadata read from input gridded data files. #1020

Closed
dwfncar opened this issue Jun 28, 2018 · 9 comments
Labels
priority: blocker Blocker type: enhancement Improve something that it is currently doing
Milestone

Comments

@dwfncar
Copy link
Contributor

dwfncar commented Jun 28, 2018

This idea came up during a met_help exchange with Ying Lin at NCEP:


https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85872


Here's the idea:

There is currently no way to override what Point-Stat writes to the FCST_VAR output column. There actually is logic for doing this in STAT-Analysis. You can use the "-set_hdr" option to explicitly specify the contents of the output header columns. For example, if you run STAT-Analysis to aggregate data where VX_MASK = EAST and VX_MASK = WEST, the output header column will, by default, be written as a concatenation of the unique input strings: VX_MASK = EAST,WEST. But you can manually override that by setting -set_hdr VX_MASK CONUS. And then EAST,WEST will be replaced by CONUS.

Perhaps we could add a similar config file option for Point-Stat, Grid-Stat, Ensemble-Stat, Wavelet-Stat, MODE, and MTD to do what STAT-Analysis is already doing:
   set_hdr_column = [ "FCST_VAR" ];
   set_hdr_value = [ "APCP_24" ];

Data is messy and this would give us an option for cleaning it up. [MET-1020] created by johnhg
@dwfncar dwfncar added this to the MET 9.0 milestone Apr 30, 2019
@JohnHalleyGotway JohnHalleyGotway added priority: medium Medium Priority type: enhancement Improve something that it is currently doing labels May 17, 2019
@JohnHalleyGotway
Copy link
Collaborator

On 5/17/2019, Tracy pointed out that this functionality would be useful for AF T&E projects.

@JohnHalleyGotway JohnHalleyGotway added priority: high and removed priority: medium Medium Priority labels May 17, 2019
@TaraJensen
Copy link
Contributor

Related to

@JohnHalleyGotway JohnHalleyGotway changed the title Consider adding set_hdr config file options to the MET statistics tool. Consider adding set attribute (set_attr) config file options to the MET statistics tool. Nov 14, 2019
@JohnHalleyGotway JohnHalleyGotway changed the title Consider adding set attribute (set_attr) config file options to the MET statistics tool. Consider adding set attribute (set_attrs) config file options to the MET statistics tool. Nov 14, 2019
@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Nov 14, 2019

This really is a way of overriding the metadata for gridded input files. And this can be useful for tools which compute .stat output files as well as those that don't, like pcp_combine or regrid_data_plane. Instead of defining how the output columns should be set, just define the metadata, as shown below. This mirrors the "attr" dictionary used for python embedding. And these settings can be parsed into the VarInfoBase class so they apply to all derived gridded data file types:

set_attrs = {
valid = "timestring";
init = "timestring";
lead = "timestring";
accum = "timestring";
name = "string";
long_name = "string";
level = "string";
units = "string";
grid = "path to gridded data file or string defining the grid";

is_u_wind = boolean;
is_v_wind = boolean;
is_grid_relative = boolean;
is_wind_speed = boolean;
is_wind_direction = boolean;
is_precip = boolean;
is_specific_humidity = boolean;
}

Only the settings the user would like to override need to be defined. The boolean flags at the end are needed to enable python embedding to verify vector winds, as requested by NRL via met-help:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=93123

@JohnHalleyGotway
Copy link
Collaborator

Need to specifically test to make sure that these changes enable python-embedding to compute VL1L2 and VCNT output.

Will need to add logic somewhere to handle...
(1) Using U and V to rotate winds from grid to earth relative.
(2) Using U and V and compute vector output stats.

@JohnHalleyGotway
Copy link
Collaborator

On 5/29/2020, NRL wrote met-help with a question about setting the OBS_UNITS column for Point-Stat output: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=95402

Adding the "set_attrs" functionality would enable them to do so since there are no unit strings prescribed for point observations.

@georgemccabe
Copy link
Collaborator

Add METplus issue to test that something like this works with these changes:

FCST_VAR1_OPTIONS = set_attrs={units='K'; valid=...;}

@JohnHalleyGotway JohnHalleyGotway added this to To do in MET-9.1-beta2 (7/14/20) via automation Jun 2, 2020
JohnHalleyGotway added a commit that referenced this issue Jun 2, 2020
…f the set_attrs dictionary config entries. Still need to add code to parse those entries and update the tests and documentation.
JohnHalleyGotway added a commit that referenced this issue Jun 5, 2020
JohnHalleyGotway added a commit that referenced this issue Jun 5, 2020
…units to the OBS_UNITS output column instead of writing a constant NA string.
JohnHalleyGotway added a commit that referenced this issue Jun 23, 2020
…d spec parsed from the file. Also, update the process_data_plane() function to handle updating the metadata and grid defintions.
JohnHalleyGotway added a commit that referenced this issue Jun 24, 2020
…le, parse them as set_attrs.set_name instead of set_name from the set_attrs dictionary. That makes setting up the config file more flexible.
JohnHalleyGotway added a commit that referenced this issue Jun 24, 2020
…evel_attr, and units_attr instead of name(), level_name(), and units(). This change should be made everywhere NetCDF output files are written.
JohnHalleyGotway added a commit that referenced this issue Jun 24, 2020
JohnHalleyGotway added a commit that referenced this issue Jun 24, 2020
…nd units() with calls to name_attr(), level_attr(), and units_attr().
JohnHalleyGotway added a commit that referenced this issue Jun 24, 2020
…nits() with calls to name_attr(), level_attr(), and units_attr().
JohnHalleyGotway added a commit that referenced this issue Jun 24, 2020
… units() with calls to name_attr(), level_attr(), and units_attr().
JohnHalleyGotway added a commit that referenced this issue Jun 24, 2020
…eplace calls to name(), level_name(), and units() with calls to name_attr(), level_attr(), and units_attr().
JohnHalleyGotway added a commit that referenced this issue Jun 24, 2020
…el_name(), and units() with calls to name_attr(), level_attr(), and units_attr().
JohnHalleyGotway added a commit that referenced this issue Jun 24, 2020
…d units() with calls to name_attr(), level_attr(), and units_attr().
JohnHalleyGotway added a commit that referenced this issue Jun 25, 2020
…). The existing changes included *'s and don't actually match the name of the NetCDF variable.
JohnHalleyGotway added a commit that referenced this issue Jun 25, 2020
…). The existing changes included *'s and don't actually match the name of the NetCDF variable. (#1389)
@JohnHalleyGotway
Copy link
Collaborator

In the METplus meeting on 6/25/20, @CPKalb mentioned the need to use this with pcp_combine. Add a unit test which runs pcp_combine with set_attrs.set_valid on the -derive option to report the output valid time as the middle of the averaging period rather than the end which is the default behavior.

JohnHalleyGotway added a commit that referenced this issue Jun 25, 2020
…rning about setting a header column to a null string.
JohnHalleyGotway added a commit that referenced this issue Jun 25, 2020
@JohnHalleyGotway
Copy link
Collaborator

The changes in feature_1020_set_attrs were compiling and running fine on my Mac laptop. However, testing them on kiowa and dakota both revealed a runtime error:
ERROR : StringArray::operator const -> range check error!

I ran through the debugger and also added print statements to figure out that this occurs when parsing the "is_..." flags. The parse_set_attrs_flag() function was actually parsing "set_attrs.is_precipitation" and the problem appeared to originate from vx_config when trying to parse those 2 levels from higher and higher levels of context. I was not able to identify/fix the specific problem.

In order to make things easier for both the user and the parsing code, I decided to abandon the "set_attrs" dictionary entirely. Instead, just parse bareword entries, not contained in a dictionary:

set_attr_name = "string"; ->
set_attr_long_name = "string";
set_attr_level = "string";
set_attr_units = "string";

set_attr_valid = "timestring";
set_attr_init = "timestring";
set_attr_lead = "timestring";

set_attr_grid = "path to gridded data file or string defining the grid";

is_precipitation = boolean;
is_specific_humidity = boolean;
is_u_wind = boolean;
is_v_wind = boolean;
is_grid_relative = boolean;
is_wind_speed = boolean;
is_wind_direction = boolean;
is_prob = boolean;

JohnHalleyGotway pushed a commit that referenced this issue Jul 4, 2020
…ng all of these metadata modifiers into a set_attrs dictionary, I'm now parsing them all as individual entries. See #1020 issue comments for more details.
@JohnHalleyGotway JohnHalleyGotway moved this from In progress to Pull request review in MET-9.1-beta2 (7/14/20) Jul 4, 2020
@JohnHalleyGotway JohnHalleyGotway changed the title Consider adding set attribute (set_attrs) config file options to the MET statistics tool. Add set_attr config file options to override the metadata read from input gridded data files. Jul 4, 2020
JohnHalleyGotway added a commit that referenced this issue Jul 8, 2020
…e user-specified long_name string in the NetCDF matched pairs file from Grid-Stat.
JohnHalleyGotway added a commit that referenced this issue Jul 9, 2020
* Per #1020, update the VarInfo class heirarchy to store the contents of the set_attrs dictionary config entries. Still need to add code to parse those entries and update the tests and documentation.

* Per #1020, update the VarInfo class to parse and store the set_attrs dictionary members.

* Per #1020, update point_stat and ensemble_stat to write the obs_info units to the OBS_UNITS output column instead of writing a constant NA string.

* Per #1020, adds attr access functions to the VarInfo class. VarInfo::name_attr() for example returns SetAttrsName, if set, and simply Name otherwise.

* Per #1020, this is work in progress. Swapping out calls to shc set_fcst_var, set_obs_var, set_fcst_level, set_obs_level, set_fcst_units, and set_obs_units with calls to the corresponding attrs access function. Still many more updates to the code required.

* Per #1020, add support for set_attrs.set_accum to override the accumulation interval read from the data.

* Per #1020, add a set_attrs() utility function to update the metadata in a DataPlane object using the contents of a set_attr Dictionary stored in a VarInfo object.

* Per #1020, add grid.nxy() utility function to return Nx*Ny instead of always having to type out grid.nx()*grid.ny().

* Per #1020, switch VarInfo::SetAttrsGrid from a ConcatString to a Grid object. And parse that grid in the set_dict() function. The huge drawback here is that the grid may be specified as a named grid or using a grid specification string... but it cannot be defined as the path to a gridded data file. The library dependency logic is just too complex, and I'd need to move a lot of code around to make this work. For now, just leave that feature out.

* No real change, just adding a blank line that was missing.

* Per #1020, add Met2dDataFile::set_grid() function to override the grid spec parsed from the file. Also, update the process_data_plane() function to handle updating the metadata and grid defintions.

* Per #1020, had to add -lvx_color to 5 Makefile.am files to get them linking again.

* Per #1020, work on log messages to make the set_attrs parsing more clear.

* Per #1020, update logic for how to parse set_attrs entries. For example, parse them as set_attrs.set_name instead of set_name from the set_attrs dictionary. That makes setting up the config file more flexible.

* Per #1020, update Grid-Stat to write NetCDF output using name_attr, level_attr, and units_attr instead of name(), level_name(), and units(). This change should be made everywhere NetCDF output files are written.

* Per #1020, update the parsing logic to check for embedded whitespace in name, level, and units.

* Per #1020, removed unused SetAttrsEnsemble option.

* Per #1020, add new test in unit_grid_stat.xml to exercise the set_attrs functionality.

* Per #1020, update data/config/README with information about the set_attrs dictionary.

* Per #1020, in Ensemble-Stat, replace calls to name(), level_name(), and units() with calls to name_attr(), level_attr(), and units_attr().

* Per #1020, in Grid-Diag, replace calls to name(), level_name(), and units() with calls to name_attr(), level_attr(), and units_attr().

* Deleting stale, commented out, development code.

* Per #1020, in MODE and MTD replace calls to name(), level_name(), and units() with calls to name_attr(), level_attr(), and units_attr().

* Per #1020, in PCP-Combine, Shift-Data-Plane, and Regrid-Data-Plane, replace calls to name(), level_name(), and units() with calls to name_attr(), level_attr(), and units_attr().

* Per #1020, in TCRMW and Series-Analysis, replace calls to name(), level_name(), and units() with calls to name_attr(), level_attr(), and units_attr().

* Per #1020, in Wavelet-Stat, replace calls to name(), level_name(), and units() with calls to name_attr(), level_attr(), and units_attr().

* src/tools/core/wavelet_stat/wavelet_stat.cc

* src/tools/tc_utils/tc_rmw/tc_rmw.cc

* Adding a couple of spaces to unit_grid_stat.xml to be safe.

* Per #1020, update call to set_obs_units() in point-stat to avoid a warning about setting a header column to a null string.

* Per #1020, changing the order of parsing to see if this fixes the runtime issues I found on dakota.

* Per #1020, so this represents a course correction. Rather than grouping all of these metadata modifiers into a set_attrs dictionary, I'm now parsing them all as individual entries. See #1020 issue comments for more details.

* Per #1020, respond to PR comments by updating Grid-Stat to include the user-specified long_name string in the NetCDF matched pairs file from Grid-Stat.

Co-authored-by: John Halley Gotway <johnhg@kiowa.rap.ucar.edu>
@JohnHalleyGotway JohnHalleyGotway moved this from Pull request review to Done in MET-9.1-beta2 (7/14/20) Jul 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: blocker Blocker type: enhancement Improve something that it is currently doing
Projects
No open projects
Development

No branches or pull requests

4 participants