Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISIS PVL - Operator Error or Improvement to the OmniParser? #108

Open
jlaura opened this issue Apr 23, 2024 · 13 comments
Open

ISIS PVL - Operator Error or Improvement to the OmniParser? #108

jlaura opened this issue Apr 23, 2024 · 13 comments

Comments

@jlaura
Copy link

jlaura commented Apr 23, 2024

Describe the bug
The following PVL fails to parse, presumably because of the keyword 'Group' inside a canonical Group.

To Reproduce

import pvl
pvl.loads("""
Group = Keyword1
  Auto
  InputKey       = CSMPlatformId
  InputGroup     = "IsisCube,CsmInfo"
  InputPosition  = (IsisCube, CsmInfo)
  OutputName     = Keyword1
  OutputPosition = (Group, SerialNumberKeywords)
  Translation    = (* , *)
End_Group

Group = Keyword2
  Auto
  InputKey       = CSMInstrumentId
  InputGroup     = "IsisCube,CsmInfo"
  InputPosition  = (IsisCube, CsmInfo)
  OutputName     = Keyword2
  OutputPosition = (Group, SerialNumberKeywords)
  Translation    = (*, *)
End_Group

Group = Keyword3
  Auto
  InputKey       = ReferenceTime
  InputGroup     = "IsisCube,CsmInfo"
  InputPosition  = (IsisCube, CsmInfo)
  OutputName     = Keyword3
  OutputPosition = (Group, SerialNumberKeywords)
  Translation    = (*, *)
End_Group
End
""")

Perhaps I should be parameterizing differently?

Expected behavior
Instantiation of a PVL object.

Error logs or terminal captures
If applicable, add examples to help explain your problem.

Your Environment (please complete the following information):

  • OS: Linux
  • Environment information: Python 3.11
  • pvl Version:1.3.2

Additional context
Add any other context about the problem here.

@rbeyer
Copy link
Member

rbeyer commented Apr 24, 2024

Admittedly, the error messages you are getting are not helping you, and that's my fault. The above PVL text contains two problems (but without debugging the code for an hour like I did, the error messages are useless):

The first problem is that you are using a reserved word.

When trying to run the above, you will have gotten an error that looks like this:

pvl.exceptions.LexerError: (LexerError(...), 'While parsing, expected a comma (,)but found: "Group": line 7 column 21 (char 186) near "= (Group,"')

However, what this error message really needs to tell you is that you have used one of PVL's seven reserved words (in this case, "Group") as an unquoted string in your PVL text:

OutputPosition = (Group, SerialNumberKeywords)

If you change this to:

OutputPosition = ("Group", SerialNumberKeywords)

and in all of the other cases where you use this, then this error goes away.

The second error is that you just have a token hanging out that doesn't conform to the PVL spec.

Once you quote all of the instances of "Group" and then try and load this again, you will get this error:

pvl.exceptions.LexerError: (LexerError(...), 'Expecting an Aggregation Block, an Assignment Statement, or an End Statement, but found "End_Group" : line 9 column 1 (char 248) near "   = (* ,"')

Frankly, this error is hard to understand. I had to read and re-read the code to understand exactly what the problem was here and why it was being emitted in this way--and I wrote the error message! What's happening, is that the parser is trying to link parameters with values inside the first group, and it acquired this random "Auto" token that it doesn't know what to do with, and then when the parser encounters the "End_Group" token to close the group, it freaks out about having the "Auto" token which isn't a parameter or a value, and tries to emit what it thinks is a helpful error message, but just ... isn't.

Essentially that bare "Auto" hanging out violates the PVL spec, because it isn't a parameter, or a value, or an aggregate statement. It doesn't fit into the PVL syntax, but the error message there is not helping you understand that.

If you quote all of the "Group" instances and remove the lines with "Auto" or comment them out, you will get this:

PVLModule([
  ('Keyword1',
   {'InputGroup': 'IsisCube,CsmInfo',
    'InputKey': 'CSMPlatformId',
    'InputPosition': ['IsisCube', 'CsmInfo'],
    'OutputName': 'Keyword1',
    'OutputPosition': ['Group', 'SerialNumberKeywords'],
    'Translation': ['*', '*']})
  ('Keyword2',
   {'InputGroup': 'IsisCube,CsmInfo',
    'InputKey': 'CSMInstrumentId',
    'InputPosition': ['IsisCube', 'CsmInfo'],
    'OutputName': 'Keyword2',
    'OutputPosition': ['Group', 'SerialNumberKeywords'],
    'Translation': ['*', '*']})
  ('Keyword3',
   {'InputGroup': 'IsisCube,CsmInfo',
    'InputKey': 'ReferenceTime',
    'InputPosition': ['IsisCube', 'CsmInfo'],
    'OutputName': 'Keyword3',
    'OutputPosition': ['Group', 'SerialNumberKeywords'],
    'Translation': ['*', '*']})
])

FYI, the seven reserved words are: BEGIN_GROUP, BEGIN_OBJECT, END, END_GROUP, END_OBJECT, GROUP, OBJECT (and since the pvl library supports mixed case reading of these tokens, then all capitalization variants must also be treated as reserved words).

In summary, this is bad PVL-text, but the pvl library is not giving the user sufficient information to understand why the PVL-text is bad. I will leave this Issue open while I work to improve the error reporting for these cases.

@michaelaye
Copy link
Member

Excellent reply. It shows that you really care about the library and these tricky issues.

@jlaura
Copy link
Author

jlaura commented Apr 24, 2024

@rbeyer Huge thank you. I very much appreciate the detailed response as it is helping me understand why the library is erroring out. Much appreciate it. I will open an issue on the ISIS repo to understand why this translation table is written the way that it is.

@acpaquette
Copy link

@rbeyer I looked over the PVL spec w.r.t. the reserved words and it seems like they can be used as values just not as parameter names.

"Reserved keywords are not permitted as Parameter Names within an Assignment Statement or
as Block Names in Aggregation Statements"

From this it seems like the use of the reserved words as values is fine, just not as parameter names. Given I didn't dig much I could very well be mistaken.

With regards to the Auto parameter, I didn't find anything about parameter names with no values as valid PVL. So that may be an element in the trn files that makes them invaild PVL. I am not sure if we plan to change this, the Auto keyword is somewhat specific to trn files and acts as a boolean flag.

An easy fix may be to just change it to Auto=0|1, this would make the files complaint going forward and would be versioned appropriately with ISIS since all of the trns have been migrated to be part of the library

@rbeyer
Copy link
Member

rbeyer commented Apr 25, 2024

@acpaquette, I think that is a correct reading for the "reserved keywords" in the PVL blue book document. I think I had just adopted a strict interpretation of "reserved" in the parsing. Of course, if you always quote strings that are used as values in PVL text, then this is irrelevant. However, I agree that upon re-reading this portion of the document that this is a bug, and will work to address it.

Bare tokens, however, I am pretty sure aren't allowed. To understand this, you will have to read all of the syntax diagrams in the blue book, which is what the figure references that follow are. In Figure 2-1, we see that a block of PVL-text can only contain Assignment Statements (which must assign a value to a parameter) and Aggregation Blocks. In turn, Figure 2-15 shows that an Aggregation Block can only contain Assignment Statements or additional Aggregation Blocks. Thus I do not see anywhere in the specification that a bare token that is not a reserved keyword can exist in PVL-text.

@rbeyer
Copy link
Member

rbeyer commented Apr 26, 2024

I did a little tinkering, and while in the post above I thought that simply changing this behavior for reserved words would be straightforward, but I have discovered some complications. There's now a conversation to be had about how to treat reserved words that the PVL spec does not necessarily help us with (have I mentioned how I dislike PVL?).

If you make the change to allow reserved words to be interpreted as unquoted strings, then this construction, which the pvl library has correctly parsed for years, would now have a different result.

This PVL-text:

foo = 1;
Object = embedded_object;
    foo = bar
    life =
End_Object;
End;

Currently results in this Python object:

PVLModule([
  ('foo', 1)
  ('embedded_object',
   {'foo': 'bar',
    'life': EmptyValueAtLine(4 does not have a value. Treat as an empty string)})
])

If we allowed reserved words to act as unquoted strings, the above PVL-text would result in this Python object:

PVLModule([
  ('foo', 1)
])

This may seem strange, but since the "life" parameter gets the value of "End_Object" there is no actual "End_Object" keyword to close the "embedded_object" object, and so the pvl Omniparser assumes there is an error with parsing that object, and ignores it, then finds an "End" which terminates the PVL-text.

So, with this in mind, I don't think that it is such a slam-dunk to make this change.

@acpaquette
Copy link

@rbeyer Appreciate the update!

@acpaquette
Copy link

acpaquette commented Jul 31, 2024

@rbeyer After reading over the PVL spec I think you are actually correct that ISIS should not have the reserved keywords as identifiers like this quoted or otherwise. From the spec:

A few identifiers have special significance in ODL statements and are therefore reserved. They cannot be used for any other purpose (specifically, they may not be used to name objects or attributes):

end, end_group, end_object, group, object, begin_object

I know it says specifically not as names but it's rather ambiguous if they are allowed as ISIS uses them. There is also mention that identifiers/symbols can either be quoted or not and it is recommended to quote symbols/identifiers when possible.

As an aside is this actually valid ODL/PVL?

foo = 1;
Object = embedded_object;
    foo = bar
    life =
End_Object;
End;

From reading over the standard assignments to strings should have a starting and ending ". Assignment to quoted symbols should have a starting and ending '. And assignment to an identifier should have at least one letter defined as such letter[letter | digit | _letter | _digit]*.

Very much appreciate working through this!

@rbeyer
Copy link
Member

rbeyer commented Jul 31, 2024

@acpaquette, So if you quote the reserved keywords like: parameter = "end_group" that's fine, because the quotes protect them, and that's perfectly allowed, its just that I don't think they can be unquoted as values, because that triggers their interpretation as reserved words.

About the PVL text example I gave and you asked about, technically no, the PVL, ODL, and PDS3 specs would "formally" break (and the pvl library's pvl_validate tool shows that), but the Python pvl library has a variety of "easements" in the "Omni-parser" (which is the default) to allow certain kinds of broken PVL to still be interpreted, and this is one of those situations, the Python pvl library basically assigns an empty string to the life parameter there, because if it does, that allows it to properly parse the rest of the PVL-text.

@jlaura
Copy link
Author

jlaura commented Oct 1, 2024

I am piggybacking on this issue (happy to open a new one!) only because this is a very similar issue. Here is the PVL, output from the ISIS camstats CLI:

Group = "User Parameters"
  Filename = ../path_to_original_cube.cub
  Linc     = 1
  Sinc     = 1
End_Group

Group = Latitude
  LatitudeMinimum           = -81.904915172101
  LatitudeMaximum           = 88.012145960339
  LatitudeAverage           = 2.2398977619578
  LatitudeStandardDeviation = 32.537659649463
End_Group

I truncated it intentionally. The issue is that Group = "User Parameters" is not being parsed properly and I am not clear why. The code is hitting the lever and then emitting pvl.exceptions.LexerError: (LexerError(...), 'Expecting a Block-Name after "Group =" but found: ""User Parameters"": line 1 column 9 (char 24) near ""').

I tested with the Omni and ISIS Grammars and the PVL decoder. The PVL decoder looks like it should support quoted strings, but I wonder if the issue has to do with a quoted Block-Name?

@rbeyer
Copy link
Member

rbeyer commented Oct 5, 2024

Yeah, it has everything to do with the quoted Aggregation Block Name (which isn't allowed).

Aggregation Blocks in PVL (Groups and Objects) can optionally have a "Block Name" (in your PVL-text, "User Parameters" and Latitude are examples of a Block Name). However, it is important to note that PVL Begin Aggregation Statements (e.g. Group = ) are NOT PVL Parameter/Value pairs and do not follow their rules, they are ... different.

Ironically, the CCSDS Blue Book tells us that "The form of the Block Name is identical to Parameter Name," which means that the Block Name must conform to the rules of PVL Parameters, not PVL Values. As such, Block Names (and Parameters) can only contain what PVL defines as Unrestricted Characters, and quotation marks are NOT Unrestricted Characters and thus cannot be used in a Block Name, and that's what that error is telling you.

@jlaura
Copy link
Author

jlaura commented Oct 5, 2024

@rbeyer Thanks! Opened #5625 for the ISIS devs. I am very happy to be finding these and I appreciate your insight as the improved interoperability (via compliance with the spec) is valuable to everyone.

@rbeyer
Copy link
Member

rbeyer commented Oct 5, 2024

I'm glad its valuable. I hate feeling like a PVL-apologist, but it is what it is. Really we shouldn't use PVL for anything new and should be trying to move old things to better meta-data representations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants