Improve formula validation#1359
Conversation
e3dbd70 to
e570e9a
Compare
0bc315f to
c24c7a1
Compare
The tokens produced by our formula parser (lexer) contained spans with unusual and inconvenient start/end indices. The represented interval should be closed at the start and open at the end: [start,end). This makes it easy to slice the original formula and to compute the length of the span. Part of frequenz-floss#1356 Signed-off-by: simon.voelcker <simon.voelcker@gmail.com>
a563f6a to
5a80e5e
Compare
5a80e5e to
417d725
Compare
| formula_line = self._formula | ||
| span_padding = " " * self._span[0] | ||
| span_highlight = "^" * (self._span[1] - self._span[0]) | ||
| error_line = f"{span_padding}{span_highlight} {self._message}" | ||
|
|
||
| char_limit: int = 80 - 9 # Account for "Formula: " prefix in final output | ||
|
|
||
| if len(error_line) > char_limit: | ||
| # Remove characters from the left to fit error line within char limit | ||
| num_chars_to_truncate = len(error_line) - char_limit | ||
| error_line = error_line[num_chars_to_truncate:] | ||
| # Shift formula line by the same amount and insert leading ellipsis | ||
| formula_line = f"... {formula_line[num_chars_to_truncate+4:]}" | ||
|
|
||
| if len(formula_line) > char_limit: | ||
| # Truncate the formula line, insert ellipsis | ||
| formula_line = f"{formula_line[:char_limit-4]} ..." |
There was a problem hiding this comment.
Why not using bold text?
print("\033[1mThis is bold\033[0m")
There was a problem hiding this comment.
Gotta see how that would look in journald
There was a problem hiding this comment.
I did some testing. journald stores the raw bytes, so it is down to how those are rendered later. E.g. when I run journalctl --output=cat, the highlighting works fine.
| known_functions: dict[str, type[Function[QuantityT]]] = { | ||
| "COALESCE": Coalesce, | ||
| "MAX": Max, | ||
| "MIN": Min, | ||
| } |
There was a problem hiding this comment.
Maybe known_functions could be class variable, now?
There was a problem hiding this comment.
It could be, yes. If I see it correctly, that would entail reordering the classes though, because Coalesce, Max and Min are currently defined after Function, but when used in a class variable they must already exist when Function is parsed. Personally, I don't think it's worth the effort and it's cleaner to have Function on top. What do you think?
| raise ValueError( | ||
| f"Expected factor after operator at span: {token.span}" | ||
| raise FormulaSyntaxError( | ||
| formula=self._formula, | ||
| span=(token.span[1], token.span[1] + 1), | ||
| message="Expected expression", | ||
| ) |
There was a problem hiding this comment.
Expected expression or factor? The same for unary
There was a problem hiding this comment.
Removing mentions of "factor", "unary" and "primary (expression)" is something Sahas and I agreed on last week - The intention is to hide technical terms from possibly-not-so-technical users.
There was a problem hiding this comment.
Yes, factor, unary etc might be very confusing for people who are only interested in writing the formulas.
|
I will also request Sahas for quick look, because he knows formula better |
| formula: str | ||
| span: tuple[int, int] | ||
| message: str |
There was a problem hiding this comment.
These seem to be unused, can be dropped.
| self._formula = formula | ||
| self._span = span | ||
| self._message = message |
There was a problem hiding this comment.
It would be nice to type these, because although this works with mypy, most newer type checkers assume that there can be more variants. For example, if you say self.v = None and later do self.v = 10, newer type checkers allow that, mypy doesn't. So it is useful to hint these instance variables.
| self._formula = formula | |
| self._span = span | |
| self._message = message | |
| self._formula: str = formula | |
| self._span: tuple[int, int] = span | |
| self._message: str = message |
| """Return a string representation of the error. | ||
| The span (if present) will be used to highlight the error location. | ||
| For long formulas, the beginning or end will be truncated as needed. |
There was a problem hiding this comment.
| For long formulas, the beginning or end will be truncated as needed. | |
| For long formulas, the beginning and/or end will be truncated as needed. |
| ( | ||
| "max(1,,2)", | ||
| " ^ Expected argument", | ||
| ), |
There was a problem hiding this comment.
Maybe we can add some more cases:
"1 2""1, 2""max(1, 2,)""max(1, 2))""max(1, 2), "
There was a problem hiding this comment.
Good challenge, 4/5 currently pass validation. I'll extend the parser as needed.
For "1 2" we probably want to see "Expected operator" in the output.
| return ( | ||
| "Formula syntax error:\n" | ||
| f"Formula: {formula_line}\n" | ||
| f"Error: {error_line}" | ||
| ) |
There was a problem hiding this comment.
| return ( | |
| "Formula syntax error:\n" | |
| f"Formula: {formula_line}\n" | |
| f"Error: {error_line}" | |
| ) | |
| return ( | |
| "Formula syntax error:\n" | |
| f" Formula: {formula_line}\n" | |
| f" Error: {error_line}" | |
| ) |
Although I wonder if we need to say "Error" at all. Maybe it can just be
| return ( | |
| "Formula syntax error:\n" | |
| f"Formula: {formula_line}\n" | |
| f"Error: {error_line}" | |
| ) | |
| return ( | |
| "Formula syntax error:\n" | |
| f" Formula: {formula_line}\n" | |
| f" {error_line}" | |
| ) |
| formula_line = self._formula | ||
| span_padding = " " * self._span[0] | ||
| span_highlight = "^" * (self._span[1] - self._span[0]) | ||
| error_line = f"{span_padding}{span_highlight} {self._message}" | ||
|
|
||
| char_limit: int = 80 - 9 # Account for "Formula: " prefix in final output | ||
|
|
||
| if len(error_line) > char_limit: | ||
| # Remove characters from the left to fit error line within char limit | ||
| num_chars_to_truncate = len(error_line) - char_limit | ||
| error_line = error_line[num_chars_to_truncate:] | ||
| # Shift formula line by the same amount and insert leading ellipsis | ||
| formula_line = f"... {formula_line[num_chars_to_truncate+4:]}" | ||
|
|
||
| if len(formula_line) > char_limit: | ||
| # Truncate the formula line, insert ellipsis | ||
| formula_line = f"{formula_line[:char_limit-4]} ..." |
There was a problem hiding this comment.
Gotta see how that would look in journald
- Added FormulaError and FormulaSyntaxError classes, using them in lexer and parser. - Changes to how functions are parsed: Detect unknown function first, then parse parameter list. - Removed mentions of terms, factors, unaries from error messages - Everything's an expression. - Better message for matching paranthesis. - Avoiding use of ASTNode.span in error messages, since it will be removed. Part of frequenz-floss#1356 Signed-off-by: simon.voelcker <simon.voelcker@gmail.com>
Signed-off-by: simon.voelcker <simon.voelcker@gmail.com>
Signed-off-by: simon.voelcker <simon.voelcker@gmail.com>
Signed-off-by: simon.voelcker <simon.voelcker@gmail.com>
417d725 to
105c625
Compare
| formula_line = f"{formula_line[:char_limit-4]} ..." | ||
|
|
||
| return ( | ||
| "\033[1mFormula syntax error:\033[0m\n" |
There was a problem hiding this comment.
I don't think this is the part that needs to be bold. May be useful to make the relevant token bold, but we already add a highlight in the next line, so probably not necessary.
Also we're not targeting the terminal, but journald instead, and possible the UI, so it is very unlikely that these terminal control characters will get rendered properly.
There was a problem hiding this comment.
So better revert to plain text - @ela-kotulska-frequenz FYI :)
This PR improves formula validation and adds test coverage for all parser and lexer validation errors.
Changes
FormulaErrorandFormulaSyntaxErrorclasses.FormulaSyntaxError.__str__()produces a human-readable error message, highlighting the offending span in the context of the original formula. This should be fit for non-technical end users.ASTNode.spanproperty as it was unused.Have mercy, it is my first PR 🙏 😄