Skip to content

Comments

Improve formula validation#1359

Open
simonvoelcker wants to merge 5 commits intofrequenz-floss:v1.x.xfrom
simonvoelcker:improve-formula-validation
Open

Improve formula validation#1359
simonvoelcker wants to merge 5 commits intofrequenz-floss:v1.x.xfrom
simonvoelcker:improve-formula-validation

Conversation

@simonvoelcker
Copy link

@simonvoelcker simonvoelcker commented Feb 19, 2026

This PR improves formula validation and adds test coverage for all parser and lexer validation errors.

Changes

  1. Switch to conventional token span start/end semantics: [start, end). Spans are only used internally now, though (see (2)).
  2. Introduced FormulaError and FormulaSyntaxError classes. FormulaSyntaxError.__str__() produces a human-readable error message, highlighting the offending span in the context of the original formula. This should be fit for non-technical end users.
  3. Removed ASTNode.span property as it was unused.
  4. Added tests and adapted existing ones to the changes described above.

Have mercy, it is my first PR 🙏 😄

@simonvoelcker simonvoelcker requested a review from a team as a code owner February 19, 2026 10:01
@simonvoelcker simonvoelcker requested review from ela-kotulska-frequenz and removed request for a team February 19, 2026 10:01
@github-actions github-actions bot added part:tests Affects the unit, integration and performance (benchmarks) tests part:data-pipeline Affects the data pipeline labels Feb 19, 2026
@simonvoelcker simonvoelcker marked this pull request as draft February 19, 2026 10:01
@simonvoelcker simonvoelcker force-pushed the improve-formula-validation branch from e3dbd70 to e570e9a Compare February 19, 2026 10:35
@github-actions github-actions bot added the part:docs Affects the documentation label Feb 19, 2026
@simonvoelcker simonvoelcker force-pushed the improve-formula-validation branch 2 times, most recently from 0bc315f to c24c7a1 Compare February 19, 2026 10:58
@simonvoelcker simonvoelcker marked this pull request as ready for review February 19, 2026 11:06
@simonvoelcker simonvoelcker marked this pull request as draft February 19, 2026 14:30
The tokens produced by our formula parser (lexer) contained
spans with unusual and inconvenient start/end indices. The
represented interval should be closed at the start and open
at the end: [start,end). This makes it easy to slice the
original formula and to compute the length of the span.

Part of frequenz-floss#1356

Signed-off-by: simon.voelcker <simon.voelcker@gmail.com>
@simonvoelcker simonvoelcker force-pushed the improve-formula-validation branch 2 times, most recently from a563f6a to 5a80e5e Compare February 20, 2026 14:16
@simonvoelcker simonvoelcker marked this pull request as ready for review February 20, 2026 14:58
@simonvoelcker simonvoelcker force-pushed the improve-formula-validation branch from 5a80e5e to 417d725 Compare February 23, 2026 18:08
Copy link
Contributor

@ela-kotulska-frequenz ela-kotulska-frequenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I was on holiday and could not check earlier.
Looks very good only small comments / questions. :)

Comment on lines +35 to +51
formula_line = self._formula
span_padding = " " * self._span[0]
span_highlight = "^" * (self._span[1] - self._span[0])
error_line = f"{span_padding}{span_highlight} {self._message}"

char_limit: int = 80 - 9 # Account for "Formula: " prefix in final output

if len(error_line) > char_limit:
# Remove characters from the left to fit error line within char limit
num_chars_to_truncate = len(error_line) - char_limit
error_line = error_line[num_chars_to_truncate:]
# Shift formula line by the same amount and insert leading ellipsis
formula_line = f"... {formula_line[num_chars_to_truncate+4:]}"

if len(formula_line) > char_limit:
# Truncate the formula line, insert ellipsis
formula_line = f"{formula_line[:char_limit-4]} ..."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using bold text?

print("\033[1mThis is bold\033[0m")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotta see how that would look in journald

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some testing. journald stores the raw bytes, so it is down to how those are rendered later. E.g. when I run journalctl --output=cat, the highlighting works fine.

Comment on lines +90 to +94
known_functions: dict[str, type[Function[QuantityT]]] = {
"COALESCE": Coalesce,
"MAX": Max,
"MIN": Min,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe known_functions could be class variable, now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be, yes. If I see it correctly, that would entail reordering the classes though, because Coalesce, Max and Min are currently defined after Function, but when used in a class variable they must already exist when Function is parsed. Personally, I don't think it's worth the effort and it's cleaner to have Function on top. What do you think?

Comment on lines -83 to 89
raise ValueError(
f"Expected factor after operator at span: {token.span}"
raise FormulaSyntaxError(
formula=self._formula,
span=(token.span[1], token.span[1] + 1),
message="Expected expression",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expected expression or factor? The same for unary

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing mentions of "factor", "unary" and "primary (expression)" is something Sahas and I agreed on last week - The intention is to hide technical terms from possibly-not-so-technical users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, factor, unary etc might be very confusing for people who are only interested in writing the formulas.

@ela-kotulska-frequenz
Copy link
Contributor

ela-kotulska-frequenz commented Feb 24, 2026

I will also request Sahas for quick look, because he knows formula better

Comment on lines +14 to +16
formula: str
span: tuple[int, int]
message: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem to be unused, can be dropped.

Comment on lines +22 to +24
self._formula = formula
self._span = span
self._message = message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to type these, because although this works with mypy, most newer type checkers assume that there can be more variants. For example, if you say self.v = None and later do self.v = 10, newer type checkers allow that, mypy doesn't. So it is useful to hint these instance variables.

Suggested change
self._formula = formula
self._span = span
self._message = message
self._formula: str = formula
self._span: tuple[int, int] = span
self._message: str = message

"""Return a string representation of the error.
The span (if present) will be used to highlight the error location.
For long formulas, the beginning or end will be truncated as needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For long formulas, the beginning or end will be truncated as needed.
For long formulas, the beginning and/or end will be truncated as needed.

(
"max(1,,2)",
" ^ Expected argument",
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add some more cases:

  • "1 2"
  • "1, 2"
  • "max(1, 2,)"
  • "max(1, 2))"
  • "max(1, 2), "

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good challenge, 4/5 currently pass validation. I'll extend the parser as needed.

For "1 2" we probably want to see "Expected operator" in the output.

Comment on lines 53 to 57
return (
"Formula syntax error:\n"
f"Formula: {formula_line}\n"
f"Error: {error_line}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return (
"Formula syntax error:\n"
f"Formula: {formula_line}\n"
f"Error: {error_line}"
)
return (
"Formula syntax error:\n"
f" Formula: {formula_line}\n"
f" Error: {error_line}"
)

Although I wonder if we need to say "Error" at all. Maybe it can just be

Suggested change
return (
"Formula syntax error:\n"
f"Formula: {formula_line}\n"
f"Error: {error_line}"
)
return (
"Formula syntax error:\n"
f" Formula: {formula_line}\n"
f" {error_line}"
)

Comment on lines +35 to +51
formula_line = self._formula
span_padding = " " * self._span[0]
span_highlight = "^" * (self._span[1] - self._span[0])
error_line = f"{span_padding}{span_highlight} {self._message}"

char_limit: int = 80 - 9 # Account for "Formula: " prefix in final output

if len(error_line) > char_limit:
# Remove characters from the left to fit error line within char limit
num_chars_to_truncate = len(error_line) - char_limit
error_line = error_line[num_chars_to_truncate:]
# Shift formula line by the same amount and insert leading ellipsis
formula_line = f"... {formula_line[num_chars_to_truncate+4:]}"

if len(formula_line) > char_limit:
# Truncate the formula line, insert ellipsis
formula_line = f"{formula_line[:char_limit-4]} ..."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotta see how that would look in journald

- Added FormulaError and FormulaSyntaxError classes, using them in lexer and parser.
- Changes to how functions are parsed: Detect unknown function first, then parse parameter list.
- Removed mentions of terms, factors, unaries from error messages - Everything's an expression.
- Better message for matching paranthesis.
- Avoiding use of ASTNode.span in error messages, since it will be removed.

Part of frequenz-floss#1356

Signed-off-by: simon.voelcker <simon.voelcker@gmail.com>
Signed-off-by: simon.voelcker <simon.voelcker@gmail.com>
Signed-off-by: simon.voelcker <simon.voelcker@gmail.com>
Signed-off-by: simon.voelcker <simon.voelcker@gmail.com>
@simonvoelcker simonvoelcker force-pushed the improve-formula-validation branch from 417d725 to 105c625 Compare February 24, 2026 13:22
formula_line = f"{formula_line[:char_limit-4]} ..."

return (
"\033[1mFormula syntax error:\033[0m\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the part that needs to be bold. May be useful to make the relevant token bold, but we already add a highlight in the next line, so probably not necessary.

Also we're not targeting the terminal, but journald instead, and possible the UI, so it is very unlikely that these terminal control characters will get rendered properly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So better revert to plain text - @ela-kotulska-frequenz FYI :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

part:data-pipeline Affects the data pipeline part:docs Affects the documentation part:tests Affects the unit, integration and performance (benchmarks) tests

Projects

Status: To do

Development

Successfully merging this pull request may close these issues.

3 participants