Improve formula validation by simonvoelcker · Pull Request #1359 · frequenz-floss/frequenz-sdk-python

simonvoelcker · 2026-02-19T10:01:21Z

This PR improves formula validation and adds test coverage for all parser and lexer validation errors.

Changes

Switch to conventional token span start/end semantics: [start, end). Spans are only used internally now, though (see (2)).
Introduced FormulaError and FormulaSyntaxError classes. FormulaSyntaxError.__str__() produces a human-readable error message, highlighting the offending span in the context of the original formula. This should be fit for non-technical end users.
Removed ASTNode.span property as it was unused.
Added tests and adapted existing ones to the changes described above.

Have mercy, it is my first PR 🙏 😄

The tokens produced by our formula parser (lexer) contained spans with unusual and inconvenient start/end indices. The represented interval should be closed at the start and open at the end: [start,end). This makes it easy to slice the original formula and to compute the length of the span. Part of frequenz-floss#1356 Signed-off-by: simon.voelcker <simon.voelcker@gmail.com>

ela-kotulska-frequenz

Sorry I was on holiday and could not check earlier.
Looks very good only small comments / questions. :)

ela-kotulska-frequenz · 2026-02-24T10:24:24Z

src/frequenz/sdk/timeseries/formulas/_exceptions.py

+        formula_line = self._formula
+        span_padding = " " * self._span[0]
+        span_highlight = "^" * (self._span[1] - self._span[0])
+        error_line = f"{span_padding}{span_highlight} {self._message}"
+
+        char_limit: int = 80 - 9  # Account for "Formula: " prefix in final output
+
+        if len(error_line) > char_limit:
+            # Remove characters from the left to fit error line within char limit
+            num_chars_to_truncate = len(error_line) - char_limit
+            error_line = error_line[num_chars_to_truncate:]
+            # Shift formula line by the same amount and insert leading ellipsis
+            formula_line = f"... {formula_line[num_chars_to_truncate+4:]}"
+
+        if len(formula_line) > char_limit:
+            # Truncate the formula line, insert ellipsis
+            formula_line = f"{formula_line[:char_limit-4]} ..."


Why not using bold text?

print("\033[1mThis is bold\033[0m")

Good idea 👍

Gotta see how that would look in journald

I did some testing. journald stores the raw bytes, so it is down to how those are rendered later. E.g. when I run journalctl --output=cat, the highlighting works fine.

ela-kotulska-frequenz · 2026-02-24T10:30:53Z

src/frequenz/sdk/timeseries/formulas/_functions.py

+        known_functions: dict[str, type[Function[QuantityT]]] = {
+            "COALESCE": Coalesce,
+            "MAX": Max,
+            "MIN": Min,
+        }


Maybe known_functions could be class variable, now?

It could be, yes. If I see it correctly, that would entail reordering the classes though, because Coalesce, Max and Min are currently defined after Function, but when used in a class variable they must already exist when Function is parsed. Personally, I don't think it's worth the effort and it's cleaner to have Function on top. What do you think?

src/frequenz/sdk/timeseries/formulas/_lexer.py

ela-kotulska-frequenz · 2026-02-24T10:36:32Z

src/frequenz/sdk/timeseries/formulas/_parser.py

-                raise ValueError(
-                    f"Expected factor after operator at span: {token.span}"
+                raise FormulaSyntaxError(
+                    formula=self._formula,
+                    span=(token.span[1], token.span[1] + 1),
+                    message="Expected expression",
                )


Expected expression or factor? The same for unary

Removing mentions of "factor", "unary" and "primary (expression)" is something Sahas and I agreed on last week - The intention is to hide technical terms from possibly-not-so-technical users.

Yes, factor, unary etc might be very confusing for people who are only interested in writing the formulas.

src/frequenz/sdk/timeseries/formulas/_parser.py

ela-kotulska-frequenz · 2026-02-24T10:43:45Z

I will also request Sahas for quick look, because he knows formula better

shsms · 2026-02-24T10:00:04Z

src/frequenz/sdk/timeseries/formulas/_exceptions.py

+    formula: str
+    span: tuple[int, int]
+    message: str


These seem to be unused, can be dropped.

shsms · 2026-02-24T10:05:01Z

src/frequenz/sdk/timeseries/formulas/_exceptions.py

+        self._formula = formula
+        self._span = span
+        self._message = message


It would be nice to type these, because although this works with mypy, most newer type checkers assume that there can be more variants. For example, if you say self.v = None and later do self.v = 10, newer type checkers allow that, mypy doesn't. So it is useful to hint these instance variables.

Suggested change

self._formula = formula

self._span = span

self._message = message

self._formula: str = formula

self._span: tuple[int, int] = span

self._message: str = message

shsms · 2026-02-24T10:06:05Z

src/frequenz/sdk/timeseries/formulas/_exceptions.py

+        """Return a string representation of the error.
+
+        The span (if present) will be used to highlight the error location.
+        For long formulas, the beginning or end will be truncated as needed.


Suggested change

For long formulas, the beginning or end will be truncated as needed.

For long formulas, the beginning and/or end will be truncated as needed.

src/frequenz/sdk/timeseries/formulas/_functions.py

shsms · 2026-02-24T12:53:48Z

tests/timeseries/_formulas/test_formulas.py

+            (
+                "max(1,,2)",
+                "      ^ Expected argument",
+            ),


Maybe we can add some more cases:

"1 2"

"1, 2"

"max(1, 2,)"

"max(1, 2))"

"max(1, 2), "

Good challenge, 4/5 currently pass validation. I'll extend the parser as needed.

For "1 2" we probably want to see "Expected operator" in the output.

shsms · 2026-02-24T13:03:51Z

src/frequenz/sdk/timeseries/formulas/_exceptions.py

+        return (
+            "Formula syntax error:\n"
+            f"Formula: {formula_line}\n"
+            f"Error:   {error_line}"
+        )


Suggested change

return (

"Formula syntax error:\n"

f"Formula: {formula_line}\n"

f"Error: {error_line}"

)

return (

"Formula syntax error:\n"

f" Formula: {formula_line}\n"

f" Error: {error_line}"

)

Although I wonder if we need to say "Error" at all. Maybe it can just be

Suggested change

return (

"Formula syntax error:\n"

f"Formula: {formula_line}\n"

f"Error: {error_line}"

)

return (

"Formula syntax error:\n"

f" Formula: {formula_line}\n"

f" {error_line}"

)

shsms · 2026-02-24T13:04:26Z

src/frequenz/sdk/timeseries/formulas/_exceptions.py

+        formula_line = self._formula
+        span_padding = " " * self._span[0]
+        span_highlight = "^" * (self._span[1] - self._span[0])
+        error_line = f"{span_padding}{span_highlight} {self._message}"
+
+        char_limit: int = 80 - 9  # Account for "Formula: " prefix in final output
+
+        if len(error_line) > char_limit:
+            # Remove characters from the left to fit error line within char limit
+            num_chars_to_truncate = len(error_line) - char_limit
+            error_line = error_line[num_chars_to_truncate:]
+            # Shift formula line by the same amount and insert leading ellipsis
+            formula_line = f"... {formula_line[num_chars_to_truncate+4:]}"
+
+        if len(formula_line) > char_limit:
+            # Truncate the formula line, insert ellipsis
+            formula_line = f"{formula_line[:char_limit-4]} ..."


Gotta see how that would look in journald

- Added FormulaError and FormulaSyntaxError classes, using them in lexer and parser. - Changes to how functions are parsed: Detect unknown function first, then parse parameter list. - Removed mentions of terms, factors, unaries from error messages - Everything's an expression. - Better message for matching paranthesis. - Avoiding use of ASTNode.span in error messages, since it will be removed. Part of frequenz-floss#1356 Signed-off-by: simon.voelcker <simon.voelcker@gmail.com>

Signed-off-by: simon.voelcker <simon.voelcker@gmail.com>

shsms · 2026-02-24T13:35:13Z

src/frequenz/sdk/timeseries/formulas/_exceptions.py

+            formula_line = f"{formula_line[:char_limit-4]} ..."
+
+        return (
+            "\033[1mFormula syntax error:\033[0m\n"


I don't think this is the part that needs to be bold. May be useful to make the relevant token bold, but we already add a highlight in the next line, so probably not necessary.

Also we're not targeting the terminal, but journald instead, and possible the UI, so it is very unlikely that these terminal control characters will get rendered properly.

So better revert to plain text - @ela-kotulska-frequenz FYI :)

simonvoelcker requested a review from a team as a code owner February 19, 2026 10:01

github-project-automation bot added this to Python SDK Roadmap Feb 19, 2026

simonvoelcker requested review from ela-kotulska-frequenz and removed request for a team February 19, 2026 10:01

github-project-automation bot moved this to To do in Python SDK Roadmap Feb 19, 2026

github-actions bot added part:tests Affects the unit, integration and performance (benchmarks) tests part:data-pipeline Affects the data pipeline labels Feb 19, 2026

simonvoelcker marked this pull request as draft February 19, 2026 10:01

simonvoelcker force-pushed the improve-formula-validation branch from e3dbd70 to e570e9a Compare February 19, 2026 10:35

github-actions bot added the part:docs Affects the documentation label Feb 19, 2026

simonvoelcker force-pushed the improve-formula-validation branch 2 times, most recently from 0bc315f to c24c7a1 Compare February 19, 2026 10:58

simonvoelcker marked this pull request as ready for review February 19, 2026 11:06

simonvoelcker marked this pull request as draft February 19, 2026 14:30

simonvoelcker force-pushed the improve-formula-validation branch 2 times, most recently from a563f6a to 5a80e5e Compare February 20, 2026 14:16

simonvoelcker marked this pull request as ready for review February 20, 2026 14:58

simonvoelcker force-pushed the improve-formula-validation branch from 5a80e5e to 417d725 Compare February 23, 2026 18:08

ela-kotulska-frequenz reviewed Feb 24, 2026

View reviewed changes

ela-kotulska-frequenz requested a review from shsms February 24, 2026 10:44

shsms reviewed Feb 24, 2026

View reviewed changes

simonvoelcker added 4 commits February 24, 2026 14:22

Add tests for formula validation

6020b20

Signed-off-by: simon.voelcker <simon.voelcker@gmail.com>

Update release notes

a3c116f

Signed-off-by: simon.voelcker <simon.voelcker@gmail.com>

Remove unused ASTNode.span

105c625

Signed-off-by: simon.voelcker <simon.voelcker@gmail.com>

simonvoelcker force-pushed the improve-formula-validation branch from 417d725 to 105c625 Compare February 24, 2026 13:22

shsms reviewed Feb 24, 2026

View reviewed changes

	For long formulas, the beginning or end will be truncated as needed.
	For long formulas, the beginning and/or end will be truncated as needed.

Comments

Conversation

simonvoelcker commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

ela-kotulska-frequenz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ela-kotulska-frequenz commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

simonvoelcker commented Feb 19, 2026 •

edited

Loading

ela-kotulska-frequenz left a comment •

edited

Loading

ela-kotulska-frequenz commented Feb 24, 2026 •

edited

Loading