tests/run-tests.py: Add automatic retry for known-flaky tests. by andrewleech · Pull Request #6 · andrewleech/micropython

andrewleech · 2026-02-23T03:45:50Z

Summary

The unix port CI workflow on master fails ~18% of the time due to a handful of intermittent test failures (mostly thread-related). This has normalised red CI to the point where contributors ignore it, masking real regressions.

I added a flaky-test allowlist and retry-until-pass mechanism to run-tests.py. When a listed test fails, it's re-run up to 2 more times and passes on the first success. The 8 tests on the list account for essentially all spurious CI failures on master.

tools/ci.sh is updated to remove --exclude patterns for tests now handled by the retry logic, so they get actual coverage again. thread/stress_aes.py stays excluded on RISC-V QEMU where it runs close to the 200s timeout and retries would burn excessive CI time.

Retries are on by default; --no-retry disables them. The end-of-run summary reports tests that passed via retry separately from clean passes.

Testing

Not yet tested in CI — this draft PR is to exercise the review workflow and get CI results on the fork.

Trade-offs and Alternatives

Retry-until-pass can mask a test that has become intermittently broken. The allowlist is deliberately small and each entry documents a reason and optional platform restriction. An alternative would be to mark these tests as expected-fail, but that removes coverage entirely which is worse than the current situation.

The test result status strings ("pass", "fail", "skip") are bare string literals throughout run-tests.py — the retry code follows this existing convention. Worth considering a follow-up to consolidate these into constants for the whole file.

Generative AI

I used generative AI tools when creating this PR, but a human has checked the code and is responsible for the description above.

Add a FLAKY_TESTS allowlist and retry-until-pass logic to run-tests.py. When a test on the list fails, it is re-run up to 2 more times and passes on its first successful attempt. This reduces the ~18% per-run false-failure rate on the unix port CI workflow to near 0% without excluding the tests entirely. The flaky list covers the 8 tests responsible for essentially all CI failures on master: thread/thread_gc1.py (GC race), thread/stress_aes.py (QEMU timeout), thread/stress_schedule.py, thread/stress_recurse.py, thread/stress_heap.py, cmdline/repl_lock.py, cmdline/repl_cont.py, and extmod/time_time_ns.py. Retries are active by default; --no-retry disables them. The end-of-run summary reports tests that passed via retry separately. tools/ci.sh is updated to remove --exclude patterns for tests now handled by the retry logic, so they get actual coverage again. thread/stress_aes.py is kept excluded on RISC-V QEMU where it runs close to the 200s timeout and retries would burn excessive CI time. Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>

github-actions · 2026-02-23T06:53:18Z

Code size report:

Reference:  zephyr/mpconfigport: Remove duplicate builtins.open definition. [1ab9b66]
Comparison: tests/run-tests.py: Support multiple platforms in flaky test map. [merge of f10a479]
  mpy-cross:    +0 +0.000% 
   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
      esp32:    +0 +0.000% ESP32_GENERIC
     mimxrt:    +0 +0.000% TEENSY40
        rp2:    +0 +0.000% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

On forks where CODECOV_TOKEN is not set, the upload step fails and marks the coverage job red even though all tests pass. Gate the upload on the secret being available so it's cleanly skipped instead. Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>

Change the platform field in FLAKY_TESTS from a single string to a tuple of sys.platform strings, allowing a test to be marked flaky on more than one platform. Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>

andrewleech · 2026-02-23T11:15:40Z

tests/run-tests.py

+        saved_testcase_count = testcase_count.value
+        flaky_failures = []
+        for i, r in enumerate(results):
+            if r[1] == "fail":


Is there a shared / original definition of "fail" that can be used rather than hard coded in string here?

andrewleech · 2026-02-23T11:16:57Z

tests/run-tests.py

+            retried_pass = False
+            for attempt in range(1, FLAKY_TEST_MAX_RETRIES + 1):
+                print(
+                    "retry {} ({}/{}, {})".format(


Is .format the standard way of outputting strings in this project?

andrewleech · 2026-02-23T11:18:58Z

tests/run-tests.py

+                run_one_test(test_file)
+                retry_entries = results[pre_len:]
+                del results[pre_len:]  # clean up all entries appended during retry
+                if len(retry_entries) == 1 and retry_entries[0][1] == "pass":


As per earlier comment, "pass" probably shouldn't be hard coded in string here. Also is this correct for all tests types (diff/exp/unittest)?

andrewleech · 2026-02-23T11:20:35Z

tests/run-tests.py

+                    "{} (retried {})".format(FLAKY_REASON_PREFIX, FLAKY_TEST_MAX_RETRIES),
+                )
+                print(
+                    "FAIL  {} (retried {}, {})".format(


How does this FAIL compare to earlier fail strings?

andrewleech force-pushed the flaky-test-retry branch from 29e879c to 91111d5 Compare February 23, 2026 06:30

pi-anl added 2 commits February 23, 2026 19:44

andrewleech commented Feb 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

tests/run-tests.py: Add automatic retry for known-flaky tests.#6

tests/run-tests.py: Add automatic retry for known-flaky tests.#6
andrewleech wants to merge 3 commits intoreview/flaky-test-retryfrom
flaky-test-retry

andrewleech commented Feb 23, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

andrewleech Feb 23, 2026

Uh oh!

andrewleech Feb 23, 2026

Uh oh!

andrewleech Feb 23, 2026

Uh oh!

andrewleech Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

andrewleech commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Trade-offs and Alternatives

Generative AI

Uh oh!

github-actions bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewleech Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

andrewleech Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

andrewleech Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

andrewleech Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andrewleech commented Feb 23, 2026 •

edited

Loading

github-actions bot commented Feb 23, 2026 •

edited

Loading