We eliminated an entire class of flaky tests by requiring an explicit I/O readiness probe in our GitHub Actions workflow. Before the autograder runs, a simple shell script must run successfully and create, write, and delete a temporary file in the working directory of the container within a 50ms timeout. We've now decoupled container filesystem latency from student code, preventing tests from running before the environment is really ready. Our diagnostic for discriminating a flake vs a student bug was the "three-strike" rule in fresh containers. If a test failed, the Action would automatically re-run that single test in a fresh instance of a clean container; if it failed a second time, it would run in a third and final run. A test that fails identically three times in a row is a student bug. A test that passes on the second or third attempt is flagged as a flake for our team to review.