Evaluating libffi test results with Red Light Green Light
In preparation for the 3.3 release, I've been spending a lot of time
shoring up the travis-ci build/test infrastructure to include as much
testing as possible. The goal, of course, is to have travis-ci give the
green light when no regressions are introduced. Unfortunately, it's too
much to expect perfect test results on travis-ci. Sometimes tests fail for
environmental reasons. For example, qemu is failing certain tests that
would pass on real hardware, wine is emitting messages that are confusing
dejagnu, or wholesale execution test failures for certain emulated targets
where we still want to test the build process.
Last year I wrote a tool called Red Light Green Light targeted at different
use cases, but I recently realized that it could be adapted to help
evaluate the results of dejagnu test results, to decide if they are 'good
enough' based on some policy. While dejagnu has the ability to mark tests
as XFAIL (expected fail), I'm talking about failures that aren't what I
would normally XFAIL. They are failings in the execution platform -- not
the software being tested.
If you look at any of the travis test logs now, you'll see something like
This shows how we send libffi.log to rl.gl for evaluation against the given
policy. The results are "GREEN" (good) and the link points at the
analysis, including the original libffi.log. If you dig into the report,
you'll see that all of the qemu execution tests fail for this target - but
the policy accounts for this and still gives us a green light - because
build tests are better than no tests at all.