output-sync test can fail due to race condition

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

output-sync test can fail due to race condition

Christian Weisgerber
GNU make 4.3

I have been struggling with test suite failures on OpenBSD/arm64.
Specifically, the output-sync test frequently shows failures.  I
have attached the work directory from such a test run.

After closer examination, I think the failures are due to a race
condition in the test itself.  It has nothing to do with the actual
feature being tested nor with the platform the test is run on.

The issue surfaces in bar/Makefile:

------------------->
all: bar baz

bar: bar-base ; @#HELPER# -q file ../mksync.bar
bar-base:
        @echo bar: start
        @echo bar: end

[...]

baz: baz-base
baz-base:
        @echo baz: start
        @#HELPER# -q wait ../mksync.foo sleep 1
        @echo baz: end
<-------------------

When the first two tests have a parallel make descend into the "bar"
directory, they expect a particular execution order:
1. the bar-base recipe
2. the bar recipe
3. the baz-base recipe

However, frequently the order is this:
1. the bar-base recipe
2. the baz-base recipe
3. the bar recipe

baz-base is waiting to be unlocked by mksync.foo, but over in the
"foo" directory a parallel job is waiting for mksync.bar, so if the
bar recipe is scheduled AFTER baz-base, the parallel jobs deadlock
and error out with the observed wait timeouts.

As far as I can tell, make does not promise an inherent execution
order here.  bar can come before or after baz, or the commands of
their respective recipes can be interleaved.  So the test is wrong
and needs additional sequencing.

--
Christian "naddy" Weisgerber                          [hidden email]

work.tgz (2K) Download Attachment