[bug #53152] Intermittent timeout running regression test features/output_sync

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[bug #53152] Intermittent timeout running regression test features/output_sync

anonymous
URL:
  <http://savannah.gnu.org/bugs/?53152>

                 Summary: Intermittent timeout running regression test
features/output_sync
                 Project: make
            Submitted by: srivasta
            Submitted on: Tue 13 Feb 2018 06:26:02 PM CST
                Severity: 3 - Normal
              Item Group: Bug
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
       Component Version: 4.2.1
        Operating System: None
           Fixed Release: None
           Triage Status: None

    _______________________________________________________

Details:

Hi,

  While trying to build Make 4.2.1 on various build daemons for different
architectures for Debian we are seeing multiple failures during regression
tests (the build happened unenventfully on my development machine). EDxcerpts
from the logs are below.

    Manoj
p.s. Please retain the CC for [hidden email] so that your
response is recorded in Debian's BTS as well.

https://buildd.debian.org/status/package.php?p=make-dfsg
https://buildd.debian.org/status/fetch.php?pkg=make-dfsg&arch=i386&ver=4.2.1-1&stamp=1518486030&raw=0

make[4]: Entering directory '/<<PKGBUILDDIR>>/debian/build-make-guile'
The system uptime program believes the load average to be:
uptime
 01:38:58 up 3 days, 15:08,  0 users,  load average: 1.53, 0.49, 0.20
The GNU load average checking code thinks:
./loadavg
1-minute: 1.530000  5-minute: 0.490000  15-minute: 0.200000  
cd tests && perl ./run_make_tests.pl -srcdir
/<<PKGBUILDDIR>>/debian/build-make-guile/../.. -make ../make
------------------------------------------------------------------------------
         Running tests for GNU make on Linux binet 4.9.0-5-amd64 i686
                                GNU Make 4.2.1
------------------------------------------------------------------------------

Finding tests...

features/order_only ..................................... ok     (10 passed)
features/output-sync ....................................
Test timed out after 30 seconds
Error running /<<PKGBUILDDIR>>/debian/build-make-guile/tests/../make (expected
0; got 14): /<<PKGBUILDDIR>>/debian/build-make-guile/tests/../make -f
work/features/output-sync.mk -j -Orecurse

Caught signal 14!
FAILED (14/15 passed)
features/override ....................................... ok     (4 passed)




    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?53152>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|

[bug #53152] Intermittent timeout running regression test features/output_sync

anonymous
Follow-up Comment #1, bug #53152 (project make):

I can reliably reproduce this bug on my laptop if I run other CPU-intensive
tasks, but not if it's otherwise mostly idle.  It is always this group of
tests that fails, either testing the -Orecurse or -Otarget option.

I couldn't build from the current master branch, as Debian doesn't have
automake 1.16 yet. But after reverting
63b42fa235835cbeac6c1b9182f32798ea135dfd I could build it and reproduced this
bug.


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?53152>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|

[bug #53152] Intermittent timeout running regression test features/output_sync

anonymous
Follow-up Comment #2, bug #53152 (project make):

I can reliably reproduce this on Fedora 26 if I have enough busy-wait
processes running, using this command line:

while true; do date; rm mksync; make -j2 -dr; done

and this Makefile:

all: foo baz
foo: bar
        date > mksync
bar:
        @echo bar
baz:
        while [ ! -f mksync ]; do sleep 1; done
        @echo baz

Based on the output, it looks like the jobs are queued breadth-first, but only
one at a time actually runs, so the all.baz child runs before the all.foo
child... i.e. with -j2 the first two jobs queued are all.foo.bar (not
all.foo!) and all.baz.  all.foo.bar runs, then all.baz, but all.foo remains
queued until the load is reduced.

I suspect a depth-first search would solve this; the jobs would be queued in
the same order as with -j1... all.foo.bar, all.foo, then all.baz.  However,
that would also mean we'd not parallelize as much as possible.

Alternately, if the job token for the last dependency of a rule could be
reserved for the parent rule - as if the parent got the token, and let its
children "borrow" it, at least the rules *could* be run in the same order as
-j1.

A third option is to test for system load as jobs are queued, so that in this
case all.baz wouldn't even be queued in the first batch.

(yes, I looked at the code; no, I couldn't figure out where to fix anything
;)

GNU Make 4.2.1
Updating goal targets....
Considering target file 'all'.
 File 'all' does not exist.
 Looking for an implicit rule for 'all'.
 No implicit rule found for 'all'.
  Considering target file 'foo'.
   File 'foo' does not exist.
    Considering target file 'bar'.
     File 'bar' does not exist.
     Finished prerequisites of target file 'bar'.
    Must remake target 'bar'.
Need a job token; we don't have children
bar
Putting child 0x55ab69cf95c0 (bar) PID 30261 on the chain.
    Recipe of 'bar' is being run.
   Finished prerequisites of target file 'foo'.
  The prerequisites of 'foo' are being made.
  Considering target file 'baz'.
   File 'baz' does not exist.
   Finished prerequisites of target file 'baz'.
  Must remake target 'baz'.
Live child 0x55ab69cf95c0 (bar) PID 30261
Reaping winning child 0x55ab69cf95c0 PID 30261
Removing child 0x55ab69cf95c0 PID 30261 from chain.
Need a job token; we don't have children
while [ ! -f mksync ]; do sleep 1; done
Putting child 0x55ab69cf95c0 (baz) PID 30262 on the chain.
  Recipe of 'baz' is being run.
 Finished prerequisites of target file 'all'.
The prerequisites of 'all' are being made.
Live child 0x55ab69cf95c0 (baz) PID 30262


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?53152>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|

[bug #53152] Intermittent timeout running regression test features/output_sync

anonymous
Follow-up Comment #3, bug #53152 (project make):

I think the problem is the outer loop in update_goal_chain():

- All goals are handle once per loop iteration.
- At the beginning of each iteration at least on child is reaped (if any
exist)

What happens is this:
First iteration:
- job for 'bar' is started
- child for 'bar' is reaped
- job for 'baz' is started
Second iteration:
- reap_children() waits forever for a child to finish

I think, reap_children() should only wait if no new action can be taken. I'm
not sure if that information is available. Maybe 'nothing changed' in the last
iteration (accumulating 'g->changed') is a good approximation?

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?53152>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|

[bug #53152] Intermittent timeout running regression test features/output_sync

anonymous
Follow-up Comment #4, bug #53152 (project make):

I see the issue; thanks for the explanation Michael.  In some ways this is
more of a test artifact than something likely to appear in a real makefile,
although I think it really is a bug in make's logic.

In a real makefile it's highly unlikely that two recipes will have this sort
of relationship (one recipe waits for an artifact of another recipe to be
created) without declaring any kind of prerequisite relationship in the
makefile.

So this should be fixed but I don't think it's critical and the fix will take
a good bit of thought and perhaps some fiddling to get right.

If there are issues with the test failing in some build environments I think
an appropriate short-term fix is to comment out these regression tests that
may fail.

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?53152>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/