GNU Make 4.3 --jobserver-auth compatibility

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

GNU Make 4.3 --jobserver-auth compatibility

Robert Morell

In git commit c9e6ab9ac73, the '--jobserver-fds' parameter was changed
to '--jobserver-auth', with the intent to "publish" the interface as
stable for interoperation with other tools.  This commit was included in
GNU Make 4.2 and newer releases.  Presumably, this should mean that GNU
Make 4.2, 4.2.1, and 4.3 which all use the same --jobserver-auth
interface should be compatible with each other.

Unfortunately, this doesn't seem to be the case: git commit b552b0525
added logic to set the O_NONBLOCK flag both when creating the file
descriptors in the parent make instance (jobserver_setup()) and when
inheriting them in a child make instance (jobserver_parse_auth()).  This
flag takes effect in both the parent make instance and all processes
which inherit it, even if a child process is the one invoking fcntl().
That commit exists in the GNU Make 4.3 release, but not 4.2 or 4.2.1.

If the tool interoperating with version 4.3 is not prepared for read(3)
to return EAGAIN, then setting the O_NONBLOCK flag will cause it to
fail.  Affected tools include GNU Make versions 4.2 and 4.2.1.

Here is an example makefile that demonstrates the problem:
$ cat Makefile
        @echo "At $(MAKELEVEL): $(shell $(MAKE) --version 2>&1 | head -n 1)"

ifneq ($(SUB_MAKE),)
        +@$(SUB_MAKE) do-work

  ifneq ($(MAKELEVEL),1)
    do-work: run-sub-make

define make_work
  do-work$(1): print-version
        @sleep 1

  do-work: do-work$(1)
  .PHONY: do-work$(1)

$(foreach i,1 2 3 4 5 6,$(eval $(call make_work,$(i))))

.PHONY: print-version do-work

When invoked with GNU Make 4.2.1 or GNU Make 4.3 with a submake using
the same version, everything works as expected:
$ ./make-4.2.1 --no-print-directory -j2 do-work SUB_MAKE=./make-4.2.1
At 0: GNU Make 4.2.1
At 1: GNU Make 4.2.1
$ ./make-4.3 --no-print-directory -j2 do-work SUB_MAKE=./make-4.3
At 0: GNU Make 4.3
At 1: GNU Make 4.3

But when invoked with GNU Make 4.2.1 with GNU Make 4.3 as a child or
vice versa, it fails pretty reliably:
$ ./make-4.2.1 --no-print-directory -j2 do-work SUB_MAKE=./make-4.3
At 0: GNU Make 4.2.1
At 1: GNU Make 4.3
make-4.2.1: *** read jobs pipe: Resource temporarily unavailable.  Stop.
make-4.2.1: *** Waiting for unfinished jobs....
$ ./make-4.3 --no-print-directory -j2 do-work SUB_MAKE=./make-4.2.1
At 0: GNU Make 4.3
At 1: GNU Make 4.2.1
make-4.2.1[1]: *** read jobs pipe: Resource temporarily unavailable.  Stop.
make-4.2.1[1]: *** Waiting for unfinished jobs....
make-4.3: *** [Makefile:7: run-sub-make] Error 2
make-4.3: *** Waiting for unfinished jobs....

I have a fairly complex production codebase that ends up in the
situation of GNU Make 4.2.1 calling GNU Make 4.3, which is running into
this problem.  (These are logically distinct components in separate
source code repositories, both of which bootstrap themselves with a
specific version of make in source control, in order to make the builds
more reproducible across a wide range of build environments while still
allowing the use of newer Make features.)  It is not feasible to change
both of these in lockstep.

I put together a simple workaround to address this specific case of a
parent GNU Make 4.2.1 with a child GNU Make 4.3: it simply removes the
call to 'set_blocking (job_fds[0], 0);' from jobserver_parse_auth() in
the GNU Make 4.3 build, while leaving GNU Make 4.2.1 alone.  With this
patch, when GNU Make 4.2.1 invokes GNU Make 4.3, both processes will use
blocking reads.  This exposes the patched GNU Make 4.3 process to the
race condition that may lead to a hang which commit b552b0525 was
originally implemented to address, but I think the risk of this
happening shouldn't be demonstrably worse than just running GNU Make
4.2.1 in all instances.  And, in the case that the patched GNU Make 4.3
is used for the parent process and all children processes, non-blocking
reads will be used so it should have no effect.  This doesn't address
the inverse case of a version 4.3 parent process and version 4.2.1
child, but at least for my needs that's enough.

My questions for this list are:
- Is there a better way to handle the compatibility break in this stable
  interface?  It looks like the latest git master version of
  doc/make.texi still documents:

    Note that the read side of the jobserver pipe is set to ``blocking''

  How are other tools expected to deal with this?

- Is there some reason I'm missing that 'set_blocking (job_fds[0], 0);'
  is called from jobserver_parse_auth()?  Putting aside all the
  mixed-version considerations, with a purely version 4.3 configuration
  this seems completely unnecessary since the parent's flags will be
  inherited.  This may be a worthwhile patch to apply just for
  simplicity's sake.

- Is there some reason that using GNU Make 4.3 with a blocking
  jobserver-auth FD (inherited as described) would be more susceptible
  to the race condition that was closed in commit b552b0525 than GNU
  Make 4.2.1?