Parallel builds across makefiles

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Parallel builds across makefiles

sgardell
We are using a standard distro of gmake 4.1 on RH 6.4.

We have a fairly well structured makefile system that leverages the ACE make framework. A linear build (-j1) is about 3 hours. Running a parallel build that builds leaf nodes (libraries) in parallel is about 1 hour. So far so good.

But when I try to build across makefiles in parallel then I see two different things:
   1) If I turn on any sort of output sync it gets dramatically slower. Sometimes slower than our -j1 time!
   2) Even with output sync turned completely off there is no meaningful improvement over a parallel build of just the leaves.

I am reasonably certain that I am actually seeing parallel build behavior since I had to fix a handful of additional dependency issues to get it to build. Also reasonably certain that the build is not doing different things since the count of compiler invocations matches. Looking at 'top' during the "highly parallelized" runs shows the system essentially idle.

Reply | Threaded
Open this post in threaded view
|

Parallel builds across makefiles

thutt
sgardell writes:
 > We are using a standard distro of gmake 4.1 on RH 6.4.
 >
 > We have a fairly well structured makefile system that leverages the ACE make
 > framework. A linear build (-j1) is about 3 hours. Running a parallel build
 > that builds leaf nodes (libraries) in parallel is about 1 hour. So far so
 > good.
 >
 > But when I try to build across makefiles in parallel then I see two
 > different things:
 >    1) If I turn on any sort of output sync it gets dramatically slower.
 > Sometimes slower than our -j1 time!
 >    2) Even with output sync turned completely off there is no meaningful
 > improvement over a parallel build of just the leaves.
 >
 > I am reasonably certain that I am actually seeing parallel build behavior
 > since I had to fix a handful of additional dependency issues to get it to
 > build. Also reasonably certain that the build is not doing different things
 > since the count of compiler invocations matches. Looking at 'top' during the
 > "highly parallelized" runs shows the system essentially idle.
 >

 Is there a question or problem that you're posing?

_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|

Re: Parallel builds across makefiles

sgardell
The gmake documentation states clearly that there is no build-time impact from output synchronization. So it seems that the behavior I am seeing in that regard is a gmake bug. Beyond that, how can I actually achieve effective parallelism with nested makefile invocations? Is the behavior I am seeing know/expected?
Reply | Threaded
Open this post in threaded view
|

Re: Parallel builds across makefiles

Paul Smith-20
In reply to this post by sgardell
On Thu, 2016-07-14 at 15:40 -0700, sgardell wrote:
> But when I try to build across makefiles in parallel then I see two
> different things:
>    1) If I turn on any sort of output sync it gets dramatically slower.
> Sometimes slower than our -j1 time!

I recommend trying with the latest release, GNU make 4.2.1 to see if
that helps.

If not then some debugging is needed as I can't reproduce this here.
It's true that, depending on the sync output mode you've selected (which
you didn't specify), you may not see individual updates as you
originally did (there may be long pauses in the output followed by a
massive dump of output) but the overall build time shouldn't be
significantly impacted.

>    2) Even with output sync turned completely off there is no meaningful
> improvement over a parallel build of just the leaves.

I'm not sure I understand this comment, but this isn't too surprising to
me.  The only improvement of complete parallelism across the entire tree
over parallelism of only the leaves would be at the edges, where right
as one leaf makefile is finishing up it could hand off some extra jobs
to another leaf makefile which wasn't done yet.  That may, depending on
the way the makefiles work and the amount of parallelism you have
enabled, not be very noticeable.

_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|

RE: Parallel builds across makefiles

sgardell
OK. Thanks! I will try the latest version.

FWIW, I am measuring is total wall clock time to complete the build. (date; gmake... ; date) This goes up substantially with all sync modes other than "none" if gmake has parallel invocations of sub-makes going (e.g. "gmake  -f SomeOtherMakefile").  When I am only parallelizing individual makefiles (with ACE, these tend to be simple lists of files to compile), then the behavior is as promised by the documentation.

My expectation for enhanced parallelism is of course a bit more nebulous. At some point we are clearly going to run out of resources. But monitoring the state of the machine (HP G9 with gobs of memory and 56 hyper-threads available) suggests it is not breathing hard through most of the build.

-----Original Message-----
From: Paul Smith [mailto:[hidden email]]
Sent: Sunday, July 17, 2016 11:33 AM
To: Gardell, Steven <[hidden email]>; [hidden email]
Subject: Re: Parallel builds across makefiles

On Thu, 2016-07-14 at 15:40 -0700, sgardell wrote:
> But when I try to build across makefiles in parallel then I see two
> different things:
>    1) If I turn on any sort of output sync it gets dramatically slower.
> Sometimes slower than our -j1 time!

I recommend trying with the latest release, GNU make 4.2.1 to see if that helps.

If not then some debugging is needed as I can't reproduce this here.
It's true that, depending on the sync output mode you've selected (which you didn't specify), you may not see individual updates as you originally did (there may be long pauses in the output followed by a massive dump of output) but the overall build time shouldn't be significantly impacted.

>    2) Even with output sync turned completely off there is no
> meaningful improvement over a parallel build of just the leaves.

I'm not sure I understand this comment, but this isn't too surprising to me.  The only improvement of complete parallelism across the entire tree over parallelism of only the leaves would be at the edges, where right as one leaf makefile is finishing up it could hand off some extra jobs to another leaf makefile which wasn't done yet.  That may, depending on the way the makefiles work and the amount of parallelism you have enabled, not be very noticeable.
_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|

Re: Parallel builds across makefiles

Paul Smith-20
On Sun, 2016-07-17 at 15:48 +0000, Gardell, Steven wrote:
> OK. Thanks! I will try the latest version. 
>
> FWIW, I am measuring is total wall clock time to complete the build.
> (date; gmake... ; date) This goes up substantially with all sync modes
> other than "none" if gmake has parallel invocations of sub-makes going
> (e.g. "gmake  -f SomeOtherMakefile").  When I am only parallelizing
> individual makefiles (with ACE, these tend to be simple lists of files
> to compile), then the behavior is as promised by the documentation.

Assuming you determine the behavior exists even in version 4.2.1, please
provide more details.  What level of -j are you using?  Which sync mode
are you using?  Do you get the same behavior even with lower levels of
-j, if yours is very large?


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|

RE: Parallel builds across makefiles

sgardell
No I have not gotten to 4.2.1 yet.

I have experimented using j values ranging from 16 to 64. The majority of the testing is with -Otarget. But I have tried all of them.

-----Original Message-----
From: Paul Smith [mailto:[hidden email]]
Sent: Sunday, July 17, 2016 3:14 PM
To: Gardell, Steven <[hidden email]>; [hidden email]
Subject: Re: Parallel builds across makefiles

On Sun, 2016-07-17 at 15:48 +0000, Gardell, Steven wrote:
> OK. Thanks! I will try the latest version.
>
> FWIW, I am measuring is total wall clock time to complete the build.
> (date; gmake... ; date) This goes up substantially with all sync modes
> other than "none" if gmake has parallel invocations of sub-makes going
> (e.g. "gmake  -f SomeOtherMakefile").  When I am only parallelizing
> individual makefiles (with ACE, these tend to be simple lists of files
> to compile), then the behavior is as promised by the documentation.

Assuming you determine the behavior exists even in version 4.2.1, please provide more details.  What level of -j are you using?  Which sync mode are you using?  Do you get the same behavior even with lower levels of -j, if yours is very large?

_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|

Re: Parallel builds across makefiles

David Boyce-3
A couple of suggestions:

1. Check your value of TMPDIR if any. All -O is doing is redirecting output into a temp file and dumping it later. Effectively it turns command "foo" into "foo > $TMPDIR/blah; cat $TMPDIR/blah; unlink $TMPDIR/blah". This is why it seems almost impossible for it to slow things down appreciably, and a slow temp dir device might be one explanation. Along similar lines you could try "TMPDIR=/tmp make ..."

2. Try strace. It has a mode which will timestamp each system call and another which prints a table of system calls used and how much time it took. One of these will probably be instructive, especially when compared with the same without -O.

David

On Sun, Jul 17, 2016 at 12:22 PM, Gardell, Steven <[hidden email]> wrote:
No I have not gotten to 4.2.1 yet.

I have experimented using j values ranging from 16 to 64. The majority of the testing is with -Otarget. But I have tried all of them.

-----Original Message-----
From: Paul Smith [mailto:[hidden email]]
Sent: Sunday, July 17, 2016 3:14 PM
To: Gardell, Steven <[hidden email]>; [hidden email]
Subject: Re: Parallel builds across makefiles

On Sun, 2016-07-17 at 15:48 +0000, Gardell, Steven wrote:
> OK. Thanks! I will try the latest version.
>
> FWIW, I am measuring is total wall clock time to complete the build.
> (date; gmake... ; date) This goes up substantially with all sync modes
> other than "none" if gmake has parallel invocations of sub-makes going
> (e.g. "gmake  -f SomeOtherMakefile").  When I am only parallelizing
> individual makefiles (with ACE, these tend to be simple lists of files
> to compile), then the behavior is as promised by the documentation.

Assuming you determine the behavior exists even in version 4.2.1, please provide more details.  What level of -j are you using?  Which sync mode are you using?  Do you get the same behavior even with lower levels of -j, if yours is very large?

_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|

RE: Parallel builds across makefiles

sgardell

Thanks for the suggestions. I have captured a bunch more data and hence the delay. First I have started by reproducing the behavior with 4.2.1:

 

gmake realclean >& cleanlog.txt

/usr/bin/gmake-4.2.1 -j32 -Otarget dirsparallel=all release >& gmake421dpa32.log; date

Tue Jul 19 11:24:47 EDT 2016

Tue Jul 19 20:43:17 EDT 2016

gmake realclean >& cleanlog.txt

/usr/bin/gmake-4.2.1 -j32 -Onone dirsparallel=all release > & gmake421dpn32.log ; date

Tue Jul 19 21:01:48 EDT 2016

Tue Jul 19 22:05:40 EDT 2016

 

Essentially the same as 4.1. This –Otarget run is not just “a little slower,” it is pathologically slow. The 9 hour build time is slower than if I had run the

build with no parallelization whatsoever. (The –Onone time is a bit faster than what I see doing parallelization at just the leaf level)

 

Our build is a relatively straight-forward “choreographed” build as followed:

 

     Top-Level Makefile

            Makefile.libs – This is a list of directories containing C++ code getting built into different relocatable libraries (one per dir)

                  (sub-directory Makefiles)

            Mikefile.bins – This is a list of directories that link executables

                  (sub-directory Makefiles)

 

There are O(120) libraries to build, each of which ranges from 10-50 objects. There are about 30 executables to build.  What “dirsparallel=all” does is to turn on parallel make for Makefile.libs, Makefile.dirs, etc. The actual build is more complicated,  having codegen, unit test, lint, and packaging  activities, but it follows this simple model.

 

The defining characteristic here is a wide fan-out. There are a large number of relatively simple and brief makefiles. There will therefore be a few “middle management” makefiles that are waiting on children to complete (I imagine each of these occupies a make job??).

 

I have confirmed that my count of “g++” invocations is exactly the same for the two runs (4403)

 

The TMPDIR variable was not set. I set it to /tmp to insure that it is on a physical device. (The build and our home directories are NFS mounted) The total elapsed time for the “-Otarget” run was then 6 hours. This is enough of a reduction to be a little suspicious, but not much more.

 

I ran with strace for both the “none” case and the “target” case.  The summaries are attached. Unfortunately, I don’t think there is a smoking gun here.

 

I have also run the “-Otrace” case at a few different job-levels:

 

j8 – 6:34

j32 – 6:00

j256 – 9:49

 

From: David Boyce [mailto:[hidden email]]
Sent: Sunday, July 17, 2016 5:03 PM
To: Gardell, Steven <[hidden email]>
Cc: [hidden email]; [hidden email]
Subject: Re: Parallel builds across makefiles

 

A couple of suggestions:

 

1. Check your value of TMPDIR if any. All -O is doing is redirecting output into a temp file and dumping it later. Effectively it turns command "foo" into "foo > $TMPDIR/blah; cat $TMPDIR/blah; unlink $TMPDIR/blah". This is why it seems almost impossible for it to slow things down appreciably, and a slow temp dir device might be one explanation. Along similar lines you could try "TMPDIR=/tmp make ..."

 

2. Try strace. It has a mode which will timestamp each system call and another which prints a table of system calls used and how much time it took. One of these will probably be instructive, especially when compared with the same without -O.

 

David

 

On Sun, Jul 17, 2016 at 12:22 PM, Gardell, Steven <[hidden email]> wrote:

No I have not gotten to 4.2.1 yet.

I have experimented using j values ranging from 16 to 64. The majority of the testing is with -Otarget. But I have tried all of them.

-----Original Message-----
From: Paul Smith [mailto:[hidden email]]
Sent: Sunday, July 17, 2016 3:14 PM
To: Gardell, Steven <[hidden email]>; [hidden email]
Subject: Re: Parallel builds across makefiles

On Sun, 2016-07-17 at 15:48 +0000, Gardell, Steven wrote:
> OK. Thanks! I will try the latest version.
>
> FWIW, I am measuring is total wall clock time to complete the build.
> (date; gmake... ; date) This goes up substantially with all sync modes
> other than "none" if gmake has parallel invocations of sub-makes going
> (e.g. "gmake  -f SomeOtherMakefile").  When I am only parallelizing
> individual makefiles (with ACE, these tend to be simple lists of files
> to compile), then the behavior is as promised by the documentation.

Assuming you determine the behavior exists even in version 4.2.1, please provide more details.  What level of -j are you using?  Which sync mode are you using?  Do you get the same behavior even with lower levels of -j, if yours is very large?

_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make

 


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make

targetSummary.txt (13K) Download Attachment
noneSummary.txt (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Parallel builds across makefiles

David Boyce-3
Given the grand total of 11 open() system calls, I'm guessing you used strace without the -f and -F options (noting that ISTR 4.2.1 reverts to use of vfork)?

BTW timings on linux are more elegantly handled by prefixing the "time" builtin. Wallclock-only can be gotten by forcing use of /usr/bin/time:

% type -a time
time is a shell keyword
time is /usr/bin/time

% time sleep 3

real 0m3.022s
user 0m0.001s
sys 0m0.001s

% \time -f %E sleep 3
0:03.00

David

On Fri, Jul 22, 2016 at 2:19 AM, Gardell, Steven <[hidden email]> wrote:

Thanks for the suggestions. I have captured a bunch more data and hence the delay. First I have started by reproducing the behavior with 4.2.1:

 

gmake realclean >& cleanlog.txt

/usr/bin/gmake-4.2.1 -j32 -Otarget dirsparallel=all release >& gmake421dpa32.log; date

Tue Jul 19 11:24:47 EDT 2016

Tue Jul 19 20:43:17 EDT 2016

gmake realclean >& cleanlog.txt

/usr/bin/gmake-4.2.1 -j32 -Onone dirsparallel=all release > & gmake421dpn32.log ; date

Tue Jul 19 21:01:48 EDT 2016

Tue Jul 19 22:05:40 EDT 2016

 

Essentially the same as 4.1. This –Otarget run is not just “a little slower,” it is pathologically slow. The 9 hour build time is slower than if I had run the

build with no parallelization whatsoever. (The –Onone time is a bit faster than what I see doing parallelization at just the leaf level)

 

Our build is a relatively straight-forward “choreographed” build as followed:

 

     Top-Level Makefile

            Makefile.libs – This is a list of directories containing C++ code getting built into different relocatable libraries (one per dir)

                  (sub-directory Makefiles)

            Mikefile.bins – This is a list of directories that link executables

                  (sub-directory Makefiles)

 

There are O(120) libraries to build, each of which ranges from 10-50 objects. There are about 30 executables to build.  What “dirsparallel=all” does is to turn on parallel make for Makefile.libs, Makefile.dirs, etc. The actual build is more complicated,  having codegen, unit test, lint, and packaging  activities, but it follows this simple model.

 

The defining characteristic here is a wide fan-out. There are a large number of relatively simple and brief makefiles. There will therefore be a few “middle management” makefiles that are waiting on children to complete (I imagine each of these occupies a make job??).

 

I have confirmed that my count of “g++” invocations is exactly the same for the two runs (4403)

 

The TMPDIR variable was not set. I set it to /tmp to insure that it is on a physical device. (The build and our home directories are NFS mounted) The total elapsed time for the “-Otarget” run was then 6 hours. This is enough of a reduction to be a little suspicious, but not much more.

 

I ran with strace for both the “none” case and the “target” case.  The summaries are attached. Unfortunately, I don’t think there is a smoking gun here.

 

I have also run the “-Otrace” case at a few different job-levels:

 

j8 – 6:34

j32 – 6:00

j256 – 9:49

 

From: David Boyce [mailto:[hidden email]]
Sent: Sunday, July 17, 2016 5:03 PM
To: Gardell, Steven <[hidden email]>
Cc: [hidden email]; [hidden email]


Subject: Re: Parallel builds across makefiles

 

A couple of suggestions:

 

1. Check your value of TMPDIR if any. All -O is doing is redirecting output into a temp file and dumping it later. Effectively it turns command "foo" into "foo > $TMPDIR/blah; cat $TMPDIR/blah; unlink $TMPDIR/blah". This is why it seems almost impossible for it to slow things down appreciably, and a slow temp dir device might be one explanation. Along similar lines you could try "TMPDIR=/tmp make ..."

 

2. Try strace. It has a mode which will timestamp each system call and another which prints a table of system calls used and how much time it took. One of these will probably be instructive, especially when compared with the same without -O.

 

David

 

On Sun, Jul 17, 2016 at 12:22 PM, Gardell, Steven <[hidden email]> wrote:

No I have not gotten to 4.2.1 yet.

I have experimented using j values ranging from 16 to 64. The majority of the testing is with -Otarget. But I have tried all of them.

-----Original Message-----
From: Paul Smith [mailto:[hidden email]]
Sent: Sunday, July 17, 2016 3:14 PM
To: Gardell, Steven <[hidden email]>; [hidden email]
Subject: Re: Parallel builds across makefiles

On Sun, 2016-07-17 at 15:48 +0000, Gardell, Steven wrote:
> OK. Thanks! I will try the latest version.
>
> FWIW, I am measuring is total wall clock time to complete the build.
> (date; gmake... ; date) This goes up substantially with all sync modes
> other than "none" if gmake has parallel invocations of sub-makes going
> (e.g. "gmake  -f SomeOtherMakefile").  When I am only parallelizing
> individual makefiles (with ACE, these tend to be simple lists of files
> to compile), then the behavior is as promised by the documentation.

Assuming you determine the behavior exists even in version 4.2.1, please provide more details.  What level of -j are you using?  Which sync mode are you using?  Do you get the same behavior even with lower levels of -j, if yours is very large?

_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make

 



_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|

RE: Parallel builds across makefiles

sgardell

I ran with -cf. By my read there are millions of open() calls here:

 

gmake-4.2.1: *** [Makefile:333: release] Error 1

% time     seconds  usecs/call     calls    errors syscall

------ ----------- ----------- --------- --------- ----------------

93.92 48988.895003      281029    174320     42047 wait4

  3.56 1857.549713        8133    228409     47431 futex

  1.12  582.114172          10  59230554  48886748 stat

  0.81  421.206442          11  37319045  29591972 open

  0.40  209.945060           3  71120290   3458259 lstat

  0.06   33.636522           2  13616492      4796 read

  0.02   11.818771           1   8065605     14388 close

 

And:

 

% time     seconds  usecs/call     calls    errors syscall

------ ----------- ----------- --------- --------- ----------------

93.82 53268.050950      302081    176337     41624 wait4

  3.18 1805.057765        7853    229844     48664 futex

  1.06  599.730842           8  71119946   3458260 lstat

  0.94  532.243350          14  37295656  29590966 open

  0.73  416.897838           7  59083695  48774860 stat

  0.09   50.603194           4  13780850      4946 read

  0.04   21.309529           3   8034030     14388 close

 

 

I am happy to run with any other settings that anyone  might suggest since I have the environment that is demonstrating the behavior.

 

From: David Boyce [mailto:[hidden email]]
Sent: Monday, July 25, 2016 11:54 AM
To: Gardell, Steven <[hidden email]>
Cc: [hidden email]; [hidden email]
Subject: Re: Parallel builds across makefiles

 

Given the grand total of 11 open() system calls, I'm guessing you used strace without the -f and -F options (noting that ISTR 4.2.1 reverts to use of vfork)?

 

BTW timings on linux are more elegantly handled by prefixing the "time" builtin. Wallclock-only can be gotten by forcing use of /usr/bin/time:

 

% type -a time

time is a shell keyword

time is /usr/bin/time

 

% time sleep 3

 

real      0m3.022s

user      0m0.001s

sys       0m0.001s

 

% \time -f %E sleep 3

0:03.00

 

David

 

On Fri, Jul 22, 2016 at 2:19 AM, Gardell, Steven <[hidden email]> wrote:

Thanks for the suggestions. I have captured a bunch more data and hence the delay. First I have started by reproducing the behavior with 4.2.1:

 

gmake realclean >& cleanlog.txt

/usr/bin/gmake-4.2.1 -j32 -Otarget dirsparallel=all release >& gmake421dpa32.log; date

Tue Jul 19 11:24:47 EDT 2016

Tue Jul 19 20:43:17 EDT 2016

gmake realclean >& cleanlog.txt

/usr/bin/gmake-4.2.1 -j32 -Onone dirsparallel=all release > & gmake421dpn32.log ; date

Tue Jul 19 21:01:48 EDT 2016

Tue Jul 19 22:05:40 EDT 2016

 

Essentially the same as 4.1. This –Otarget run is not just “a little slower,” it is pathologically slow. The 9 hour build time is slower than if I had run the

build with no parallelization whatsoever. (The –Onone time is a bit faster than what I see doing parallelization at just the leaf level)

 

Our build is a relatively straight-forward “choreographed” build as followed:

 

     Top-Level Makefile

            Makefile.libs – This is a list of directories containing C++ code getting built into different relocatable libraries (one per dir)

                  (sub-directory Makefiles)

            Mikefile.bins – This is a list of directories that link executables

                  (sub-directory Makefiles)

 

There are O(120) libraries to build, each of which ranges from 10-50 objects. There are about 30 executables to build.  What “dirsparallel=all” does is to turn on parallel make for Makefile.libs, Makefile.dirs, etc. The actual build is more complicated,  having codegen, unit test, lint, and packaging  activities, but it follows this simple model.

 

The defining characteristic here is a wide fan-out. There are a large number of relatively simple and brief makefiles. There will therefore be a few “middle management” makefiles that are waiting on children to complete (I imagine each of these occupies a make job??).

 

I have confirmed that my count of “g++” invocations is exactly the same for the two runs (4403)

 

The TMPDIR variable was not set. I set it to /tmp to insure that it is on a physical device. (The build and our home directories are NFS mounted) The total elapsed time for the “-Otarget” run was then 6 hours. This is enough of a reduction to be a little suspicious, but not much more.

 

I ran with strace for both the “none” case and the “target” case.  The summaries are attached. Unfortunately, I don’t think there is a smoking gun here.

 

I have also run the “-Otrace” case at a few different job-levels:

 

j8 – 6:34

j32 – 6:00

j256 – 9:49

 

From: David Boyce [mailto:[hidden email]]
Sent: Sunday, July 17, 2016 5:03 PM
To: Gardell, Steven <[hidden email]>
Cc: [hidden email]; [hidden email]


Subject: Re: Parallel builds across makefiles

 

A couple of suggestions:

 

1. Check your value of TMPDIR if any. All -O is doing is redirecting output into a temp file and dumping it later. Effectively it turns command "foo" into "foo > $TMPDIR/blah; cat $TMPDIR/blah; unlink $TMPDIR/blah". This is why it seems almost impossible for it to slow things down appreciably, and a slow temp dir device might be one explanation. Along similar lines you could try "TMPDIR=/tmp make ..."

 

2. Try strace. It has a mode which will timestamp each system call and another which prints a table of system calls used and how much time it took. One of these will probably be instructive, especially when compared with the same without -O.

 

David

 

On Sun, Jul 17, 2016 at 12:22 PM, Gardell, Steven <[hidden email]> wrote:

No I have not gotten to 4.2.1 yet.

I have experimented using j values ranging from 16 to 64. The majority of the testing is with -Otarget. But I have tried all of them.

-----Original Message-----
From: Paul Smith [mailto:[hidden email]]
Sent: Sunday, July 17, 2016 3:14 PM
To: Gardell, Steven <[hidden email]>; [hidden email]
Subject: Re: Parallel builds across makefiles

On Sun, 2016-07-17 at 15:48 +0000, Gardell, Steven wrote:
> OK. Thanks! I will try the latest version.
>
> FWIW, I am measuring is total wall clock time to complete the build.
> (date; gmake... ; date) This goes up substantially with all sync modes
> other than "none" if gmake has parallel invocations of sub-makes going
> (e.g. "gmake  -f SomeOtherMakefile").  When I am only parallelizing
> individual makefiles (with ACE, these tend to be simple lists of files
> to compile), then the behavior is as promised by the documentation.

Assuming you determine the behavior exists even in version 4.2.1, please provide more details.  What level of -j are you using?  Which sync mode are you using?  Do you get the same behavior even with lower levels of -j, if yours is very large?

_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make

 

 


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|

Re: Parallel builds across makefiles

David Boyce-3
Sorry, I misread the chart. But I do think -F would be important as well given that make employs vfork.

David

On Mon, Jul 25, 2016 at 11:22 AM, Gardell, Steven <[hidden email]> wrote:

I ran with -cf. By my read there are millions of open() calls here:

 

gmake-4.2.1: *** [Makefile:333: release] Error 1

% time     seconds  usecs/call     calls    errors syscall

------ ----------- ----------- --------- --------- ----------------

93.92 48988.895003      281029    174320     42047 wait4

  3.56 1857.549713        8133    228409     47431 futex

  1.12  582.114172          10  59230554  48886748 stat

  0.81  421.206442          11  37319045  29591972 open

  0.40  209.945060           3  71120290   3458259 lstat

  0.06   33.636522           2  13616492      4796 read

  0.02   11.818771           1   8065605     14388 close

 

And:

 

% time     seconds  usecs/call     calls    errors syscall

------ ----------- ----------- --------- --------- ----------------

93.82 53268.050950      302081    176337     41624 wait4

  3.18 1805.057765        7853    229844     48664 futex

  1.06  599.730842           8  71119946   3458260 lstat

  0.94  532.243350          14  37295656  29590966 open

  0.73  416.897838           7  59083695  48774860 stat

  0.09   50.603194           4  13780850      4946 read

  0.04   21.309529           3   8034030     14388 close

 

 

I am happy to run with any other settings that anyone  might suggest since I have the environment that is demonstrating the behavior.

 

From: David Boyce [mailto:[hidden email]]
Sent: Monday, July 25, 2016 11:54 AM


To: Gardell, Steven <[hidden email]>
Cc: [hidden email]; [hidden email]
Subject: Re: Parallel builds across makefiles

 

Given the grand total of 11 open() system calls, I'm guessing you used strace without the -f and -F options (noting that ISTR 4.2.1 reverts to use of vfork)?

 

BTW timings on linux are more elegantly handled by prefixing the "time" builtin. Wallclock-only can be gotten by forcing use of /usr/bin/time:

 

% type -a time

time is a shell keyword

time is /usr/bin/time

 

% time sleep 3

 

real      0m3.022s

user      0m0.001s

sys       0m0.001s

 

% \time -f %E sleep 3

0:03.00

 

David

 

On Fri, Jul 22, 2016 at 2:19 AM, Gardell, Steven <[hidden email]> wrote:

Thanks for the suggestions. I have captured a bunch more data and hence the delay. First I have started by reproducing the behavior with 4.2.1:

 

gmake realclean >& cleanlog.txt

/usr/bin/gmake-4.2.1 -j32 -Otarget dirsparallel=all release >& gmake421dpa32.log; date

Tue Jul 19 11:24:47 EDT 2016

Tue Jul 19 20:43:17 EDT 2016

gmake realclean >& cleanlog.txt

/usr/bin/gmake-4.2.1 -j32 -Onone dirsparallel=all release > & gmake421dpn32.log ; date

Tue Jul 19 21:01:48 EDT 2016

Tue Jul 19 22:05:40 EDT 2016

 

Essentially the same as 4.1. This –Otarget run is not just “a little slower,” it is pathologically slow. The 9 hour build time is slower than if I had run the

build with no parallelization whatsoever. (The –Onone time is a bit faster than what I see doing parallelization at just the leaf level)

 

Our build is a relatively straight-forward “choreographed” build as followed:

 

     Top-Level Makefile

            Makefile.libs – This is a list of directories containing C++ code getting built into different relocatable libraries (one per dir)

                  (sub-directory Makefiles)

            Mikefile.bins – This is a list of directories that link executables

                  (sub-directory Makefiles)

 

There are O(120) libraries to build, each of which ranges from 10-50 objects. There are about 30 executables to build.  What “dirsparallel=all” does is to turn on parallel make for Makefile.libs, Makefile.dirs, etc. The actual build is more complicated,  having codegen, unit test, lint, and packaging  activities, but it follows this simple model.

 

The defining characteristic here is a wide fan-out. There are a large number of relatively simple and brief makefiles. There will therefore be a few “middle management” makefiles that are waiting on children to complete (I imagine each of these occupies a make job??).

 

I have confirmed that my count of “g++” invocations is exactly the same for the two runs (4403)

 

The TMPDIR variable was not set. I set it to /tmp to insure that it is on a physical device. (The build and our home directories are NFS mounted) The total elapsed time for the “-Otarget” run was then 6 hours. This is enough of a reduction to be a little suspicious, but not much more.

 

I ran with strace for both the “none” case and the “target” case.  The summaries are attached. Unfortunately, I don’t think there is a smoking gun here.

 

I have also run the “-Otrace” case at a few different job-levels:

 

j8 – 6:34

j32 – 6:00

j256 – 9:49

 

From: David Boyce [mailto:[hidden email]]
Sent: Sunday, July 17, 2016 5:03 PM
To: Gardell, Steven <[hidden email]>
Cc: [hidden email]; [hidden email]


Subject: Re: Parallel builds across makefiles

 

A couple of suggestions:

 

1. Check your value of TMPDIR if any. All -O is doing is redirecting output into a temp file and dumping it later. Effectively it turns command "foo" into "foo > $TMPDIR/blah; cat $TMPDIR/blah; unlink $TMPDIR/blah". This is why it seems almost impossible for it to slow things down appreciably, and a slow temp dir device might be one explanation. Along similar lines you could try "TMPDIR=/tmp make ..."

 

2. Try strace. It has a mode which will timestamp each system call and another which prints a table of system calls used and how much time it took. One of these will probably be instructive, especially when compared with the same without -O.

 

David

 

On Sun, Jul 17, 2016 at 12:22 PM, Gardell, Steven <[hidden email]> wrote:

No I have not gotten to 4.2.1 yet.

I have experimented using j values ranging from 16 to 64. The majority of the testing is with -Otarget. But I have tried all of them.

-----Original Message-----
From: Paul Smith [mailto:[hidden email]]
Sent: Sunday, July 17, 2016 3:14 PM
To: Gardell, Steven <[hidden email]>; [hidden email]
Subject: Re: Parallel builds across makefiles

On Sun, 2016-07-17 at 15:48 +0000, Gardell, Steven wrote:
> OK. Thanks! I will try the latest version.
>
> FWIW, I am measuring is total wall clock time to complete the build.
> (date; gmake... ; date) This goes up substantially with all sync modes
> other than "none" if gmake has parallel invocations of sub-makes going
> (e.g. "gmake  -f SomeOtherMakefile").  When I am only parallelizing
> individual makefiles (with ACE, these tend to be simple lists of files
> to compile), then the behavior is as promised by the documentation.

Assuming you determine the behavior exists even in version 4.2.1, please provide more details.  What level of -j are you using?  Which sync mode are you using?  Do you get the same behavior even with lower levels of -j, if yours is very large?

_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make

 

 



_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|

Re: Parallel builds across makefiles

Brian Vandenberg
In reply to this post by sgardell
I haven't seen similar issues but I have a hypothesis: make is single-threaded and therefore it consumes output from jobs in the same thread it uses to reap/spawn new jobs.  If make is spending a large enough amount of time consuming output then this will impacts the rate at which it can spawn/reap new jobs.

I ran some tests to see when make attempts to consume output from child processes and here's what I learned:

(a) For jobs executing within a single instance of make (not recursive) it doesn't appear to begin consuming text until a recipe completes (I only tested this with --output-sync=target; I expect similar to be true of other sync modes excluding none).
(b) If you use --output-sync=target and a recipe contains the string ${MAKE} at all -- even if it doesn't actually start a recursive make instance -- the way it consumes output is different (more on that below)
(c) make will not stop reading job output until there's nothing left to consume.  During that time it cannot reap/spawn new jobs.  If your jobs produce a large enough amount of text this could create enough small delays that aggregate into the noticeable delays you're seeing.

As an example to demonstrate (b):

.RECIPEPREFIX := <
SHELL := bash
OUTER := $(shell echo -n {0..15})
INNER := $(shell echo -n {16..32})
all: outer
outer: ${OUTER}
inner: ${INNER}

${OUTER}:
<@${MAKE} -f /tmp/makefile inner

${INNER}:
<@perl -e 'my $$tmp = "${MAKE}"; map { printf "x\n"; sleep 1 } (0..100)'


I executed with (make 4.1):

make -j64 --output-sync=target

... and it begins printing output almost immediately in spite of the fact none of the recipes could have finished.  If you take ${MAKE} out of the recipe then it behaves as expected.

Now consider what I said w/regard to (c): it will only move on to spawning/reaping other jobs (or consuming other inputs) if there's nothing to read for the job it's currently consuming text from.

When I use --output-sync=recurse I don't see the same behavior, but if your build recurses more than 1 level then perhaps each sub-make attempts to consume output from its recursive make calls, possibly eating away at time it could spend spawning/reaping jobs.

If I'm right this could be completely avoided for those of us who would rather have log files instead of having everything dumped to the terminal.  If the parent make still needs to consume the output text, though, then the limiting factor is: make is single-threaded and consumes input from recursive jobs in the same thread it spawns/reaps new jobs.

-brian

_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|

RE: Parallel builds across makefiles

sgardell

Thanks for the insights and apologies for the delayed response. My measurements seem to bear out the inference about Orecurse. These were done in succession on the same HP G9 server with 28 physical cores (56 hyper-threads) and TMDIR set to /tmp:

 

Build with leaf makefiles at j=32, and parallel suppressed in higher level Makefiles – 1:37 (hours:minutes)

Build with low level makefiles at j=32, parallel enabled for the “list of makefiles” makefiles and -Orecurse – 1:11

Build with low level makefiles at j=32, parallel enabled for the “list of makefiles” makefiles and -Onone– 1:04

Build with low level makefiles at j=32, parallel enabled for the “list of makefiles” makefiles and -Otarget – 8:55

Build with j=1 – 4:45

 

These are wall clock times on a shared machine, so certainly subject to some modest variation; it is likely that the Onone and Orecurse times are effectively the same.

 

Somewhat naively, I guess, the most surprising thing here is the -Otarget time. I would not be awfully surprised to see it asymptotically approach the no-parallel-building time. But to see it repeatably much longer just seems a bit bizarre.

 

From: Brian Vandenberg [mailto:[hidden email]]
Sent: Wednesday, July 27, 2016 4:59 PM
To: Gardell, Steven <[hidden email]>
Cc: [hidden email]
Subject: Re: Parallel builds across makefiles

 

I haven't seen similar issues but I have a hypothesis: make is single-threaded and therefore it consumes output from jobs in the same thread it uses to reap/spawn new jobs.  If make is spending a large enough amount of time consuming output then this will impacts the rate at which it can spawn/reap new jobs.

 

I ran some tests to see when make attempts to consume output from child processes and here's what I learned:

 

(a) For jobs executing within a single instance of make (not recursive) it doesn't appear to begin consuming text until a recipe completes (I only tested this with --output-sync=target; I expect similar to be true of other sync modes excluding none).

(b) If you use --output-sync=target and a recipe contains the string ${MAKE} at all -- even if it doesn't actually start a recursive make instance -- the way it consumes output is different (more on that below)

(c) make will not stop reading job output until there's nothing left to consume.  During that time it cannot reap/spawn new jobs.  If your jobs produce a large enough amount of text this could create enough small delays that aggregate into the noticeable delays you're seeing.

 

As an example to demonstrate (b):

 

.RECIPEPREFIX := <

SHELL := bash

OUTER := $(shell echo -n {0..15})

INNER := $(shell echo -n {16..32})

all: outer

outer: ${OUTER}

inner: ${INNER}

 

${OUTER}:

<@${MAKE} -f /tmp/makefile inner

 

${INNER}:

<@perl -e 'my $$tmp = "${MAKE}"; map { printf "x\n"; sleep 1 } (0..100)'

 

 

I executed with (make 4.1):

 

make -j64 --output-sync=target

 

... and it begins printing output almost immediately in spite of the fact none of the recipes could have finished.  If you take ${MAKE} out of the recipe then it behaves as expected.

 

Now consider what I said w/regard to (c): it will only move on to spawning/reaping other jobs (or consuming other inputs) if there's nothing to read for the job it's currently consuming text from.

 

When I use --output-sync=recurse I don't see the same behavior, but if your build recurses more than 1 level then perhaps each sub-make attempts to consume output from its recursive make calls, possibly eating away at time it could spend spawning/reaping jobs.

 

If I'm right this could be completely avoided for those of us who would rather have log files instead of having everything dumped to the terminal.  If the parent make still needs to consume the output text, though, then the limiting factor is: make is single-threaded and consumes input from recursive jobs in the same thread it spawns/reaps new jobs.

 

-brian


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|

Re: Parallel builds across makefiles

David Boyce-3
I still think there must be an answer at the system-call level, i.e. using strace. It seems certain to me that some system call is taking much longer and that finding it would point to the solution. The fact that strace hasn't shown it yet implies that it's losing the thread at some point. Maybe try something like "SHELL := strace ... $(SHELL)", or find some other creative way to force it in at the leaves? The problem has to be visible at the system call layer.

Alternatively, have you tried building make 4.2.1 with custom instrumentation added?

David

On Sat, Jul 30, 2016 at 11:13 AM, Gardell, Steven <[hidden email]> wrote:

Thanks for the insights and apologies for the delayed response. My measurements seem to bear out the inference about Orecurse. These were done in succession on the same HP G9 server with 28 physical cores (56 hyper-threads) and TMDIR set to /tmp:

 

Build with leaf makefiles at j=32, and parallel suppressed in higher level Makefiles – 1:37 (hours:minutes)

Build with low level makefiles at j=32, parallel enabled for the “list of makefiles” makefiles and -Orecurse – 1:11

Build with low level makefiles at j=32, parallel enabled for the “list of makefiles” makefiles and -Onone– 1:04

Build with low level makefiles at j=32, parallel enabled for the “list of makefiles” makefiles and -Otarget – 8:55

Build with j=1 – 4:45

 

These are wall clock times on a shared machine, so certainly subject to some modest variation; it is likely that the Onone and Orecurse times are effectively the same.

 

Somewhat naively, I guess, the most surprising thing here is the -Otarget time. I would not be awfully surprised to see it asymptotically approach the no-parallel-building time. But to see it repeatably much longer just seems a bit bizarre.

 

From: Brian Vandenberg [mailto:[hidden email]]
Sent: Wednesday, July 27, 2016 4:59 PM
To: Gardell, Steven <[hidden email]>
Cc: [hidden email]
Subject: Re: Parallel builds across makefiles

 

I haven't seen similar issues but I have a hypothesis: make is single-threaded and therefore it consumes output from jobs in the same thread it uses to reap/spawn new jobs.  If make is spending a large enough amount of time consuming output then this will impacts the rate at which it can spawn/reap new jobs.

 

I ran some tests to see when make attempts to consume output from child processes and here's what I learned:

 

(a) For jobs executing within a single instance of make (not recursive) it doesn't appear to begin consuming text until a recipe completes (I only tested this with --output-sync=target; I expect similar to be true of other sync modes excluding none).

(b) If you use --output-sync=target and a recipe contains the string ${MAKE} at all -- even if it doesn't actually start a recursive make instance -- the way it consumes output is different (more on that below)

(c) make will not stop reading job output until there's nothing left to consume.  During that time it cannot reap/spawn new jobs.  If your jobs produce a large enough amount of text this could create enough small delays that aggregate into the noticeable delays you're seeing.

 

As an example to demonstrate (b):

 

.RECIPEPREFIX := <

SHELL := bash

OUTER := $(shell echo -n {0..15})

INNER := $(shell echo -n {16..32})

all: outer

outer: ${OUTER}

inner: ${INNER}

 

${OUTER}:

<@${MAKE} -f /tmp/makefile inner

 

${INNER}:

<@perl -e 'my $$tmp = "${MAKE}"; map { printf "x\n"; sleep 1 } (0..100)'

 

 

I executed with (make 4.1):

 

make -j64 --output-sync=target

 

... and it begins printing output almost immediately in spite of the fact none of the recipes could have finished.  If you take ${MAKE} out of the recipe then it behaves as expected.

 

Now consider what I said w/regard to (c): it will only move on to spawning/reaping other jobs (or consuming other inputs) if there's nothing to read for the job it's currently consuming text from.

 

When I use --output-sync=recurse I don't see the same behavior, but if your build recurses more than 1 level then perhaps each sub-make attempts to consume output from its recursive make calls, possibly eating away at time it could spend spawning/reaping jobs.

 

If I'm right this could be completely avoided for those of us who would rather have log files instead of having everything dumped to the terminal.  If the parent make still needs to consume the output text, though, then the limiting factor is: make is single-threaded and consumes input from recursive jobs in the same thread it spawns/reaps new jobs.

 

-brian


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make



_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make