[PATCH] Windows jobserver updates for GNU make 4.1

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] Windows jobserver updates for GNU make 4.1

Troy Runkel

This patch expands the maximum number of simultaneous jobs on Windows from 63 (limit dictated by Windows MAXIMUM_WAIT_OBJECTS) to 4095.

 

It also fixes a situation where GNU make on Windows could deadlock and/or suffer memory corruption issues if –j or –j63 was used.  The problem was due to the way that $(shell …) commands are handled.

 

--Troy Runkel

 


_______________________________________________
Make-w32 mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/make-w32

windows_jobserver_update.diff (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Windows jobserver updates for GNU make 4.1

Paul Smith-20
On Wed, 2016-02-24 at 17:27 +0000, Troy Runkel wrote:
> This patch expands the maximum number of simultaneous jobs on Windows
> from 63 (limit dictated by Windows MAXIMUM_WAIT_OBJECTS) to 4095.
>  
> It also fixes a situation where GNU make on Windows could deadlock
> and/or suffer memory corruption issues if –j or –j63 was used.  The
> problem was due to the way that $(shell …) commands are handled.

Thanks Troy this looks like good stuff.

_______________________________________________
Make-w32 mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/make-w32
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Windows jobserver updates for GNU make 4.1

Eli Zaretskii
> From: Paul Smith <[hidden email]>
> Date: Wed, 24 Feb 2016 20:25:43 -0500
>
> On Wed, 2016-02-24 at 17:27 +0000, Troy Runkel wrote:
> > This patch expands the maximum number of simultaneous jobs on Windows
> > from 63 (limit dictated by Windows MAXIMUM_WAIT_OBJECTS) to 4095.
> >  
> > It also fixes a situation where GNU make on Windows could deadlock
> > and/or suffer memory corruption issues if –j or –j63 was used.  The
> > problem was due to the way that $(shell …) commands are handled.
>
> Thanks Troy this looks like good stuff.

Thanks, I will install this in a couple of days, after a more careful
reading of the patch.  I saw a busy-wait loop there, which I'm not
sure I like.

_______________________________________________
Make-w32 mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/make-w32
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Windows jobserver updates for GNU make 4.1

Eli Zaretskii
In reply to this post by Troy Runkel
> From: Troy Runkel <[hidden email]>
> Date: Wed, 24 Feb 2016 17:27:57 +0000
> Cc: Troy Runkel <[hidden email]>
>
> This patch expands the maximum number of simultaneous jobs on Windows from 63 (limit dictated by
> Windows MAXIMUM_WAIT_OBJECTS) to 4095.
>
> It also fixes a situation where GNU make on Windows could deadlock and/or suffer memory corruption issues
> if –j or –j63 was used. The problem was due to the way that $(shell …) commands are handled.

Thanks.

I wonder how hard would it be to remove the limitation altogether?
If we are lifting the limitation, why not lift it entirely?

Another comment is about the busy-wait loop (with 10-msec sleep) that
this change uses -- it looks like, when there are 4095 jobs running,
we will be re-checking each process once every 640 msec, is that
right?  That might be too large a delay, I think.  Did you consider a
design that uses a separate thread for watching up to 64 processes?  I
think such a design might scale up better; in particular, the response
time for checking the status of each job will not decrease as the
number of jobs goes up.

> +DWORD GmakeWaitForMultipleObjects(
> +  DWORD nCount,
> +  const HANDLE *lpHandles,
> +  BOOL bWaitAll,
> +  DWORD dwMilliseconds
> +)
> +{
> +  assert(nCount <= GMAKE_MAXIMUM_WAIT_OBJECTS);
> +
> +  if (nCount <= MAXIMUM_WAIT_OBJECTS) {
> +    DWORD retVal =  WaitForMultipleObjects(nCount, lpHandles, bWaitAll, dwMilliseconds);
> +    return (retVal == WAIT_TIMEOUT) ? GMAKE_WAIT_TIMEOUT : retVal;

This GMAKE_WAIT_TIMEOUT and GMAKE_WAIT_ABANDONED_0 stuff looks like a
kludge to me; can we get rid of it?

> +        switch (retVal) {
> +          case WAIT_TIMEOUT:
> +            retVal = GMAKE_WAIT_TIMEOUT;
> +            continue;
> +            break;
> +          case WAIT_FAILED:
> +            fprintf(stderr,"WaitForMultipleOjbects failed waiting with error %d\n", GetLastError());
> +            break;

I guess this fprintf should go away, or be converted to a DB call?

Thank you for working on this.

_______________________________________________
Make-w32 mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/make-w32
Reply | Threaded
Open this post in threaded view
|

RE: [PATCH] Windows jobserver updates for GNU make 4.1

Troy Runkel
Hi Eli,

The changes for this patch were created by another developer where I work.  I've forwarded your questions to him and included his answers below.  Thanks!

--Troy Runkel

-----Original Message-----
From: Eli Zaretskii [mailto:[hidden email]]
Sent: Saturday, February 27, 2016 4:01 AM
To: Troy Runkel <[hidden email]>
Cc: [hidden email]
Subject: Re: [PATCH] Windows jobserver updates for GNU make 4.1

> From: Troy Runkel <[hidden email]>
> Date: Wed, 24 Feb 2016 17:27:57 +0000
> Cc: Troy Runkel <[hidden email]>
>
> This patch expands the maximum number of simultaneous jobs on Windows
> from 63 (limit dictated by Windows MAXIMUM_WAIT_OBJECTS) to 4095.
>
> It also fixes a situation where GNU make on Windows could deadlock
> and/or suffer memory corruption issues if –j or –j63 was used. The problem was due to the way that $(shell …) commands are handled.

Thanks.

I wonder how hard would it be to remove the limitation altogether?
If we are lifting the limitation, why not lift it entirely?

        ANSWER:  Simplicity and reliability.  To our knowledge, no one has complained about the
        previous limit of 63 until we bumped into it while using IncrediBuild to distribute jobs across
        multiple machines.   A larger number could certainly be used or a dynamic value, but
        neither seemed appropriate in our engineering judgement.

Another comment is about the busy-wait loop (with 10-msec sleep) that this change uses -- it looks like, when there are 4095 jobs running, we will be re-checking each process once every 640 msec, is that right?

        ANSWER: No, the check is constant time.  The 10ms sleep only occurs when no processes
        are ready and gmake would have otherwise been blocked (note:  the sleep is outside the
        inner for-loop.  It looks like the file was formatted with tabs so brace alignment might be
        confusing depending on what diff tool you are using.

That might be too large a delay, I think.  Did you consider a design that uses a separate thread for watching up to 64 processes?
  I think such a design might scale up better; in particular, the response time for checking the status of each job will not decrease as the number of jobs goes up.

        ANSWER: Yes, that was our first approach, but the cost of creating and managing the threads vastly
        exceeded the execution time of the current approach.  More specifically, when waiting on
        threads, one is typically waiting for them to die and when one dies, you have all the others
        to worry about rounding up (i.e., they need to be killed or stopped).  It becomes hairy
        quite fast, and has all sorts of subtleties like race conditions, etc.

        Also, note that we were unable to measure the overhead of the current approach and it has been
        working reliably in a large production environment for 8+ months.

> +DWORD GmakeWaitForMultipleObjects(
> +  DWORD nCount,
> +  const HANDLE *lpHandles,
> +  BOOL bWaitAll,
> +  DWORD dwMilliseconds
> +)
> +{
> +  assert(nCount <= GMAKE_MAXIMUM_WAIT_OBJECTS);
> +
> +  if (nCount <= MAXIMUM_WAIT_OBJECTS) {
> +    DWORD retVal =  WaitForMultipleObjects(nCount, lpHandles, bWaitAll, dwMilliseconds);
> +    return (retVal == WAIT_TIMEOUT) ? GMAKE_WAIT_TIMEOUT : retVal;

This GMAKE_WAIT_TIMEOUT and GMAKE_WAIT_ABANDONED_0 stuff looks like a kludge to me; can we get rid of it?

        ANSWER:  No. These are to compensate for a short-sighted design decision of Microsoft's part to
        use values for error conditions that are now in the middle of our legitimate return range, thus
        we need to move the error conditions outside this now legal region.  Keeping the error conditions
        the same and mapping the legitimate values around them, while probably possible, would
        likely require more code and greater chance of confusion and errors.

> +        switch (retVal) {
> +          case WAIT_TIMEOUT:
> +            retVal = GMAKE_WAIT_TIMEOUT;
> +            continue;
> +            break;
> +          case WAIT_FAILED:
> +            fprintf(stderr,"WaitForMultipleOjbects failed waiting with error %d\n", GetLastError());
> +            break;

I guess this fprintf should go away, or be converted to a DB call?

        ANSWER: We have never seen this error in practice (that we are aware of), but we wanted to
        cover all our bases.   Converting it to an alternate form is probably fine.  Thanks.

Thank you for working on this.
_______________________________________________
Make-w32 mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/make-w32
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Windows jobserver updates for GNU make 4.1

Eli Zaretskii
> From: Troy Runkel <[hidden email]>
> CC: "[hidden email]" <[hidden email]>
> Date: Tue, 1 Mar 2016 12:21:38 +0000
>
> The changes for this patch were created by another developer where I work.  I've forwarded your questions to him and included his answers below.  Thanks!

Thanks.  But if changes were not written by you, how can we accept
them, without knowing the name of that person and verifying we have a
copyright assignment from him on file?  The changes are substantial
enough to require legal papers.

_______________________________________________
Make-w32 mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/make-w32
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Windows jobserver updates for GNU make 4.1

Paul Smith-20
On Tue, 2016-03-01 at 18:55 +0200, Eli Zaretskii wrote:

> > From: Troy Runkel <[hidden email]>
> > CC: "[hidden email]" <[hidden email]>
> > Date: Tue, 1 Mar 2016 12:21:38 +0000
> >
> > The changes for this patch were created by another developer where
> I work.  I've forwarded your questions to him and included his
> answers below.  Thanks!
>
> Thanks.  But if changes were not written by you, how can we accept
> them, without knowing the name of that person and verifying we have a
> copyright assignment from him on file?  The changes are substantial
> enough to require legal papers.

I don't know if this is good or bad (although I can guess :)), but I'm
about to push changes (probably tonight) which extract all aspects of
the jobserver feature into platform-specific files.  So, this patch
will no longer apply at all... but hopefully the changes needed for
this feature will be much simpler to implement (in any event, they
should be cleaner to implement).

The main impetus for this is a change I want to make to the POSIX
implementation to allow us to use pselect() where available, which will
get rid of the SO_RESTART/SIGCHLD hack and increase reliability
(looking at https://savannah.gnu.org/bugs/index.php?46261 ).  I just
couldn't face another round of ever-more #ifdef'd code.

The POSIX jobserver code is now in posixos.c and the Windows code is
now in w32/w32os.c.  There is a single "os.h" header file that defines
the interface both support.

I've also added stub functions to vmsfunctions.c.  I need to look at
the AmigaOS support to see what needs to be done there.  OS2 is using
mainly the same interface as POSIX, weirdly enough.  I did a bit of
work to pull those changes into posixos.c.

I have built on Linux and will test on OSX.

I have built on Windows using both Visual Studio 14 (2015) and MinGW
GCC 4.9.3, with the build_w32.bat file (which I have completely
rewritten... hope that's OK).  I edited, but did not try to use, the
vcproj file to add the new file.

I am not able to test on VMS, AmigaOS, OS2, or DOS.

I'm not sure what to do about the plethora of other ways we can build
for Windows.  Do we still have any use for dosbuild.bat for example?
 It seems to invoke gcc... does that mean it's superseded by the
build_w32.bat support for GCC builds?

I assume we still support NMakefile?

It's a little frustrating how many different methods we have to
support.  I wonder if we can somehow condense this... hmm.

_______________________________________________
Make-w32 mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/make-w32
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Windows jobserver updates for GNU make 4.1

Eli Zaretskii
> From: Paul Smith <[hidden email]>
> Cc: [hidden email]
> Date: Mon, 07 Mar 2016 17:11:25 -0500
>
> I'm not sure what to do about the plethora of other ways we can build
> for Windows.  Do we still have any use for dosbuild.bat for example?
>  It seems to invoke gcc... does that mean it's superseded by the
> build_w32.bat support for GCC builds?

dosbuild.bat is for DOS, not for Windows.

> I assume we still support NMakefile?

No, not really.

_______________________________________________
Make-w32 mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/make-w32
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Windows jobserver updates for GNU make 4.1

Paul Smith-20
On Tue, 2016-03-08 at 05:30 +0200, Eli Zaretskii wrote:
> > I'm not sure what to do about the plethora of other ways we can build
> > for Windows.  Do we still have any use for dosbuild.bat for example?
> >  It seems to invoke gcc... does that mean it's superseded by the
> > build_w32.bat support for GCC builds?
>
> dosbuild.bat is for DOS, not for Windows.

I know, but when I look at the content of that batch file I don't see
the difference in the commands invoked between the build_w32.bat GCC and
dosbuild.bat.  They both seem to invoke GCC, with more or less the same
command line and source files.

Maybe I just missed it.

_______________________________________________
Make-w32 mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/make-w32
Reply | Threaded
Open this post in threaded view
|

RE: [PATCH] Windows jobserver updates for GNU make 4.1

Troy Runkel
In reply to this post by Paul Smith-20
Interesting.  Sounds like a great update.  I look forward to taking a look at it.

So, if I were to personally re-implement our Windows jobserver patch on top of your changes and then submit a new patch, would you be able to accept the patch using my already existing copyright assignment?

Thanks!

--Troy Runkel

-----Original Message-----
From: Paul Smith [mailto:[hidden email]]
Sent: Monday, March 07, 2016 5:11 PM
To: Eli Zaretskii <[hidden email]>; Troy Runkel <[hidden email]>
Cc: [hidden email]
Subject: Re: [PATCH] Windows jobserver updates for GNU make 4.1

On Tue, 2016-03-01 at 18:55 +0200, Eli Zaretskii wrote:

> > From: Troy Runkel <[hidden email]>
> > CC: "[hidden email]" <[hidden email]>
> > Date: Tue, 1 Mar 2016 12:21:38 +0000
> >
> > The changes for this patch were created by another developer where
> I work.  I've forwarded your questions to him and included his answers
> below.  Thanks!
>
> Thanks.  But if changes were not written by you, how can we accept
> them, without knowing the name of that person and verifying we have a
> copyright assignment from him on file?  The changes are substantial
> enough to require legal papers.

I don't know if this is good or bad (although I can guess :)), but I'm about to push changes (probably tonight) which extract all aspects of the jobserver feature into platform-specific files.  So, this patch will no longer apply at all... but hopefully the changes needed for this feature will be much simpler to implement (in any event, they should be cleaner to implement).

The main impetus for this is a change I want to make to the POSIX implementation to allow us to use pselect() where available, which will get rid of the SO_RESTART/SIGCHLD hack and increase reliability (looking at https://savannah.gnu.org/bugs/index.php?46261 ).  I just couldn't face another round of ever-more #ifdef'd code.

The POSIX jobserver code is now in posixos.c and the Windows code is now in w32/w32os.c.  There is a single "os.h" header file that defines the interface both support.

I've also added stub functions to vmsfunctions.c.  I need to look at the AmigaOS support to see what needs to be done there.  OS2 is using mainly the same interface as POSIX, weirdly enough.  I did a bit of work to pull those changes into posixos.c.

I have built on Linux and will test on OSX.

I have built on Windows using both Visual Studio 14 (2015) and MinGW GCC 4.9.3, with the build_w32.bat file (which I have completely rewritten... hope that's OK).  I edited, but did not try to use, the vcproj file to add the new file.

I am not able to test on VMS, AmigaOS, OS2, or DOS.

I'm not sure what to do about the plethora of other ways we can build for Windows.  Do we still have any use for dosbuild.bat for example?
 It seems to invoke gcc... does that mean it's superseded by the build_w32.bat support for GCC builds?

I assume we still support NMakefile?

It's a little frustrating how many different methods we have to support.  I wonder if we can somehow condense this... hmm.
_______________________________________________
Make-w32 mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/make-w32
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Windows jobserver updates for GNU make 4.1

Eli Zaretskii
In reply to this post by Paul Smith-20
> From: Paul Smith <[hidden email]>
> Cc: [hidden email], [hidden email]
> Date: Tue, 08 Mar 2016 00:18:58 -0500
>
> > dosbuild.bat is for DOS, not for Windows.
>
> I know, but when I look at the content of that batch file I don't see
> the difference in the commands invoked between the build_w32.bat GCC and
> dosbuild.bat.  They both seem to invoke GCC, with more or less the same
> command line and source files.

No, there are several important differences:

 . dosbuild.bat doesn't compile files under w32/
 . dosbuild.bat doesn't compile Guile support
 . dosbuild.bat doesn't compile load API
 . dosbuild.bat mentions DJGPP, which is the DOS GNU-based development
   environment


_______________________________________________
Make-w32 mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/make-w32
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Windows jobserver updates for GNU make 4.1

Eli Zaretskii
In reply to this post by Troy Runkel
> From: Troy Runkel <[hidden email]>
> CC: "[hidden email]" <[hidden email]>,
>         Troy Runkel
> <[hidden email]>
> Date: Tue, 8 Mar 2016 13:54:26 +0000
>
> So, if I were to personally re-implement our Windows jobserver patch on top of your changes and then submit a new patch, would you be able to accept the patch using my already existing copyright assignment?

I see no problem with your assignment, Troy.

Thanks.

_______________________________________________
Make-w32 mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/make-w32
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Windows jobserver updates for GNU make 4.1

Paul Smith-20
In reply to this post by Eli Zaretskii
On Tue, 2016-03-08 at 18:04 +0200, Eli Zaretskii wrote:

> > I know, but when I look at the content of that batch file I don't see
> > the difference in the commands invoked between the build_w32.bat GCC and
> > dosbuild.bat.  They both seem to invoke GCC, with more or less the same
> > command line and source files.
>
> No, there are several important differences:
>
>  . dosbuild.bat doesn't compile files under w32/
>  . dosbuild.bat doesn't compile Guile support
>  . dosbuild.bat doesn't compile load API
>  . dosbuild.bat mentions DJGPP, which is the DOS GNU-based development
>    environment

OK.  I'll put down my editor and back away slowly, and let you all
decide if the various build alternatives can be consolidated...

Maybe I'll see if I can get a DJGPP install just to verify things
compile and run still; can you do on Windows or should I get a
virtualbox running MS-DOS? :)

Last night I got the new pselect() version of jobserver working and was
able to compile glibc parallel with and without load restrictions. One
regression test fails.  I should be able resolve that tonight.

_______________________________________________
Make-w32 mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/make-w32
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Windows jobserver updates for GNU make 4.1

Eli Zaretskii
> From: Paul Smith <[hidden email]>
> Cc: [hidden email], [hidden email]
> Date: Tue, 08 Mar 2016 16:11:55 -0500
>
> >  . dosbuild.bat doesn't compile files under w32/
> >  . dosbuild.bat doesn't compile Guile support
> >  . dosbuild.bat doesn't compile load API
> >  . dosbuild.bat mentions DJGPP, which is the DOS GNU-based development
> >    environment
>
> OK.  I'll put down my editor and back away slowly, and let you all
> decide if the various build alternatives can be consolidated...
>
> Maybe I'll see if I can get a DJGPP install just to verify things
> compile and run still; can you do on Windows or should I get a
> virtualbox running MS-DOS? :)

Yes, you can do that on Windows, if you install DJGPP.

_______________________________________________
Make-w32 mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/make-w32
Reply | Threaded
Open this post in threaded view
|

RE: [PATCH] Windows jobserver updates for GNU make 4.1

Troy Runkel
In reply to this post by Eli Zaretskii
I've contacted MathWorks legal and the originating developer regarding getting a copyright assignment in place for these changes.  Whom should I contact on the FSF side regarding the copyright assignment?

Thanks!

Troy Runkel

-----Original Message-----
From: Eli Zaretskii [mailto:[hidden email]]
Sent: Tuesday, March 01, 2016 11:56 AM
To: Troy Runkel <[hidden email]>
Cc: [hidden email]
Subject: Re: [PATCH] Windows jobserver updates for GNU make 4.1

> From: Troy Runkel <[hidden email]>
> CC: "[hidden email]" <[hidden email]>
> Date: Tue, 1 Mar 2016 12:21:38 +0000
>
> The changes for this patch were created by another developer where I work.  I've forwarded your questions to him and included his answers below.  Thanks!

Thanks.  But if changes were not written by you, how can we accept them, without knowing the name of that person and verifying we have a copyright assignment from him on file?  The changes are substantial enough to require legal papers.

_______________________________________________
Make-w32 mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/make-w32
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Windows jobserver updates for GNU make 4.1

Eli Zaretskii
> From: Troy Runkel <[hidden email]>
> CC: "[hidden email]" <[hidden email]>,
>         Troy Runkel
> <[hidden email]>
> Date: Fri, 8 Apr 2016 12:19:27 +0000
>
> I've contacted MathWorks legal and the originating developer regarding getting a copyright assignment in place for these changes.  Whom should I contact on the FSF side regarding the copyright assignment?

I can send you the assignment request form, if you want.  Then you
should act according to the instructions there.

_______________________________________________
Make-w32 mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/make-w32