[bug #51237] Deadlock in Ctrl-C handler on Windows

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[bug #51237] Deadlock in Ctrl-C handler on Windows

Robert Morell
URL:
  <http://savannah.gnu.org/bugs/?51237>

                 Summary: Deadlock in Ctrl-C handler on Windows
                 Project: make
            Submitted by: mbuilov
            Submitted on: Wed 14 Jun 2017 12:43:28 PM UTC
                Severity: 3 - Normal
              Item Group: Bug
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
       Component Version: 4.2.1
        Operating System: MS Windows
           Fixed Release: None
           Triage Status: None

    _______________________________________________________

Details:

Hello.

There is a bug in processing of Ctrl+C event under WINDOWS.

Gnu make handles Ctrl+C in fatal_error_signal() function, which suspends Main
thread.

But, if Main thread get suspended while it locks resources, where is a
possibility for dead-lock.

It's easy to reproduce dead-lock with this makefile:

# run make -j -O, then press Ctrl+C
all: a b c d
CMD := cmd.exe /c "echo off & for /l %a in (0,1,1000) do echo %a"
a:;$(CMD)
b:;$(CMD)
c:;$(CMD)
d:;$(CMD)
.PHONY: all a b c d

(backtrace1 attached)


Even without job-server and output synchronization, it is possible to
dead-lock make or put it to infinite loop:

# just run make without options, then press Ctrl+C
# (hard to reproduce dead-lock)
GOALS := a b c d e f g h i j k l m n o p q r s t u v w x y z
GOALS += $(GOALS:=1)
GOALS += $(GOALS:=2)
GOALS += $(GOALS:=3)
all: $(GOALS)
$(GOALS):;cmd.exe /c "echo off & echo 1 > NUL"
.PHONY: all $(GOALS)

(backtrace2 attached)


One of possible solutions to this dead-lock problem - instead of suspending
Main thread, Ctrl+C handler thread may notify Main thread to stop and wait
until it releases resources and goes to sleep.

Here is the patch:

https://github.com/mbuilov/gnumake-windows/blob/master/make-4.2.1-win32-ctrl-c.patch




    _______________________________________________________

File Attachments:


-------------------------------------------------------
Date: Wed 14 Jun 2017 12:43:28 PM UTC  Name: backtrace1  Size: 1kB   By:
mbuilov
backtraces of dead-locked Main and Ctrl+C threads
<http://savannah.gnu.org/bugs/download.php?file_id=40917>
-------------------------------------------------------
Date: Wed 14 Jun 2017 12:43:28 PM UTC  Name: backtrace2  Size: 2kB   By:
mbuilov
backtraces of dead-locked Main and Ctrl+C threads
<http://savannah.gnu.org/bugs/download.php?file_id=40918>

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?51237>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[bug #51237] Deadlock in Ctrl-C handler on Windows

Robert Morell
Follow-up Comment #1, bug #51237 (project make):

Thanks for the analysis and the patch.
Your patch is too large to accept without legal paperwork.  Would you be
willing to assign copyright for your changes to the FSF, so we could accept
your contribution?  If so, I can send you the legal form and instructions to
fill and submit it.

Also, I wonder if your implementation is the best one (or maybe I'm missing
something).  You've inserted a call to check_need_sleep into the main loop of
reap_children.  But if the 1st arg of reap_children is non-zero, Make will
block inside process_wait_for_any, and AFAIU signaling the event you added
will not interrupt that wait.  Wouldn't it be better to instead add the event
handle to the handles on which process_wait_for_any waits?  Then the reaction
to an interrupt should be much faster and more reliable, I think.  Or am I
missing something?


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?51237>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[bug #51237] Deadlock in Ctrl-C handler on Windows

Robert Morell
Follow-up Comment #2, bug #51237 (project make):

>> ...Would you be willing to assign copyright for your changes to the FSF...

Yes, I will, no problem.

>> ...signaling the event you added will not interrupt that wait...

New event is not supposed to interrupt Main thread waiting for sub-processes.
Event is waited only by Ctrl+C handler thread and is signaled by Main thread
just before it goes to infinite sleep.

By default, CTRL+C signal is passed to all console processes that are attached
to the console.
So, normally, all console sub-processes spawned by make are stopped by
CTRL+C.

Only processes without console (GUI apps) or that have explicitly detached
console (via FreeConsole) are not stopped by Ctrl+C.
But that kind of processes are unlikely run by make, killing them is
questionable.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?51237>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[bug #51237] Deadlock in Ctrl-C handler on Windows

Robert Morell
Follow-up Comment #3, bug #51237 (project make):

I have created test application, which detaches console and sleeps for 1000
seconds:

#include <windows.h>
int main(int argc, char *argv[]) {
    FreeConsole();
    Sleep(1000000);
    return 0;
}

And tested it with this makefile:

# run make -j -O, then press Ctrl+C
all: a b c d
CMD := C:\test.exe
a:;$(CMD)
b:;$(CMD)
c:;$(CMD)
d:;$(CMD)
.PHONY: all a b c d

I see next behaviors of original/pached make on Ctrl+C.

1) Original make-4.2.1.

- Main thread is suspended in WaitForMultipleObjects.
- Ctrl+C handler thread waits for one of 4 sub-processes in
WaitForMultipleObjects.

2) Patched make-4.2.1.

- Main thread waits for one of 4 sub-processes in WaitForMultipleObjects.
- Ctrl+C handler thread waits for Main thread in WaitForSingleObject.

Both makes hang for 1000 seconds.

...

For this test, it is possible to fix Ctrl+C handler in original make, so it
will kill sub-processes.

(replace SIGTERM with SIGINT in fatal_error_signal, because "The SIGILL and
SIGTERM signals are not generated under Windows. They are included for ANSI
compatibility").

And this is works, but only top child processes get killed:

If I use

CMD := cmd.exe /c "C:\test.exe"

Then by Ctrl+C, make kills only cmd.exe, and C:\test.exe continues to live.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?51237>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[bug #51237] Deadlock in Ctrl-C handler on Windows

Robert Morell
Follow-up Comment #4, bug #51237 (project make):

I would prefer this to be done differently.  Sorry for the late conversation.

First, we have this same type of issue in UNIX (see for example bug #50557) so
I would prefer to not have a Windows-specific solution.

Second, I don't see why we need a separate thread to handle this.  Why can't
we just modify the signal handler to set a flag saying that a fatal signal was
received, then test that flag in appropriate places during the main processing
of make.  If the flag is set then we die.  Of course, finding the "appropriate
places" is the trick.

Maybe there's something about Windows that makes this less straightforward
than on POSIX systems?

Finally, this implementation adds a lot more ifdefs to the code.  I'm (more
slowly than I'd like) trying to remove ifdefs by moving system-specific code
into system-specific source files and creating generic interfaces.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?51237>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[bug #51237] Deadlock in Ctrl-C handler on Windows

Robert Morell
Follow-up Comment #5, bug #51237 (project make):

> First, we have this same type of issue in UNIX (see for example bug #50557)
so I would prefer to not have a Windows-specific solution.

I'm not sure it's the same issue, although it maybe looks the same.  See
below.

> Second, I don't see why we need a separate thread to handle this. Why can't
we just modify the signal handler to set a flag saying that a fatal signal was
received, then test that flag in appropriate places during the main processing
of make. If the flag is set then we die. Of course, finding the "appropriate
places" is the trick.

Maybe I'm missing something.  When the user presses Ctrl-C, Make gets a
SIGINT.  This causes fatal_error_signal to be called, which then proceeds to
do stuff.  That "stuff" is done entirely in the signal handler, so how can
just setting a flag solve the issue?  Who will do all that "stuff" "during the
main processing"?

> Maybe there's something about Windows that makes this less straightforward
than on POSIX systems?

Yes, there is: on Windows, the SIGINT handler runs in a separate thread,
created by the system.  That is why the MS-Windows code of the signal handler
begins by suspending the main thread -- because this emulates more closely
what would happen on Posix hosts.

> Finally, this implementation adds a lot more ifdefs to the code.

That should be the least of our problems.  I actually intended to move all the
additional code to Windows-specific files.

But first we need to agree on the solution, so let's forget about ifdef's for
now.


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?51237>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[bug #51237] Deadlock in Ctrl-C handler on Windows

Robert Morell
Follow-up Comment #6, bug #51237 (project make):

It's not the exact same problem but it's caused by the same flaw in the make
code: doing lots of work in a signal handler.

What I was hoping to do is change the signal handler to do nothing more than
set a flag saying that the fatal signal was received, and move all the "stuff"
into a separate function that is not a signal handler.

There are lots of issues with having significant work done in a signal
handler.  Another bug I just fixed this past weekend is to correct
blocking/unblocking fatal signals, which we need to do only because the signal
handler is mucking about with the child lists.

Then it becomes the responsibility of the mainline code to check the fatal
flag at various times and call the "do fatal stuff" function when it sees the
flag is set.

The question of course is, what are the correct "various times".  We can
probably make some fairly good guesses: right before we do something that
might "take a long time".

I guess the real problem is the same one that led to lots of issues with the
jobserver: even if we check immediately before we  start to wait for child to
finish the signal could be received in the instant between when we check and
we drop into the wait, and we would have to wait for the child to exit before
we'd notice the fatal signal.  I'll have to think about that.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?51237>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[bug #51237] Deadlock in Ctrl-C handler on Windows

Robert Morell
Follow-up Comment #7, bug #51237 (project make):

If you are going to refactor the code which handles SIGINT, then we should
return to this issue only after the new code is written.

The only Windows-specific aspect of your idea is that the signal handler will
still run in a separate thread on Windows, so we should be careful about
examining that flag, perhaps using atomic operations for that.


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?51237>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [bug #51237] Deadlock in Ctrl-C handler on Windows

David Boyce-3
In reply to this post by Robert Morell
> right before we do something that might "take a long time".

Or "right before we do something that could change disk state".

On Mon, Jul 10, 2017 at 11:29 AM, Paul D. Smith <[hidden email]> wrote:
Follow-up Comment #6, bug #51237 (project make):

It's not the exact same problem but it's caused by the same flaw in the make
code: doing lots of work in a signal handler.

What I was hoping to do is change the signal handler to do nothing more than
set a flag saying that the fatal signal was received, and move all the "stuff"
into a separate function that is not a signal handler.

There are lots of issues with having significant work done in a signal
handler.  Another bug I just fixed this past weekend is to correct
blocking/unblocking fatal signals, which we need to do only because the signal
handler is mucking about with the child lists.

Then it becomes the responsibility of the mainline code to check the fatal
flag at various times and call the "do fatal stuff" function when it sees the
flag is set.

The question of course is, what are the correct "various times".  We can
probably make some fairly good guesses: right before we do something that
might "take a long time".

I guess the real problem is the same one that led to lots of issues with the
jobserver: even if we check immediately before we  start to wait for child to
finish the signal could be received in the instant between when we check and
we drop into the wait, and we would have to wait for the child to exit before
we'd notice the fatal signal.  I'll have to think about that.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?51237>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Loading...