[bug #46193] Discussion of system crash behaviours

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[bug #46193] Discussion of system crash behaviours

Bogdan Barbu
URL:
  <http://savannah.gnu.org/bugs/?46193>

                 Summary: Discussion of system crash behaviours
                 Project: make
            Submitted by: jiangyy
            Submitted on: Tue 13 Oct 2015 02:25:56 AM GMT
                Severity: 3 - Normal
              Item Group: Documentation
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
       Component Version: None
        Operating System: Any
           Fixed Release: None
           Triage Status: None

    _______________________________________________________

Details:

I am currently working on the file system reliability issues. I have a disk
driver that is able to simulate crash disk sites after injected power failures
(inspired by two OSDI'14 papers about crash sites, and they found interesting
bugs in many production systems like database). This disk is compatible with
the Linux block driver semantics (refer to
https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt),
and may create many crash sites that pending blocks are partially flushed into
the disk.

Our tool finds that a typical compiler (e.g., gcc) may suffer the issue of
crash inconsistency. Specifically, there is a chance that for the binary
output file (e.g., a .o file):

1. its timestamp is updated and gmake considers this file is up-to-date.
2. its actual data is not persisted to the disk.

On an ext4 filesystem (default setting) of a typical Linux distribution, we
observed that there is a chance of leaving a 0-byte output file whose
timestamp is updated. In more relaxed settings (e.g., old-time filesystems), a
system crash would leave partially corrupted file in the filesystem with
timestamp updated (e.g., several blocks are missing but with a correct
header).

Note that this is NOT a defect for gcc or gmake as they have nothing to do
with the crash semantics. However, if the user continues the incremental build
after system crash, the entire thing would proceed, gmake will consider the
generated .o file is up-to-date and proceed into the next stages, finally
leading to incorrect outputs.

Though it is not a software defect, and is expected to be very rarely in
practice. Neverthless, gmake is supposed to be general and to run on any
platform. I am wondering if we should make users aware of this phenomenon
(e.g., adding a small section in the document to discuss this issue).




    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?46193>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|

[bug #46193] Discussion of system crash behaviours

Bogdan Barbu
Follow-up Comment #1, bug #46193 (project make):

Fail-stops aren't limited to crashes.  Ctrl-Z and network splits sometimes
stop make from removing a half-written target.  At this shop, we aim to write
rules that only write into temporary files, renaming them to the desired name
once writing has completed successfully.  This also saves us from persistent
breakage when multiple builds accidentally contend.  Perhaps someone has a
write-up linked to from http://make.mad-scientist.net/#resources of how to do
this without duplication.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?46193>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|

[bug #46193] Discussion of system crash behaviours

Bogdan Barbu
Follow-up Comment #2, bug #46193 (project make):

Please see also bug 46242: make can leave behind out of date files even if
nothing crashes.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?46193>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/


_______________________________________________
Bug-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/bug-make
Reply | Threaded
Open this post in threaded view
|

[bug #46193] Discussion of system crash behaviours

Bogdan Barbu
Update of bug #46193 (project make):

                  Status:                    None => Fixed                  
             Assigned to:                    None => psmith                
             Open/Closed:                    Open => Closed                
           Fixed Release:                    None => SCM                    

    _______________________________________________________

Follow-up Comment #3:

I added some information to the section "Interrupting or Killing make"
discussing ways to write recipes to defend against these types of extreme
errors.

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?46193>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/