gmake + unexec = fast!

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

gmake + unexec = fast!

Kaz Kylheku
Hi All,

Over ten years ago, I was working on a project that had
a large tree built with non-recursive makes: one master
make system including a large number of per-directory
include files.

On the hardware of the day, it took 30 seconds for GNU Make
3.80 to read the files and issue the first build command.
So for instance if I someone changed a .cpp somewhere in the
tree, it would take over half a minute after typing "make"
for the g++ command to be dispatched to rebuild it. It was awful!

I thought, why can't we just dump an image of the make process after the
rules are read. Then re-start the image, and evaluate the goals?

So then I took the "unexec" code from GNU Emacs, transplanted it into
GNU Make and hooked it to a "--dump" option.

Now, "make --dump" would produce an executable image called "remake",
that requiring over half a minute to produce.

Then running "./remake" would almost instantly kick off the
incremental build.

Would there be any interest in this?



_______________________________________________
Help-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-make
Reply | Threaded
Open this post in threaded view
|

Re: gmake + unexec = fast!

Stefan Monnier
> On the hardware of the day, it took 30 seconds for GNU Make
> 3.80 to read the files and issue the first build command.

Do you have some idea of how those 30s were spent?

> So then I took the "unexec" code from GNU Emacs, transplanted it into
> GNU Make and hooked it to a "--dump" option.
> Now, "make --dump" would produce an executable image called "remake",
> that requiring over half a minute to produce.
> Then running "./remake" would almost instantly kick off the
> incremental build.

So the 30s were spent just reading the makefiles rather then looking at
the relative state of all the relevant files to figure out what needs to
be rebuilt?

That seems like a really long time just to read makefiles.  I wonder
what took so long.  Were there lots of $(<function> <args>) calls in
there, maybe?  If so, are we sure it's correct to precompute them and
stash the result in the dump?


        Stefan


_______________________________________________
Help-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-make
Reply | Threaded
Open this post in threaded view
|

Re: gmake + unexec = fast!

Kaz Kylheku (gmake)
On 2019-03-22 12:07, Stefan Monnier wrote:
>> On the hardware of the day, it took 30 seconds for GNU Make
>> 3.80 to read the files and issue the first build command.
>
> Do you have some idea of how those 30s were spent?

No detailed breakdown; just reading per-directory make include files,
and performing their variable assignments and whatever not.

I've given this proposal some thought since my initial posting.
I think that because GNU Make doesn't have such a huge proliferation
of data structures, it might be feasible to actually perform a
dump by walking and serializing those structures. Perhaps even into
a text representation, but one that is "lightning fast" to parse.
It takes more work, but is totally portable; no voodoo with creating
an executable out of the memory maps.

It creates maintenance risks, though: someone adds a new structure
member here and there, but forgets to update the
serialization/deserialization code to include that member.
That member is important to build correctness in some way and so
restarted dump builds break subtly.

>> So then I took the "unexec" code from GNU Emacs, transplanted it into
>> GNU Make and hooked it to a "--dump" option.
>> Now, "make --dump" would produce an executable image called "remake",
>> that requiring over half a minute to produce.
>> Then running "./remake" would almost instantly kick off the
>> incremental build.
>
> So the 30s were spent just reading the makefiles rather then looking at
> the relative state of all the relevant files to figure out what needs
> to
> be rebuilt?

Yes; but that looking at the state of the files was actually very
fast (which was shown by running the unexec-ed image).
I seem to recall it was no more than a second or two for make to
sweep through the tree to check the modification timestamps and
kick off the rebuild of the modified file.

> That seems like a really long time just to read makefiles.

Well, computer science places no upper bound on that; it's proportional
to how much makefile material have! Give me 30 seconds and a specific
machine, and I will prepare makefiles that fill that 30 seconds. :)

This tree really had a lots of little makefiles. The software had
a huge number of classes, generated by code generation tools from
compact specifications.

This stuff was built into shared libraries.  We ran into issues like
the global offset table on MIPS overflowing, and having to break up
a shared library into multiple smaller ones just for that.

Each C++ class introduces numerous global symbols: foo::this, foo::that.
Constructor, destructor, various boiler-plate functions. Before you know
it, your global table is full.

> I wonder
> what took so long.  Were there lots of $(<function> <args>) calls in
> there, maybe?

Possibly.

> If so, are we sure it's correct to precompute them and
> stash the result in the dump?

No, it's not always correct to precompute things into the dump.
Sometimes you have to refresh the dump.

Suppose a build time or counter stamp is captured at makefile-read-time
and stored in a variable. Then every build kicked off with the saved
image will have the same time stamp.

Text processing that just identifies objects: builds paths, rules
and whatnot, is safe to cache. Environmental stuff, not always so
correct. A cached host name is fine; time stamp, maybe not.

It's exactly like running a Lisp image, versus loading .fasl files,
versus loading .lisp files.  If something is captured at
macro-expansion-time, it changes every time we load the .lisp, but
not the .fasl. A load-time-value expression (or any top-level form)
gets re-executed when we load the .fasl, but not if we restart
a dumped image.

We have to know what the caveats are and choose wisely.

With this feature, it would be documented that moving dumped images
between environments, or using them for serious production builds,
is not advised: it's strictly a gadget for improving the
debug-compile-run
cycle for a developer working on non-earth-shattering changes to the
codebase.

_______________________________________________
Help-make mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-make