nss_db: protect against empty mappings

classic Classic list List threaded Threaded
44 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv7] nss_db: protect against empty mappings

Florian Weimer-5
* DJ Delorie:

> +  rest = "/makedb -o /var/db/passwd.db /var/db/passwd.in";
> +  cmd = (char *) xmalloc (strlen (support_bindir_prefix)
> + + strlen (rest) + 1);
> +  strcpy (cmd, support_bindir_prefix);
> +  strcat (cmd, rest);
> +
> +  system (cmd);
> +
> +  try_it ();
> +  try_it ();

You could use xasprintf and free the pointer.

Thanks,
Florian
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv7] nss_db: protect against empty mappings

Florian Weimer-5
* Florian Weimer:

> * DJ Delorie:
>
>> +  rest = "/makedb -o /var/db/passwd.db /var/db/passwd.in";
>> +  cmd = (char *) xmalloc (strlen (support_bindir_prefix)
>> + + strlen (rest) + 1);
>> +  strcpy (cmd, support_bindir_prefix);
>> +  strcat (cmd, rest);
>> +
>> +  system (cmd);
>> +
>> +  try_it ();
>> +  try_it ();
>
> You could use xasprintf and free the pointer.

Please also post the planned commit message.  Thanks.

Florian
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv7] nss_db: protect against empty mappings

DJ Delorie-2

Florian Weimer <[hidden email]> writes:
> Please also post the planned commit message.  Thanks.

I've been doing that in every patch, as $subject and comment text,
although my current $subject in git is:

    nss_db: fix endent wrt NULL mappings [BZ #24695] [BZ #24696]
   
nss_db allows for getpwent et al to be called without a set*ent,
but it only works once.  After the last get*ent a set*ent is
required to restart, because the end*ent did not properly reset
the module.  Resetting it to NULL allows for a proper restart.

If the database doesn't exist, however, end*ent erroniously called
munmap which set errno.

The test case runs "makedb" inside the testroot, so needs selinux
DSOs installed.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv7] nss_db: protect against empty mappings

Florian Weimer-5
* DJ Delorie:

> Florian Weimer <[hidden email]> writes:
>> Please also post the planned commit message.  Thanks.
>
> I've been doing that in every patch, as $subject and comment text,
> although my current $subject in git is:
>
>     nss_db: fix endent wrt NULL mappings [BZ #24695] [BZ #24696]

Thanks, that's what I wanted to see.

> The test case runs "makedb" inside the testroot, so needs selinux
> DSOs installed.

If glibc has been built with SELinux support, I presume.

Thanks,
Florian
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv7] nss_db: protect against empty mappings

DJ Delorie-2
Florian Weimer <[hidden email]> writes:
>> The test case runs "makedb" inside the testroot, so needs selinux
>> DSOs installed.
>
> If glibc has been built with SELinux support, I presume.

Yup, I used the same checks and makefile fragments as makedb used, so
they should match regardless.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv8] nss_db: protect against empty mappings

DJ Delorie-2
In reply to this post by Florian Weimer-5

Florian Weimer <[hidden email]> writes:
>> You could use xasprintf and free the pointer.
>
> Please also post the planned commit message.  Thanks.

latest, with xasprintf, and full commit blob posted...

From e465ac6a0cf322c72028d54cc6eb534bb035bd0d Mon Sep 17 00:00:00 2001
From: DJ Delorie <[hidden email]>
Date: Fri, 28 Jun 2019 18:30:00 -0500
Subject: nss_db: fix endent wrt NULL mappings [BZ #24695] [BZ #24696]

nss_db allows for getpwent et al to be called without a set*ent,
but it only works once.  After the last get*ent a set*ent is
required to restart, because the end*ent did not properly reset
the module.  Resetting it to NULL allows for a proper restart.

If the database doesn't exist, however, end*ent erroniously called
munmap which set errno.

The test case runs "makedb" inside the testroot, so needs selinux
DSOs installed.

diff --git a/ChangeLog b/ChangeLog
index aece032385..37be9a998a 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,19 @@
+2019-06-28  DJ Delorie  <[hidden email]>
+    Sergei Trofimovich <[hidden email]>
+
+ [BZ #24696]
+ [BZ #24695]
+ * nss/nss_db/db-open.c (internal_endent): Protect against NULL
+ mappings.
+ * nss/tst-nss-db-endgrent.c: New.
+ * nss/tst-nss-db-endgrent.root: New.
+ * nss/tst-nss-db-endpwent.c: New.
+ * nss/tst-nss-db-endpwent.root: New.
+ * nss/Makefile: Add new tests.
+ * support/links-dso-program-c.c: Add selinux dependency.
+ * support/links-dso-program.cc: Add selinux dependency.
+ * support/Makefile: Build those with -lselinux if enabled.
+
 2019-06-28  Wilco Dijkstra  <[hidden email]>
 
  * benchtests/bench-math-inlines.c: Increase iterations.
diff --git a/nss/Makefile b/nss/Makefile
index 95081bddc5..a15c3b7d90 100644
--- a/nss/Makefile
+++ b/nss/Makefile
@@ -61,7 +61,9 @@ xtests = bug-erange
 
 tests-container = \
   tst-nss-test3 \
-  tst-nss-files-hosts-long
+  tst-nss-files-hosts-long \
+  tst-nss-db-endpwent \
+  tst-nss-db-endgrent
 
 # Tests which need libdl
 ifeq (yes,$(build-shared))
diff --git a/nss/nss_db/db-open.c b/nss/nss_db/db-open.c
index 8a83d6b930..3fa11e9ab0 100644
--- a/nss/nss_db/db-open.c
+++ b/nss/nss_db/db-open.c
@@ -63,5 +63,9 @@ internal_setent (const char *file, struct nss_db_map *mapping)
 void
 internal_endent (struct nss_db_map *mapping)
 {
-  munmap (mapping->header, mapping->len);
+  if (mapping->header != NULL)
+    {
+      munmap (mapping->header, mapping->len);
+      mapping->header = NULL;
+    }
 }
diff --git a/nss/tst-nss-db-endgrent.c b/nss/tst-nss-db-endgrent.c
new file mode 100644
index 0000000000..367cc6c901
--- /dev/null
+++ b/nss/tst-nss-db-endgrent.c
@@ -0,0 +1,54 @@
+/* Test for endgrent changing errno for BZ #24696
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdlib.h>
+#include <sys/types.h>
+#include <grp.h>
+#include <unistd.h>
+#include <errno.h>
+
+#include <support/check.h>
+#include <support/support.h>
+
+/* The following test verifies that if the db NSS Service is initialized
+   with no database (getgrent), that a subsequent closure (endgrent) does
+   not set errno. In the case of the db service it is not an error to close
+   the service and so it should not set errno.  */
+
+static int
+do_test (void)
+{
+  /* Just make sure it's not there, although usually it won't be.  */
+  unlink ("/var/db/group.db");
+
+  /* This, in conjunction with the testroot's nsswitch.conf, causes
+     the nss_db module to be "connected" and initialized - but the
+     testroot has no group.db, so no mapping will be created.  */
+  getgrent ();
+
+  errno = 0;
+
+  /* Before the fix, this would call munmap (NULL) and set errno.  */
+  endgrent ();
+
+  if (errno != 0)
+    FAIL_EXIT1 ("endgrent set errno to %d\n", errno);
+
+  return 0;
+}
+#include <support/test-driver.c>
diff --git a/nss/tst-nss-db-endgrent.root/etc/nsswitch.conf b/nss/tst-nss-db-endgrent.root/etc/nsswitch.conf
new file mode 100644
index 0000000000..21471df94f
--- /dev/null
+++ b/nss/tst-nss-db-endgrent.root/etc/nsswitch.conf
@@ -0,0 +1 @@
+group : db files
diff --git a/nss/tst-nss-db-endpwent.c b/nss/tst-nss-db-endpwent.c
new file mode 100644
index 0000000000..cb85410b7c
--- /dev/null
+++ b/nss/tst-nss-db-endpwent.c
@@ -0,0 +1,66 @@
+/* Test for endpwent->getpwent crash for BZ #24695
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdlib.h>
+#include <string.h>
+#include <sys/types.h>
+#include <pwd.h>
+
+#include <support/support.h>
+#include <support/check.h>
+
+/* It is entirely allowed to start with a getpwent call without
+   resetting the state of the service via a call to setpwent.
+   You can also call getpwent more times than you have entries in
+   the service, and it should not fail.  This test iteratates the
+   database once, gets to the end, and then attempts a second
+   iteration to look for crashes.  */
+
+static void
+try_it (void)
+{
+  struct passwd *pw;
+
+  /* setpwent is intentionally omitted here.  The first call to
+     getpwent detects that it's first and initializes.  The second
+     time try_it is called, this "first call" was not detected before
+     the fix, and getpwent would crash.  */
+
+  while ((pw = getpwent ()) != NULL)
+    ;
+
+  /* We only care if this segfaults or not.  */
+  endpwent ();
+}
+
+static int
+do_test (void)
+{
+  char *cmd;
+
+  cmd = xasprintf ("%s/makedb -o /var/db/passwd.db /var/db/passwd.in",
+   support_bindir_prefix);
+  system (cmd);
+  free (cmd);
+
+  try_it ();
+  try_it ();
+
+  return 0;
+}
+#include <support/test-driver.c>
diff --git a/nss/tst-nss-db-endpwent.root/etc/nsswitch.conf b/nss/tst-nss-db-endpwent.root/etc/nsswitch.conf
new file mode 100644
index 0000000000..593ffc564a
--- /dev/null
+++ b/nss/tst-nss-db-endpwent.root/etc/nsswitch.conf
@@ -0,0 +1 @@
+passwd: db
diff --git a/nss/tst-nss-db-endpwent.root/var/db/passwd.in b/nss/tst-nss-db-endpwent.root/var/db/passwd.in
new file mode 100644
index 0000000000..98f39126ef
--- /dev/null
+++ b/nss/tst-nss-db-endpwent.root/var/db/passwd.in
@@ -0,0 +1,4 @@
+.root root:x:0:0:root:/root:/bin/bash
+=0 root:x:0:0:root:/root:/bin/bash
+.bin bin:x:1:1:bin:/bin:/sbin/nologin
+=1 bin:x:1:1:bin:/bin:/sbin/nologin
diff --git a/support/Makefile b/support/Makefile
index 56c1ed43bb..ab66913a02 100644
--- a/support/Makefile
+++ b/support/Makefile
@@ -191,6 +191,11 @@ LINKS_DSO_PROGRAM = links-dso-program
 LDLIBS-links-dso-program = -lstdc++ -lgcc -lgcc_s $(libunwind)
 endif
 
+ifeq (yes,$(have-selinux))
+LDLIBS-$(LINKS_DSO_PROGRAM) += -lselinux
+endif
+
+
 LDLIBS-test-container = $(libsupport)
 
 others += test-container
diff --git a/support/links-dso-program-c.c b/support/links-dso-program-c.c
index d28a28a0d0..5fcbab2c17 100644
--- a/support/links-dso-program-c.c
+++ b/support/links-dso-program-c.c
@@ -1,9 +1,26 @@
 #include <stdio.h>
 
+/* makedb needs selinux dso's.  */
+#ifdef HAVE_SELINUX
+# include <selinux/selinux.h>
+#endif
+
+/* The purpose of this file is to indicate to the build system which
+   shared objects need to be copied into the testroot, such as gcc or
+   selinux support libraries.  This program is never executed, only
+   scanned for dependencies on shared objects, so the code below may
+   seem weird - it's written to survive gcc optimization and force
+   such dependencies.
+*/
+
 int
 main (int argc, char **argv)
 {
   /* Complexity to keep gcc from optimizing this away.  */
   printf ("This is a test %s.\n", argc > 1 ? argv[1] : "null");
+#ifdef HAVE_SELINUX
+  /* This exists to force libselinux.so to be required.  */
+  printf ("selinux %d\n", is_selinux_enabled ());
+#endif
   return 0;
 }
diff --git a/support/links-dso-program.cc b/support/links-dso-program.cc
index dba6976c06..4bc2411086 100644
--- a/support/links-dso-program.cc
+++ b/support/links-dso-program.cc
@@ -1,11 +1,28 @@
 #include <iostream>
 
+/* makedb needs selinux dso's.  */
+#ifdef HAVE_SELINUX
+# include <selinux/selinux.h>
+#endif
+
 using namespace std;
 
+/* The purpose of this file is to indicate to the build system which
+   shared objects need to be copied into the testroot, such as gcc or
+   selinux support libraries.  This program is never executed, only
+   scanned for dependencies on shared objects, so the code below may
+   seem weird - it's written to survive gcc optimization and force
+   such dependencies.
+*/
+
 int
 main (int argc, char **argv)
 {
   /* Complexity to keep gcc from optimizing this away.  */
   cout << (argc > 1 ? argv[1] : "null");
+#ifdef HAVE_SELINUX
+  /* This exists to force libselinux.so to be required.  */
+  cout << "selinux " << is_selinux_enabled ();
+#endif
   return 0;
 }
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv8] nss_db: protect against empty mappings

DJ Delorie-2

DJ Delorie <[hidden email]> writes:
> latest, with xasprintf, and full commit blob posted...

Ping?

https://www.sourceware.org/ml/libc-alpha/2019-06/msg00957.html
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv8] nss_db: protect against empty mappings

Florian Weimer-5
In reply to this post by DJ Delorie-2
* DJ Delorie:

> diff --git a/ChangeLog b/ChangeLog
> index aece032385..37be9a998a 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,3 +1,19 @@
> +2019-06-28  DJ Delorie  <[hidden email]>
> +    Sergei Trofimovich <[hidden email]>
> +
> + [BZ #24696]
> + [BZ #24695]
> + * nss/nss_db/db-open.c (internal_endent): Protect against NULL
> + mappings.
> + * nss/tst-nss-db-endgrent.c: New.
> + * nss/tst-nss-db-endgrent.root: New.
> + * nss/tst-nss-db-endpwent.c: New.
> + * nss/tst-nss-db-endpwent.root: New.
> + * nss/Makefile: Add new tests.
> + * support/links-dso-program-c.c: Add selinux dependency.
> + * support/links-dso-program.cc: Add selinux dependency.
> + * support/Makefile: Build those with -lselinux if enabled.
> +

This version looks okay to me.  Thanks.

Florian
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv8] nss_db: protect against empty mappings

DJ Delorie-2

Thanks!  Pushed.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv8] nss_db: protect against empty mappings

Rafal Luzynski
10.07.2019 20:52 DJ Delorie <[hidden email]> wrote:
>
> Thanks!  Pushed.

Hello DJ,

This commit seems to cause failure.  After "make check"
nss/tst-nss-db-endgrent.out contains:

    error: tst-nss-db-endgrent.c:50: endgrent set errno to 22

    error: 1 test failures

Does it maybe still run the old version of the updated function?
The OS is Fedora 30, x86_64.

Regards,

Rafal
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv8] nss_db: protect against empty mappings

DJ Delorie-2

If you applied the patch and just ran "make" to rebuild the parts that
changed, it won't update the testroot - you need to either remove the
testroot.pristine directory, or at least remove the
testroot.pristine/install.stamp file.

And yes, this was an intentional choice - running "make install" every
single time would have been prohibitively expensive; see the
"test-in-container: Install locales into the test container" thread for
more discussion on test times.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv8] nss_db: protect against empty mappings

Rafal Luzynski
12.07.2019 06:21 DJ Delorie <[hidden email]> wrote:
>
> If you applied the patch and just ran "make" to rebuild the parts that
> changed, it won't update the testroot

Yes, that's exactly what I did.

> - you need to either remove the
> testroot.pristine directory, or at least remove the
> testroot.pristine/install.stamp file.
> [...]

Yes, after the full rebuild from scratch all tests pass correctly.
Thank you for explanation and sorry for the noise.

Regards,

Rafal
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv8] nss_db: protect against empty mappings

Carlos O'Donell-6
On 7/12/19 6:24 AM, Rafal Luzynski wrote:

> 12.07.2019 06:21 DJ Delorie <[hidden email]> wrote:
>>
>> If you applied the patch and just ran "make" to rebuild the parts that
>> changed, it won't update the testroot
>
> Yes, that's exactly what I did.
>
>> - you need to either remove the
>> testroot.pristine directory, or at least remove the
>> testroot.pristine/install.stamp file.
>> [...]
>
> Yes, after the full rebuild from scratch all tests pass correctly.
> Thank you for explanation and sorry for the noise.

Rafal,

If you think this is a bad developer experience, then please feel
free to voice your concerns.

We're not sure how to get the chroot to be rebuilt easily without
lots of additional cost.

DJ,

I wonder if we can't make the workflow different, an this is purely
a psychological thing:

(a) Make a "build done" stamp.
(b) Compare "build done" stamp with "chroot stamp" and if they are
     out of sync, all container tests *run* but have the final exit
     code overriden with fail with a new error code which is "OUT OF SYNC"
     You can still inspect the original failure code in the logs,
     but the container wrapper alters it on return.
(c) Developers see this and can ignore it, but at least the tests
     don't run with an erroneous chroot returning results.
(d) Developers must run "make update" to clear this problem, and
     the test summary script should indicate this when it see any
     "OUT OF SYNC" failures.

Thoughts?

--
Cheers,
Carlos.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv8] nss_db: protect against empty mappings

DJ Delorie-2
In reply to this post by Rafal Luzynski
Rafal Luzynski <[hidden email]> writes:
> Thank you for explanation and sorry for the noise.

As Carlos said, this isn't noise, this is feedback.  We knew "back then"
that this type of thing might happen, but that doesn't mean we can't
optimize, document, or otherwise tweak it to be less surprising to glibc
developers.

Input on your experience with the testroot system, and possible ways to
improve it, are always welcome :-)
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv8] nss_db: protect against empty mappings

DJ Delorie-2
In reply to this post by Carlos O'Donell-6
"Carlos O'Donell" <[hidden email]> writes:
> I wonder if we can't make the workflow different, an this is purely
> a psychological thing:
>
> (a) Make a "build done" stamp.

What build?  What targets?  There are a lot of installable bits in the
testroot that can be built independently.  Without a fully
dependency-driven install, it's very hard to detect that "something
changed" which would require a re-install.

Of course, we could check for libc.so itself being newer than
install.stamp, and maybe libm.so if it had containerized tests.

> (b) Compare "build done" stamp with "chroot stamp" and if they are

I suspect this could be centralized into the support routines that
handle exit codes; looking at "/install.stamp" and some other file.
Aside from the problem of reliably creating the "some other file", which
would have to be inside the container also.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv8] nss_db: protect against empty mappings

Carlos O'Donell-6
On 7/12/19 4:19 PM, DJ Delorie wrote:

> "Carlos O'Donell" <[hidden email]> writes:
>> I wonder if we can't make the workflow different, an this is purely
>> a psychological thing:
>>
>> (a) Make a "build done" stamp.
>
> What build?  What targets?  There are a lot of installable bits in the
> testroot that can be built independently.  Without a fully
> dependency-driven install, it's very hard to detect that "something
> changed" which would require a re-install.

I don't think we need a fully dependency-driven install.

We need:

* One stamp to be created when a top-level invoked 'make' finishes
   building anything. Sub-makes don't update this stamp so running
   them leaves the top-level 'make' stamp at the old time.

* One stamp to be created when the chroot is installed.

> Of course, we could check for libc.so itself being newer than
> install.stamp, and maybe libm.so if it had containerized tests.

Given Florian and Rafal's feedback I think we need to do something
to make this easier to use.

(1) Top-level make / Top-level make check.

* User runs 'make' at the top-level (not a subdir)
   - Top-level stamp is updated.
* User runs 'make check' at top-level (not a subdir)
   - Top-level stamp is mismatched to chroot stamp and so we
     reinstall the root.

(2) Top-level make / subdir make check.

* User runs 'make' at the top-level (not a subdir)
   - Top-level stamp is  updated.
* User runs 'make check' in a subdir
   - Top-level stamp is mismatched to chroot stamp but
     subdir make check doesn't reinstall chroot.
   - test-container throws a warning telling the user how
     to fix this, but ignores it, runs the test, and always
     returns a new error code.
   - Developer is *able* to run the subdir test with failure
     mode "root not in sync", but that might be fine for them.
     They can look at the original logs and original failures.
   - Developer can choose now to do a top-level make check to
     trigger root re-install, or convenience target 'make update-container'

Does that make sense?

>> (b) Compare "build done" stamp with "chroot stamp" and if they are
>
> I suspect this could be centralized into the support routines that
> handle exit codes; looking at "/install.stamp" and some other file.
> Aside from the problem of reliably creating the "some other file", which
> would have to be inside the container also.

I would check the stamps in the build and pass test-container a new
flag e.g. --out-of-date, if the stamps don't match, and let test-container
decide what to do.

This way the user must take a proactive step, the costly one, to bring
the container inline with the recent build. We have argued that doing this
automatically is what's costly, and many times useless if you're just
updating a test case over and over to get it working against a fixed glibc.

--
Cheers,
Carlos.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCHv8] nss_db: protect against empty mappings

DJ Delorie-2
"Carlos O'Donell" <[hidden email]> writes:
> I don't think we need a fully dependency-driven install.

My point was, it's difficult to tell if an arbitrary "make" invokation
warrants a re-install - doing that would imply a full set of
dependencies about what *is* installed, and at that point, we have a
dependency-driven install anyway.

But that's a distraction at this point ;-)

> We need:
>
> * One stamp to be created when a top-level invoked 'make' finishes
>    building anything. Sub-makes don't update this stamp so running
>    them leaves the top-level 'make' stamp at the old time.

It would be easier to do something at the start, using a := rule and
$(system) but GNU make might have more options by now.

> * One stamp to be created when the chroot is installed.

that would be testroot.pristine/install.stamp :-)

> * User runs 'make' at the top-level (not a subdir)
>    - Top-level stamp is updated.

Only "make" - no parameters?  I mean, the default target?  I suppose
that would be an obvious point to flag most cases.

> * User runs 'make check' at top-level (not a subdir)
>    - Top-level stamp is mismatched to chroot stamp and so we
>      reinstall the root.

Or just making the default target could remove
testroot.pristine/install.stamp, which would case "make check" to redo
the install anyway.

> (2) Top-level make / subdir make check.
>
> * User runs 'make' at the top-level (not a subdir)
>    - Top-level stamp is  updated.
> * User runs 'make check' in a subdir

The testroot install dependency is on the subdir checks, not the
toplevel check, anyway, so this isn't a different case as far as the
testroot is concerned.  I.e. the Makefile does this:

$(tests-container) $(addsuffix /tests,$(subdirs)) : \
                $(objpfx)testroot.pristine/install.stamp

>    - Top-level stamp is mismatched to chroot stamp but
>      subdir make check doesn't reinstall chroot.

Faster :-)

>    - test-container throws a warning telling the user how
>      to fix this, but ignores it, runs the test, and always
>      returns a new error code.
>    - Developer is *able* to run the subdir test with failure
>      mode "root not in sync", but that might be fine for them.
>      They can look at the original logs and original failures.
>    - Developer can choose now to do a top-level make check to
>      trigger root re-install, or convenience target 'make update-container'

I think this is dangerous for two reasons...

1. We're hiding testing-specific details inside test-container.c, which
   otherwise is only concerned with the containerization itself.

2. Any text other than the PASS/FAIL/etc lines should be assumed ignored
   by any script that parses the output, or by developers who
   auto-filter-out everything else because "it's never relevent".

I would suggest that the out-of-date flag be something that the support
code can detect, and replace any PASS/FAIL/etc with UNSYNC or something
that sticks out like a sore thumb.  Or perhaps append " - UNSYNC[*]" to
the end of that line, and adds a footnote elsewhere which the developer
can look for once the UNSYNC is noticed.

I don't know how easily it would be to robustly detect "I'm in a
testroot container" but if it looks for /install.stamp and /build.stamp
(i.e. in the root dir) that should suffice.

We could, however, have test-container.c set TESTROOT_CONTAINER=1 in the
child's environment, just for such detection.

The only other alternative I think would make sense is if
test-container.c detected an out-of-sync container, and just
hard-stopped.  I.e. don't run anything.  I don't know how this could be
made to show up in the PASS/FAIL text though, so that it's noticed.  I
worry that it would go unnoticed that a subset of tests just aren't
being run at all.

> I would check the stamps in the build and pass test-container a new
> flag e.g. --out-of-date, if the stamps don't match, and let test-container
> decide what to do.

test-container has nothing to do with the tests; it merely runs a given
binary and echos the exit code.  It's the runtime test harness that is
aware of what the tests mean.  The coordination needs to be between the
toplevel Makefile and support/test-driver.c et al.
Reply | Threaded
Open this post in threaded view
|

CI/CD in glibc (was: nss_db: protect against empty mappings)

Rafal Luzynski
In reply to this post by Carlos O'Donell-6
12.07.2019 13:36 Carlos O'Donell <[hidden email]> wrote:
> [...]
> Rafal,
>
> If you think this is a bad developer experience, then please feel
> free to voice your concerns.

This is little off-topic so I'm changing the subject of the
thread.

To some extent it is indeed a kind of bad experience.  In a perfect
world, every "make" run should rebuild all binaries whose source
code had changed and reuse those existing binaries from the previous
builds whose source code had not changed.  Usually this works fine
but sometimes does not.  In these rare cases "delete everything and
rebuild from scratch" is the correct answer.  I don't want to waste
my time and other developers' time to build a system where every
"make" works incrementally correctly because the aim is to ensure
that the rebuild from scratch works correctly because this is what
the distros are doing.  Incremental build is necessary only for our
small group of developers and, again, it works fine most of the time.

So maybe instead of focusing on incremental builds I should explain
why I perceive building as so important.  I always want to ensure
that my patches do not break glibc, including ensuring that they
don't break today (if yesterday everything was OK it does not mean
it is OK today because something might have changed upstream).
This is a simple and repetitive task which could be done automatically.

So the question is: can we have a CI/CD mechanism in glibc project
which would perform all builds and tests for every commit and raise
an alarm if anything goes wrong?  Can it be extended to verifying
patches when they are posted on this mailing list, before pushing
to the main repository?  Can it be part of sourceware.org, maybe
integrated with patchwork.sourceware.org?  Some big source code
management systems like GitHub or GitLab already include such
features, can we reuse them?  If not, can we have an unofficial
clone at GitHub (well, we already have one [1]) to do that task?

Such a mechanism would be useful to detect use cases like:
* something goes wrong but the problem is only in my machine
  because the online service confirms there is no problem;
* the code works fine in my machine and in all other developers'
  machines but fails in one exotic hardware architecture.

Regards,

Rafal

[1] https://github.com/bminor/glibc/
Reply | Threaded
Open this post in threaded view
|

Re: CI/CD in glibc

Florian Weimer-5
* Rafal Luzynski:

> Such a mechanism would be useful to detect use cases like:
> * something goes wrong but the problem is only in my machine
>   because the online service confirms there is no problem;
> * the code works fine in my machine and in all other developers'
>   machines but fails in one exotic hardware architecture.

We already have build and test bots, but I haven't been able to make any
sense of their output. 8-(

I would love to have a better contributor experience.  CI could be part
of that, but the kind of resources needed to give feedback in a
reasonable time frame (say, less than one hour) are difficult to come by
at this point.  (And we are not even talking about actually running
tests, just making sure that everything still builds.)

Some more notes about the build system:

Most of my development work involves short-circuiting the build system
for testing (make -j8 -O subdirs=… check).  I have to be really careful when I
do that because in some cases, it will corrupt your build tree, and then
I need to throw everything away and build from scratch again.  It's
considerably safer to do this instead:

  make -j8 -O && make -j8 -O subdirs=… check

But for me, it adds 12 seconds to every test tweak.  (For reference:
musl builds from scratch in less than 10 seconds on this hardware.)

What's worse, for me, the test-in-continer framework has made things
quite worse, but no one has been able to reproduce that.  Given things
are so brittle, I hesitate to encourage adoption of this style.  But not
being able to do something like that makes it really hard for new
contributors.

One experiment I'd like to do (and maybe someone wants to help with
that): Instrument CC and CXX invocations with something that captures
the current directory and the command lines during a regular build.
Afterwards, replay all the compiler invocations, in parallel.  (This may
need some manual tweaks for adding barriers, but perhaps not if we run
this test on an already-built tree.)  This should gives a number: How
much time does it take, at minimum, to build glibc, without make
overhead or artificial serialization.  It will tell us how inefficient
the current build system really is.  Is the make overhead for a
from-scratch build just those 12 seconds I mentioned above, or is it
much larger?

This should give us some guidance whether we need to focus on
from-scratch performance, or on making incremental builds accurate.

My feeling is that we need to tackle this first before we can offer a CI
workflow because even with remote CI, local builds will still be an
important part of the developer experience.  And if we can speed up
building significantly, we perhaps do not have to investigate ways to
run build-many-glibcs.py in a distributed fashion, on a cluster of
machines.

Thanks,
Florian
Reply | Threaded
Open this post in threaded view
|

Re: CI/CD in glibc

Carlos O'Donell-6
On 7/16/19 8:00 AM, Florian Weimer wrote:
> My feeling is that we need to tackle this first before we can offer a CI
> workflow because even with remote CI, local builds will still be an
> important part of the developer experience.  And if we can speed up
> building significantly, we perhaps do not have to investigate ways to
> run build-many-glibcs.py in a distributed fashion, on a cluster of
> machines.

I agree.

However, I think we can discuss gitlab/CI style setups and they will
benefit from any speedup we have in the build cycle.

I have an item for discussion at GNU Cauldron 2019 regarding this
kind of workflow.

--
Cheers,
Carlos.
123