Status of build bots?

classic Classic list List threaded Threaded
38 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Status of build bots?

Carlos O'Donell-6
Community,

The buildbots look red across the board.

Do we know what's up with them?

http://glibc-buildbot.reserved-bit.com/waterfall

Do we have an "ownership" page on the wiki so I can
reach out and offer support in some way to the owners
of that hardware?

--
Cheers,
Carlos.
Reply | Threaded
Open this post in threaded view
|

Re: Status of build bots?

Mark Wielaard
Hi,

On Mon, 2019-08-19 at 16:49 -0400, Carlos O'Donell wrote:
> The buildbots look red across the board.
>
> Do we know what's up with them?
>
> http://glibc-buildbot.reserved-bit.com/waterfall
>
> Do we have an "ownership" page on the wiki so I can
> reach out and offer support in some way to the owners
> of that hardware?

The s390x worker had a crash yesterday and we lost some config.
It shouldn't be hard to put it back. But it isn't clear anybody is
actually looking at or checking the results (they have been red for
months).

The s390x worker is somewhat overloaded, so if the glibc project isn't
actually using the buildbot results we could opt for letting other
projects use the resources instead.

Cheers,

Mark
Reply | Threaded
Open this post in threaded view
|

Re: Status of build bots?

Carlos O'Donell-6
On 8/19/19 4:53 PM, Mark Wielaard wrote:

> Hi,
>
> On Mon, 2019-08-19 at 16:49 -0400, Carlos O'Donell wrote:
>> The buildbots look red across the board.
>>
>> Do we know what's up with them?
>>
>> http://glibc-buildbot.reserved-bit.com/waterfall
>>
>> Do we have an "ownership" page on the wiki so I can
>> reach out and offer support in some way to the owners
>> of that hardware?
>
> The s390x worker had a crash yesterday and we lost some config.
> It shouldn't be hard to put it back. But it isn't clear anybody is
> actually looking at or checking the results (they have been red for
> months).
>
> The s390x worker is somewhat overloaded, so if the glibc project isn't
> actually using the buildbot results we could opt for letting other
> projects use the resources instead.

We really need functioning build bots for all major targets:

* x86_64 / i686
* aarch64 / arm
* s390x / s390
* ppc32 / ppc64 / ppc64le

It would be great if we got s390x backup and running.

In the meantime we need to cleanup the results and move
from "purely informative" to "an active part of our process"
and assigning maintainership to the various build bots in
case one fails, along with turning on nag mails for the user
that broke the bot, and then using that to discuss fixes.

We should take that fundamental next step so that the build
bots are really useful at catching issues across distros and
machines (just like build-many-glibcs catches other kinds of
problems).

I've added a GNU Cauldron glibc BoF topic for this.

--
Cheers,
Carlos.
Reply | Threaded
Open this post in threaded view
|

Re: Status of build bots?

Siddhesh Poyarekar-8
On 20/08/19 2:28 AM, Carlos O'Donell wrote:

> We really need functioning build bots for all major targets:
>
> * x86_64 / i686
> * aarch64 / arm
> * s390x / s390
> * ppc32 / ppc64 / ppc64le
>
> It would be great if we got s390x backup and running.
>
> In the meantime we need to cleanup the results and move
> from "purely informative" to "an active part of our process"
> and assigning maintainership to the various build bots in
> case one fails, along with turning on nag mails for the user
> that broke the bot, and then using that to discuss fixes.
>
> We should take that fundamental next step so that the build
> bots are really useful at catching issues across distros and
> machines (just like build-many-glibcs catches other kinds of
> problems).

Agreed.  The way most projects handle this is to have not only
per-commit builds (and emails/messages whenever a build fails) but also
per patch submission build and test.  I suppose we can set up mail
notifications to libc-alpha right away but ideally we want to get to a
point of doing all of this within a gitlab/phabricator like framework so
that the list doesn't get too noisy.

Maybe I should dust up my changelog automation patchset again because
the ChangeLog format is an obvious hurdle to all this.  The last state
there was that of bewilderment from the gnulib community because I
wasn't able to clearly communicate to them the need for changelog
automation and how it is different from the changelog skeleton
generation scripts they already use.

Has anything changed since?  We still need changelog automation right?
Sorry I've been a bit out of the loop lately.

Siddhesh
Reply | Threaded
Open this post in threaded view
|

Re: Status of build bots?

Florian Weimer-5
In reply to this post by Carlos O'Donell-6
* Carlos O'Donell:

> We really need functioning build bots for all major targets:
>
> * x86_64 / i686
> * aarch64 / arm
> * s390x / s390
> * ppc32 / ppc64 / ppc64le
>
> It would be great if we got s390x backup and running.

You should qualify whether this is a community perspective or a Red Hat
perspective.

I have to admit that I have not been able to make any sense whatsoever
of the buildbot output.  Is this really something from which regular
glibc contributors derive value?  If not, why are we doing it?  Joseph's
build-only tester is much more useful to me, even though it provides so
little diagnostic output.

I also find the choice of architectures peculiar.  With the potential
exception of arm (which variant?), these are exactly those architectures
which are (comparatively) easy to get access to.  The regular
contributors either have them in-house, or can access Debian
porterboxes, the GCC compile farm, IBM's community resources (never used
those, admittedly), or the public Fedora machines.

Thanks,
Florian
Reply | Threaded
Open this post in threaded view
|

Re: Status of build bots?

Joseph Myers
On Tue, 20 Aug 2019, Florian Weimer wrote:

> I have to admit that I have not been able to make any sense whatsoever
> of the buildbot output.  Is this really something from which regular
> glibc contributors derive value?  If not, why are we doing it?  Joseph's
> build-only tester is much more useful to me, even though it provides so
> little diagnostic output.

I think the point of this discussion is to make it something from which we
can derive value.  Which I think means:

* The normal, expected state is clean, so failures indicate regressions.

* We can readily see exactly what regressions there are at any time, and
which are new versus old regressions.

* Results get reported to libc-testresults, and someone with access to
each bot monitors its results and raises issues on the mailing list / in
Bugzilla as necessary.  (That someone should probably also e.g. do routine
libm-test-ulps updates for new tests themselves, when something routine
like that allows them to restore results to clean status on a given
architecture.)

Other things, such as automatically informing the author of a patch
causing a regression without needing manual action to do that, or
providing a way for a contributor to have their uncommitted patch tested
on multiple systems, or covering lots more architectures, are nice to
have, but not necessary for the bots to be useful.

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Status of build bots?

Carlos O'Donell-6
In reply to this post by Florian Weimer-5
On 8/20/19 6:05 AM, Florian Weimer wrote:

> * Carlos O'Donell:
>
>> We really need functioning build bots for all major targets:
>>
>> * x86_64 / i686
>> * aarch64 / arm
>> * s390x / s390
>> * ppc32 / ppc64 / ppc64le
>>
>> It would be great if we got s390x backup and running.
>
> You should qualify whether this is a community perspective or a Red Hat
> perspective.

Let me clarify.

Firstly I don't want this list to imply any bias about which machines
are important or not important, they are simply a reflection of how easy
it is to get hardware and setup systems that can be used publicly for the
given hardware.

From a Red Hat perspective I care about:

* x86_64 / i686
* aarch64
* s390x
* ppc64le

From an upstream perspective I care about:

* x86_64 / i686 / x32
* aarch64 / armv7hf
* s390x / s390
* ppc32 / ppc64 / ppc64le
* mips (several variants)
* riscv (several variants)
* c-sky
* nios II

I have a wishlist which would be:

* alpha
* sparc
* ia64
* hppa
* m68k
* microblaze
* sh

That is *my* list, and it may not reflect the desires and priorities of
others in the community, and you should feel free to put your efforts
where you want them to go.

> I have to admit that I have not been able to make any sense whatsoever
> of the buildbot output.  Is this really something from which regular
> glibc contributors derive value?  If not, why are we doing it?  Joseph's
> build-only tester is much more useful to me, even though it provides so
> little diagnostic output.

We need to do 3 things today to make our situation better:

(a) Commit to ownership of the various build bots.
    - Machine maintainer contact for help with machine issues.
    - Admin contact to help with system issues.

(b) Commit to fixing machine-specific test failures.
    - Machine maintainers get on board to cleanup machine failures as xfails.

(c) Commit to fixing generic test failures.
    - Community starts committing xfails for tests that fail frequently.

With that we get a clean set of builds.

We also need to upgrade to newer buildbot master so we can get
newer UI with cleaner interfaces.

Just look at Sergio's gdb buildbot and how nice it is to see logs snippets etc.
https://gdb-buildbot.osci.io/#/

> I also find the choice of architectures peculiar.  With the potential
> exception of arm (which variant?), these are exactly those architectures
> which are (comparatively) easy to get access to.  The regular
> contributors either have them in-house, or can access Debian
> porterboxes, the GCC compile farm, IBM's community resources (never used
> those, admittedly), or the public Fedora machines.

Some boxes are much harder to setup, let alone give "developer access" for the
box on a public network.

We should walk before we run, and setting up boxes for all the "easy" machines
is something we should succeed at before asking the machine maintainers to
help with the "harder" machines.

To answer your earlier question about value.

We want to see the results of execution failures given a particular commit
and make sure we aren't regressing. We should be doing this for all architectures
which are easy to setup and run. And you should get an immediate notification via
email if you broke something, and you should have 2 points of contact for fixing
the issue (the admin and machine maintainer).

Immediately after that you want to have "developer access" to the environment to
duplicate the failure and fix the issue (something we haven't discussed here).
Getting "developer access" is a key part of the equation here if machine maintainers
want help maintaining their port. It isn't required if you have a responsive machine
maintainer that steps in quickly to fix any issues.

--
Cheers,
Carlos.
Reply | Threaded
Open this post in threaded view
|

Re: Status of build bots?

Joseph Myers
On Tue, 20 Aug 2019, Carlos O'Donell wrote:

> (b) Commit to fixing machine-specific test failures.
>     - Machine maintainers get on board to cleanup machine failures as xfails.

We also need to revisit previous decisions to let some tests fail.

Case in point: the known (and documented as such on the wiki) failures of

FAIL: resolv/tst-resolv-ai_idn
FAIL: resolv/tst-resolv-ai_idn-latin1

with libidn2 before version 2.0.5.

Marking as either UNSUPPORTED or XFAIL when using older libidn2 would be
reasonable (I suspect UNSUPPORTED is more practical, since the tests could
dlopen libidn2 and call idn2_check_version).  Leaving as FAIL is
unhelpful; it means anyone testing on Ubuntu 18.04, for example, needs to
know to disregard those failures.

Likewise anything else that fails because of some environmental issue that
it's reasonable to have present on a buildbot system - we need to find a
way to make such known issues result in something other than FAIL, so that
the expected results can be clean.

(The other, longstanding, failure I see for native x86_64 testing on
Ubuntu 18.04 is

FAIL: nss/tst-nss-files-hosts-long

 .)

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Status of build bots?

Jeff Law
In reply to this post by Carlos O'Donell-6
On 8/20/19 3:08 PM, Carlos O'Donell wrote:

> On 8/20/19 6:05 AM, Florian Weimer wrote:
>> * Carlos O'Donell:
>>
>>> We really need functioning build bots for all major targets:
>>>
>>> * x86_64 / i686
>>> * aarch64 / arm
>>> * s390x / s390
>>> * ppc32 / ppc64 / ppc64le
>>>
>>> It would be great if we got s390x backup and running.
>>
>> You should qualify whether this is a community perspective or a Red Hat
>> perspective.
>
> Let me clarify.
>
> Firstly I don't want this list to imply any bias about which machines
> are important or not important, they are simply a reflection of how easy
> it is to get hardware and setup systems that can be used publicly for the
> given hardware.
>
> From a Red Hat perspective I care about:
>
> * x86_64 / i686
> * aarch64
> * s390x
> * ppc64le
>
> From an upstream perspective I care about:
>
> * x86_64 / i686 / x32
> * aarch64 / armv7hf
> * s390x / s390
> * ppc32 / ppc64 / ppc64le
> * mips (several variants)
> * riscv (several variants)
> * c-sky
> * nios II
>
> I have a wishlist which would be:
>
> * alpha
> * sparc
> * ia64
> * hppa
> * m68k
> * microblaze
> * sh
Note that alpha, hppa, m68k and sh4 have solid enough qemu user mode
support that you can create a root filesystem for the target, chroot
into it and make it look entirely native.  That's how my tester handles
those (and a few other) targets.

My alpha & aarch64_be chroots even have dejagnu so I can run the GCC
testsuite.  Probably the hardest part is making sure all the build
dependencies are in the chroot.  glibc adding python as a hard
dependency was painful :(

When run on a beefy box qemu emulated is probably still faster then real
hardware for those targets.

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: Status of build bots?

Florian Weimer-5
In reply to this post by Joseph Myers
* Joseph Myers:

> Case in point: the known (and documented as such on the wiki) failures of
>
> FAIL: resolv/tst-resolv-ai_idn
> FAIL: resolv/tst-resolv-ai_idn-latin1
>
> with libidn2 before version 2.0.5.
>
> Marking as either UNSUPPORTED or XFAIL when using older libidn2 would be
> reasonable (I suspect UNSUPPORTED is more practical, since the tests could
> dlopen libidn2 and call idn2_check_version).  Leaving as FAIL is
> unhelpful; it means anyone testing on Ubuntu 18.04, for example, needs to
> know to disregard those failures.

I'll gladly review a patch for that.  I'm not sure what the system
configuration is exactly because I don't think that a supported Ubunutu
18.04 installation contains the prerequisites for building glibc, so I'm
not sure how to replicate this locally.

> (The other, longstanding, failure I see for native x86_64 testing on
> Ubuntu 18.04 is
>
> FAIL: nss/tst-nss-files-hosts-long
>
>  .)

I see this on Fedora 30 as well.  It is unclear why it happens.  There
is no support for debugging container tests today, so I haven't been
able to find the cause.

Thanks,
Florian
Reply | Threaded
Open this post in threaded view
|

Re: Status of build bots?

Carlos O'Donell-6
On 8/21/19 4:56 AM, Florian Weimer wrote:

> * Joseph Myers:
>
>> Case in point: the known (and documented as such on the wiki) failures of
>>
>> FAIL: resolv/tst-resolv-ai_idn
>> FAIL: resolv/tst-resolv-ai_idn-latin1
>>
>> with libidn2 before version 2.0.5.
>>
>> Marking as either UNSUPPORTED or XFAIL when using older libidn2 would be
>> reasonable (I suspect UNSUPPORTED is more practical, since the tests could
>> dlopen libidn2 and call idn2_check_version).  Leaving as FAIL is
>> unhelpful; it means anyone testing on Ubuntu 18.04, for example, needs to
>> know to disregard those failures.
>
> I'll gladly review a patch for that.  I'm not sure what the system
> configuration is exactly because I don't think that a supported Ubunutu
> 18.04 installation contains the prerequisites for building glibc, so I'm
> not sure how to replicate this locally.
>
>> (The other, longstanding, failure I see for native x86_64 testing on
>> Ubuntu 18.04 is
>>
>> FAIL: nss/tst-nss-files-hosts-long
>>
>>  .)
>
> I see this on Fedora 30 as well.  It is unclear why it happens.  There
> is no support for debugging container tests today, so I haven't been
> able to find the cause.

This doesn't reproduce for me. Do you have a box that I can reproduce this on?

uname -a
Linux athas 5.2.7-200.fc30.x86_64 #1 SMP Thu Aug 8 05:35:29 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

cat /home/carlos/build/glibc/nss/subdir-tests.sum | grep long
PASS: nss/tst-nss-files-hosts-long

--
Cheers,
Carlos.
Reply | Threaded
Open this post in threaded view
|

Re: Status of build bots?

Andreas Schwab
On Aug 21 2019, Carlos O'Donell <[hidden email]> wrote:

> PASS: nss/tst-nss-files-hosts-long

I don't see how that test is doing anything useful.  It depends on the
host name test4 to be resolvable, which is of course unlikely.

Andreas.

--
Andreas Schwab, SUSE Labs, [hidden email]
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
Reply | Threaded
Open this post in threaded view
|

Re: Status of build bots?

Florian Weimer-5
* Andreas Schwab:

> On Aug 21 2019, Carlos O'Donell <[hidden email]> wrote:
>
>> PASS: nss/tst-nss-files-hosts-long
>
> I don't see how that test is doing anything useful.  It depends on the
> host name test4 to be resolvable, which is of course unlikely.

See nss/tst-nss-files-hosts-long.root/etc/hosts.  It contains an entry
for test4.

Thanks,
Florian
Reply | Threaded
Open this post in threaded view
|

Re: Status of build bots?

Joseph Myers
In reply to this post by Florian Weimer-5
On Wed, 21 Aug 2019, Florian Weimer wrote:

> * Joseph Myers:
>
> > Case in point: the known (and documented as such on the wiki) failures of
> >
> > FAIL: resolv/tst-resolv-ai_idn
> > FAIL: resolv/tst-resolv-ai_idn-latin1
> >
> > with libidn2 before version 2.0.5.
> >
> > Marking as either UNSUPPORTED or XFAIL when using older libidn2 would be
> > reasonable (I suspect UNSUPPORTED is more practical, since the tests could
> > dlopen libidn2 and call idn2_check_version).  Leaving as FAIL is
> > unhelpful; it means anyone testing on Ubuntu 18.04, for example, needs to
> > know to disregard those failures.
>
> I'll gladly review a patch for that.  I'm not sure what the system
> configuration is exactly because I don't think that a supported Ubunutu
> 18.04 installation contains the prerequisites for building glibc, so I'm
> not sure how to replicate this locally.

The system compiler is GCC 7.4, with binutils 2.30, which should be fine
for building glibc (though I'm using locally built GCC / binutils).  The
only special thing I'm aware of being needed is unsetting LD_PRELOAD which
is otherwise set by default (pointing to libgtk3-nocsd.so.0).

Here is a patch (testing on a system with libidn2 2.0.5 or later advised).


Mark IDN tests unsupported with libidn2 before 2.0.5.

When using a system (e.g. Ubuntu 18.04) with libidn2 2.0.4 or earlier,
test results include:

FAIL: resolv/tst-resolv-ai_idn
FAIL: resolv/tst-resolv-ai_idn-latin1

It was previously stated
<https://sourceware.org/ml/libc-alpha/2018-05/msg00771.html> that "It
should fail to indicate you have bugs in your system libidn.".
However, the glibc testsuite should be indicating whether there are
bugs in glibc, not whether there are bugs in other system pieces - so
unless you consider it a glibc bug that it fails to work around the
libidn issues, these FAILs are not helpful.  And as a general
principle, it's best for the expected glibc test results to be clean,
with Bugzilla used to track known bugs in glibc itself, rather than
people needing to know about the expected FAILs to tell if there are
problems with their glibc build.  So, while there is an argument that
install.texi (not just the old NEWS entries for 2.28) should explain
the use of libidn2 and that 2.0.5 or later is recommended, test FAILs
are not the right way to indicate the presence of an old libidn2
version.

This patch accordingly makes those tests return UNSUPPORTED for older
libidn2 versions, just as they do when libidn2 isn't present at all.
As implied by that past discussion, it's possible this could result in
UNSUPPORTED for systems with older versions but whatever required
fixes backported so the tests previously passed, if there are any such
systems.

Tested for x86_64 on Ubuntu 18.04, including verifying that putting an
earlier version in place of 2.0.5 results in the tests FAILing whereas
using 2.0.5 as in the patch results in UNSUPPORTED.  I have not tested
on a system using 2.0.5 or later.

2019-08-21  Joseph Myers  <[hidden email]>

        * resolv/tst-resolv-ai_idn-latin1.c (do_test): Mark test
        unsupported with libidn2 before 2.0.5.
        * resolv/tst-resolv-ai_idn.c (do_test): Likewise.

diff --git a/resolv/tst-resolv-ai_idn-latin1.c b/resolv/tst-resolv-ai_idn-latin1.c
index 4a6bf5623c..5c515958c2 100644
--- a/resolv/tst-resolv-ai_idn-latin1.c
+++ b/resolv/tst-resolv-ai_idn-latin1.c
@@ -29,6 +29,11 @@ do_test (void)
   void *handle = dlopen (LIBIDN2_SONAME, RTLD_LAZY);
   if (handle == NULL)
     FAIL_UNSUPPORTED ("libidn2 not installed");
+  void *check_ver_sym = xdlsym (handle, "idn2_check_version");
+  const char *check_res
+    = ((const char *(*) (const char *)) check_ver_sym) ("2.0.5");
+  if (check_res == NULL)
+    FAIL_UNSUPPORTED ("libidn2 too old");
 
   if (setlocale (LC_CTYPE, "en_US.ISO-8859-1") == NULL)
     FAIL_EXIT1 ("setlocale: %m");
diff --git a/resolv/tst-resolv-ai_idn.c b/resolv/tst-resolv-ai_idn.c
index 493d1c7741..046842769a 100644
--- a/resolv/tst-resolv-ai_idn.c
+++ b/resolv/tst-resolv-ai_idn.c
@@ -28,6 +28,11 @@ do_test (void)
   void *handle = dlopen (LIBIDN2_SONAME, RTLD_LAZY);
   if (handle == NULL)
     FAIL_UNSUPPORTED ("libidn2 not installed");
+  void *check_ver_sym = xdlsym (handle, "idn2_check_version");
+  const char *check_res
+    = ((const char *(*) (const char *)) check_ver_sym) ("2.0.5");
+  if (check_res == NULL)
+    FAIL_UNSUPPORTED ("libidn2 too old");
 
   if (setlocale (LC_CTYPE, "en_US.UTF-8") == NULL)
     FAIL_EXIT1 ("setlocale: %m");

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Status of build bots?

Florian Weimer-5
* Joseph Myers:

> 2019-08-21  Joseph Myers  <[hidden email]>
>
> * resolv/tst-resolv-ai_idn-latin1.c (do_test): Mark test
> unsupported with libidn2 before 2.0.5.
> * resolv/tst-resolv-ai_idn.c (do_test): Likewise.

This patch looks reasonable to me.  The test still runs on Fedora 30,
with libidn 2.2.0.

Thanks,
Florian
Reply | Threaded
Open this post in threaded view
|

Re: Status of build bots?

Joseph Myers
On Wed, 21 Aug 2019, Florian Weimer wrote:

> * Joseph Myers:
>
> > 2019-08-21  Joseph Myers  <[hidden email]>
> >
> > * resolv/tst-resolv-ai_idn-latin1.c (do_test): Mark test
> > unsupported with libidn2 before 2.0.5.
> > * resolv/tst-resolv-ai_idn.c (do_test): Likewise.
>
> This patch looks reasonable to me.  The test still runs on Fedora 30,
> with libidn 2.2.0.

Carlos, any comments, since you previously said these tests should fail
with older libidn2?

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Status of build bots?

Stefan Liebler-2
In reply to this post by Carlos O'Donell-6
On 8/20/19 11:08 PM, Carlos O'Donell wrote:
> Just look at Sergio's gdb buildbot and how nice it is to see logs snippets etc.
> https://gdb-buildbot.osci.io/#/
>
I've just had a look to the gdb buildbot.
You can see that the builds were green (all fine) or red (fails).
But compared to glibc buildbot, it's much easier and faster to see which
tests have failed (see regressions in the build summary).

We could also distinguish somehow between red (fails) and red* (fails,
but further fails appeared or are now passing compared to last build)

Perhaps we could also dump the out files of all failing tests.
At least sometimes this would help.

Reply | Threaded
Open this post in threaded view
|

Re: Status of build bots?

Stefan Liebler-2
In reply to this post by Mark Wielaard
On 8/19/19 10:53 PM, Mark Wielaard wrote:

> Hi,
>
> On Mon, 2019-08-19 at 16:49 -0400, Carlos O'Donell wrote:
>> The buildbots look red across the board.
>>
>> Do we know what's up with them?
>>
>> http://glibc-buildbot.reserved-bit.com/waterfall
>>
>> Do we have an "ownership" page on the wiki so I can
>> reach out and offer support in some way to the owners
>> of that hardware?
>
> The s390x worker had a crash yesterday and we lost some config.
> It shouldn't be hard to put it back. But it isn't clear anybody is
> actually looking at or checking the results (they have been red for
> months).
>
> The s390x worker is somewhat overloaded, so if the glibc project isn't
> actually using the buildbot results we could opt for letting other
> projects use the resources instead.
>
> Cheers,
>
> Mark
>
Hi Mark,

I've just had a look into the logs of some of the last builds of s390x
buildbot (before it was down). There I could always see: "No space left
on device".

Bye
Stefan

Reply | Threaded
Open this post in threaded view
|

Re: Status of build bots?

Mark Wielaard
On Thu, Aug 22, 2019 at 09:35:36AM +0200, Stefan Liebler wrote:
> I've just had a look into the logs of some of the last builds of s390x
> buildbot (before it was down). There I could always see: "No space left on
> device".

It was actually a file sytem corruption, that showed up as "no space
left".  The problem is that for other projects using this machine for
their buildbot it was quickly noticed and reported, but for glibc
nobody seemed to be monitoring the results.

Also before that file system issue the build also didn't work.  And
all other buildbot workers also look red (for months).  The problem is
that there is no known "good set" of test results and since there are
always build steps or tests failing you cannot determine if something
is a bad regression or "just" a (known?) failing testcase.

I think before we resurrect the buildbot workers (for whichever
architecture), we should see if we can define a minumum build and test
setup that should always PASS and make the buildbot sent warnings if
that changes. And make sure that someone monitors the results. And/Or
that making a buildbot worker turn red is a reason for reverting a
commit.

Cheers,

Mark
Reply | Threaded
Open this post in threaded view
|

Re: Status of build bots?

Szabolcs Nagy-2
In reply to this post by Carlos O'Donell-6
On 19/08/2019 21:49, Carlos O'Donell wrote:

> Community,
>
> The buildbots look red across the board.
>
> Do we know what's up with them?
>
> http://glibc-buildbot.reserved-bit.com/waterfall
>
> Do we have an "ownership" page on the wiki so I can
> reach out and offer support in some way to the owners
> of that hardware?
>

i do look at the aarch64 and armhf build bots, but i
was away on holiday for a week.

(aarch64 buildbot is supposed to be green, except for
occasional FAIL: malloc/tst-malloc-thread-exit, same as
https://sourceware.org/bugzilla/show_bug.cgi?id=24537
i might move that test to xtest too. armhf unfortunately
suffers from an arm64 kernel bug that applies aarch64
signal stack limits to aarch32 processes, it should be
fixed in new kernels but i cannot update that machine)

now i see

FAIL: elf/tst-dlopen-aout
FAIL: elf/tst-dlopen-aout-container

$ elf/ld-linux-aarch64.so.1 --library-path nptl:dlfcn:. elf/tst-dlopen-aout
error: tst-dlopen-aout.c:48: dlopen succeeded unexpectedly: elf/tst-dlopen-aout
error: 1 test failures

since

Change #4160
Category None
Changed by Florian Weimer <[hidden email]>
Changed at Thu 15 Aug 2019 16:53:32
Repository git://sourceware.org/git/glibc.git
Branch master
Revision 23d2e5faf0bca6d9b31bef4aa162b95ee64cbfc6
Comments

elf: Self-dlopen failure with explict loader invocation [BZ #24900]
In case of an explicit loader invocation, ld.so essentially performs
a dlopen call to load the main executable.  Since the pathname of
the executable is known at this point, it gets stored in the link
map.  In regular mode, the pathname is not known and "" is used
instead.

As a result, if a program calls dlopen on the pathname of the main
program, the dlopen call succeeds and returns a handle for the main
map.  This results in an unnecessary difference between glibc
testing (without --enable-hardcoded-path-in-tests) and production
usage.

This commit discards the names when building the link map in
_dl_new_object for the main executable, but it still determines
the origin at this point in case of an explict loader invocation.
The reason is that the specified pathname has to be used; the kernel
has a different notion of the main executable.

12