[PATCH] Test status before h_errno in gaih_inet

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] Test status before h_errno in gaih_inet

Stan Shebs-5
Per feedback on my previous attempt to fix getaddrinfo failures
after recovering from a failure:

https://sourceware.org/ml/libc-alpha/2016-07/msg00341.html

here is a simpler patch that checks status before checking
h_errno.

The patch basically adds back a test that was previously
present.  I am not 100% convinced that it catches all cases
of NSS status being set in a way that ought to be reported
as an error coming from getaddrinfo, but I can't find
any actual examples.

2016-07-25  Stan Shebs  <[hidden email]>

        * sysdeps/posix/getaddrinfo.c (gaih_inet): Test status before
        looking at h_errno.
        * posix/tst-getaddrinfo6.c: New test.
        * posix/Makefile (tests): Add tst-getaddrinfo6.

gai-patch-2 (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Test status before h_errno in gaih_inet

Florian Weimer-5
On 07/25/2016 06:50 PM, Stan Shebs wrote:

> Per feedback on my previous attempt to fix getaddrinfo failures
> after recovering from a failure:
>
> https://sourceware.org/ml/libc-alpha/2016-07/msg00341.html
>
> here is a simpler patch that checks status before checking
> h_errno.
>
> The patch basically adds back a test that was previously
> present.  I am not 100% convinced that it catches all cases
> of NSS status being set in a way that ought to be reported
> as an error coming from getaddrinfo, but I can't find
> any actual examples.
>
> 2016-07-25  Stan Shebs  <[hidden email]>
>
>         * sysdeps/posix/getaddrinfo.c (gaih_inet): Test status before
>         looking at h_errno.
>         * posix/tst-getaddrinfo6.c: New test.
>         * posix/Makefile (tests): Add tst-getaddrinfo6.

It's unclear whether the test case is intended to run against nss_files,
nss_dns, or both.  You should call __nss_configure_lookup to make your
selection explicit.  This also avoids accidentally running against a
system nscd daemon.

I'm still trying to figure out what the expected error reporting
behavior for functions returning enum nss_status is.  There seems to be
some expectation that enum nss_status != NSS_STATUS_SUCCESS implies that
h_errno is valid, but this code in getaddrinfo.c itself contradicts that:

       status = NSS_STATUS_UNAVAIL;
       /* Could not load any of the lookup functions.  Indicate
          an internal error if the failure was due to a system
         error other than the file not being found.  We use the
         errno from the last failed callback.  */
       if (errno != 0 && errno != ENOENT)
        __set_h_errno (NETDB_INTERNAL);

It would have to set h_errno unconditionally in order to preserve the
invariant.

The _nss_files_gethostbyname3_r implementation in nss_files calls
internal_setent, but does not update *herrnop for status !=
NSS_STATUS_SUCCESS.  This is in contrast to _nss_files_gethostbyname4_r,
which does.

I need to dig further and write up what I find, but I suspect that we
may have to set h_errno to 0 temporarily to obtain maximum compatibility
with existing NSS modules, and base the error check on that.

Thanks,
Florian
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Test status before h_errno in gaih_inet

Siddhesh Poyarekar-9
In reply to this post by Stan Shebs-5


On Monday 25 July 2016 10:20 PM, Stan Shebs wrote:
> The patch basically adds back a test that was previously
> present.  I am not 100% convinced that it catches all cases
> of NSS status being set in a way that ought to be reported
> as an error coming from getaddrinfo, but I can't find
> any actual examples.
When I had changed this condition some time ago, my understanding was
that there are only two cases where h_errno is set to NETDB_INTERNAL:
when status is NSS_STATUS_UNAVAILABLE and when NSS_STATUS_TRYAGAIN.  I
had dropped the status check as unnecessary (in multiple places IIRC)
because the NSS_STATUS_TRYAGAIN checks earlier was exhaustive.  However,
I did not take into consideration the fact that h_errno will persist
across calls and hence removing that check was wrong.

Based on that, I agree with your fix is correct but it is incomplete.
There need to be similar checks for every place that test for h_errno ==
NETDB_INTERNAL, like in the gethosts macro and elsewhere in gaih_inet.
IIRC gethosts is exercised for AF_INET, so you might be able to verify
that with your test case.

Siddhesh
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Test status before h_errno in gaih_inet

Siddhesh Poyarekar-8
On Monday 29 August 2016 04:05 PM, Siddhesh Poyarekar wrote:
> Based on that, I agree with your fix is correct but it is incomplete.

Ugh, let me try that one again:

    Based on that, I agree with your fix but it is incomplete.

Siddhesh
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Test status before h_errno in gaih_inet

Florian Weimer-5
In reply to this post by Florian Weimer-5
On 08/19/2016 06:01 PM, Florian Weimer wrote:
> I need to dig further and write up what I find, but I suspect that we
> may have to set h_errno to 0 temporarily to obtain maximum compatibility
> with existing NSS modules, and base the error check on that.

_nss_files_gethostbyname3_r sets *herrnop on success to a non-zero
value, so that does not work.

Florian
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Test status before h_errno in gaih_inet

Florian Weimer-5
In reply to this post by Stan Shebs-5
On 07/25/2016 06:50 PM, Stan Shebs wrote:
>         * posix/tst-getaddrinfo6.c: New test.

The test does not reproduce the issue for me, with Fedora 24 (glibc
2.23-based).

How confident are you about your root cause analysis?

I see a completely different bug in the NSS framework: If a call into
NSS triggers dlopen, and that dlopen fails with a system error, we cache
the failure permanently and never attempt to open the service module again.

This issue could well lead to persistent getaddrinfo failures because
glibc assumes none of the configured NSS modules are available.

Could this match what you experienced?

Thanks,
Florian