On glibc's resolver

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

On glibc's resolver

Dimitrios Apostolou
Hello list,

I was trying to write a patch for glibc so hopefully this is the
appropriate list, please let me know otherwise.

I have been tracing weird behaviour of my mail client (alpine) and ended
up in getaddrinfo() calls, which are handled by glibc's resolver. In
particular, when I connect my laptop to different networks and the
previous DNS server is unreachable, resolver never re-reads its cache and
all queries timeout after several retries.

Apparently this is a known issues, and a web search reveals discussions
from as early as 2003. I'd appreciate your opinions, I was thinking of
writing a patch but I can't figure out where it should go, alpine or
glibc, code or documentation! Here are the replies I gathered from a web
search:

1) Use a caching daemon (nscd maybe, some argue that it does not provide a
solution) which should be restarted/reloaded when changing networks.

2) Call res_init() if getaddrinfo() fails.

3) Patch glibc to stat() /etc/resolv.conf, checking for changes. Debian,
Ubuntu are patched.

4) Use a custom DNS library, glibc is unsuitable for this purpose.


Here is my take. About nscd, I'm having the problem on a major distro
(Fedora) so I can only guess there are good reasons for not using it by
default.

On (2), res_init() is a BSD non-standard function, and its man page
doesn't mention such a purpose. In fact I can't be sure if it's safe to
call it multiple times and I see no guarantee that it will re-initialise
the resolver more than once. If it's the proposed way shouldn't it be
mentioned in both res_init() and getaddrinfo()'s man pages, or otherwise a
big warning that resolv.conf is never reparsed?

On (3) I don't have a Debian system to check it, but the overhead of
stat'ing on every request is probably unacceptable. I was thinking of
writing a patch that would stat() and reparse after a single request
timeout, so that following retries (unless RES_DFLRETRY is reached) will
automatically connect to the new servers. Would that be acceptable?

Finally using a custom library sounded logical, until I started reading
glibc's resolver. Really, with such size and complexity and even
asynchronous interface provided, shouldn't we also provide the simplest
facilities?


And a related question, is there a way to setup resolver behaviour
(timeout, retries) for a process programmatically, instead of changing the
system-wide resolv.conf?


Thank you in advance,
Dimitris

Reply | Threaded
Open this post in threaded view
|

Re: On glibc's resolver

Dimitrios Apostolou
On Wed, 26 Dec 2012, Dimitrios Apostolou wrote:
>
> I have been tracing weird behaviour of my mail client (alpine) and ended up
> in getaddrinfo() calls, which are handled by glibc's resolver. In particular,
> when I connect my laptop to different networks and the previous DNS server is
> unreachable, resolver never re-reads its cache and all queries timeout after

Sorry for not being clear, I meant never re-reads /etc/resolv.conf.

> several retries.
> [...]
Reply | Threaded
Open this post in threaded view
|

Re: On glibc's resolver

Carlos O'Donell-2
In reply to this post by Dimitrios Apostolou
On Tue, Dec 25, 2012 at 10:14 PM, Dimitrios Apostolou <[hidden email]> wrote:
> I was trying to write a patch for glibc so hopefully this is the appropriate
> list, please let me know otherwise.

Excellent question, and this is the right list.

> I have been tracing weird behaviour of my mail client (alpine) and ended up
> in getaddrinfo() calls, which are handled by glibc's resolver. In
> particular, when I connect my laptop to different networks and the previous
> DNS server is unreachable, resolver never re-reads its cache and all queries
> timeout after several retries.

What we need is a test case with expected and observed behaviour.
Given a test case we can justify or refute the expected or observed
behaviour against relevant standards or prior art.

> Apparently this is a known issues, and a web search reveals discussions from
> as early as 2003. I'd appreciate your opinions, I was thinking of writing a
> patch but I can't figure out where it should go, alpine or glibc, code or
> documentation! Here are the replies I gathered from a web search:

Could you please provide references to the prior discussions so we can
review them also?

> 1) Use a caching daemon (nscd maybe, some argue that it does not provide a
> solution) which should be restarted/reloaded when changing networks.
>
> 2) Call res_init() if getaddrinfo() fails.

These two solutions are interesting in that the
distribution/application is in control of when and why the resolver
should carry out the costly operation of reloading whatever data is
required to resolve a name.

The distribution can rehup nscd when the network is reconfigured. The
application can call res_init() as required (perhaps as documented by
new documentation).

> 3) Patch glibc to stat() /etc/resolv.conf, checking for changes. Debian,
> Ubuntu are patched.

This sounds like the worst possible solution, imposing a penalty on
all applications for a change that is well defined in a higher level.

> 4) Use a custom DNS library, glibc is unsuitable for this purpose.

Certainly an option. You are allowed to so as you wish with your system.

However, I do not think that glibc is unsuitable for these purposes
and that with some effort we can put together a solution.

> Here is my take. About nscd, I'm having the problem on a major distro
> (Fedora) so I can only guess there are good reasons for not using it by
> default.

The complexity of caching name server requests is not something that
should be enabled by default unless there is a specific need.

> On (2), res_init() is a BSD non-standard function, and its man page doesn't
> mention such a purpose. In fact I can't be sure if it's safe to call it
> multiple times and I see no guarantee that it will re-initialise the
> resolver more than once. If it's the proposed way shouldn't it be mentioned
> in both res_init() and getaddrinfo()'s man pages, or otherwise a big warning
> that resolv.conf is never reparsed?

This seems like a sensible solution e.g. an API call that guarantees
that the resolver can operate correctly after a network configuration
change.

I haven't reviewed the code in question so I don't actually know if
res_init() is safe to be used this way. Part of your work would be to
look into this and propose the documentation patch and provide
sufficient background to justify the changes.

> On (3) I don't have a Debian system to check it, but the overhead of
> stat'ing on every request is probably unacceptable. I was thinking of
> writing a patch that would stat() and reparse after a single request
> timeout, so that following retries (unless RES_DFLRETRY is reached) will
> automatically connect to the new servers. Would that be acceptable?

No.

> Finally using a custom library sounded logical, until I started reading
> glibc's resolver. Really, with such size and complexity and even
> asynchronous interface provided, shouldn't we also provide the simplest
> facilities?

We should.

> And a related question, is there a way to setup resolver behaviour (timeout,
> retries) for a process programmatically, instead of changing the system-wide
> resolv.conf?

There is no interface for this. This is another place where
enhancements would be greatly appreciated.

Please feel free to email [hidden email] if you have any
more general questions about how to X or Y.

Cheers,
Carlos.
Reply | Threaded
Open this post in threaded view
|

Re: On glibc's resolver

Carlos O'Donell-2
On Tue, Dec 25, 2012 at 11:17 PM, Carlos O'Donell
<[hidden email]> wrote:
> I haven't reviewed the code in question so I don't actually know if
> res_init() is safe to be used this way. Part of your work would be to
> look into this and propose the documentation patch and provide
> sufficient background to justify the changes.

... or file a bug with a test case and act as an advocate for the change.

Cheers,
Carlos.
Reply | Threaded
Open this post in threaded view
|

Re: On glibc's resolver

Siddhesh Poyarekar-3
In reply to this post by Carlos O'Donell-2
On Tue, Dec 25, 2012 at 11:17:22PM -0500, Carlos O'Donell wrote:

> On Tue, Dec 25, 2012 at 10:14 PM, Dimitrios Apostolou <[hidden email]> wrote:
> > I have been tracing weird behaviour of my mail client (alpine) and ended up
> > in getaddrinfo() calls, which are handled by glibc's resolver. In
> > particular, when I connect my laptop to different networks and the previous
> > DNS server is unreachable, resolver never re-reads its cache and all queries
> > timeout after several retries.
>
> What we need is a test case with expected and observed behaviour.
> Given a test case we can justify or refute the expected or observed
> behaviour against relevant standards or prior art.

I don't think there are any standards that define this behaviour,
which is why any behaviour is 'correct'.  The reproducer is quite
simple:

#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <netdb.h>
#include <netinet/in.h>
#include <arpa/inet.h>

int
main ()
{
  const char *host = "priv.network.foo.com";
  int err;
  struct addrinfo *result = NULL;

  while (1)
    {
      if ((err = getaddrinfo(host, NULL, NULL, &result)) < 0)
        fprintf(stderr, "Lookup: unable to create socket for %s: %s\n",
                host, gai_strerror (err));
      sleep (2);
    }

  return 0;
}

where priv.network.foo.com is resolvable only by a specific DNS
server A and not by DNS server B.  Set up resolv.conf with

nameserver B

and start the above program, watching it fail the DNS query every 2
seconds.  Now modify resolv.conf to:

nameserver A

and watch it continue to fail.  This is because resolv.conf is not
read in again when it is changed.

A lot of desktop applications depend on NetworkManager to do this for
them.  NetworkManager has an API that notifies applications when an
interface has changed.  This allows applications to do a res_init.
Firefox or pidgin code are good references for this.

> > 3) Patch glibc to stat() /etc/resolv.conf, checking for changes. Debian,
> > Ubuntu are patched.
>
> This sounds like the worst possible solution, imposing a penalty on
> all applications for a change that is well defined in a higher level.

* Linux-specific: Use the kernel notify interface (or something
  similar) to asynchronously reinitialize the resolver when a change
  is detected.

* Memory map resolv.conf and iterate through the nameservers
  everytime, like we do for hosts.  Really bad for performance and
  hence I'd think this would get a 'no'.

Siddhesh
Reply | Threaded
Open this post in threaded view
|

Re: On glibc's resolver

Dimitrios Apostolou
In reply to this post by Carlos O'Donell-2
Hi Carlos, I appreciate your thorough reply!

On Tue, 25 Dec 2012, Carlos O'Donell wrote:
>
> What we need is a test case with expected and observed behaviour.
> Given a test case we can justify or refute the expected or observed
> behaviour against relevant standards or prior art.
>

I'm having trouble to reproduce it without moving between networks, it
could probably be done with some iptables rules (block access to original
servers after launching alpine and change resolv.conf to new servers) but
I'm not an expert on iptables. I'll try to figure this out later.

Practically, the alpine process opened when connected to the University's
network, can't connect to any of Uni's 3 DNS servers since they are
inaccessible from home network. Strace shows many retries-failures.
Instead of rereading /etc/resolv.conf, it repeatedly times out.

>> Apparently this is a known issues, and a web search reveals discussions from
>> as early as 2003. I'd appreciate your opinions, I was thinking of writing a
>> patch but I can't figure out where it should go, alpine or glibc, code or
>> documentation! Here are the replies I gathered from a web search:
>
> Could you please provide references to the prior discussions so we can
> review them also?

Sure, here are some pointers:

Ulrich Drepper proposing the res_init() solution:
http://sourceware.org/bugzilla/show_bug.cgi?id=3675

Pushing the Debian stat() patch to eglibc:
http://www.eglibc.org/archives/patches/msg00778.html

Firefox bug from 2003:
https://bugzilla.mozilla.org/show_bug.cgi?id=214538

Plus various stackoverflow answers proposing a dedicated resolver library
like c-ares or libunbound.

>
>> 3) Patch glibc to stat() /etc/resolv.conf, checking for changes. Debian,
>> Ubuntu are patched.
>
> This sounds like the worst possible solution, imposing a penalty on
> all applications for a change that is well defined in a higher level.

The penalty should be negligible if stat() happens after the first
timeout, right?

>> On (2), res_init() is a BSD non-standard function, and its man page doesn't
>> mention such a purpose. In fact I can't be sure if it's safe to call it
>> multiple times and I see no guarantee that it will re-initialise the
>> resolver more than once. If it's the proposed way shouldn't it be mentioned
>> in both res_init() and getaddrinfo()'s man pages, or otherwise a big warning
>> that resolv.conf is never reparsed?
>
> This seems like a sensible solution e.g. an API call that guarantees
> that the resolver can operate correctly after a network configuration
> change.
>
> I haven't reviewed the code in question so I don't actually know if
> res_init() is safe to be used this way. Part of your work would be to
> look into this and propose the documentation patch and provide
> sufficient background to justify the changes.

I'll look into this. I have doubts on whether it's safe to call res_init()
repeatedly on all UNIX systems. Maybe a glibc specific init function would
be better, that could also change (per-process) all the resolv.conf
parameters, e.g. timeout and retries?

>> And a related question, is there a way to setup resolver behaviour (timeout,
>> retries) for a process programmatically, instead of changing the system-wide
>> resolv.conf?
>
> There is no interface for this.

Thanks, I was not sure.


Dimitris

Reply | Threaded
Open this post in threaded view
|

Re: On glibc's resolver

Ondřej Bílka
On Wed, Dec 26, 2012 at 07:28:31AM +0200, Dimitrios Apostolou wrote:

> >>Apparently this is a known issues, and a web search reveals discussions from
> >>as early as 2003. I'd appreciate your opinions, I was thinking of writing a
> >>patch but I can't figure out where it should go, alpine or glibc, code or
> >>documentation! Here are the replies I gathered from a web search:
> >
> >Could you please provide references to the prior discussions so we can
> >review them also?
>
> Sure, here are some pointers:
>
> Ulrich Drepper proposing the res_init() solution:
> http://sourceware.org/bugzilla/show_bug.cgi?id=3675
>
> Pushing the Debian stat() patch to eglibc:
> http://www.eglibc.org/archives/patches/msg00778.html
>
> Firefox bug from 2003:
> https://bugzilla.mozilla.org/show_bug.cgi?id=214538
>
> Plus various stackoverflow answers proposing a dedicated resolver
> library like c-ares or libunbound.
>
> >
> >>3) Patch glibc to stat() /etc/resolv.conf, checking for changes. Debian,
> >>Ubuntu are patched.
> >
> >This sounds like the worst possible solution, imposing a penalty on
> >all applications for a change that is well defined in a higher level.
>
> The penalty should be negligible if stat() happens after the first
> timeout, right?

Even in calling it directly this penalty is negligible.
Stat call takes few microseconds. As getaddrinfo is typically followed by socket
call you get milisecond latencies and stat penalty is unmeasurable.


Reply | Threaded
Open this post in threaded view
|

Re: On glibc's resolver

Dimitrios Apostolou
In reply to this post by Siddhesh Poyarekar-3
Hi Siddhesh,

On Wed, 26 Dec 2012, Siddhesh Poyarekar wrote:
>
> * Linux-specific: Use the kernel notify interface (or something
>  similar) to asynchronously reinitialize the resolver when a change
>  is detected.
>
> * Memory map resolv.conf and iterate through the nameservers
>  everytime, like we do for hosts.  Really bad for performance and
>  hence I'd think this would get a 'no'.

Why not share a small memory segment among all processes with the
resolv.conf contents in binary form. Whenever a process parses
resolv.conf, it should also update the SHM file. So it would be updated by
firefox or networkmanager, or whoever calls res_init(), but all processes
will profit. Optionally re-parsing could happen without even res_init(),
but on the first timeout. Since this would be system wide, the overhead
will be for a single process only.

I'm thinking that ideally all caching of resolved and unresolvable names
should happen in a common place for all processes, and this could happen
with shared memory better than with separate daemon. But I'm probably
taking this too far.


Thanks,
Dimitris


>
>
Reply | Threaded
Open this post in threaded view
|

Re: On glibc's resolver

Andreas Schwab-2
Dimitrios Apostolou <[hidden email]> writes:

> I'm thinking that ideally all caching of resolved and unresolvable names
> should happen in a common place for all processes, and this could happen
> with shared memory better than with separate daemon.

This is what nscd is already doing.

Andreas.

--
Andreas Schwab, [hidden email]
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."
Reply | Threaded
Open this post in threaded view
|

Speeding up nscd

Ondřej Bílka
In reply to this post by Dimitrios Apostolou
On Wed, Dec 26, 2012 at 11:11:47AM +0200, Dimitrios Apostolou wrote:

> Hi Siddhesh,
>
> On Wed, 26 Dec 2012, Siddhesh Poyarekar wrote:
> >
> >* Linux-specific: Use the kernel notify interface (or something
> > similar) to asynchronously reinitialize the resolver when a change
> > is detected.
> >
> >* Memory map resolv.conf and iterate through the nameservers
> > everytime, like we do for hosts.  Really bad for performance and
> > hence I'd think this would get a 'no'.
>
> Why not share a small memory segment among all processes with the
> resolv.conf contents in binary form. Whenever a process parses
> resolv.conf, it should also update the SHM file. So it would be
> updated by firefox or networkmanager, or whoever calls res_init(),
> but all processes will profit. Optionally re-parsing could happen
> without even res_init(), but on the first timeout. Since this would
> be system wide, the overhead will be for a single process only.
>
> I'm thinking that ideally all caching of resolved and unresolvable
> names should happen in a common place for all processes, and this
> could happen with shared memory better than with separate daemon.
> But I'm probably taking this too far.
>
This won't work for security reasons. Any user could modify cache to
redirect everybody's traffic to evil.com.

However it could be possible for nscd to export its cache as read-only
mmaped file to avoid context switches on cached entries.

Then another question is how persudate distributions to use nscd. Debian
description is:
 A daemon which handles passwd, group and host lookups
 for running programs and caches the results for the next
 query. You should install this package only if you use
 slow Services like LDAP, NIS or NIS+
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up nscd

Petr Baudis
On Wed, Dec 26, 2012 at 11:31:54AM +0100, Ondřej Bílka wrote:
> This won't work for security reasons. Any user could modify cache to
> redirect everybody's traffic to evil.com.
>
> However it could be possible for nscd to export its cache as read-only
> mmaped file to avoid context switches on cached entries.

This is what is already done; file descriptors of the respective map
files are passed on the nscd socket.

In the nscd/ directory, generally the nscd_* files are client code
included in glibc, and the rest is part of the nscd daemon. (Somewhat
confusingly...)

> Then another question is how persudate distributions to use nscd. Debian
> description is:
>  A daemon which handles passwd, group and host lookups
>  for running programs and caches the results for the next
>  query. You should install this package only if you use
>  slow Services like LDAP, NIS or NIS+

nscd has a bad reputation due to a fairly long history of bugs;
this stems from ugly spaghetti code and very aggressive use of
multi-threading coupled with some synchronization issues in sensitive
areas like garbage collection. Add NSS modules running in environment
not tested commonly (long-lived many-threaded applications) to the list
and as a result you get a list of bugs that not many are willing to
debug to the bone (especially if not paid for it).

I think that by now, pretty much all the common nscd bugs might be
ironed out, but its bad reputation lingers; I would expect it to be
the main reason why most distributions shied away from nscd. There is
also a "unscd" alternative that has its pros and cons.

Anyway, regarding the resolver issues, any solution must also work
without nscd, which is an optional system component. If all applications
(that do more than a single getaddrinfo() call during their runtime)
must implement similar logic of explicitly calling res_init() on some
event, forcing them to do it explicitly instead of just moving that
logic to glibc seems silly to me; I haven't thought about the topic
sufficiently to comment further, though.

--
                                Petr "Pasky" Baudis
        For every complex problem there is an answer that is clear,
        simple, and wrong.  -- H. L. Mencken
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up nscd

Dimitrios Apostolou
On Wed, 26 Dec 2012, Petr Baudis wrote:

>
> nscd has a bad reputation due to a fairly long history of bugs;
> this stems from ugly spaghetti code and very aggressive use of
> multi-threading coupled with some synchronization issues in sensitive
> areas like garbage collection. Add NSS modules running in environment
> not tested commonly (long-lived many-threaded applications) to the list
> and as a result you get a list of bugs that not many are willing to
> debug to the bone (especially if not paid for it).
>
> I think that by now, pretty much all the common nscd bugs might be
> ironed out, but its bad reputation lingers; I would expect it to be
> the main reason why most distributions shied away from nscd. There is
> also a "unscd" alternative that has its pros and cons.

Hi Petr, interesting insight I didn't know of, so I went ahead and found
"unscd" at [1]. Here is a part of the initial comment:


nscd problems are not exactly unheard of. Over the years, there were
quite a bit of bugs in it. This leads people to invent babysitters
which restart crashed/hung nscd. This is ugly.

After looking at nscd source in glibc I arrived to the conclusion
that its design is contributing to this significantly. Even if nscd's
code is 100.00% perfect and bug-free, it can still suffer from bugs
in libraries it calls.

As designed, it's a multithreaded program which calls NSS libraries.
These libraries are not part of libc, they may be provided
by third-party projects (samba, ldap, you name it).

Thus nscd cannot be sure that libraries it calls do not have memory
or file descriptor leaks and other bugs.

Since nscd is multithreaded program with single shared cache,
any resource leak in any NSS library has cumulative effect.
Even if a NSS library leaks a file descriptor 0.01% of the time,
this will make nscd crash or hang after some time.

Of course bugs in NSS .so modules should be fixed, but meanwhile
I do want nscd which does not crash or lock up.


Dimitris


[1] https://github.com/keymon/unscd/blob/master/nscd-0.47.c
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up nscd

Russ Allbery
In reply to this post by Petr Baudis
Petr Baudis <[hidden email]> writes:

> nscd has a bad reputation due to a fairly long history of bugs;

Indeed, that's an understatement.

> this stems from ugly spaghetti code and very aggressive use of
> multi-threading coupled with some synchronization issues in sensitive
> areas like garbage collection.

Also, the original Solaris nscd and I believe early glibc verisons
completely ignored DNS TTLs.  That was an absolute catastrophe.
Generally, it only took one time of trying to track down a name resolution
bug for two or three hours (with host and dig showing nothing wrong at
all) and finally figuring out that nscd was just lying to the rest of the
system.  After that, the poor system administrator would vow to seek out
and destroy every copy of nscd running on any system so that could never
happen again.  Its benefits are otherwise marginal on systems that don't
use NIS or LDAP nsswitch modules heavily.

Per Urlich at http://udrepper.livejournal.com/16362.html this bug has been
fixed in glibc since late 2004, but having had that debugging experience,
I have to say that it's... memorable.  I suspect that many people just
haven't gotten the message that this was fixed long ago, and "disable nscd
or your DNS caching will be broken" has now entered the common lore and is
being copied from system to system by people who have only heard stories
about the original problems.

--
Russ Allbery ([hidden email])             <http://www.eyrie.org/~eagle/>
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up nscd

Petr Baudis
In reply to this post by Dimitrios Apostolou
  Hi!

On Wed, Dec 26, 2012 at 09:10:49PM +0200, Dimitrios Apostolou wrote:
> Hi Petr, interesting insight I didn't know of, so I went ahead and
> found "unscd" at [1]. Here is a part of the initial comment:

  Right. To balance that, there are at least two downsides to unscd:

  * It is/used to be not as tested as nscd and there are some bugs
in it too.  A different set, and *maybe* smaller.

  * Since all resolving is done in separate children, NSS cannot
reuse resources between the children. For example, IIRC nss_ldap
likes to reuse a single connection to the LDAP server for all the
queries; one of the big reasons to use nscd (at least for some) is to
keep LDAP connection count low for servers handling huge deployments
(thousands+ of clients). This is not the case with unscd where queries
will start new connections. It may be similar with regards to other
NSS modules as well.

  It's no panacea, and in a hindsight it's not as simple as replacing
nscd with unscd and being happy to the end of our days. But it is an
alternative to seriously consider.

--
                                Petr "Pasky" Baudis
        For every complex problem there is an answer that is clear,
        simple, and wrong.  -- H. L. Mencken
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up nscd

Andreas Jaeger-8
On 12/27/2012 01:07 AM, Petr Baudis wrote:
>    Hi!
>
> On Wed, Dec 26, 2012 at 09:10:49PM +0200, Dimitrios Apostolou wrote:
>> Hi Petr, interesting insight I didn't know of, so I went ahead and
>> found "unscd" at [1]. Here is a part of the initial comment:
>
>    Right. To balance that, there are at least two downsides to unscd:

One more: Sometimes the interface between nscd and glibc changes - and
then unscd has to catch up. For example unscd 0.48 was released to fix
such a change introduced in glibc 2.15.

>    * It is/used to be not as tested as nscd and there are some bugs
> in it too.  A different set, and *maybe* smaller.
>
>    * Since all resolving is done in separate children, NSS cannot
> reuse resources between the children. For example, IIRC nss_ldap
> likes to reuse a single connection to the LDAP server for all the
> queries; one of the big reasons to use nscd (at least for some) is to
> keep LDAP connection count low for servers handling huge deployments
> (thousands+ of clients). This is not the case with unscd where queries
> will start new connections. It may be similar with regards to other
> NSS modules as well.
>
>    It's no panacea, and in a hindsight it's not as simple as replacing
> nscd with unscd and being happy to the end of our days. But it is an
> alternative to seriously consider.

Also, AFAIK unscd does not support all databases that nscd supports,

Andreas
--
  Andreas Jaeger aj@{suse.com,opensuse.org} Twitter/Identica: jaegerandi
   SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
    GF: Jeff Hawn,Jennifer Guild,Felix Imendörffer,HRB16746 (AG Nürnberg)
     GPG fingerprint = 93A3 365E CE47 B889 DF7F  FED1 389A 563C C272 A126
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up nscd

Florian Weimer-5
In reply to this post by Petr Baudis
On 12/27/2012 01:07 AM, Petr Baudis wrote:

>    * Since all resolving is done in separate children, NSS cannot
> reuse resources between the children. For example, IIRC nss_ldap
> likes to reuse a single connection to the LDAP server for all the
> queries; one of the big reasons to use nscd (at least for some) is to
> keep LDAP connection count low for servers handling huge deployments
> (thousands+ of clients).

In general, it is not safe to perform complex operations in NSS (or PAM)
modules because there's so little known about the surrounding process.
That's why nss_ldap was with nss_ldapd, which does the heavy lifting in
a separate daemon.  Over time, all complex NSS/PAM modules will move to
this model.

--
Florian Weimer / Red Hat Product Security Team
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up nscd

Florian Weimer-5
In reply to this post by Russ Allbery
On 12/26/2012 08:47 PM, Russ Allbery wrote:

> Also, the original Solaris nscd and I believe early glibc verisons
> completely ignored DNS TTLs. [...]

> Per Urlich at http://udrepper.livejournal.com/16362.html this bug has been
> fixed in glibc since late 2004, but having had that debugging experience,
> I have to say that it's... memorable.  I suspect that many people just
> haven't gotten the message that this was fixed long ago,

I dimly remember subsequent bugs about negative caching and some bad
interactions between A and AAAA lookups.

Looking at the code (and experimenting with Fedora 17 and glibc 2.15),
nscd still turns a NODATA response into a NXDOMAIN response.

Here's a getaddrinfo test without nscd:

error: name lookup failure for enyo.de/80: No address associated with
hostname

But with nscd, I get:

error: name lookup failure for enyo.de/80: Name or service not known

These are the strings returned from gai_strerror, and the constants are
probably EAI_NODATA and EAI_NONAME.

--
Florian Weimer / Red Hat Product Security Team