This article is written in English and Portuguese
Este artigo está escrito em Inglês e Português
English version:
You decided...
This article if the first "on-demand" topic I write about. I had a few options for topics I'd like to cover and I initiated a poll on a Facebook page. The impact of DNS on Informix was the most voted. So you asked for it, here it is. I will probably keep doing this from now on.
Personally I also like this topic, partially because I had several problems related to this in more than one customer. This is a relatively common issue in complex environments.
I have a public and generic disclaimer about the material I publish here, but for this case I'd like to stretch this just a bit... Beside the normal disclaimer, I'd like to state that most of the information presented here lies a bit far from my regular competencies. This is the result of a lot of digging and investigation over the time (not only from me), and there may be a few (hopefully minor) errors on the information provided. Feel free to comment, email me etc.
Very short introduction about DNS
DNS is the acronym for Domain Name System which is a protocol/service that is able to convert a hostname (e.g. onlinedomus.com) into an IP address (e.g. 89.1.2.3) and the opposite. It can also do a lot of other stuff, like telling which mail server is responsible for a specific domain etc. but this is out of the scope of the article.
Without DNS there would be no Internet as we know it. It's a critical component of the Internet infra-structure and it's performance and security is crucial for us, the users.
In a small network you can work without DNS for the basic name resolution functions, by using files. But if your network is larger, it can be very hard to use those files.
The DNS system uses an hierarchical architecture and the UDP protocol (U does not stand for unreliable but it could be...) for performance reasons. Proper configuration of a DNS system can be a complex task, and my personal experience tells me it's not very easy to find who knows how to do it properly. Furthermore, many times people don't realize the terrible impact a DNS misconfiguration or malfunction may have on the systems.
I will not explain (and I wouldn't know how to) all the DNS configuration aspects, but I'd like to reference a few points:
How does Informix use the DNS?
From the DNS perspective, Informix is just another application. The /etc/nsswitch.conf file will tell if Informix will use the files (/etc/hosts) or the DNS servers (specified in /etc/resolv.conf). The first important thing to note is that all interaction between Informix and the DNS system goes through system calls. In particular these two functions or their equivalents or replacements:
This article is written mainly from the database server perspective. But the DNS has obvious implications for the client too... Let's start there and then I'll jump into the server.
When a client tool tries to connect to an Informix server, it starts by looking up the $INFORMIXSERVER (or equivalent given in the connection string) in the $INFORMIXSQLHOSTS file (for Java it can look up LDAP or HTTP servers for the info, but let's stick to the files for easier understanding). The file contains lines in the following format:
Typically, if a service name is used, we look the port number in /etc/services (this can be configured in /etc/nsswitch.conf). Personally I tend to use the port number to avoid that lookup...
Then, if a hostname is used, the client must map it to an IP address. For that it calls the gethostbyname() function. This function will behave as specified in /etc/nsswitch.conf, and will try to map the name to an IP address. A failure to do that will raise error -930. This can be reproduced:
{mosgoogle}
{mos_sb_discuss:4}
Este artigo está escrito em Inglês e Português
English version:
You decided...
This article if the first "on-demand" topic I write about. I had a few options for topics I'd like to cover and I initiated a poll on a Facebook page. The impact of DNS on Informix was the most voted. So you asked for it, here it is. I will probably keep doing this from now on.
Personally I also like this topic, partially because I had several problems related to this in more than one customer. This is a relatively common issue in complex environments.
I have a public and generic disclaimer about the material I publish here, but for this case I'd like to stretch this just a bit... Beside the normal disclaimer, I'd like to state that most of the information presented here lies a bit far from my regular competencies. This is the result of a lot of digging and investigation over the time (not only from me), and there may be a few (hopefully minor) errors on the information provided. Feel free to comment, email me etc.
Very short introduction about DNS
DNS is the acronym for Domain Name System which is a protocol/service that is able to convert a hostname (e.g. onlinedomus.com) into an IP address (e.g. 89.1.2.3) and the opposite. It can also do a lot of other stuff, like telling which mail server is responsible for a specific domain etc. but this is out of the scope of the article.
Without DNS there would be no Internet as we know it. It's a critical component of the Internet infra-structure and it's performance and security is crucial for us, the users.
In a small network you can work without DNS for the basic name resolution functions, by using files. But if your network is larger, it can be very hard to use those files.
The DNS system uses an hierarchical architecture and the UDP protocol (U does not stand for unreliable but it could be...) for performance reasons. Proper configuration of a DNS system can be a complex task, and my personal experience tells me it's not very easy to find who knows how to do it properly. Furthermore, many times people don't realize the terrible impact a DNS misconfiguration or malfunction may have on the systems.
I will not explain (and I wouldn't know how to) all the DNS configuration aspects, but I'd like to reference a few points:
- /etc/nsswitch.confThis file (on Unix/Linux systems, but the name can vary) defines how the name resolution (and other services) are used. In particular it can define if the system uses files, NIS, the DNS servers or other mechanism and the order it uses. As an example, a line like:hosts: dns filesindicates that for hostname lookups the system will first ask the DNS servers and then looks in the files
- /etc/hostsThis file can map IP addresses into hostnames (and vice-versa). As an example:89.1.2.3 www.onlinedomus.com onlinedomus.comThis tells the system that the IP address 89.1.2.3 will map to "www.onlinedomus.com" (and vice-versa). As you can imagine, a lookup for "onlinedomus.com" will also map to the same IP address.
- /etc/resolv.confThis contains the list of DNS servers that will be used for lookups and possibly a few other options (like requests timeout, names of domains that will be appended to simple hostnames, if the lookups for those hostnames fail etc.). An example:nameserver 192.168.112.2nameserver 9.64.162.21
How does Informix use the DNS?
From the DNS perspective, Informix is just another application. The /etc/nsswitch.conf file will tell if Informix will use the files (/etc/hosts) or the DNS servers (specified in /etc/resolv.conf). The first important thing to note is that all interaction between Informix and the DNS system goes through system calls. In particular these two functions or their equivalents or replacements:
- gethostbyname()
In short, this receives an hostname and returns a structure containing the IP address - gethostbyaddr()
This receives an IP address and returns the hostname that matches it
This article is written mainly from the database server perspective. But the DNS has obvious implications for the client too... Let's start there and then I'll jump into the server.
When a client tool tries to connect to an Informix server, it starts by looking up the $INFORMIXSERVER (or equivalent given in the connection string) in the $INFORMIXSQLHOSTS file (for Java it can look up LDAP or HTTP servers for the info, but let's stick to the files for easier understanding). The file contains lines in the following format:
INFORMIXSERVER PROTOCOL HOSTNAME/IP_ADDRESS PORT_NUMBER/SERVICE_NAME OPTIONSwhen the client libraries find the line matching the INFORMIXSERVER, they check the hostname (or IP address) and the port number (or service name).
Typically, if a service name is used, we look the port number in /etc/services (this can be configured in /etc/nsswitch.conf). Personally I tend to use the port number to avoid that lookup...
Then, if a hostname is used, the client must map it to an IP address. For that it calls the gethostbyname() function. This function will behave as specified in /etc/nsswitch.conf, and will try to map the name to an IP address. A failure to do that will raise error -930. This can be reproduced:
This e-mail address is being protected from spambots. You need JavaScript enabled to view it :fnunes-> echo $INFORMIXSERVER; grep $INFORMIXSERVER $INFORMIXSQLHOSTS; dbaccess sysmaster -and if you need evidences of what's going on behind the scenes we can use strace (or truss):
blogtest
blogtest onsoctcp nowhere.onlinedomus.com 1500
930: Cannot connect to database server (nowhere.onlinedomus.com).
This e-mail address is being protected from spambots. You need JavaScript enabled to view it :fnunes->
strace -o /tmp/strace.out dbaccess sysmaster -This is an edited extract of /tmp/strace.out generated by the command above. If you have the patience, you can see it doing the following:
- Open /etc/nsswitch.conf
- Open $INFORMIXSQLHOSTS (/home/informix/etc/sqlhosts)
- Open /etc/services (exceptionally I used a name instead of a port number)
- Open /etc/resolv.conf to find out the configured nameservers
- Open a socket to 192.168.112.2 (my configured DNS server)
- Ask for nowhere.onlinedomus.com
- Open /etc/hosts (in /etc/nsswtich.conf I configured to search the files if the DNS lookup fails)
- Read the error message from the Informix message files
- Write the error message to stderr
- Exit with error code -1
This e-mail address is being protected from spambots. You need JavaScript enabled to view it :fnunes-> cat /tmp/strace.out
[...]
open("/etc/nsswitch.conf", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=1803, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7862000
read(3, "#\n# /etc/nsswitch.conf\n#\n# An ex"..., 4096) = 1803
read(3, "", 4096) = 0
close(3) = 0
[...]
open("/home/informix/etc/sqlhosts", O_RDONLY|O_LARGEFILE) = 4
_llseek(4, 0, [0], SEEK_SET) = 0
read(4, "blogtest onsoctcp nowher"..., 4096) = 1389
[...]
open("/etc/services", O_RDONLY|O_CLOEXEC) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=644327, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7862000
read(4, "# /etc/services:\n# $Id: services"..., 4096) = 4096
close(4) = 0
[...]
open("/etc/resolv.conf", O_RDONLY) = 4
[...]
read(4, "", 4096) = 0
close(4) = 0
[...]
open("/lib/libresolv.so.2", O_RDONLY) = 4
read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0Pf\256\0004\0\0\0"..., 512) = 512
[...]
close(4) = 0
[...]
socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 4
connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.112.2")}, 16) = 0
gettimeofday({1325590403, 502576}, NULL) = 0
poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4, revents=POLLOUT}])
send(4, "tl\1\0\0\1\0\0\0\0\0\0\7nowhere\vonlinedomus"..., 41, MSG_NOSIGNAL) = 41
poll([{fd=4, events=POLLIN}], 1, 5000) = 1 ([{fd=4, revents=POLLIN}])
ioctl(4, FIONREAD, [101]) = 0
recvfrom(4, "tl\201\203\0\1\0\0\0\1\0\0\7nowhere\vonlinedomus"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.112.2")}, [16]) = 101
close(4) = 0
[...]
open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 4
[...]
read(4, "127.0.0.1\tpacman1.onlinedomus.ne"..., 4096) = 439
[...]
close(4) = 0
[...]
read(3, "Cannot connect to database serve"..., 40) = 40
write(2, "\n", 1) = 1
write(2, " 930: Cannot connect to databas"..., 68) = 68
exit_group(-1) = ?
This e-mail address is being protected from spambots. You need JavaScript enabled to view it :fnunes->
Now, we should move to the Informix server side. This requires a bit more work and preliminary explanations. To start, we must understand what the engine needs in order to establish the connection. One of those things is to do a reverse name lookup (IP address to hostname). This is not essential, but it's always tried. Informix may need the hostname for trust relation validation and to provide information to the DBA.
As you know, the Informix database engine comprises several operating system processes. From the OS perspective they all look the same (oninit), but every one has a specific role and runs certain engine threads. We can see the threads with:
Please note that for clarity I added line numbers and time differences between each call (this will be important later). Let's explain this.
So, again the analysis:
This e-mail address is being protected from spambots. You need JavaScript enabled to view it :fnunes-> onstat -g athAnd the OS processes with:
IBM Informix Dynamic Server Version 11.70.UC4 -- On-Line -- Up 00:17:01 -- 411500 Kbytes
Threads:
tid tcb rstcb prty status vp-class name
2 5583fa38 0 1 IO Idle 3lio* lio vp 0
3 558551f8 0 1 IO Idle 4pio* pio vp 0
4 5586b1f8 0 1 IO Idle 5aio* aio vp 0
5 558811f8 8f59dc0 1 IO Idle 6msc* msc vp 0
6 558af1f8 0 1 IO Idle 7fifo* fifo vp 0
7 558c9590 0 1 IO Idle 9aio* aio vp 1
8 558df3b8 54267018 3 sleeping secs: 1 8cpu main_loop()
9 559276f8 0 1 running 10soc* soctcppoll
10 5593ed18 0 2 sleeping forever 1cpu* soctcplst
11 55927d20 542675fc 1 sleeping secs: 1 8cpu flush_sub(0)
12 55988018 54267be0 1 sleeping secs: 1 8cpu flush_sub(1)
13 559881f0 542681c4 1 sleeping secs: 1 8cpu flush_sub(2)
14 559883c8 542687a8 1 sleeping secs: 1 8cpu flush_sub(3)
15 559885a0 54268d8c 1 sleeping secs: 1 8cpu flush_sub(4)
16 55988778 54269370 1 sleeping secs: 1 8cpu flush_sub(5)
17 55988bf0 54269954 1 sleeping secs: 1 8cpu flush_sub(6)
18 559fb468 54269f38 1 sleeping secs: 1 8cpu flush_sub(7)
19 559fb640 0 3 IO Idle 8cpu* kaio
20 55ab6018 5426a51c 2 sleeping secs: 1 8cpu aslogflush
21 55ab6960 5426ab00 1 sleeping secs: 92 1cpu btscanner_0
22 55b6a408 5426b0e4 3 cond wait ReadAhead 1cpu readahead_0
39 55bcd5c8 0 3 IO Idle 1cpu* kaio
40 55bcd7a0 5426bcac 3 sleeping secs: 1 1cpu* onmode_mon
41 55d3e148 5426c874 3 sleeping secs: 1 8cpu periodic
49 55e80a78 5426da20 1 sleeping secs: 177 1cpu dbScheduler
51 55f340f8 5426d43c 1 sleeping forever 1cpu dbWorker1
52 55f34d80 5426ce58 1 sleeping forever 8cpu dbWorker2
59 562ee228 5426e5e8 1 cond wait bp_cond 1cpu bf_priosweep()
This e-mail address is being protected from spambots. You need JavaScript enabled to view it :fnunes-> onstat -g gloThe threads that are listening on the engine TCP ports are the poll threads (soctcppoll) running on SOC class (this depends on the NETTYPE parameter). When a new request is received by them they call the listener threads (soctcplst) running on the cpu class to initiate the authentication process. Parts of this task are run by the MSC virtual processor. As we can see in the last output this has the PID 29402. So, in order to see what happens I'll trace that OS process. For reasons that I'll explain later, I will turn off the NS_CACHE feature (Informix 11.7) and I will restart the engine. So, for the first connection attempt we get (some parts cut off):
IBM Informix Dynamic Server Version 11.70.UC4 -- On-Line -- Up 00:18:48 -- 411500 Kbytes
MT global info:
sessions threads vps lngspins
0 29 10 3
sched calls thread switches yield 0 yield n yield forever
total: 9589515 8992470 597961 14485 4457836
per sec: 0 0 0 0 0
Virtual processor summary:
class vps usercpu syscpu total
cpu 2 11.51 94.06 105.57
aio 2 3.57 75.44 79.01
lio 1 0.01 0.01 0.02
pio 1 0.00 0.01 0.01
adm 1 0.01 0.15 0.16
soc 1 0.04 0.15 0.19
msc 1 0.00 0.01 0.01
fifo 1 0.00 0.01 0.01
total 10 15.14 169.84 184.98
Individual virtual processors:
vp pid class usercpu syscpu total Thread Eff
1 29395 cpu 5.63 46.80 52.43 66.41 78%
2 29398 adm 0.01 0.15 0.16 0.00 0%
3 29399 lio 0.01 0.01 0.02 0.02 100%
4 29400 pio 0.00 0.01 0.01 0.01 100%
5 29401 aio 3.29 74.30 77.59 77.59 100%
6 29402 msc 0.00 0.01 0.01 0.03 31%
7 29403 fifo 0.00 0.01 0.01 0.01 100%
8 29404 cpu 5.88 47.26 53.14 64.45 82%
9 29405 aio 0.28 1.14 1.42 1.42 100%
10 29406 soc 0.04 0.15 0.19 NA NA
tot 15.14 169.84 184.98
1 0.000000 semop(753664, {{5, -1, 0}}, 1) = 0
2 7.009868 socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
3 0.000107 connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
4 0.000242 close(3) = 0
5 0.000060 socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
6 0.000063 connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
7 0.000095 close(3) = 0
8 [...]
9 0.000000 open("/etc/resolv.conf", O_RDONLY) = 3
10 0.000000 fstat64(3, {st_mode=S_IFREG|0644, st_size=55, ...}) = 0
11 0.000000 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xab4000
12 0.000000 read(3, "# Generated by NetworkManager\nna"..., 4096) = 55
13 0.000926 read(3, "", 4096) = 0
14 0.000050 close(3) = 0
15 [...]
16 0.000057 futex(0x29ab44, FUTEX_WAKE_PRIVATE, 2147483647) = 0
17 0.000256 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 3
18 0.000089 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.112.2")}, 16) = 0
19 0.000107 gettimeofday({1325605320, 167025}, NULL) = 0
20 0.000072 poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}])
21 0.000083 send(3, "\363\337\1\0\0\1\0\0\0\0\0\0\0011\003112\003168\003192\7in-ad"..., 44, MSG_NOSIGNAL) = 44
22 0.000322 poll([{fd=3, events=POLLIN}], 1, 5000) = 1 ([{fd=3, revents=POLLIN}])
23 2.061369 ioctl(3, FIONREAD, [121]) = 0
24 0.000111 recvfrom(3, "\363\337\201\203\0\1\0\0\0\1\0\0\0011\003112\003168\003192\7in-ad"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.112.2")}, [16]) = 121
25 0.000155 close(3) = 0
26 0.000090 open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 3
27 0.000377 fstat64(3, {st_mode=S_IFREG|0644, st_size=439, ...}) = 0
28 0.000089 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf22000
29 0.000057 read(3, "127.0.0.1\tpacman1.onlinedomus.ne"..., 4096) = 439
30 0.000130 close(3) = 0
31 [...]
32 0.000072 semop(753664, {{7, 1, 0}}, 1) = 0
33 0.000069 semop(753664, {{7, 1, 0}}, 1) = 0
34 0.007558 semop(753664, {{5, -1, 0}}, 1
Please note that for clarity I added line numbers and time differences between each call (this will be important later). Let's explain this.
- On line 1) we have a semop() which is the way MSC VP stays idle. That was before the connection attempt.
- 7 seconds later it tries to "talk" with nscd daemon (lines 2-8). This is kind of a Linux specific mechanism that I'm not running. Then accesses the /etc/nsswitch.conf. Just keep in memory that it did this on the first attempt
- Then it accesses /etc/resolv.conf (lines 9-15) and finds the nameserver address
- On lines 16-25 it talks to the DNS server (192.168.112.2) and asks for the reverse name of the connecting IP address
- Since the answer is inconclusive, it goes to /etc/hosts (lines 26-31)
- I have cut the remaining part which is related to the authentication (opening the /etc/passwd, /etc/group, /etc/shadow etc.).
- Finally it returns to the normal idle state
- It all happened pretty quick (values are in seconds)
- We don't see the gethostbyaddr() call. This is not a "system call" for strace. So we see the lower level calls, but not the gethostbyaddr() function. We can catch it by attaching a debugger to the same process. This is important because usually it's hard to discuss this issues with the network and OS administrators because they tend to assume all this is done by Informix. It isn't! Informix just calls gethostbyaddr() (or equivalent fiunctions)
1 0.000000 semop(753664, {{5, -1, 0}}, 1) = 0
2 6.452154 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 3
3 0.000099 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.112.2")}, 16) = 0
4 0.008816 gettimeofday({1325605445, 534040}, NULL) = 0
5 0.000089 poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}])
6 0.000100 send(3, "\233\t\1\0\0\1\0\0\0\0\0\0\0011\003112\003168\003192\7in-ad"..., 44, MSG_NOSIGNAL) = 44
7 0.000417 poll([{fd=3, events=POLLIN}], 1, 5000) = 1 ([{fd=3, revents=POLLIN}])
8 2.089726 ioctl(3, FIONREAD, [121]) = 0
9 0.000118 recvfrom(3, "\233\t\201\203\0\1\0\0\0\1\0\0\0011\003112\003168\003192\7in-ad"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.112.2")}, [16]) = 121
10 0.000132 close(3) = 0
11 0.000069 open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 3
12 0.000102 fstat64(3, {st_mode=S_IFREG|0644, st_size=439, ...}) = 0
13 0.000092 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xe1f000
14 0.000064 read(3, "127.0.0.1\tpacman1.onlinedomus.ne"..., 4096) = 439
15 0.000099 close(3) = 0
16 [...]
17 0.000068 semop(753664, {{0, 1, 0}}, 1) = 0
18 0.000096 semop(753664, {{0, 1, 0}}, 1) = 0
19 0.000076 semop(753664, {{5, -1, 0}}, 1
So, again the analysis:
- The first part accessing the nscd daemon, the /etc/nsswitch.conf and the /etc/resolv.conf is completely gone.
As you can see it starts by connecting to the DNS. - Then it reads the files (/etc/hosts)
- Then I cut the authentication part as before
- Finally it returns to the idle state
The important point to note here is that the first attempt is different from the others. And again, Informix just calls gethostbyaddr()... just the same call each time. Well... to be correct the function we call may depend on the platform. As I mentioned earlier, only by using a debugger we can find the gethostbyaddr() call. I've done it and here is the result:
(gdb) break connect
Breakpoint 1 at 0x95f640
(gdb) continue
Continuing.
Breakpoint 1, 0x0095f640 in connect () from /lib/libpthread.so.0
(gdb) where
#0 0x0095f640 in connect () from /lib/libpthread.so.0
#1 0x00aec9ab in reopen () from /lib/libresolv.so.2
#2 0x00aee542 in __libc_res_nsend () from /lib/libresolv.so.2
#3 0x00aeb24e in __libc_res_nquery () from /lib/libresolv.so.2
#4 0x002b6dc7 in _nss_dns_gethostbyaddr2_r () from /lib/libnss_dns.so.2
#5 0x002b6f1a in _nss_dns_gethostbyaddr_r () from /lib/libnss_dns.so.2
#6 0x0020890b in gethostbyaddr_r@@GLIBC_2.1.2 () from /lib/libc.so.6
#7 0x00211f77 in getnameinfo () from /lib/libc.so.6
#8 0x08c0e664 in ifx_getipnodebyaddr ()
#9 0x08c0f79c in ifx_gethostbyaddr ()
#10 0x08c0f8a2 in __osgethostbyaddr ()
#11 0x08b0c055 in aio_workon ()
#12 0x08b0c9c3 in aiothread ()
#13 0x08b0dbcb in iothread ()
#14 0x08b00762 in startup ()
#15 0x558749e8 in ?? ()
#16 0x00000000 in ?? ()
(gdb)
As you can see I've setup a breakpoint for connect(). Then I "continue" the execution and I try the connection. gdb stops the program at the breakpoint and I get the stack trace (read bottom up).
So, it shows we call getnameinfo() which in turn calls gethostbyaddr_r() etc. All this belongs to the system libraries, not to Informix code.
There are two additional considerations we need to be aware. First, the Informix MSC VP processes it's requests in a serial manner. For each connection it asks what it needs from the DNS servers and/or files and makes the authentication. By default we only have one MSC VP... so if one request gets stuck.... yes... the following connections will suffer delays. This delays can be a fraction of second, or a few seconds, but on some systems I've seen tens (heard about hundreds) of connections per second, so even a few seconds will have large impact.
The second consideration relates to the differences between the first call and the subsequent ones. As we've seen above, on the first call the process checks the configuration (the /etc/nsswitch.conf and /etc/resolv.conf files). After that first check it does not do that anymore. And this causes a problem. Again this is the behavior of the system functions (but not necessarily the end of the story....)
So, hopefully I was able to explain how Informix interacts with the DNS system. The important technical deep dive should be over. We'll proceed to the implications. It's important you understand all the above before proceeding to the next paragraphs.
What problems can we face?
Above I've tried to show you how things work when everything is ok. But what happens when something is wrong? Let's see what can go wrong first and then the implications. I'll also try to explain who to blame (and again the disclaimer...). The purpose of course is not to finger point, but knowing where the problem lies is the first step to solve it.
So, it shows we call getnameinfo() which in turn calls gethostbyaddr_r() etc. All this belongs to the system libraries, not to Informix code.
There are two additional considerations we need to be aware. First, the Informix MSC VP processes it's requests in a serial manner. For each connection it asks what it needs from the DNS servers and/or files and makes the authentication. By default we only have one MSC VP... so if one request gets stuck.... yes... the following connections will suffer delays. This delays can be a fraction of second, or a few seconds, but on some systems I've seen tens (heard about hundreds) of connections per second, so even a few seconds will have large impact.
The second consideration relates to the differences between the first call and the subsequent ones. As we've seen above, on the first call the process checks the configuration (the /etc/nsswitch.conf and /etc/resolv.conf files). After that first check it does not do that anymore. And this causes a problem. Again this is the behavior of the system functions (but not necessarily the end of the story....)
So, hopefully I was able to explain how Informix interacts with the DNS system. The important technical deep dive should be over. We'll proceed to the implications. It's important you understand all the above before proceeding to the next paragraphs.
What problems can we face?
Above I've tried to show you how things work when everything is ok. But what happens when something is wrong? Let's see what can go wrong first and then the implications. I'll also try to explain who to blame (and again the disclaimer...). The purpose of course is not to finger point, but knowing where the problem lies is the first step to solve it.
- Network problems prevent the connection to the DNS servers
If this happens, the requests sent by the MSC VP will have to timeout (typically a few seconds) before the OS call returns. This delay will cause all the other connection requests to stay on hold (assuming we just have one MSC VP). If the network problems persist, it really doesn't matter how many MSC VPs we have, since they'll all get stuck and all our connection attempts will suffer delays - The DNS server dies, is stopped, or is extremely slow
The effect of this is very similar to the previous. Anything that causes delays in the DNS requests will potenti...
Find the whole article DNS impact on Informix / Impacto do DNS no Informix on the website Informix technology.
{mosgoogle}
{mos_sb_discuss:4}
| < Prev | Next > |
|---|