It was discovered that the GNU C Library incorrectly handled receiving responses while performing DNS resolution. A remote attacker could use this issue to cause the GNU C Library to crash, resulting in a denial of service, or possibly execute arbitrary code.
Internet-scale monoculture vulnerabilities
Pretty much any Linux system uses glibc, and getaddrinfo is typically used to resolve IP addresses. Which means Linux servers as well as workstations, are vulnerable unless it runs an old version of glibc (pre 2.9).
Google isn't the first one that spotted the bug, but determined it's significance in collaboration with Redhat.
Google was able to create a PoC exploit. While exploitation depends on the countermeasures systems use for stack based buffer overflows, it is possible to exploit the bug and achieve command execution
Tuesday, February 16, 2016 CVE-2015-7547: glibc getaddrinfo stack-based buffer overflow
Posted by Fermin J. Serna, Staff Security Engineer and Kevin Stadmeyer, Technical Program Manager
Have you ever been deep in the mines of debugging and suddenly realized that you were staring at something far more interesting than you were expecting? You are not alone! Recently a Google engineer noticed that their SSH client segfaulted every time they tried to connect to a specific host. That engineer filed a ticket to investigate the behavior and after an intense investigation we discovered the issue lay in glibc and not in SSH as we were expecting.
Thanks to this engineer’s keen observation, we were able determine that the issue could result in remote code execution. We immediately began an in-depth analysis of the issue to determine whether it could be exploited, and possible fixes. We saw this as a challenge, and after some intense hacking sessions, we were able to craft a full working exploit!
In the course of our investigation, and to our surprise, we learned that the glibc maintainers had previously been alerted of the issue via their bug tracker in July, 2015. (bug). We couldn't immediately tell whether the bug fix was underway, so we worked hard to make sure we understood the issue and then reached out to the glibc maintainers. To our delight, Florian Weimer and Carlos O’Donell of Red Hat had also been studying the bug’s impact, albeit completely independently! Due to the sensitive nature of the issue, the investigation, patch creation, and regression tests performed primarily by Florian and Carlos had continued “off-bug.”
This was an amazing coincidence, and thanks to their hard work and cooperation, we were able to translate both teams’ knowledge into a comprehensive patch and regression test to protect glibc users.
Our initial investigations showed that the issue affected all the versions of glibc since 2.9. You should definitely update if you are on an older version though. If the vulnerability is detected, machine owners may wish to take steps to mitigate the risk of an attack.
The glibc DNS client side resolver is vulnerable to a stack-based buffer overflow when the getaddrinfo() library function is used. Software using this function may be exploited with attacker-controlled domain names, attacker-controlled DNS servers, or through a man-in-the-middle attack.
Google has found some mitigations that may help prevent exploitation if you are not able to immediately patch your instance of glibc. The vulnerability relies on an oversized (2048+ bytes) UDP or TCP response, which is followed by another response that will overwrite the stack. Our suggested mitigation is to limit the response (i.e., via DNSMasq or similar programs) sizes accepted by the DNS resolver locally as well as to ensure that DNS queries are sent only to DNS servers which limit the response size for UDP responses with the truncation bit set.
glibc reserves 2048 bytes in the stack through alloca() for the DNS answer at _nss_dns_gethostbyname4_r() for hosting responses to a DNS query.
Later on, at send_dg() and send_vc(), if the response is larger than 2048 bytes, a new buffer is allocated from the heap and all the information (buffer pointer, new buffer size and response size) is updated.
Under certain conditions a mismatch between the stack buffer and the new heap allocation will happen. The final effect is that the stack buffer will be used to store the DNS response, even though the response is larger than the stack buffer and a heap buffer was allocated. This behavior leads to the stack buffer overflow.
The vectors to trigger this buffer overflow are very common and can include ssh, sudo, and curl. We are confident that the exploitation vectors are diverse and widespread; we have not attempted to enumerate these vectors further.
Remote code execution is possible, but not straightforward. It requires bypassing the security mitigations present on the system, such as ASLR. We will not release our exploit code, but a non-weaponized Proof of Concept has been made available simultaneously with this blog post. With this Proof of Concept, you can verify if you are affected by this issue, and verify any mitigations you may wish to enact.
As you can see in the below debugging session we are able to reliably control EIP/RIP.
When code crashes unexpectedly, it can be a sign of something much more significant than it appears; ignore crashes at your peril!
Failed exploit indicators, due to ASLR, can range from:
Crash on free(ptr) where ptr is controlled by the attacker.
Crash on free(ptr) where ptr is semi-controlled by the attacker since ptr has to be a valid readable address.
Crash reading from memory pointed by a local overwritten variable.
Crash writing to memory on an attacker-controlled pointer.
We would like to thank Neel Mehta, Thomas Garnier, Gynvael Coldwind, Michael Schaller, Tom Payne, Michael Haro, Damian Menscher, Matt Brown, Yunhong Gu, Florian Weimer, Carlos O’Donell and the rest of the glibc team for their help figuring out all details about this bug, exploitation, and patch development.
CVE-2015-7547: Critical Vulnerability in glibc getaddrinfo
The exploit will likely trigger a DNS lookup from a vulnerable system. DNS lookups can be triggered in many ways: An image embedded in a web page, an email sent that is processed by a spam filter (which involves DNS lookups) are just two of many options.
The exploit response will exceed 2048 bytes in size. Not all responses > 2048 are exploits. The response may arrive via TCP or UDP.
Many modern systems support a feature called "EDNS0". This feature can be used by a client to signal to a server that it is willing to receive UDP responses that are larger than the traditional 512 bytes in size. Features like DNSSEC require EDNS0 to be enabled. Blocking large DNS responses will likely break EDNS0. DNS resolution may fail or will be significantly delayed.
All versions of glibc after 2.9 are vulnerable. Version 2.9 was introduced in May 2008.
The buffer overflow occurs in the function send_dg (UDP) and send_vc
(TCP) for the NSS module libnss_dns.so.2 when calling getaddrinfo with
AF_UNSPEC family and in some cases also with AF_INET6 before the fix in
commit 8479f23a (only use gethostbyname4_r if PF_UNSPEC).
The use of AF_UNSPEC triggers the low-level resolver code to send out
two parallel queries for A and AAAA. A mismanagement of the buffers used
for those queries could result in the response writing beyond the alloca
allocated buffer created by __res_nquery.
- Via getaddrinfo with family AF_UNSPEC or AF_INET6 the overflowed
buffer is located on the stack via alloca (a 2048 byte fixed size
buffer for DNS responses).
- At most 65535 bytes (MAX_PACKET) may be written to the alloca buffer
of 2048 bytes. Overflowing bytes are entirely under the control of the
attacker and are the result of a crafted DNS response.
- Local testing shows that we have been able to control at least the
execution of one free() call with the buffer overflow and gained
control of EIP. Further exploitation was not attempted, only this
single attempt to show that it is very likely that execution control
can be gained without much more effort. We know of no known attacks
that use this specific vulnerability.
- Mitigating factors for UDP include:
- A firewall that drops UDP DNS packets > 512 bytes.
- A local resolver (that drops non-compliant responses).
- Avoid dual A and AAAA queries (avoids buffer management error) e.g.
Do not use AF_UNSPEC.
- No use of `options edns0` in /etc/resolv.conf since EDNS0 allows
responses larger than 512 bytes and can lead to valid DNS responses
- No use of `RES_USE_EDNS0` or `RES_USE_DNSSEC` since they can both
lead to valid large EDNS0-based DNS responses that can overflow.
- Mitigating factors for TCP include:
- Limit all replies to 1024 bytes.
- Mitigations that don't work:
- Setting `options single-request` does not change buffer management
and does not prevent the exploit.
- Setting `options single-request-reopen` does not change buffer
management and does not prevent the exploit.
- Disabling IPv6 does not disable AAAA queries. The use of AF_UNSPEC
unconditionally enables the dual query.
- The use of `sysctl -w net.ipv6.conf.all.disable_ipv6=1` will not
protect your system from the exploit.
- Blocking IPv6 at a local or intermediate resolver does not work to
prevent the exploit. The exploit payload can be delivered in A or
AAAA results, it is the parallel query that triggers the buffer
- The code that causes the vulnerability was introduced in May 2008 as
part of glibc 2.9.
- The code that causes the vulnerability is only present in glibc's copy
of libresolv which has enhancements to carry out parallel A and AAAA
queries. Therefore only programs using glibc's copy of the code have
- A back of the envelope analysis shows that it should be possible to
write correctly formed DNS responses with attacker controlled payloads
that will penetrate a DNS cache hierarchy and therefore allow
attackers to exploit machines behind such caches.
The immediate solution to the buffer mismanagement issues are as
- Remove buffer reuse.
- Always malloc the second response buffer if needed.
- Requires fix for sourceware bug 16574 to avoid memory leak.
- Correctly adjust pointer *and* size for buffer in use.
In order to validate and test the resulting changes, including valgrind
validation, the following was fixed:
- Uninitialized uses of *herrno_p.
- With all uses initialized we have clean valgrind runs.
- Result of NSS_STATUS_SUCCESS masking the case where the second
response has failed with an ERANGE failure. In this case the second
response will contain whatever was on the stack last (alloca).
- With NSS_STATUS_TRYAGAIN returned if any of the results fail with
ERANGE we have deterministic results that can be validated.
Attached to the email are:
- Patch to fix the vulnerability.
- Tarball of validation tests which will be integrated into glibc.
NEWS update will be included in the final commit.
The defect is located in the glibc sources in the following file:
as part of the send_dg and send_vc functions which are part of the
__libc_res_nsend (res_nsend) interface which is used by many of the
higher level interfaces including getaddrinfo (indirectly via the DNS
One way to trigger the buffer mismanagement is like this:
* Have the target attempt a DNS resolution for a domain you control.
- Need to get A and AAAA queries.
* First response is 2048 bytes.
- Fills the alloca buffer entirely with 0 left over.
- send_dg attemps to reuse the user buffer but can't.
- New buffer created but due to bug old alloca buffer is used with new
size of 65535 (size of the malloc'd buffer).
- Response should be valid.
* Send second response.
- This response should be flawed in such a way that it forces
__libc_res_nsend to retry the query. It is sufficient for example to
pick any of the listed failure modes in the code which return zero.
* Send third response.
- The third response can contain 2048 bytes of valid response.
- The remaining 63487 bytes of the response are the attack payload and
the recvfrom smashes the stack with it.
The flaw happens because when send_dg is retried it restarts the query,
but the second time around the answer buffer points to the alloca'd
buffer but with the wrong size.
Please note that there are other ways to trigger the buffer management
flaw, but they require slightly more control over the timing of the
responses and use poll timeout to carry out the exploit with just two
responses from the attacker (as opposed to three).
A similar exploit is possible with TCP, but requires closing the TCP
connection (either with a TCP reset or a regular 3-way connection
close), or sending an empty response with a zero length header. Any such
action with forces send_vc to exit and retry with the wrong buffer size
will trigger a similar failure as seen in send_dg.
Earlier I mentioned iOS and Android as likely vulnerable. They could be, but they do not include glibc by default, so it would be up to an app to introduce it. Android uses the Bionic libc library. OS X and iOS do not use glibc neither do other BSD based systems. Thanks to Ken White for pointing this out.
Após o patch e reboot servidores de e-mail hospedados na Dacentec e na OVH-CA estão apresentando problemas.
Após power cycle, o servidor da Dacentec responde queries apenas por alguns instantes antes de toda comunicação UDP e TCP cessar. ICMP (ping) não foi afetado.
No caso do servidor da OVH, ocorre o acesso inicial e após poucos minutos o servidor deixa de responder, inclusive pings. O suporte da OVH, acionado automaticamente, enviou o seguinte diagnóstico:
Diagnosis interface boot (rescue)
Date 2016-02-17 03:49:01, frederik V made Diagnosis interface boot (rescue):
Here are the details of the operation performed:
The server gets stuck during the boot phase leading to a kernel panic.
A restart on the standard OVH kernel ('netboot') does not correct the situation.
Rebooting the server to "rescue" mode (Linux).
Boot OK. Rescue mode accessible.
Configuration / error to be corrected by the customer.
Vou tentar determinar a causa no servidor da OVH antes de solicitar KVM na Dacentec.
Nota: Existe um terceiro servidor de e-mail, na OVH, com hardware identico que não foi afetado pelo patch. Roda quase os mesmos serviços do servidor da OVH afetado mas com as seguintes diferenças: Debian 7 (vs Debian 8), dovecot sem replicação e não roda LDAP.
O servidor da Dacentec não roda SMTP & Cia. Roda dovecot com replicação master-master com o servidor afetado da OVH. Roda LDAP com replicação multi-master com diversos servidores, inclusive o da OVH afetado. Porém os outros servidores que rodam LDAP não foram afetados pelo patch.
O problema no servidor da OVH está ocorrendo por exaustão de memória (ram e swap).
Complicada a coisa. Após o patch, um dos servidores da Online.net está alocando 32GB de buffer e uma das mariolas, 2GB. Ambos não rodam serviços de e-mail.
Com extrema boa vontade, pode até ser a carga aumentada no servidor (recebendo o tráfego adicional dos servidores da OVH e Dacentec) MAS a tal mariola não está em produção, tráfego zero, e as demais, sem patch, estão com uso normal de memória.
No momento, estou descartando que os serviços de e-mail tenham algo a ver com os problemas dos servidores da OVH, Dacentec. LDAP a conferir.