Vulnerability in Glibc Allows Remote Execution on Domain Name Resolution
http://dankaminsky.com/2016/02/20/skeleton/
More from: dankaminsky.com
show/hide source |
http://dankaminsky.com/2016/02/20/skeleton/
A recently discovered vulnerability in glibc would potentially allow an attacker remote code execution by making a victim do a DNS lookup. The code has apparently been around since 2008 and there are some PoCs available.
Full remote code execution has been demonstrated by Google, despite the usual battery of post-exploitation mitigations like ASLR, NX, and so on.Worth noting that many other programming languages and frameworks use the underlying libc functions to resolve domain names (Java, Python, Ruby among others) and are affected by this. Patch now. Tags: glibc dns vulnerability
More from: dankaminsky.com
show/hide source |
A Skeleton Key of UnknownStrength February 20, 2016 Dan Kaminsky Leave a comment Go to comments TL;DR: The glibc DNS bug (CVE-2015-7547) is unusually bad. Even Shellshock and Heartbleed tended to affect things we knew were on the network and knew we had to defend. This affects a universally used library (glibc) at a universally used protocol (DNS). Generic tools that we didnt even know had network surface (sudo) are thus exposed, as is software written in programming languages designed explicitly to be safe. Who can exploit this vulnerability? We know unambiguously that an attacker directly on our networks can take over many systems running Linux. What we are unsure of is whether an attacker anywhere on the Internet is similarly empowered, given only the trivial capacity to cause our systems to look up addresses inside their malicious domains. Weve investigated the DNS lookup path, which requires the glibc exploit to survive traversing one of the millions of DNS caches dotted across the Internet. Weve found that it is neither trivial to squeeze the glibc flaw through common name servers, nor is it trivial to prove such a feat is impossible. The vast majority of potentially affected systems require this attack path to function, and we just dont know yet if it can. Our belief is that were likely to end up with attacks that work sometimes, and were probably going to end up hardening DNS caches against them with intent rather than accident. Were likely not going to apply network level DNS length limits because that breaks things in catastrophic and hard to predict ways. This is a very important bug to patch, and it is good we have some opportunity to do so. Its problematic that, a decade after the last DNS flaw that took a decade to fix, we have another one. Its time we discover and deploy architectural mitigations for these sorts of flaws with more assurance than technologies like ASLR can provide. The hard truth is that if this code was written in JavaScript, it wouldnt have been vulnerable. We can do better than that. We need to develop and fund the infrastructure, both technical and organizational, that defends and maintains the foundations of the global economy. Click here if youre a DNS expert and dont need to be told how DNS works. Click here if your interests are around security policy implications and not the specific technical flaw in question. Update: Click here to learn how thisissue compares to last years glibc DNS flaw, Ghost. ===== Here is a galaxy map of the Internet. I helped the Opte project create this particular one. And this galaxy is Linux specifically, Ubuntu Linux, in a map by Thomi Richards, showing how each piece of software inside of it depends on each other piece. There is a black hole at the center of this particular galaxy the GNU C Standard Library, or glibc. And at this center, in this black hole, there is a flaw. More than your average or even extraordinary flaw, its affecting a shocking amount of code. How shocking? Ive seen a lot of vulnerabilities, but not too many that create remote code execution in sudo. When DNS aint happy, aint nobody happy. Just how much trouble are we in? Were not quite sure. Background Most Internet software is built on top of Linux, and most Internet protocols are built on top of DNS. Recently, Redhat Linux and Google discovered some fairly serious flaws in the GNU C Library, used by Linux to (among many other things) connect to DNS to resolve names (like google.com) to IP addresses (like 8.8.8.8). The buggy code has been around for quite some time since May 2008 so its really worked its way across the globe. Full remote code execution has been demonstrated by Google, despitethe usual battery of post-exploitation mitigations like ASLR, NX, and so on. What we know unambiguously is that an attacker who can monitor DNS traffic between most (but not all) Linux clients, and a Domain Name Server, can achieve remote code execution independent of how well those clients are otherwise implemented. (Android is not affected.) That is a solid critical vulnerability by any normal standard. Actionable Intelligence Ranking exploits is silly. Theyre not sports teams. But generally, what you can do is actually less important than who you have to be to do it. Bugs like Heartbleed, Shellshock, and even the recent Java Deserialization flaws ask very little of attackers they have to be somewhere on a network that can reach their victims, maybe just anywhere on the Internet at large. By contrast, the unambiguous victims of glibc generally require their attackers to be close by. Youre just going to have to believe me when I say thats less of a constraint than youd think, for many classes of attacker youd actually worry about. More importantly though, the scale of software exposed to glibc is unusually substantial. For example: Thats JavaScript, Python, Java, and even Haskell blowing right up. Just because theyre memory-safe doesnt mean their runtime libraries are, and glibc is the big one under Linux they all depend on. (Not that other C libraries should be presumed safe. Ahem.) Theres a reason Im saying this bug exposes Linux in general to risk. Even your paranoid solutions leak DNS you can route everything over a VPN, but youve still got to discover where youre routing it to, and thats usually done with DNS. You can push everything over HTTPS, but whats that text after the https://? Its a DNS domain. Importantly, the whole point of entire sets of defenses is that theres an attacker on the network path. That guy just got a whole new set of toys, against a whole new set of devices. Everyone protects apache, who protects sudo? So, independent of whatever else may be found, Florian, Fermin, Kevin, and everyone else at Redhat and Google did some tremendous work finding and repairing something genuinely nasty. Patch this bug with extreme prejudice. Youll have to reboot everything, even if it doesnt get worse. It might get worse. The Hierarchy DNS is how this Internet (there were several previous attempts) achieves cross-organizational interoperability. It is literally the identity layer everything else builds upon; everybody can discover Googles mail server, but only Google can change it. Only they have the delegated ownership rights for gmail.com and google.com. Those rights were delegated by Verisign, who owns .com, who themselves received that exclusive delegation from ICANN, the Internet Corporation for Assigned Names and Numbers. The point is not to debate the particular trust model of DNS. The point is to recognize that its not just Google who can register domains; attackers can literally register badguy.com and host whatever they want there. If a DNS vulnerability could work through the DNS hierarchy, we would be in a whole new class of trouble, because it is just extraordinarily easy to compel code that does not trust you to retrieve arbitrary domains from anywhere in the DNS. You connect to a web server, it wants to put your domain in its logs, its going to look you up. You connect to a mail server, it wants to see if youre a spammer, its going to look you up. You send someone an email, they reply. How does their email find you? Their systems are going to look you up. It would be unfortunate if those lookups led to code execution. Once, I gave a talk to two hundred software developers. I asked them, how many of you depend on DNS? Two hands go up. I then asked, how many of you expect a string of text like google.com to end up causing a connection to Google? 198 more hands. Strings containing domain names happen all over the place in software, in all sorts of otherwise safe programming languages. Far more often than not, those strings not only find their way to a DNS client, but specifically to the code embedded in the operating system (the one thing that knows where the local Domain Name Server is!). If that embedded code, glibc, can end up receiving from the local infrastructure traffic similar enough to what a full-on local attacker would deliver, were in a lot more trouble. Many more attackers can cause lookups to badguy.com, than might find themselves already on the network path to a target. Domain Name Servers Glibc is what is known as a stub resolver. It asks a question, it gets an answer, somebody else actually does most of the work running around the Internet bouncing through ICANN to Verisign to Google. These somebody elses are Domain Name Servers, also known as caching resolvers. DNS is an old protocol it dates back to 1983 and comes from a world where bandwidth was so constrained that every bit mattered, even during protocol design. (DNS got 16 bits in a place so TCP could get 32. We were young, we needed the bits was actually a thing.) These caching resolvers actually enforce a significant amount of rules upon what may or may not flow through the DNS. The proof of concept delivered by Google essentially delivers garbage bytes. Thats fine on the LAN, where theres nothing getting in the way. But name servers can essentially be modeled as scrubbing firewalls in most (never all) environments, traffic that is not protocol compliant is just not going to reach stubs like glibc. Certainly that Google Proof of Concept isnt surviving any real world cache. Does that mean nothing will? As of yet, we dont actually know. According to Redhat: A back of the envelope analysis shows that it should be possible to write correctly formed DNS responses with attacker controlled payloads that will penetrate a DNS cache hierarchy and therefore allow attackers to exploit machines behind such caches. Im just going to state outright: Nobody has gotten this glibc flaw to work through caches yet. So we just dont know if thats possible. Actual exploit chains are subject to what I call the MacGyver effect. For those unfamiliar, MacGyver was a 1980s television show that showed a very creative tinkerer building bombs and other such things with tools like chocolate. The show inspired an entire generation of engineers, but did not lead to a significant number of lost limbs because there was always something non-obvious and missing that ultimately prevented anything from working. Exploit chains at this layer are just a lot more fragile than, say, corrupted memory. But we still go ahead and actually build working memory corruption exploits, because some things are just extraordinarily expensive to fix, and so we better be sure theres unambiguously a problem here. At the extreme end, there are discussions happening about widespread DNS filters across the Internet certainly in front of sensitive networks. Redhat et al did some great work here, but we do need more than the back of the envelope. Ive personally been investigating cache traversal variants of this attack. Heres what I can report after a day. Cache Attacks Somewhat simplified, the attacks depend on:. A buffer being filled with about 2048 bytes of data from a DNS response The stub retrying, for whatever reason Two responses ultimately getting stacked into the same buffer, with over 2048 bytes from the wire The flaw is linkedtothe fact that the stack has two outstanding requests at the same time one for IPv4 addresses, and one for IPv6 addresses. Furthermore DNS can operate over both UDP and TCP, with the ability to upgrade from the former to the latter. There is error handling in DNS, but most errors and retries are handled by the caching resolver, not the stub. That means any weird errors just cause the (safer, more properly written) middlebox to handle the complexity, reducing degrees of freedom for hitting glibc. Given that rough summary of the constraints, heres what I can report. This CVE is easily the most difficult to scope bug Ive ever worked on, despite it being in a domain I am intimately familiar with. The trivial defenses against cache traversal are easily bypassable; the obvious attacks that would generate cache traversal are trivially defeated. What we are left with is a morass of maybes, with the consequences being remarkably dire (even my bug did not yield direct code execution). Heres what I can say at present time, with thanks to those who have been very generous with their advice behind the scenes. The attacks do not need to be garbage that could never survive a DNS cache, as they are in the Google PoC. Its perfectly legal to have large A and AAAA responses that are both cache-compatible and corrupt client memory. I have this working well. The attacks do not require UDP or EDNS0. Traditional DNS has a 512 byte limit, notably less than the 2048 bytes required. Some people (including me) thought that since glibc doesnt issue the EDNS0 request that declares a larger buffer, caching resolvers would not provide sufficient data to create the required failure state. Sure, if the attack was constrained to UDP as in the Google PoC. But not only does TCP exist, but we can set the tc Truncation bit to force an upgrade to the protocol with more bandwidth. This most certainly does traverse caches. There are ways of making the necessary retry occur, even through TCP. Were still investigating them, as its a fundamental requirement for the attack to function. (No retry, no big write to small buf.) Where I think were going to end up, around 24 (straight) hours of research in, is that some networks are going to be vulnerable to some cache traversal attacks sometimes, following the general rule of attacks only get better. That rule usually only applies to crypto vulns, but on this half-design half-implementation vuln, we get it here too. This is in contrast to the on-path attackers, who just need to figure out how to smash a 2016 stack and away they go. Theres a couple comments Id like to make, which summarize down to This may not get nasty in days to weeks, but months to years has me worried. Low reliability attacks become high reliability in DNS, because you can just do a lot of them very quickly. Even without forcing an endpoint to hammer you through some API, name servers have all sorts of crazy corner cases where they blast you with traffic quickly, and stop only when youve gotten data successfully in their cache. Load causes all sorts of weird and wooly behavior in name servers, so proving something doesnt work in the general case says literally nothing about edge case behavior. Low or no Time To Live (TTL) mean the attacker can disable DNS caching,eliminating some (but not nearly all) protections onemight assume caching creates. That being said, not all name servers respect a zero TTL, or even should. If anything is going to stop actual cache traversing exploitability, its that you just have an absurd amount more timing and ordering control directly speaking to clients over TCP and UDP, than you do indirectly communicating with the client through a generally protocol enforcing cache. That doesnt mean there wont be situations where you can cajole the cache to do your bidding, even unreliably, but accidental defenses are where were at here. Those accidental defenses are not strong. Theyre accidents, in the way DNS cache rules kept my own attacks from being discovered. Eventually we figured out we could do other things to get around those defenses and they just melted in seconds. The possibility that a magic nasty payload pushes a major namesever or whatever into some state that quickly and easily knocks stuff over, on the scale of months to years, is non-trivial. Stub resolvers are not just weak, theyre kind of designed to be that way. The whole point is you dont need a lot of domain specific knowledge (no pun intended) to achieve resolution over DNS; instead you just ask a question and get an answer. Specifically, theres a universe of DNS clients that dont randomize ports (or even transaction ids). You really dont want random Internet hosts poking your clients spoofing your name servers. Protecting against spoofed traffic on the global Internet is difficult; preventing traffic spoofing from outside networks using internal addresses is on the edge of practicality. Lets talk about suggested mitigations, and then go into what we can learn policy-wise from this situation. Length Limits Are Silly Mitigations No other way to say it. Redhat might as well have suggested filtering all AAAA (IPv6) records might actually be effective, as it happens, but it turns out security is not the only engineering requirement at play. DNS has had to engineer several mechanisms for sending more than 512 bytes, and not because it was a fun thing to do on a Saturday night. JavaScript is not the only thing thats gotten bigger over the years; we are putting more and more in there and not just DNSSEC signatures either. What is worth noting is that IT, and even IT Security, has actually learned the very very hard way not to apply traditional firewalling approaches to DNS. Basically, as a foundational protocol its very far away from normal debugging interfaces. That means, when something goes wrong like, somebody applied a length limit to DNS traffic who was not themselves a DNS engineer theres this sudden outage that nobody can trace for some absurd amount of time. By the time the problem gets tracedwell, if you ever wondered why DNS doesnt get filtered, that is why. And ultimately, any DNS packet filter is a poor version of what you really want, which is an actual protocol enforcing scrubbing firewall, i.e. a name server that is not a stub, though it might be a forwarder (meaning it enforces all the rules and provides a cache, but doesnt wander around the Internet resolving names). My expectations for mitigations, particularly as we actually start getting some real intelligence around cache traversing glibc attacks, are: We will put more intelligent resolvers on more devices, such that glibc is only talking to the local resolver not over the network, and Caching resolvers will learn how to specially handle the case of simultaneous A and AAAA requests. If were protected from traversing attacks its because the attacker just cant play a lot of games between UDP and TCP and A and AAAA responses. As we learn more about when the attacks can traverse caches, we can intentionally work to make them not. Local resolvers are popular anyway, because they mean theres a DNS cache improving performance. A large number of embedded routers are already safe against the verified on-path attack scenario due to their use of dnsmasq, a common forwarding cache. Note that technologies like DNSSEC are mostly orthogonal to this threat; the attacker can just send us signed responses that he in particular wants to break us. I say mostly because one mode of DNSSEC deployment involves the use of a local validating resolver; such resolvers are also DNS caches that insulate glibc from the outside world. There is the interesting question of how to scan and detect nodes on your network with vulnerable versions of glibc. Ive been worried for a while were only going to end up fixing the sorts of bugs that are aggressively trivial to detect, independent of their actual impact to our risk profiles. Short of actually intercepting traffic and injecting exploits Im not sure what we can do here. Certainly one can look for simultaneous A and AAAA requests with identical source ports and no EDNS0, but thats going to stay that way even post patch. Detecting what on our networks still needs to get patched (especially when ultimately this sort of platform failure infests the smallest of devices) is certain to become a priority even if we end up making it easier for attackers to detect our faults as well. If youre looking for actual exploit attempts, dont just look for large DNS packets. UDP attackswill actually be fragmented (normal IP packets cannot carry 2048 bytes) and you might forgetDNS can be carried over TCP. And again, large DNS replies are not necessarily malicious. And thus, we end up at a good transition point to discuss security policy. What do we learn from this situation? The Fifty Thousand Foot View Patch this bug. Youll have to reboot your servers. It will be somewhat disruptive. Patch this bug now, before the cache traversing attacks are discovered, because even the on-path attacks are concerning enough. Patch. And if patching is not a thing you know how to do, automatic patching needs to be something you demand from the infrastructure you deploy on your network. If it might not be safe in six months, why are you paying for it today? Its important to realize that while this bug was just discovered, its not actually new. CVE-2015-7547has been around for eight years. Literally, six weeks before I unveiled my own grand fix to DNS (July 2008), this catastrophic code was committed. Nobody noticed. The timing is a bit troublesome,but lets be realistic: theres only so many months to go around. The real issue is it took almost a decade to fix this new issue, right after it took a decade to fix my old one (DJB didnt quite identify the bug, but he absolutely called the fix). The Internet is not less important to global commerce than it was in 2008. Hacker latency continues to be a real problem. What maybe has changedover the years is the strangely increasing amountoftalk about how the Internet is perhaps too secure. I dont believe that, and I dont believe anyone in business (or even with a credit card) does either. But the discussion on cybersecurity seems dominated by the necessity of insecurity. Did anyone know about this flaw earlier? Theres absolutely no way to tell. We can only know we need to be finding these bugs faster, understanding these issues better, and fixing them more comprehensively. We need to not be finding bugs like this,eight years from now, again. (There were clear public signs of impending public discovery of this flaw, so do not take my words as any form of criticism for the release schedule of this CVE.) My concerns are not merely organizational. I do think we need to start investing significantly more in mitigation technologies that operate before memory corruption has occurred. ASLR, NX, Control Flow Guard all of these technologies are greatly impressive, at showing us who our greatly impressive hackers are. Theyre not actually stopping code execution from being possible. Theyre just not. Somewhere between base arithmetic and x86 is a sandbox people cant just walk in and out of. To put it bluntly, if this code had been written in JavaScript yes, really it wouldnt have been vulnerable. Even if this network exposed code remained in C, and was justcompiled to JavaScript via Emscripten, it still would not have been vulnerable. Efficiently microsandboxing individual codepathsis a thing we shouldstart exploring. What can we do to the software we deploy, at what cost, to actually make exploitation of software flaws actually impossible, as opposed to merely difficult? It is unlikely this is the only platform threat, or even the only threat in glibc. With the Internet of Things spreading extraordinarily, perhaps its time to be less concerned about being able to spy on every last phone call and more concerned about how we can make sure innovators have better environments to build upon. Im not merely talking about therather frothy software stacks adorning the Internet of Things, withBluetooth and custom TCP/IP and so on. Im talking about maintainability. When we find problems and we will can we fix them? This is a problem that took Android too long to start seriously addressing,but theyre not the only ones. A network where devices eventually become existential threats is a network that eventually ceases to exist. What do we dofor platforms to guarantee that attack windows close? What do we do for consumers and purchasing agents so they can differentiate that whichhas a maintenancewarranty, and that which does not? Are there insurance structures that could pay out, when a glibc level patch needs to be rolled out? Theres a level of maturity that can be brought to the table, and I think should. There are a lot of unanswered questions about the scope of this flaw, and many others,that perhaps neither vendors nor volunteer researchers are in the best position to answer. We can do better building the secure platforms of the future. Lets start here.