Tuesday, January 22, 2013

OSX: Y U No like 3G?

I recently came across an issue where my laptop would consume copious amounts of bandwidth. On average, the idle laptop would slurp up about 8MB of traffic/hour. In a day, that's ~ 100 MB of data gone.

In S. Africa and on 3G, this is prohibitively expensive. How expensive?
I am on Telkom 8ta, which charges
~ 40 Rand/100MB == 6USD/100MB. 
Scaling up this cost to a month adds up to
180USD/Month for ~ 3GB of data

 Obviously, this is a problem!

 So let's find out what's going on....


Hmm, 5 similar queries? Obviously, there's something stuck in a dns lookup loop. This is not cool at all. If we do some math, the dns traffic adds up to the extra traffic that my 3G dongle is reporting. So how do we find out which process generates this data? We can use little snitch or better, use dtrace which comes in bundled with OSX.


Dtrace allows you to hook up a probe that get's called once your kernel function is called. For example, say you know that all udp traffic in your box is generated via a specific syscall, you can 'watch' this syscall and generate some statistics around this.. (Basing this description on systemtap). So let's find out how a typical dns lookup works. The easiest program I can think that does dnslookup is nslookup. So let's truss up nslookup
The system call we are after is sendmsg and recvmsg (UDP send and udp recv). The exact packets sent and receiver are also unraveled in the packet tree below the dtruss output. The ardent reader will notice that the length of the UDP packet - 8 bytes == the syscall return output for sendmsg and recvmsg. For example, sendmsg has a return value of 32 bytes (40 UDP header - 2 bytes src port - 2 bytes dst port - 2 bytes length - 2bytes checksum). See line 32, 36, 101 and 172 of the gist above.

So we now know without a doubt that we ought to watch for a process that's generating a lot of recvmsg and sendmsg (At a rate of 10/sec for either syscall). Let's fire up another dtrace script


So let's try and validate our findings by stopping the buggy process. Phew! And that's it! A sample of 100 packets has 100 DNS requests when opendirectoryd is running. The 100 packets are captured in 3.5 seconds. With opendirectoryd stopped, it takes 23 seconds to capture 100 packets; DNS constitutes 30% of the sample. It's obvious that something is wrong with opendirectoryd. Stopping the process will save me tons of bucks until Apple issues a fix.