Tuesday, January 22, 2013

OSX: Y U No like 3G?

I recently came across an issue where my laptop would consume copious amounts of bandwidth. On average, the idle laptop would slurp up about 8MB of traffic/hour. In a day, that's ~ 100 MB of data gone.

In S. Africa and on 3G, this is prohibitively expensive. How expensive?
I am on Telkom 8ta, which charges
~ 40 Rand/100MB == 6USD/100MB. 
Scaling up this cost to a month adds up to
180USD/Month for ~ 3GB of data

 Obviously, this is a problem!

 So let's find out what's going on....

Tcpdump


21 Machine:~ username$ sudo su
20 Password:
19 sh-3.2# tshark -i en1 -c 100 port 53
18 Capturing on en1
17 0.000000 192.168.1xx.2 -> 192.168.1xx.101 DNS 158 Standard query response 0x6999 No such name
16 0.041892 192.168.1xx.2 -> 192.168.1xx.101 DNS 158 Standard query response 0x6ed6 No such name
15 0.042998 192.168.1xx.101 -> 192.168.1xx.2 DNS 86 Standard query 0x8c6a A abc-123.xyz.companyx.com
14 0.043129 192.168.1xx.101 -> 192.168.1xx.2 DNS 86 Standard query 0xd8ab AAAA abc-123.xyz.companyx.com
13 0.307251 192.168.1xx.2 -> 192.168.1xx.101 DNS 147 Standard query response 0xd8ab No such name
12 0.307628 192.168.1xx.101 -> 192.168.1xx.2 DNS 101 Standard query 0x06b4 AAAA abc-123.xyz.companyx.com.xyz.companyx.com
11 0.307732 192.168.1xx.2 -> 192.168.1xx.101 DNS 147 Standard query response 0x8c6a No such name
10 0.307980 192.168.1xx.101 -> 192.168.1xx.2 DNS 101 Standard query 0x0258 A abc-123.xyz.companyx.com.xyz.companyx.com
9 0.504638 192.168.1xx.2 -> 192.168.1xx.101 DNS 162 Standard query response 0x06b4 No such name
8 0.504871 192.168.1xx.101 -> 192.168.1xx.2 DNS 97 Standard query 0x3579 AAAA abc-123.xyz.companyx.com.companyx.com
7 0.517008 192.168.1xx.2 -> 192.168.1xx.101 DNS 162 Standard query response 0x0258 No such name
6 0.517223 192.168.1xx.101 -> 192.168.1xx.2 DNS 97 Standard query 0xddf6 A abc-123.xyz.companyx.com.companyx.com
5 0.691122 192.168.1xx.2 -> 192.168.1xx.101 DNS 158 Standard query response 0xddf6 No such name
4 0.713513 192.168.1xx.2 -> 192.168.1xx.101 DNS 158 Standard query response 0x3579 No such name
3 0.714719 192.168.1xx.101 -> 192.168.1xx.2 DNS 86 Standard query 0xc20c A abc-124.xyz.companyx.com
2 0.714860 192.168.1xx.101 -> 192.168.1xx.2 DNS 86 Standard query 0x2edf AAAA abc-124.xyz.companyx.com
1 1.012694 192.168.1xx.2 -> 192.168.1xx.101 DNS 147 Standard query response 0xc20c No such name
0 1.012943 192.168.1xx.101 -> 192.168.1xx.2 DNS 101 Standard query 0x98f4 A abc-124.xyz.companyx.com.xyz.companyx.com
view raw gistfile1.txt hosted with ❤ by GitHub
Hmm, 5 similar queries? Obviously, there's something stuck in a dns lookup loop. This is not cool at all. If we do some math, the dns traffic adds up to the extra traffic that my 3G dongle is reporting.
# Capture 100 Packets
sh-3.2# tshark -i en1 -c 100 -s0 -w dns.pcap port 53
Capturing on en1
100
# Pcap Size
sh-3.2# du -sh dns.pcap
16K dns.pcap
# 100 Packets to a text file
sh-3.2# tshark -i en1 -c 100 port 53 | tee dns.txt
Capturing on en1
100 packets captured
....
# Number of queries in the capture. 50 queries, 50 responses.
sh-3.2# grep -v resp dns.txt | wc -l
50
# Example entry
sh-3.2# grep -v resp dns.txt | head -n 1
0.005333 192.168.1xx.101 -> 192.168.1xx.2 DNS 87 Standard query 0x707f A XXXXXXXXXXXXXX
# At a rate of 20 packets/sec (10 queries per sec, 10 responses/sec) =~ (100/5.08)
sh-3.2# head -n 1 dns.txt ; tail -n 1 dns.txt ; wc -l dns.txt
0.000000 192.168.1xx.2 -> 192.168.1xx.101 DNS 159 Standard query response 0xe333 No such name
5.082457 192.168.1xx.101 -> 192.168.1xx.2 DNS 87 Standard query 0x2d3d AAAA XXXXXXXXXXXXXX
100 dns.txt
# Count number of bytes
sh-3.2# awk 'BEGIN{bytes=0}{bytes = bytes + $6} END{print "SUM "bytes" averages to "bytes/100 " bytes/pkt";}' dns.txt
SUM 12604 averages to 126.04 bytes/pkt
# So, we got 12,600 bytes in 5 seconds. Multiply this by 12 to get the byte/minute rate, then by 60 minutes/hour.
sh-3.2# echo "12604 * 12 * 60 / (1024*1024)" | bc -l
8.65447998046875000000
# That's 8.6 Mb/hour of DNS traffic
view raw gistfile1.sh hosted with ❤ by GitHub
So how do we find out which process generates this data? We can use little snitch or better, use dtrace which comes in bundled with OSX.

DTrace


Dtrace allows you to hook up a probe that get's called once your kernel function is called. For example, say you know that all udp traffic in your box is generated via a specific syscall, you can 'watch' this syscall and generate some statistics around this.. (Basing this description on systemtap). So let's find out how a typical dns lookup works. The easiest program I can think that does dnslookup is nslookup. So let's truss up nslookup
# dtruss: an strace equivalent for OSX
# dtruss nslookup www.google.com
Server: 192.168.1xx.2
Address: 192.168.1xx.2#53
Non-authoritative answer:
Name: www.google.com
Address: 197.80.128.24
SYSCALL(args) = return
issetugid(0x10B3CC000, 0x7FFF6AFCBD50, 0x7FFF6AFCBC00) = 0 0
...
...
open_nocancel("/dev/random\0", 0x0, 0x0) = 4 0
read_nocancel(0x4, "V\370\3xxxxxxxxxxx", 0x80) = 128 0
close_nocancel(0x4) = 0 0
open_nocancel("/dev/random\0", 0x0, 0x0) = 4 0
read_nocancel(0x4, "4#\207F\n\336V2xxxxxxx", 0x80) = 128 0
close_nocancel(0x4) = 0 0
socket(0x2, 0x2, 0x11) = 4 0
fcntl(0x4, 0x0, 0x14) = 20 0
close(0x4) = 0 0
fcntl(0x14, 0x3, 0x0) = 2 0
fcntl(0x14, 0x4, 0x6) = 0 0
setsockopt(0x14, 0xFFFF, 0x1022) = 0 0
setsockopt(0x14, 0xFFFF, 0x400) = 0 0
getsockopt(0x14, 0xFFFF, 0x1002) = 0 0
bind(0x14, 0x10B55C410, 0x10) = 0 0
recvmsg(0x14, 0x7FFF6AFCB930, 0x0) = -1 Err#35
kevent(0x3, 0x7FFF6AFCBA60, 0x1) = 0 0
sendmsg(0x14, 0x7FFF6AFCB8E0, 0x0) = 32 0
kevent(0x3, 0x0, 0x0) = 0 0
kevent(0x3, 0x0, 0x0) = 1 0
kevent(0x3, 0x7FFF6AFCBA90, 0x1) = 0 0
recvmsg(0x14, 0x7FFF6AFCB940, 0x0) = 48 0
fstat64(0x1, 0x7FFF6AFCA3E8, 0x7FFF6AFCA4AC) = 0 0
ioctl(0x1, 0x4004667A, 0x7FFF6AFCA484) = 0 0
write_nocancel(0x1, "Server:\t\t192.168.1xx.2\n\0", 0x17) = 23 0
write_nocancel(0x1, "Address:\t192.168.1xx.2#53\n\0", 0x1A) = 26 0
write_nocancel(0x1, "\n\0", 0x1) = 1 0
write_nocancel(0x1, "Non-authoritative answer:\n\0", 0x1A) = 26 0
write_nocancel(0x1, "Name:\twww.google.com\n\0", 0x15) = 21 0
write_nocancel(0x1, "Address: 197.80.128.24\n\0", 0x17) = 23 0
...
....
....
# Started a concurrent tshark capture session for a google.com lookup
$ tshark -r google.pcap -R "dns.qry.name==www.google.com" -V
Frame 61: 74 bytes on wire (592 bits), 74 bytes captured (592 bits) on interface 0
Interface id: 0
WTAP_ENCAP: 1
Arrival Time: Jan 21, 2013 21:57:30.182718000 SAST
[Time shift for this packet: 0.000000000 seconds]
Epoch Time: 1358798250.182718000 seconds
[Time delta from previous captured frame: 0.078960000 seconds]
[Time delta from previous displayed frame: 0.000000000 seconds]
[Time since reference or first frame: 2.722339000 seconds]
Frame Number: 61
Frame Length: 74 bytes (592 bits)
Capture Length: 74 bytes (592 bits)
[Frame is marked: False]
[Frame is ignored: False]
[Protocols in frame: eth:ip:udp:dns]
Ethernet II, Src: Apple_xx:bb:cc (7c:d1:c3:xx:bb:cc), Dst: D-LinkIn_dd:ee:ff (c8:be:19:dd:ee:ff)
Destination: D-LinkIn_dd:ee:ff (c8:be:19:dd:ee:ff)
Address: D-LinkIn_dd:ee:ff (c8:be:19:dd:ee:ff)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
Source: Apple_xx:bb:cc (7c:d1:c3:xx:bb:cc)
Address: Apple_xx:bb:cc (7c:d1:c3:xx:bb:cc)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
Type: IP (0x0800)
Internet Protocol Version 4, Src: 192.168.1xx.101 (192.168.1xx.101), Dst: 192.168.1xx.2 (192.168.127.2)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
0000 00.. = Differentiated Services Codepoint: Default (0x00)
.... ..00 = Explicit Congestion Notification: Not-ECT (Not ECN-Capable Transport) (0x00)
Total Length: 60
Identification: 0x6ddb (28123)
Flags: 0x00
0... .... = Reserved bit: Not set
.0.. .... = Don't fragment: Not set
..0. .... = More fragments: Not set
Fragment offset: 0
Time to live: 64
Protocol: UDP (17)
Header checksum: 0x8d1d [correct]
[Good: True]
[Bad: False]
Source: 192.168.1xx.101 (192.168.1xx.101)
Destination: 192.168.1xx.2 (192.168.127.2)
User Datagram Protocol, Src Port: 60346 (60346), Dst Port: domain (53)
Source port: 60346 (60346)
Destination port: domain (53)
Length: 40
Checksum: 0xad30 [validation disabled]
[Good Checksum: False]
[Bad Checksum: False]
Domain Name System (query)
Transaction ID: 0x581d
Flags: 0x0100 Standard query
0... .... .... .... = Response: Message is a query
.000 0... .... .... = Opcode: Standard query (0)
.... ..0. .... .... = Truncated: Message is not truncated
.... ...1 .... .... = Recursion desired: Do query recursively
.... .... .0.. .... = Z: reserved (0)
.... .... ...0 .... = Non-authenticated data: Unacceptable
Questions: 1
Answer RRs: 0
Authority RRs: 0
Additional RRs: 0
Queries
www.google.com: type A, class IN
Name: www.google.com
Type: A (Host address)
Class: IN (0x0001)
Frame 62: 90 bytes on wire (720 bits), 90 bytes captured (720 bits) on interface 0
Interface id: 0
WTAP_ENCAP: 1
Arrival Time: Jan 21, 2013 21:57:30.184381000 SAST
[Time shift for this packet: 0.000000000 seconds]
Epoch Time: 1358798250.184381000 seconds
[Time delta from previous captured frame: 0.001663000 seconds]
[Time delta from previous displayed frame: 0.001663000 seconds]
[Time since reference or first frame: 2.724002000 seconds]
Frame Number: 62
Frame Length: 90 bytes (720 bits)
Capture Length: 90 bytes (720 bits)
[Frame is marked: False]
[Frame is ignored: False]
[Protocols in frame: eth:ip:udp:dns]
Ethernet II, Src: D-LinkIn_dd:ee:ff (c8:be:19:dd:ee:ff), Dst: Apple_xx:bb:cc (7c:d1:c3:xx:bb:cc)
Destination: Apple_xx:bb:cc (7c:d1:c3:xx:bb:cc)
Address: Apple_xx:bb:cc (7c:d1:c3:xx:bb:cc)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
Source: D-LinkIn_dd:ee:ff (c8:be:19:dd:ee:ff)
Address: D-LinkIn_dd:ee:ff (c8:be:19:dd:ee:ff)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
Type: IP (0x0800)
Internet Protocol Version 4, Src: 192.168.1xx.2 (192.168.127.2), Dst: 192.168.1xx.101 (192.168.1xx.101)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
0000 00.. = Differentiated Services Codepoint: Default (0x00)
.... ..00 = Explicit Congestion Notification: Not-ECT (Not ECN-Capable Transport) (0x00)
Total Length: 76
Identification: 0x0000 (0)
Flags: 0x02 (Don't Fragment)
0... .... = Reserved bit: Not set
.1.. .... = Don't fragment: Set
..0. .... = More fragments: Not set
Fragment offset: 0
Time to live: 64
Protocol: UDP (17)
Header checksum: 0xbae8 [correct]
[Good: True]
[Bad: False]
Source: 192.168.1xx.2 (192.168.127.2)
Destination: 192.168.1xx.101 (192.168.1xx.101)
User Datagram Protocol, Src Port: domain (53), Dst Port: 60346 (60346)
Source port: domain (53)
Destination port: 60346 (60346)
Length: 56
Checksum: 0x27fd [validation disabled]
[Good Checksum: False]
[Bad Checksum: False]
Domain Name System (response)
[Request In: 61]
[Time: 0.001663000 seconds]
Transaction ID: 0x581d
Flags: 0x8000 Standard query response, No error
1... .... .... .... = Response: Message is a response
.000 0... .... .... = Opcode: Standard query (0)
.... .0.. .... .... = Authoritative: Server is not an authority for domain
.... ..0. .... .... = Truncated: Message is not truncated
.... ...0 .... .... = Recursion desired: Don't do query recursively
.... .... 0... .... = Recursion available: Server can't do recursive queries
.... .... .0.. .... = Z: reserved (0)
.... .... ..0. .... = Answer authenticated: Answer/authority portion was not authenticated by the server
.... .... ...0 .... = Non-authenticated data: Unacceptable
.... .... .... 0000 = Reply code: No error (0)
Questions: 1
Answer RRs: 1
Authority RRs: 0
Additional RRs: 0
Queries
www.google.com: type A, class IN
Name: www.google.com
Type: A (Host address)
Class: IN (0x0001)
Answers
www.google.com: type A, class IN, addr 197.80.128.24
Name: www.google.com
Type: A (Host address)
Class: IN (0x0001)
Time to live: 2 minutes, 30 seconds
Data length: 4
Addr: 197.80.128.24 (197.80.128.24)
view raw gistfile1.txt hosted with ❤ by GitHub
The system call we are after is sendmsg and recvmsg (UDP send and udp recv). The exact packets sent and receiver are also unraveled in the packet tree below the dtruss output. The ardent reader will notice that the length of the UDP packet - 8 bytes == the syscall return output for sendmsg and recvmsg. For example, sendmsg has a return value of 32 bytes (40 UDP header - 2 bytes src port - 2 bytes dst port - 2 bytes length - 2bytes checksum). See line 32, 36, 101 and 172 of the gist above.

So we now know without a doubt that we ought to watch for a process that's generating a lot of recvmsg and sendmsg (At a rate of 10/sec for either syscall). Let's fire up another dtrace script
sh-3.2# date; syscallbypid.d; date
Sun Jan 20 22:42:47 SAST 2013
Tracing... Hit Ctrl-C to end.
^C
PID CMD SYSCALL COUNT
1 launchd fork 1
...
...
61275 dtrace workq_kernreturn 82
87 eTSrv stat64 86
1632 PKIMonitor stat64 86
13 opendirectoryd psynch_mutexdrop 87
13 opendirectoryd psynch_mutexwait 87
1621 Dropbox __sysctl 87
51682 firefox stat64 87
51682 firefox madvise 89
61052 dtrace kevent 90
61275 dtrace kevent 100
1632 PKIMonitor lseek 104
51682 firefox lseek 104
61276 SfntMonD stat64 105
61052 dtrace ioctl 117
61275 dtrace ioctl 121
87 eTSrv __semwait_signal 127
13 opendirectoryd sendto_nocancel 130
51682 firefox psynch_cvwait 131
13 opendirectoryd read_nocancel 132
13 opendirectoryd __sysctl 137
1 launchd getuid 139
52577 GoogleTalkPlugi psynch_cvwait 142
64 mds workq_kernreturn 160
64 mds kevent 165
13 opendirectoryd workq_kernreturn 189
13 opendirectoryd recvfrom_nocancel 198
12 mDNSResponder recvfrom 202
13 opendirectoryd close_nocancel 228
12 mDNSResponder bind 232
12 mDNSResponder socket 232
12 mDNSResponder sendto 248
51682 firefox mmap 258
87 eTSrv close 278
87 eTSrv read 278
12 mDNSResponder recvmsg 284
12 mDNSResponder close 296
1616 Flux geteuid 312
13 opendirectoryd kevent 343
655 Terminal __semwait_signal 355
87 eTSrv open 530
87 eTSrv lseek 556
1624 applet workq_kernreturn 788
12 mDNSResponder fcntl 824
1624 applet sigaltstack 1075
1624 applet sigprocmask 1075
1630 System Events sigaltstack 1075
1630 System Events sigprocmask 1075
1624 applet kevent 1128
12 mDNSResponder kevent 1137
12 mDNSResponder setsockopt 1276
117 WindowServer sigaltstack 1310
117 WindowServer sigprocmask 1310
Sun Jan 20 22:43:00 SAST 2013
# Total of 13 seconds. Looking for ~130 recvmesg or 130 sendmsg.
# This looks suspicious
13 opendirectoryd sendto_nocancel 130
51682 firefox psynch_cvwait 131
13 opendirectoryd read_nocancel 132
# More dtruss
sh-3.2# dtruss -p 13 -e -a -d
PID/THRD RELATIVE ELAPSD CPU SYSCALL(args) = return
13/0x3c0f3: 39029 161 2 kevent(0xE, 0x0, 0x0) = 1 0
13/0x3c0f3: 39034 6 2 recvfrom_nocancel(0xD, 0x105CBA150, 0x1C) = 28 0
13/0x3c0f3: 39039 3 1 recvfrom_nocancel(0xD, 0x7F845304DF80, 0x3E) = 62 0
13/0x3c0f3: 39044 6 3 select_nocancel(0xE, 0x105CBA180, 0x0) = 1 0
13/0x3c0f3: 39047 4 2 kevent(0x7, 0x105CB9C50, 0x1) = 0 0
13/0x3c0f3: 39050 3 1 recvfrom_nocancel(0xD, 0x105CBA150, 0x1C) = 28 0
13/0x3c0f3: 39052 3 1 recvfrom_nocancel(0xD, 0x7F845304DF80, 0x3E) = 62 0
13/0x3c0f3: 39054 3 1 select_nocancel(0xE, 0x105CBA180, 0x0) = 0 0
13/0x3c0f3: 39055 2 0 kevent(0xE, 0x105CB9C50, 0x1) = 0 0
13/0x3c0f3: 39059 5 2 psynch_mutexdrop(0x7FFF72990920, 0x4F1CC03, 0x4F1CB00) = 0 0
13/0x3c0f3: 39063 4 1 kevent(0xE, 0x0, 0x0) = 1 0
13/0x3c088: 89565 43 6 psynch_mutexwait(0x7FFF72990920, 0x4F1D003, 0x4F1CE00) = 82956291 0
13/0x3c088: 89572 8 4 sendto_nocancel(0xD, 0x7F8453045250, 0x1C) = 28 0
13/0x3c088: 89578 6 2 sendto_nocancel(0xD, 0x7F8453045250, 0x1C) = 28 0
13/0x3c088: 89585 10 5 close_nocancel(0x7) = 0 0
13/0x3c088: 89595 11 7 open_nocancel("/etc/hosts\0", 0x0, 0x1B6) = 7 0
13/0x3c088: 89598 5 1 fstat64(0x7, 0x1058E6218, 0x1058E62DC) = 0 0
13/0x3c088: 89602 6 2 read_nocancel(0x7, "##\n# Host Database\n#\n# localhost is used to configure the loopback interface\n# when the system is booting. Do not change this entry.\n##\n127.0.0.1\tlocalhost\n255.255.255.255\tbroadcasthost\n::1 localhost \nfe80::1%lo0\tlocalhost\n\0", 0x1000) = 236 0
13/0x3c088: 89632 17 13 __sysctl(0x1058E72A0, 0x6, 0x0) = 0 0
13/0x3c088: 89641 10 7 __sysctl(0x1058E72A0, 0x6, 0x7F8453803600) = 0 0
13/0x3c088: 89648 5 1 read_nocancel(0x7, "\0", 0x1000) = 0 0
13/0x3c088: 89654 8 4 close_nocancel(0x7) = 0 0
13/0x3c088: 89663 11 7 open_nocancel("/etc/hosts\0", 0x0, 0x1B6) = 14 0
13/0x3c088: 89666 5 1 psynch_mutexwait(0x7FFF727AC1E0, 0x6A4C403, 0x6A4C200) = 111461379 0
13/0x3c088: 89667 4 0 psynch_mutexdrop(0x7FFF727AC1E0, 0x6A4C503, 0x6A4C400) = 0 0
13/0x3c088: 89670 5 2 fstat64(0xE, 0x1058E6218, 0x1058E62DC) = 0 0
13/0x3c0f3: 39370 507641 17 kevent(0x7, 0x0, 0x0) = 1 0
13/0x3c0f3: 39383 18 7 kevent(0x7, 0x0, 0x0) = 1 0
13/0x3c0f3: 39395 11 7 recvfrom_nocancel(0xD, 0x105CBA150, 0x1C) = 28 0
13/0x3c0f3: 39403 7 3 recvfrom_nocancel(0xD, 0x7F845304DF80, 0x3E) = 62 0
13/0x3c0f3: 39413 10 5 select_nocancel(0xE, 0x105CBA180, 0x0) = 1 0
13/0x3c0f3: 39421 9 3 kevent(0x7, 0x105CB9C50, 0x1) = 0 0
13/0x3c0f3: 39434 13 6 recvfrom_nocancel(0xD, 0x105CBA150, 0x1C) = 28 0
13/0x3c0f3: 39440 8 3 recvfrom_nocancel(0xD, 0x7F845304DF80, 0x3E) = 62 0
13/0x3c0f3: 39446 8 4 select_nocancel(0xE, 0x105CBA180, 0x0) = 0 0
13/0x3c0f3: 39451 4 2 kevent(0xE, 0x105CB9C50, 0x1) = 0 0
13/0x3c0f3: 39458 7 4 psynch_mutexdrop(0x7FFF72990920, 0x4F1D403, 0x4F1D300) = 0 0
13/0x3c0f3: 39463 7 3 kevent(0x7, 0x0, 0x0) = 1 0
13/0x3c088: 89680 21 8 read_nocancel(0xE, "##\n# Host Database\n#\n# localhost is used to configure the loopback interface\n# when the system is booting. Do not change this entry.\n##\n127.0.0.1\tlocalhost\n255.255.255.255\tbroadcasthost\n::1 localhost \nfe80::1%lo0\tlocalhost\n\0", 0x1000) = 236 0
# Some analysis yields 15 recvs and 18 sends in ~ 1.5 seconds. Looks like we are getting somewhere.
$ grep 0x3c0f3 dns | grep recv | sort -k 2 |wc -l
15
$ grep 0x3c0f3 dns | grep recv | sort -k 2
13/0x3c0f3: 39528 11 6 recvfrom_nocancel(0xD, 0x105CBA150, 0x1C) = 28 0
13/0x3c0f3: 39535 6 3 recvfrom_nocancel(0xD, 0x7F845304DF80, 0x3E) = 62 0
13/0x3c0f3: 39549 4 1 recvfrom_nocancel(0xD, 0x105CBA150, 0x1C) = 28 0
13/0x3c0f3: 39552 4 1 recvfrom_nocancel(0xD, 0x7F845304DF80, 0x3E) = 62 0
13/0x3c0f3: 39911 10 4 recvfrom_nocancel(0xF, 0x105CBA140, 0x4) = 4 0
13/0x3c0f3: 40088 12 7 recvfrom_nocancel(0xD, 0x105CBA150, 0x1C) = 28 0
13/0x3c0f3: 40095 7 3 recvfrom_nocancel(0xD, 0x7F8451683690, 0x3E) = 62 0
13/0x3c0f3: 40113 5 2 recvfrom_nocancel(0xD, 0x105CBA150, 0x1C) = 28 0
13/0x3c0f3: 40116 5 1 recvfrom_nocancel(0xD, 0x7F8451683690, 0x3E) = 62 0
13/0x3c0f3: 40391 90 10 recvfrom_nocancel(0x7, 0x105CBA140, 0x4) = 4 0
13/0x3c0f3: 40460 210 5 recvfrom_nocancel(0x7, 0x105CBA140, 0x4) = 4 0
13/0x3c0f3: 40875 131 8 recvfrom_nocancel(0xF, 0x105CBA140, 0x4) = 4 0
13/0x3c0f3: 40938 26 4 recvfrom_nocancel(0xF, 0x105CBA140, 0x4) = 4 0
13/0x3c0f3: 40991 15 9 recvfrom_nocancel(0xD, 0x105CBA150, 0x1C) = 28 0
13/0x3c0f3: 41000 8 3 recvfrom_nocancel(0xD, 0x7F8451683690, 0x3E) = 62 0
$ grep 0x3c0f3 dns | grep send | sort -k 2 | wc -l
18
$ grep 0x3c0f3 dns | grep send | sort -k 2
13/0x3c0f3: 39589 17 11 sendto_nocancel(0xD, 0x7F8453045250, 0x1C) = 28 0
13/0x3c0f3: 39597 9 4 sendto_nocancel(0xD, 0x7F8453045250, 0x1C) = 28 0
13/0x3c0f3: 39895 13 8 sendto_nocancel(0xD, 0x7F84515772B0, 0x44) = 68 0
13/0x3c0f3: 39902 10 6 sendmsg_nocancel(0xD, 0x105CBA0F0, 0x0) = 1 0
13/0x3c0f3: 39966 7 4 sendto_nocancel(0xD, 0x7F84515772B0, 0x44) = 68 0
13/0x3c0f3: 39969 6 2 sendmsg_nocancel(0xD, 0x105CBA0F0, 0x0) = 1 0
13/0x3c0f3: 40160 18 12 sendto_nocancel(0xD, 0x7F8451697970, 0x1C) = 28 0
13/0x3c0f3: 40168 7 3 sendto_nocancel(0xD, 0x7F8451697970, 0x1C) = 28 0
13/0x3c0f3: 40373 7 4 sendto_nocancel(0xD, 0x7F8451646A10, 0x44) = 68 0
13/0x3c0f3: 40379 8 4 sendmsg_nocancel(0xD, 0x105CBA0F0, 0x0) = 1 0
13/0x3c0f3: 40448 7 4 sendto_nocancel(0xD, 0x7F8451646A10, 0x44) = 68 0
13/0x3c0f3: 40452 6 3 sendmsg_nocancel(0xD, 0x105CBA0F0, 0x0) = 1 0
13/0x3c0f3: 40623 20 13 sendto_nocancel(0xD, 0x7F8453045250, 0x1C) = 28 0
13/0x3c0f3: 40631 10 4 sendto_nocancel(0xD, 0x7F8453045250, 0x1C) = 28 0
13/0x3c0f3: 40858 10 5 sendto_nocancel(0xD, 0x7F84515777B0, 0x44) = 68 0
13/0x3c0f3: 40865 10 5 sendmsg_nocancel(0xD, 0x105CBA0F0, 0x0) = 1 0
13/0x3c0f3: 40928 7 4 sendto_nocancel(0xD, 0x7F8451646A10, 0x44) = 68 0
13/0x3c0f3: 40931 5 2 sendmsg_nocancel(0xD, 0x105CBA0F0, 0x0) = 1 0
view raw gistfile1.txt hosted with ❤ by GitHub

Fini


So let's try and validate our findings by stopping the buggy process.
# Process
sh-3.2# ps aux | grep " [1]3\W"
root 13 0.0 0.3 2497244 13416 ?? Rs Thu08PM 1:52.96 /usr/libexec/opendirectoryd
# What file thinks
sh-3.2# file /usr/libexec/opendirectoryd
/usr/libexec/opendirectoryd: Mach-O universal binary with 2 architectures
/usr/libexec/opendirectoryd (for architecture x86_64): Mach-O 64-bit executable x86_64
/usr/libexec/opendirectoryd (for architecture i386): Mach-O executable i386
# Manuals
sh-3.2# man -k opendirectory
dsexport(1) - export records from OpenDirectory
odproxyd(8) - OpenDirectory proxy daemon
odutil(1) - allows caller to examine or change state of opendirectoryd(8)
opendirectoryd(8) - is a launchd(8) job for client access to local or remote directory systems
pam_opendirectory(8) - OpenDirectory PAM module
# Validation
sh-3.2# launchctl list |grep opend
13 - com.apple.opendirectoryd
sh-3.2# tshark -i en1 -c 100 -s0 -w traffic.pcap
Capturing on en1
100
3 packets dropped
sh-3.2# launchctl stop com.apple.opendirectoryd
sh-3.2# tshark -i en1 -c 100 -s0 -w traffic1.pcap
Capturing on en1
100
# Comparing the pcaps
# With opendirectoryd running
sh-3.2# tshark -r traffic.pcap -R "dns" |wc -l
100
sh-3.2# tshark -r traffic.pcap -R "dns" | awk 'BEGIN{start_time=end_time=0} { if(start_time == 0) start_time=$2; end_time=$2; } END{print "Time in seconds to capture 100 packets: " end_time - start_time}'
Time in seconds to capture 100 packets: 3.58054
# With opendirectoryd stopped.
sh-3.2# tshark -r traffic1.pcap -R "dns" |wc -l
31
sh-3.2# tshark -r traffic1.pcap -R "dns" | awk 'BEGIN{start_time=end_time=0} { if(start_time == 0) start_time=$2; end_time=$2; } END{print "Time in seconds to capture 100 packets: " end_time - start_time}'
Time in seconds to capture 100 packets: 23.101
view raw gistfile1.txt hosted with ❤ by GitHub
Phew! And that's it! A sample of 100 packets has 100 DNS requests when opendirectoryd is running. The 100 packets are captured in 3.5 seconds. With opendirectoryd stopped, it takes 23 seconds to capture 100 packets; DNS constitutes 30% of the sample. It's obvious that something is wrong with opendirectoryd. Stopping the process will save me tons of bucks until Apple issues a fix.