Monday, May 19, 2014

Redistilling PDFs that are not portable by design

I hate it when I am forced to deal with documents that are portable in title only (yes, I am looking at your Adobe). Every so often, I do get pdf documents from a major organisation that can viewed by Adobe Acrobat only. On OSX, this bloated application consumes 369 Megabytes of precious SSD space (preview consumes 29 Megabytes and is nicer).

Anyway, back to the story, these documents cannot be saved in any other format on my machine. In fact, the only way to read these documents w/out hackery is to print them out and rescan them back.


So here goes a recipe for saving these files in a portable way.

Saturday, May 17, 2014

Subnet calculation using pure mysql

You can easily aggregate your records by subnets using mysql thanks to bitwise operators, an inet_aton (ascii to number function) and some thinking...

Here you go:

Thursday, May 15, 2014

tshark: display filters + reporting using csv

You can do pretty nifty things with tshark. The absolute life saver is thsark's ability to dump to a csv/tsv file using a user specified display filter.

As an example, I'd like to point out some packet retransmission issues to my provider in a nice (manager friendly) spreadsheet.  Here we go:

Manager friendly output:

ip.src tcp.srcport ip.dst tcp.dstport tcp.flags.syn tcp.flags.ack tcp.flags.push tcp.flags.reset tcp.analysis.bytes_in_flight tcp.len
a.b.c.d 8645 e.f.g.h7 9999 1 0 0 0
e.f.g.h7 9999 a.b.c.d 8645 1 1 0 0
a.b.c.d 8645 e.f.g.h7 9999 0 1 0 0
a.b.c.d 8645 e.f.g.h7 9999 0 1 1 0 168 168
e.f.g.h7 9999 a.b.c.d 8645 0 1 0 0
e.f.g.h7 9999 a.b.c.d 8645 0 1 1 0 1154 1154
a.b.c.d 8645 e.f.g.h7 9999 0 1 0 0
a.b.c.d 8645 e.f.g.h7 9999 0 1 0 0 1448 1448
a.b.c.d 8645 e.f.g.h7 9999 0 1 1 0 1502 54
e.f.g.h7 9999 a.b.c.d 8645 0 1 0 0

How do we get there?
1. Identify the fields that you want. A wireshark display filter cheat-sheet is a good place to start. You can home in on the fields that you want by firing up Wireshark and using the expression builder (button right next to the filter input box) then selecting the protocol that you want.

2. Choose your TCP stream.

3. Assemble your command. The one used to display the output above is:

Partitions in Postgres: Automatically creating partitions based on an attribute

A long time ago... I worked on importing ~ half a billion log records into Postgres. To achieve a low query response time, I used a partitioner that would shard records monthly. I documented it in the Postgres docs

Here it is: