Infinite diversity in infinite combinations.: May 2014

Monday, May 19, 2014

Redistilling PDFs that are not portable by design

I hate it when I am forced to deal with documents that are portable in title only (yes, I am looking at your Adobe). Every so often, I do get pdf documents from a major organisation that can viewed by Adobe Acrobat only. On OSX, this bloated application consumes 369 Megabytes of precious SSD space (preview consumes 29 Megabytes and is nicer).

Anyway, back to the story, these documents cannot be saved in any other format on my machine. In fact, the only way to read these documents w/out hackery is to print them out and rescan them back.

!Stupid!

So here goes a recipe for saving these files in a portable way.

Saturday, May 17, 2014

Subnet calculation using pure mysql

You can easily aggregate your records by subnets using mysql thanks to bitwise operators, an inet_aton (ascii to number function) and some thinking...

Here you go:

Thursday, May 15, 2014

tshark: display filters + reporting using csv

You can do pretty nifty things with tshark. The absolute life saver is thsark's ability to dump to a csv/tsv file using a user specified display filter.

As an example, I'd like to point out some packet retransmission issues to my provider in a nice (manager friendly) spreadsheet. Here we go:

Manager friendly output:

ip.src	tcp.srcport	ip.dst	tcp.dstport	tcp.flags.syn	tcp.flags.ack	tcp.flags.push	tcp.flags.reset	tcp.analysis.bytes_in_flight	tcp.len
a.b.c.d	8645	e.f.g.h7	9999	1	0	0	0		0
e.f.g.h7	9999	a.b.c.d	8645	1	1	0	0		0
a.b.c.d	8645	e.f.g.h7	9999	0	1	0	0		0
a.b.c.d	8645	e.f.g.h7	9999	0	1	1	0	168	168
e.f.g.h7	9999	a.b.c.d	8645	0	1	0	0		0
e.f.g.h7	9999	a.b.c.d	8645	0	1	1	0	1154	1154
a.b.c.d	8645	e.f.g.h7	9999	0	1	0	0		0
a.b.c.d	8645	e.f.g.h7	9999	0	1	0	0	1448	1448
a.b.c.d	8645	e.f.g.h7	9999	0	1	1	0	1502	54
e.f.g.h7	9999	a.b.c.d	8645	0	1	0	0		0

How do we get there?
1. Identify the fields that you want. A wireshark display filter cheat-sheet is a good place to start. You can home in on the fields that you want by firing up Wireshark and using the expression builder (button right next to the filter input box) then selecting the protocol that you want.

2. Choose your TCP stream.

3. Assemble your command. The one used to display the output above is:

Partitions in Postgres: Automatically creating partitions based on an attribute

A long time ago... I worked on importing ~ half a billion log records into Postgres. To achieve a low query response time, I used a partitioner that would shard records monthly. I documented it in the Postgres docs

Here it is:

Infinite diversity in infinite combinations.