Monday, May 19, 2014

Redistilling PDFs that are not portable by design

I hate it when I am forced to deal with documents that are portable in title only (yes, I am looking at your Adobe). Every so often, I do get pdf documents from a major organisation that can viewed by Adobe Acrobat only. On OSX, this bloated application consumes 369 Megabytes of precious SSD space (preview consumes 29 Megabytes and is nicer).

Anyway, back to the story, these documents cannot be saved in any other format on my machine. In fact, the only way to read these documents w/out hackery is to print them out and rescan them back.

!Stupid!

So here goes a recipe for saving these files in a portable way.
############ Adobe badness #############
# In your operating system, create a postscript printer whose address is 127.0.0.1
# Fake a postscript printer using netcat
$ nc -l 127.0.0.1 9100 > printout.ps
# Print your pdf using Adobe Reader to the postscript printer on 127.0.0.1
# Netcat will diligently dump the printout to printout.ps as a postscript file
# The postscript file is encrypted and can't be converted by Ghostscript utils
$ ps2pdf printout.ps printout.pdf
This PostScript file was created from an encrypted PDF file.
Redistilling encrypted PDF is not permitted.
Error: /undefined in --eexec--
Operand stack:
--nostringval-- --dict:94/200(L)-- quit
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1894 1 3 %oparray_pop 1893 1 3 %oparray_pop 1877 1 3 %oparray_pop 1771 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- 1762 2 3 %oparray_pop --nostringval-- --nostringval-- --nostringval--
Dictionary stack:
--dict:1163/1684(ro)(G)-- --dict:1/20(G)-- --dict:94/200(L)-- --dict:1163/1684(ro)(G)--
Current allocation mode is local
Last OS error: No such file or directory
GPL Ghostscript 9.07: Unrecoverable error, exit code 1
############ Portable goodness #############
# Fake a postscript printer using netcat
$ nc -l 127.0.0.1 9100 > printout2.ps
# Print your pdf using Adobe Reader to the postscript printer on 127.0.0.1
# Yank out adobe file protection gunk from the postscript file generated by netcat
$ sed -e "/mark currentfile eexec/,/cleartomark/ d" printout2.ps > printout_clean2.ps
# Convert away!
$ ps2pdf printout_clean2.ps printout_clean2.pdf
# Enjoy your newly found portability!

Saturday, May 17, 2014

Subnet calculation using pure mysql

You can easily aggregate your records by subnets using mysql thanks to bitwise operators, an inet_aton (ascii to number function) and some thinking...

Here you go:

-- SQLfiddle: http://www.sqlfiddle.com/#!2/e88de/2
-- Create a table and add some records
create table somelogs
(
some_ip varchar(25),
log varchar(50)
);
insert into somelogs
values
("10.0.0.1", "class A"),
("10.0.0.2", "Same /24 as the previous record"),
("192.168.122.9", "Class C"),
("127.0.0.1", "Home");
-- Run a query that does subnet calculation
select distinct
some_ip,
-- Subnet at /24 (32 - 8)
-- Basically, we are converting the IP into a number (32 bit)
-- Then the cast/pow business generates the subnet mask
-- Then we `and` the two to generate the network id :)
inet_ntoa(inet_aton(some_ip) & cast((pow(2, 32) - pow(2, 8)) as UNSIGNED)) as subnet,
log
from
somelogs;
-- Profit!
-- SOME_IP SUBNET LOG
-- 10.0.0.1 10.0.0.0 class A
-- 10.0.0.2 10.0.0.0 Same /24 as the previous record
-- 192.168.122.9 192.168.122.0 Class C
-- 127.0.0.1 127.0.0.0 Home

Thursday, May 15, 2014

tshark: display filters + reporting using csv


You can do pretty nifty things with tshark. The absolute life saver is thsark's ability to dump to a csv/tsv file using a user specified display filter.

As an example, I'd like to point out some packet retransmission issues to my provider in a nice (manager friendly) spreadsheet.  Here we go:

Manager friendly output:

ip.src tcp.srcport ip.dst tcp.dstport tcp.flags.syn tcp.flags.ack tcp.flags.push tcp.flags.reset tcp.analysis.bytes_in_flight tcp.len
a.b.c.d 8645 e.f.g.h7 9999 1 0 0 0
0
e.f.g.h7 9999 a.b.c.d 8645 1 1 0 0
0
a.b.c.d 8645 e.f.g.h7 9999 0 1 0 0
0
a.b.c.d 8645 e.f.g.h7 9999 0 1 1 0 168 168
e.f.g.h7 9999 a.b.c.d 8645 0 1 0 0
0
e.f.g.h7 9999 a.b.c.d 8645 0 1 1 0 1154 1154
a.b.c.d 8645 e.f.g.h7 9999 0 1 0 0
0
a.b.c.d 8645 e.f.g.h7 9999 0 1 0 0 1448 1448
a.b.c.d 8645 e.f.g.h7 9999 0 1 1 0 1502 54
e.f.g.h7 9999 a.b.c.d 8645 0 1 0 0
0

How do we get there?
1. Identify the fields that you want. A wireshark display filter cheat-sheet is a good place to start. You can home in on the fields that you want by firing up Wireshark and using the expression builder (button right next to the filter input box) then selecting the protocol that you want.

2. Choose your TCP stream.

# Viewing the tcp conversations in a pcap
tshark -qn -z conv,tcp -r test.pcap
================================================================================
TCP Conversations
Filter:<No Filter>
| <- | | -> | | Total | Relative | Duration |
| Frames Bytes | | Frames Bytes | | Frames Bytes | Start | |
a.b.c.d:31822 <-> e.f.g.h:9999 553 91298 549 36234 1102 127532 0.000000000 5155.6751
a.b.c.d:8645 <-> e.f.g.h:9999 402 66141 402 28210 804 94351 5162.869498000 3715.2102
3. Assemble your command. The one used to display the output above is:
# First 10 packets of the second TCP stream in the pcap
# Comman separated values with a header for the specified fields
$ tshark -ntu -r test.pcap -Y tcp.stream==1 -c 10 \
-E header=y -Tfields -E separator="," \
-e ip.src \
-e tcp.srcport \
-e "ip.dst" \
-e tcp.dstport \
-e tcp.flags.syn \
-e tcp.flags.ack \
-e tcp.flags.push \
-e tcp.flags.reset \
-e tcp.analysis.bytes_in_flight \
-e tcp.len
# Piping the output of the previous command to the csvlook command yields a nice table that can be easily grokked on the shell
|----------+-------------+---------+-------------+---------------+---------------+----------------+-----------------+------------------------------+----------|
| ip.src | tcp.srcport | ip.dst | tcp.dstport | tcp.flags.syn | tcp.flags.ack | tcp.flags.push | tcp.flags.reset | tcp.analysis.bytes_in_flight | tcp.len |
|----------+-------------+---------+-------------+---------------+---------------+----------------+-----------------+------------------------------+----------|
| a.b.c.d | 8645 | e.f.g.h | 9999 | 1 | 0 | 0 | 0 | | 0 |
| e.f.g.h | 9999 | a.b.c.d | 8645 | 1 | 1 | 0 | 0 | | 0 |
| a.b.c.d | 8645 | e.f.g.h | 9999 | 0 | 1 | 0 | 0 | | 0 |
| a.b.c.d | 8645 | e.f.g.h | 9999 | 0 | 1 | 1 | 0 | 168 | 168 |
| e.f.g.h | 9999 | a.b.c.d | 8645 | 0 | 1 | 0 | 0 | | 0 |
| e.f.g.h | 9999 | a.b.c.d | 8645 | 0 | 1 | 1 | 0 | 1154 | 1154 |
| a.b.c.d | 8645 | e.f.g.h | 9999 | 0 | 1 | 0 | 0 | | 0 |
| a.b.c.d | 8645 | e.f.g.h | 9999 | 0 | 1 | 0 | 0 | 1448 | 1448 |
| a.b.c.d | 8645 | e.f.g.h | 9999 | 0 | 1 | 1 | 0 | 1502 | 54 |
| e.f.g.h | 9999 | a.b.c.d | 8645 | 0 | 1 | 0 | 0 | | 0 |
|----------+-------------+---------+-------------+---------------+---------------+----------------+-----------------+------------------------------+----------|

Partitions in Postgres: Automatically creating partitions based on an attribute

A long time ago... I worked on importing ~ half a billion log records into Postgres. To achieve a low query response time, I used a partitioner that would shard records monthly. I documented it in the Postgres docs

Here it is:

----------------------------------------
-- function that inserts records.
-- if the partition for the month of
-- the record isn't found, a new partition
-- is created.
----------------------------------------
CREATE OR REPLACE FUNCTION logs_insert_func()
RETURNS TRIGGER AS $$
DECLARE
ourTable varchar;
ourTableExists integer;
ourFirstOfMonth date;
ourFirstOfNextMonth date;
ourInsertSTMT TEXT;
ourCreateSTMT TEXT;
ourMasterTable TEXT;
BEGIN
-- The table we'll inherit from
ourMasterTable := 'logs';
-- Get the partition table names ~ master_year_month
SELECT ourMasterTable || '_' ||
EXTRACT(ISOYEAR FROM NEW.log_time) || '_' ||
EXTRACT(MONTH FROM NEW.log_time)
into ourTable;
-- Create our insert statement
ourInsertSTMT := 'INSERT INTO '|| ourTable || ' (status,log_time,svc_time,ip_addr,query) VALUES (';
ourInsertSTMT := ourInsertSTMT|| NEW.status ||',';
ourInsertSTMT := ourInsertSTMT|| quote_nullable(NEW.log_time) ||',';
ourInsertSTMT := ourInsertSTMT|| NEW.svc_time ||',';
ourInsertSTMT := ourInsertSTMT|| quote_nullable(NEW.ip_addr) ||',';
ourInsertSTMT := ourInsertSTMT|| quote_nullable(NEW.query);
ourInsertSTMT := ourInsertSTMT || ')';
--
--Try execute it
EXECUTE ourInsertSTMT;
-- Phew! We didn't throw an exception. Insert worked which means that the
-- partition for this month exists.
RETURN NULL;
EXCEPTION WHEN OTHERS THEN
-- Insert failed. Let's check whether the table exists
SELECT count(*) into ourTableExists
FROM pg_catalog.pg_class c
WHERE c.relname = ourTable;
-- If it doesn't exist, try create it
IF ourTableExists = 0 THEN
-- First of this month and next month
SELECT date_trunc('month', NEW.log_time) into ourFirstOfMonth;
SELECT (ourFirstOfMonth + interval '1 month')::date into ourFirstOfNextMonth;
-- Create partition with range
ourCreateSTMT := 'CREATE TABLE '|| ourTable || '(';
ourCreateSTMT := ourCreateSTMT ||' CHECK ( log_time >=' || quote_nullable(ourFirstOfMonth) ;
ourCreateSTMT := ourCreateSTMT ||' AND log_time < DATE '|| quote_nullable(ourFirstOfNextMonth) || ')';
ourCreateSTMT := ourCreateSTMT ||') INHERITS (' || ourMasterTable || ')';
RAISE NOTICE 'Attempting to create a new table with STMT %',ourCreateSTMT;
EXECUTE ourCreateSTMT;
-- Retry to insert row
EXECUTE ourInsertSTMT;
IF NOT found THEN
RAISE NOTICE 'Error inserting into created partition % for %',ourTable,ourInsertSTMT;
END IF;
ELSE
RAISE NOTICE 'Error inserting into existing partition % for %',ourTable,ourInsertSTMT;
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
------------------------------
-- Example master table:
------------------------------
CREATE TABLE logs
(
id bigserial NOT NULL,
status smallint,
log_time timestamp without time zone,
svc_time real,
ip_addr inet,
query character varying(2048),
CONSTRAINT logs_id PRIMARY KEY (id)
)
WITH ( OIDS=FALSE);
------------------------------
-- Insert trigger:
------------------------------
CREATE TRIGGER
logs_insert_trigger
BEFORE INSERT ON logs
FOR EACH ROW EXECUTE PROCEDURE logs_insert_func();
------------------------------
-- Testing time baby!
-- Inserts from an existing
-- large table into the new
-- partitioned tables
------------------------------
=# insert into logs (status,log_time,svc_time,ip_addr,query) (select status,log_time,svc_time,ip_addr,query from whois_logs);
NOTICE: Attempting to create a new table with STMT CREATE TABLE logs_2009_1( CHECK ( log_time >='2010-01-01' AND log_time < DATE '2010-02-01')) INHERITS (logs)
NOTICE: Attempting to create a new table with STMT CREATE TABLE logs_2009_5( CHECK ( log_time >='2009-05-01' AND log_time < DATE '2009-06-01')) INHERITS (logs)
NOTICE: Attempting to create a new table with STMT CREATE TABLE logs_2006_9( CHECK ( log_time >='2006-09-01' AND log_time < DATE '2006-10-01')) INHERITS (logs)
----