Thursday, October 28, 2010

Table Partitions with Postgres

I recently had to import about 400 million log entries into a database for reporting. MongoDB? Some might say, I was there and was dissapointed with the performance. It's true that I don't need ACIDity but I couldn't live with a 5 day long import from the query rates I was getting. Tweaking postgres I managed to do the import overnight. It could have been faster if I didn't have to bunzip, awk and sed then dump each log file into a CSV.

After this, I realised that since I tended to do aggregate my reports on a per month basis, per month partitions might help speed things up. Here's a small pgsql script that inserts a row into the proper partition based on the log's timestamp. If the partition doesn't exist, it's auto created. It's not pretty/optimized but it works pretty well.

CREATE OR REPLACE FUNCTION  logs_insert_func()
RETURNS TRIGGER AS $$
DECLARE

ourTable varchar;
ourTableExists integer;
ourFirstOfMonth date;
ourFirstOfNextMonth date;
ourInsertSTMT TEXT;
ourCreateSTMT TEXT;
ourMasterTable TEXT;


BEGIN
-- The table we'll inherit from
ourMasterTable := 'logs';

-- Get the partition table names ~ master_year_month
SELECT ourMasterTable|| '_' || EXTRACT(ISOYEAR FROM NEW.log_time) || '_' || EXTRACT(MONTH FROM NEW.log_time) into ourTable;

-- Create our insert statement
ourInsertSTMT := 'INSERT INTO '|| ourTable || ' (status,log_time,svc_time,ip_addr,query) VALUES (';
ourInsertSTMT := ourInsertSTMT|| NEW.status ||',';
ourInsertSTMT := ourInsertSTMT|| quote_nullable(NEW.log_time) ||',';
ourInsertSTMT := ourInsertSTMT|| NEW.svc_time ||',';
ourInsertSTMT := ourInsertSTMT|| quote_nullable(NEW.ip_addr) ||',';
ourInsertSTMT := ourInsertSTMT|| quote_nullable(NEW.query);
ourInsertSTMT := ourInsertSTMT || ')';
--

--Try execute it
EXECUTE ourInsertSTMT;

-- Phew! We didn't Except
RETURN NULL;
EXCEPTION
WHEN OTHERS THEN
-- Insert failed. Let's check whether the table exists
SELECT count(*) into ourTableExists
FROM pg_catalog.pg_class c
WHERE c.relname = ourTable;

-- If it doesn't exist, try create it
IF ourTableExists = 0 THEN

-- First of this month and next month
SELECT date_trunc('month', NEW.log_time) into ourFirstOfMonth;
SELECT (ourFirstOfMonth + interval '1 month')::date into ourFirstOfNextMonth;

-- Create partition with range
ourCreateSTMT := 'CREATE TABLE '|| ourTable || '(';
ourCreateSTMT := ourCreateSTMT ||' CHECK ( log_time >=' || quote_nullable(ourFirstOfMonth) ;
ourCreateSTMT := ourCreateSTMT ||' AND log_time < DATE '|| quote_nullable(ourFirstOfNextMonth) || ')';
ourCreateSTMT := ourCreateSTMT ||') INHERITS (' || ourMasterTable || ')';
RAISE NOTICE 'Attempting to create a new table with STMT %',ourCreateSTMT;
EXECUTE ourCreateSTMT;

-- Retry to insert row
EXECUTE ourInsertSTMT;

IF NOT found THEN
RAISE NOTICE 'Error inserting into created partition % for %',ourTable,ourInsertSTMT;
END IF;

ELSE
RAISE NOTICE 'Error inserting into existing partition % for %',ourTable,ourInsertSTMT;
END IF;
RETURN NULL;
END;

$$
LANGUAGE plpgsql;


The master table:
CREATE TABLE logs
(
id bigserial NOT NULL,
status smallint,
log_time timestamp without time zone,
svc_time real,
ip_addr inet,
query character varying(2048),
CONSTRAINT logs_id PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);


* Trigger to be called on each insert
CREATE TRIGGER logs_insert_trigger
BEFORE INSERT
ON logs
FOR EACH ROW
EXECUTE PROCEDURE logs_insert_func();


An example insert auto-creates the relevant partitions:
# insert into logs (status,log_time,svc_time,ip_addr,query)  (select status,log_time,svc_time,ip_addr,query from whois_logs);
NOTICE: Attempting to create a new table with STMT CREATE TABLE logs_2009_1( CHECK ( log_time >='2010-01-01' AND log_time < DATE '2010-02-01')) INHERITS (logs)
NOTICE: Attempting to create a new table with STMT CREATE TABLE logs_2009_5( CHECK ( log_time >='2009-05-01' AND log_time < DATE '2009-06-01')) INHERITS (logs)
NOTICE: Attempting to create a new table with STMT CREATE TABLE logs_2006_9( CHECK ( log_time >='2006-09-01' AND log_time < DATE '2006-10-01')) INHERITS (logs)

Works for me. Yay!

Monday, October 25, 2010

Workaround for Mouse stuck in X windows

I had an issue with my macbook pro 5,5 running debian squeeze where the mouse would get stuck after moving to my second screen. It seems to be a common issue with X.
After messing around with xinput,halm xset etc, I finally got a workaround by using the python script from here.

Using the -n arguement switches you & your pointer out of the jailed screen. You may find it useful to bind it to a shortcut key.

$ tail -n 3 ~/.fluxbox/keys
#Woohooo - Workaround for stupid pointer
Control Shift N :ExecCommand /home/lmwangi/bin/screenswap.py -n

Now pressing ctrl-shift-n just switches me back to the other screen. It might make sense to modify the script so that it monitors your proximity to an edge of your jailing screen and auto-switches you to inaccessible screen.