Friday, October 31, 2014

My ZSH/prezto settings...

Prezto is pretty cool. I love it. Here are my settings :)

#
# Executes commands at the start of an interactive session.
#
# Authors:
# Sorin Ionescu <sorin.ionescu@gmail.com>
#
# Source Prezto.
if [[ -s "${ZDOTDIR:-$HOME}/.zprezto/init.zsh" ]]; then
source "${ZDOTDIR:-$HOME}/.zprezto/init.zsh"
fi
# My binaries, then homebrew's the osx's
export PATH="$HOME/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin"
# Recent dirs handling
# Use cdr & friends. See http://zsh.sourceforge.net/Doc/Release/User-Contributions.html#Recent-Directories
autoload -Uz chpwd_recent_dirs cdr add-zsh-hook
add-zsh-hook chpwd chpwd_recent_dirs
# Allow clobbering of file redirection
# By default zsh doesn't let you overwrite existing files with > or create
# new files with >> if they don't exist. This option reverts that setting.
setopt CLOBBER
# Glob dot files
# ls ~/*pro* should display ~/.profile
setopt GLOBDOTS
# Comments after commands are acceptable
# $ ls # Comments allowed after command
setopt interactivecomments
view raw .zshrc hosted with ❤ by GitHub


#
# Sets Prezto options.
#
# Authors:
# Sorin Ionescu <sorin.ionescu@gmail.com>
#
#
# General
#
# Set case-sensitivity for completion, history lookup, etc.
zstyle ':prezto:*:*' case-sensitive 'yes'
# Color output (auto set to 'no' on dumb terminals).
zstyle ':prezto:*:*' color 'yes'
# Set the Zsh modules to load (man zshmodules).
zstyle ':prezto:load' zmodule 'attr' 'stat' 'datetime' 'deltochar' 'mathfunc'
# Set the Zsh functions to load (man zshcontrib).
zstyle ':prezto:load' zfunction 'zargs' 'zmv'
# Set the Prezto modules to load (browse modules).
# The order matters.
# You want to use gitfast over git. There's a massive speedup!
zstyle ':prezto:load' pmodule \
'environment' \
'terminal' \
'editor' \
'history' \
'directory' \
'spectrum' \
'utility' \
'completion' \
'homebrew' \
'osx' \
'ruby' \
'rails' \
'git' \
'syntax-highlighting' \
'history-substring-search' \
'prompt'
#
# Editor
#
# Set the key mapping style to 'emacs' or 'vi'.
zstyle ':prezto:module:editor' key-bindings 'emacs'
# Auto convert .... to ../..
# zstyle ':prezto:module:editor' dot-expansion 'yes'
#
# Git
#
# Ignore submodules when they are 'dirty', 'untracked', 'all', or 'none'.
# zstyle ':prezto:module:git:status:ignore' submodules 'all'
#
# GNU Utility
#
# Set the command prefix on non-GNU systems.
# zstyle ':prezto:module:gnu-utility' prefix 'g'
#
# History Substring Search
#
# Set the query found color.
# zstyle ':prezto:module:history-substring-search:color' found ''
# Set the query not found color.
# zstyle ':prezto:module:history-substring-search:color' not-found ''
# Set the search globbing flags.
# zstyle ':prezto:module:history-substring-search' globbing-flags ''
#
# Pacman
#
# Set the Pacman frontend.
# zstyle ':prezto:module:pacman' frontend 'yaourt'
#
# Prompt
#
# Set the prompt theme to load.
# Setting it to 'random' loads a random theme.
# Auto set to 'off' on dumb terminals.
zstyle ':prezto:module:prompt' theme 'paradox'
#
# Ruby
#
# Auto switch the Ruby version on directory change.
# zstyle ':prezto:module:ruby:chruby' auto-switch 'yes'
#
# Screen
#
# Auto start a session when Zsh is launched in a local terminal.
# zstyle ':prezto:module:screen:auto-start' local 'yes'
# Auto start a session when Zsh is launched in a SSH connection.
# zstyle ':prezto:module:screen:auto-start' remote 'yes'
#
# SSH
#
# Set the SSH identities to load into the agent.
# zstyle ':prezto:module:ssh:load' identities 'id_rsa' 'id_rsa2' 'id_github'
#
# Syntax Highlighting
#
# Set syntax highlighters.
# By default, only the main highlighter is enabled.
zstyle ':prezto:module:syntax-highlighting' highlighters \
'main' \
'brackets' \
'pattern' \
'cursor' \
'root'
#
# Set syntax highlighting styles.
# zstyle ':prezto:module:syntax-highlighting' styles \
# 'builtin' 'bg=blue' \
# 'command' 'bg=blue' \
# 'function' 'bg=blue'
#
# Terminal
#
# Auto set the tab and window titles.
zstyle ':prezto:module:terminal' auto-title 'yes'
# Set the window title format.
# zstyle ':prezto:module:terminal:window-title' format '%n@%m: %s'
# Set the tab title format.
# zstyle ':prezto:module:terminal:tab-title' format '%m: %s'
#
# Tmux
#
# Auto start a session when Zsh is launched in a local terminal.
# zstyle ':prezto:module:tmux:auto-start' local 'yes'
# Auto start a session when Zsh is launched in a SSH connection.
# zstyle ':prezto:module:tmux:auto-start' remote 'yes'
view raw .zpreztorc hosted with ❤ by GitHub

Tuesday, August 19, 2014

flask ( or itsdangerous ) secret key size.

I recently need to figure out the recommended key size for flask's secret key. Trawling through flasks' source, I discovered that it's using itsdangerous for signing. The signer in turn uses hmac with a defined hash algorithm or a default one.  The default digest method in itsdangerous is SHA-1.
According to wikipedia:

The cryptographic strength of the HMAC depends upon the size of the secret key that is used.
 The HMAC RFC in turn states that:


2. Definition of HMAC
The definition of HMAC requires a cryptographic hash function, which
   we denote by H, and a secret key K. We assume H to be a cryptographic
   hash function where data is hashed by iterating a basic compression
   function on blocks of data.   We denote by B the byte-length of such
   blocks (B=64 for all the above mentioned examples of hash functions),
   and by L the byte-length of hash outputs (L=16 for MD5, L=20 for
   SHA-1).  The authentication key K can be of any length up to B, the
   block length of the hash function.  Applications that use keys longer
   than B bytes will first hash the key using H and then use the
   resultant L byte string as the actual key to HMAC. In any case the
   minimal recommended length for K is L bytes (as the hash output
   length). See section 3 for more information on keys.
....
....
3. Keys

   The key for HMAC can be of any length (keys longer than B bytes are
   first hashed using H).  However, less than L bytes is strongly
   discouraged as it would decrease the security strength of the
   function.  Keys longer than L bytes are acceptable but the extra
   length would not significantly increase the function strength. (A
   longer key may be advisable if the randomness of the key is
   considered weak.)

   Keys need to be chosen at random (or using a cryptographically strong
   pseudo-random generator seeded with a random seed), and periodically
   refreshed.  (Current attacks do not indicate a specific recommended
   frequency for key changes as these attacks are practically
   infeasible.  However, periodic key refreshment is a fundamental
   security practice that helps against potential weaknesses of the
   function and keys, and limits the damage of an exposed key.)
So that in effect means that our secret key should be 16 bytes for MD5, 20 bytes for SHA1 and larger if you use SHA-2 or SHA-3. Use the output bits column of this table to figure out what your secret key size ought to be. For the flask secret key size,  I believe that a 32 byte secret key should be sufficient (and a 16 byte secret key risky... :)

My secret key block then becomes:
KEY_SIZE=32
SECRET_KEY = open("/dev/random","rb").read(KEY_SIZE)

os.urandom from the stdlib may not cut it since it sources /dev/urandom.


Caveat: YMMV, I am not a security expert!

Thursday, July 31, 2014

[Announce] BARCAMP NAIROBI 2014 – Who’s Your Data’s Daddy?


Nairobi’s premier technology event is back for 2014! The 8th Barcamp Nairobi will be held on Saturday, 30th August, 2014. Barcamp is produced by Skunkworks Kenya – a disruptive collective of Kenya’s best looking and best skilled techies - and will be jointly hosted for the 2nd year by iHub, Nailab and m:Lab East Africa at Bishop Magua Centre, Ngong Road.

Barcamp is an unconference - participants run it. Anyone and everyone can attend. Please join us by registering here. Attendees set the agenda for what’s discussed, lead the sessions and workshops that fill the schedule, and create an environment of innovation and productive discussion.

Who should attend: the curious, the unconventional, the brilliant, the resilient, thinkers, hackers, crackers, builders, coders, techies, writers, artists, ninjas, everyone.

  • Come prepared to: share ideas, challenge ideas, engage with others
  • Bring: gadgets, code, designs, community attitude, friends, deodorant
  • Don’t bring: wordy powerpoint presentations, hubris, suits and ties

Hashtag #BarcampNBI

The theme for Barcamp Nairobi 2014 is:
Who's Your Data's Daddy?
Is privacy and security online possible in Kenya?

We entrust our most sensitive, private, and important information to private technology companies. At the same time the increasing usage of technology has attracted the attention of authorities eager to provide caveats on the openness of the Internet and the range of freedoms, which we enjoy online.
At Barcamp Nairobi 2014 we are eager to talk about privacy and surveillance, we will explore if there are any strategies and solutions that Kenyan citizens, corporations and governments are using to protect their privacy and security online.

  • Have time? Volunteer for Barcamp Nairobi 2014 here.
  • Have money? Sponsor Barcamp Nairobi 2014. Email info@barcamp.co.ke for more information.
  • Want to attend? Register for Barcamp Nairobi 2014 here.
  • Want to speak? List your topic here.

Barcamp Nairobi 2014 is free to attend. Stay tuned to http://barcamp.co.ke/ for more details.

Tuesday, July 8, 2014

file locking using a context manager (with statement) in python

I needed a quick locking mechanism to prevent my daemons from stepping over each other. To have a sane daemon startup (and prevent multiple daemon spawns), we need to ensure that we have an exclusive lock before starting the program.  Googling around didn't lead to show any context managers that actually use the flock syscalls.

So here goes my attempt that seems to work:

import fcntl
class FileLock:
def __init__(self, filename, lock_type):
self.fh = open(filename, "a")
self.lock_type = lock_type
def __enter__(self):
print "Acquiring lock"
fcntl.flock(self.fh.fileno(), self.lock_type)
def __exit__(self, type, value, tb):
print "Releasing lock"
fcntl.flock(self.fh.fileno(), fcntl.LOCK_UN)
self.fh.close()
def main():
import os
import time
with FileLock("/tmp/some_lock_file", fcntl.LOCK_EX):
print "Start: %s" % time.ctime()
print "PID: %s" % os.getpid()
time.sleep(5)
print "Finish: %s" % time.ctime()
if __name__ == '__main__':
main()
view raw flock.py hosted with ❤ by GitHub
Spinning off some python processes that utilise this context manager shows serialisation taking place:
Starting process 1
Starting process 2
Starting process 3
Starting process 4
Starting process 5
Acquiring lock
Start: Tue Jul 8 11:51:22 2014
Acquiring lock
Acquiring lock
PID: 826
Acquiring lock
Acquiring lock
Finish: Tue Jul 8 11:51:27 2014
Releasing lock
Start: Tue Jul 8 11:51:27 2014
PID: 825
Finish: Tue Jul 8 11:51:32 2014
Releasing lock
Start: Tue Jul 8 11:51:32 2014
PID: 824
Finish: Tue Jul 8 11:51:37 2014
Releasing lock
Start: Tue Jul 8 11:51:37 2014
PID: 823
Finish: Tue Jul 8 11:51:42 2014
Releasing lock
Start: Tue Jul 8 11:51:42 2014
PID: 822
Finish: Tue Jul 8 11:51:47 2014
Releasing lock
And here's the output of lsof showing locking for the processes spun off above:
Tue Jul 8 11:51:18 UTC 2014
Tue Jul 8 11:51:20 UTC 2014
Tue Jul 8 11:51:22 UTC 2014
Tue Jul 8 11:51:24 UTC 2014
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
python 822 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 823 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 824 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 825 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 826 vagrant 3wW REG 254,0 0 1973047 /tmp/some_lock_file
Tue Jul 8 11:51:26 UTC 2014
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
python 822 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 823 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 824 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 825 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 826 vagrant 3wW REG 254,0 0 1973047 /tmp/some_lock_file
Tue Jul 8 11:51:28 UTC 2014
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
python 822 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 823 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 824 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 825 vagrant 3wW REG 254,0 0 1973047 /tmp/some_lock_file
Tue Jul 8 11:51:30 UTC 2014
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
python 822 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 823 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 824 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 825 vagrant 3wW REG 254,0 0 1973047 /tmp/some_lock_file
Tue Jul 8 11:51:32 UTC 2014
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
python 822 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 823 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 824 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 825 vagrant 3wW REG 254,0 0 1973047 /tmp/some_lock_file
Tue Jul 8 11:51:34 UTC 2014
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
python 822 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 823 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 824 vagrant 3wW REG 254,0 0 1973047 /tmp/some_lock_file
Tue Jul 8 11:51:36 UTC 2014
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
python 822 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 823 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 824 vagrant 3wW REG 254,0 0 1973047 /tmp/some_lock_file
Tue Jul 8 11:51:38 UTC 2014
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
python 822 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 823 vagrant 3wW REG 254,0 0 1973047 /tmp/some_lock_file
Tue Jul 8 11:51:40 UTC 2014
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
python 822 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 823 vagrant 3wW REG 254,0 0 1973047 /tmp/some_lock_file
Tue Jul 8 11:51:42 UTC 2014
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
python 822 vagrant 3w REG 254,0 0 1973047 /tmp/some_lock_file
python 823 vagrant 3wW REG 254,0 0 1973047 /tmp/some_lock_file
Tue Jul 8 11:51:44 UTC 2014
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
python 822 vagrant 3wW REG 254,0 0 1973047 /tmp/some_lock_file
Tue Jul 8 11:51:46 UTC 2014
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
python 822 vagrant 3wW REG 254,0 0 1973047 /tmp/some_lock_file
Tue Jul 8 11:51:48 UTC 2014
Tue Jul 8 11:51:50 UTC 2014
view raw lsof.output hosted with ❤ by GitHub

Monday, May 19, 2014

Redistilling PDFs that are not portable by design

I hate it when I am forced to deal with documents that are portable in title only (yes, I am looking at your Adobe). Every so often, I do get pdf documents from a major organisation that can viewed by Adobe Acrobat only. On OSX, this bloated application consumes 369 Megabytes of precious SSD space (preview consumes 29 Megabytes and is nicer).

Anyway, back to the story, these documents cannot be saved in any other format on my machine. In fact, the only way to read these documents w/out hackery is to print them out and rescan them back.

!Stupid!

So here goes a recipe for saving these files in a portable way.
############ Adobe badness #############
# In your operating system, create a postscript printer whose address is 127.0.0.1
# Fake a postscript printer using netcat
$ nc -l 127.0.0.1 9100 > printout.ps
# Print your pdf using Adobe Reader to the postscript printer on 127.0.0.1
# Netcat will diligently dump the printout to printout.ps as a postscript file
# The postscript file is encrypted and can't be converted by Ghostscript utils
$ ps2pdf printout.ps printout.pdf
This PostScript file was created from an encrypted PDF file.
Redistilling encrypted PDF is not permitted.
Error: /undefined in --eexec--
Operand stack:
--nostringval-- --dict:94/200(L)-- quit
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1894 1 3 %oparray_pop 1893 1 3 %oparray_pop 1877 1 3 %oparray_pop 1771 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- 1762 2 3 %oparray_pop --nostringval-- --nostringval-- --nostringval--
Dictionary stack:
--dict:1163/1684(ro)(G)-- --dict:1/20(G)-- --dict:94/200(L)-- --dict:1163/1684(ro)(G)--
Current allocation mode is local
Last OS error: No such file or directory
GPL Ghostscript 9.07: Unrecoverable error, exit code 1
############ Portable goodness #############
# Fake a postscript printer using netcat
$ nc -l 127.0.0.1 9100 > printout2.ps
# Print your pdf using Adobe Reader to the postscript printer on 127.0.0.1
# Yank out adobe file protection gunk from the postscript file generated by netcat
$ sed -e "/mark currentfile eexec/,/cleartomark/ d" printout2.ps > printout_clean2.ps
# Convert away!
$ ps2pdf printout_clean2.ps printout_clean2.pdf
# Enjoy your newly found portability!

Saturday, May 17, 2014

Subnet calculation using pure mysql

You can easily aggregate your records by subnets using mysql thanks to bitwise operators, an inet_aton (ascii to number function) and some thinking...

Here you go:

-- SQLfiddle: http://www.sqlfiddle.com/#!2/e88de/2
-- Create a table and add some records
create table somelogs
(
some_ip varchar(25),
log varchar(50)
);
insert into somelogs
values
("10.0.0.1", "class A"),
("10.0.0.2", "Same /24 as the previous record"),
("192.168.122.9", "Class C"),
("127.0.0.1", "Home");
-- Run a query that does subnet calculation
select distinct
some_ip,
-- Subnet at /24 (32 - 8)
-- Basically, we are converting the IP into a number (32 bit)
-- Then the cast/pow business generates the subnet mask
-- Then we `and` the two to generate the network id :)
inet_ntoa(inet_aton(some_ip) & cast((pow(2, 32) - pow(2, 8)) as UNSIGNED)) as subnet,
log
from
somelogs;
-- Profit!
-- SOME_IP SUBNET LOG
-- 10.0.0.1 10.0.0.0 class A
-- 10.0.0.2 10.0.0.0 Same /24 as the previous record
-- 192.168.122.9 192.168.122.0 Class C
-- 127.0.0.1 127.0.0.0 Home

Thursday, May 15, 2014

tshark: display filters + reporting using csv


You can do pretty nifty things with tshark. The absolute life saver is thsark's ability to dump to a csv/tsv file using a user specified display filter.

As an example, I'd like to point out some packet retransmission issues to my provider in a nice (manager friendly) spreadsheet.  Here we go:

Manager friendly output:

ip.src tcp.srcport ip.dst tcp.dstport tcp.flags.syn tcp.flags.ack tcp.flags.push tcp.flags.reset tcp.analysis.bytes_in_flight tcp.len
a.b.c.d 8645 e.f.g.h7 9999 1 0 0 0
0
e.f.g.h7 9999 a.b.c.d 8645 1 1 0 0
0
a.b.c.d 8645 e.f.g.h7 9999 0 1 0 0
0
a.b.c.d 8645 e.f.g.h7 9999 0 1 1 0 168 168
e.f.g.h7 9999 a.b.c.d 8645 0 1 0 0
0
e.f.g.h7 9999 a.b.c.d 8645 0 1 1 0 1154 1154
a.b.c.d 8645 e.f.g.h7 9999 0 1 0 0
0
a.b.c.d 8645 e.f.g.h7 9999 0 1 0 0 1448 1448
a.b.c.d 8645 e.f.g.h7 9999 0 1 1 0 1502 54
e.f.g.h7 9999 a.b.c.d 8645 0 1 0 0
0

How do we get there?
1. Identify the fields that you want. A wireshark display filter cheat-sheet is a good place to start. You can home in on the fields that you want by firing up Wireshark and using the expression builder (button right next to the filter input box) then selecting the protocol that you want.

2. Choose your TCP stream.

# Viewing the tcp conversations in a pcap
tshark -qn -z conv,tcp -r test.pcap
================================================================================
TCP Conversations
Filter:<No Filter>
| <- | | -> | | Total | Relative | Duration |
| Frames Bytes | | Frames Bytes | | Frames Bytes | Start | |
a.b.c.d:31822 <-> e.f.g.h:9999 553 91298 549 36234 1102 127532 0.000000000 5155.6751
a.b.c.d:8645 <-> e.f.g.h:9999 402 66141 402 28210 804 94351 5162.869498000 3715.2102
3. Assemble your command. The one used to display the output above is:
# First 10 packets of the second TCP stream in the pcap
# Comman separated values with a header for the specified fields
$ tshark -ntu -r test.pcap -Y tcp.stream==1 -c 10 \
-E header=y -Tfields -E separator="," \
-e ip.src \
-e tcp.srcport \
-e "ip.dst" \
-e tcp.dstport \
-e tcp.flags.syn \
-e tcp.flags.ack \
-e tcp.flags.push \
-e tcp.flags.reset \
-e tcp.analysis.bytes_in_flight \
-e tcp.len
# Piping the output of the previous command to the csvlook command yields a nice table that can be easily grokked on the shell
|----------+-------------+---------+-------------+---------------+---------------+----------------+-----------------+------------------------------+----------|
| ip.src | tcp.srcport | ip.dst | tcp.dstport | tcp.flags.syn | tcp.flags.ack | tcp.flags.push | tcp.flags.reset | tcp.analysis.bytes_in_flight | tcp.len |
|----------+-------------+---------+-------------+---------------+---------------+----------------+-----------------+------------------------------+----------|
| a.b.c.d | 8645 | e.f.g.h | 9999 | 1 | 0 | 0 | 0 | | 0 |
| e.f.g.h | 9999 | a.b.c.d | 8645 | 1 | 1 | 0 | 0 | | 0 |
| a.b.c.d | 8645 | e.f.g.h | 9999 | 0 | 1 | 0 | 0 | | 0 |
| a.b.c.d | 8645 | e.f.g.h | 9999 | 0 | 1 | 1 | 0 | 168 | 168 |
| e.f.g.h | 9999 | a.b.c.d | 8645 | 0 | 1 | 0 | 0 | | 0 |
| e.f.g.h | 9999 | a.b.c.d | 8645 | 0 | 1 | 1 | 0 | 1154 | 1154 |
| a.b.c.d | 8645 | e.f.g.h | 9999 | 0 | 1 | 0 | 0 | | 0 |
| a.b.c.d | 8645 | e.f.g.h | 9999 | 0 | 1 | 0 | 0 | 1448 | 1448 |
| a.b.c.d | 8645 | e.f.g.h | 9999 | 0 | 1 | 1 | 0 | 1502 | 54 |
| e.f.g.h | 9999 | a.b.c.d | 8645 | 0 | 1 | 0 | 0 | | 0 |
|----------+-------------+---------+-------------+---------------+---------------+----------------+-----------------+------------------------------+----------|

Partitions in Postgres: Automatically creating partitions based on an attribute

A long time ago... I worked on importing ~ half a billion log records into Postgres. To achieve a low query response time, I used a partitioner that would shard records monthly. I documented it in the Postgres docs

Here it is:

----------------------------------------
-- function that inserts records.
-- if the partition for the month of
-- the record isn't found, a new partition
-- is created.
----------------------------------------
CREATE OR REPLACE FUNCTION logs_insert_func()
RETURNS TRIGGER AS $$
DECLARE
ourTable varchar;
ourTableExists integer;
ourFirstOfMonth date;
ourFirstOfNextMonth date;
ourInsertSTMT TEXT;
ourCreateSTMT TEXT;
ourMasterTable TEXT;
BEGIN
-- The table we'll inherit from
ourMasterTable := 'logs';
-- Get the partition table names ~ master_year_month
SELECT ourMasterTable || '_' ||
EXTRACT(ISOYEAR FROM NEW.log_time) || '_' ||
EXTRACT(MONTH FROM NEW.log_time)
into ourTable;
-- Create our insert statement
ourInsertSTMT := 'INSERT INTO '|| ourTable || ' (status,log_time,svc_time,ip_addr,query) VALUES (';
ourInsertSTMT := ourInsertSTMT|| NEW.status ||',';
ourInsertSTMT := ourInsertSTMT|| quote_nullable(NEW.log_time) ||',';
ourInsertSTMT := ourInsertSTMT|| NEW.svc_time ||',';
ourInsertSTMT := ourInsertSTMT|| quote_nullable(NEW.ip_addr) ||',';
ourInsertSTMT := ourInsertSTMT|| quote_nullable(NEW.query);
ourInsertSTMT := ourInsertSTMT || ')';
--
--Try execute it
EXECUTE ourInsertSTMT;
-- Phew! We didn't throw an exception. Insert worked which means that the
-- partition for this month exists.
RETURN NULL;
EXCEPTION WHEN OTHERS THEN
-- Insert failed. Let's check whether the table exists
SELECT count(*) into ourTableExists
FROM pg_catalog.pg_class c
WHERE c.relname = ourTable;
-- If it doesn't exist, try create it
IF ourTableExists = 0 THEN
-- First of this month and next month
SELECT date_trunc('month', NEW.log_time) into ourFirstOfMonth;
SELECT (ourFirstOfMonth + interval '1 month')::date into ourFirstOfNextMonth;
-- Create partition with range
ourCreateSTMT := 'CREATE TABLE '|| ourTable || '(';
ourCreateSTMT := ourCreateSTMT ||' CHECK ( log_time >=' || quote_nullable(ourFirstOfMonth) ;
ourCreateSTMT := ourCreateSTMT ||' AND log_time < DATE '|| quote_nullable(ourFirstOfNextMonth) || ')';
ourCreateSTMT := ourCreateSTMT ||') INHERITS (' || ourMasterTable || ')';
RAISE NOTICE 'Attempting to create a new table with STMT %',ourCreateSTMT;
EXECUTE ourCreateSTMT;
-- Retry to insert row
EXECUTE ourInsertSTMT;
IF NOT found THEN
RAISE NOTICE 'Error inserting into created partition % for %',ourTable,ourInsertSTMT;
END IF;
ELSE
RAISE NOTICE 'Error inserting into existing partition % for %',ourTable,ourInsertSTMT;
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
------------------------------
-- Example master table:
------------------------------
CREATE TABLE logs
(
id bigserial NOT NULL,
status smallint,
log_time timestamp without time zone,
svc_time real,
ip_addr inet,
query character varying(2048),
CONSTRAINT logs_id PRIMARY KEY (id)
)
WITH ( OIDS=FALSE);
------------------------------
-- Insert trigger:
------------------------------
CREATE TRIGGER
logs_insert_trigger
BEFORE INSERT ON logs
FOR EACH ROW EXECUTE PROCEDURE logs_insert_func();
------------------------------
-- Testing time baby!
-- Inserts from an existing
-- large table into the new
-- partitioned tables
------------------------------
=# insert into logs (status,log_time,svc_time,ip_addr,query) (select status,log_time,svc_time,ip_addr,query from whois_logs);
NOTICE: Attempting to create a new table with STMT CREATE TABLE logs_2009_1( CHECK ( log_time >='2010-01-01' AND log_time < DATE '2010-02-01')) INHERITS (logs)
NOTICE: Attempting to create a new table with STMT CREATE TABLE logs_2009_5( CHECK ( log_time >='2009-05-01' AND log_time < DATE '2009-06-01')) INHERITS (logs)
NOTICE: Attempting to create a new table with STMT CREATE TABLE logs_2006_9( CHECK ( log_time >='2006-09-01' AND log_time < DATE '2006-10-01')) INHERITS (logs)
----

Sunday, March 16, 2014

Making sense of /proc/buddyinfo

/proc/buddyinfo gives you an idea about the free memory fragments on your Linux box. You get to view the free fragments for each available order, for the different zones of each numa node. The typical /proc/buddyinfo looks like this:

Node 0, zone DMA 5 4 4 4 4 3 2 0 0 0 2
Node 0, zone DMA32 2 2 3 2 2 1 3 1 2 3 386
Node 0, zone Normal 56332 12573 2212 959 448 179 35 5 0 0 35002
view raw _proc_buddyinfo hosted with ❤ by GitHub

This box has a single numa node. Each numa node is an entry in the kernel linked list pgdat_list. Each node is further divided into zones. Here are some example zone types:
  • DMA Zone: Lower 16 MiB of RAM used by legacy devices that cannot address anything beyond the first 16MiB of RAM.
  • DMA32 Zone (only on x86_64): Some devices can't address beyond the first 4GiB of RAM. On x86, this zone would probably be covered by Normal zone
  • Normal Zone: Anything above zone DMA and doesn't require kernel tricks to be addressable. Typically on x86, this is 16MiB to 896MiB. Many kernel operations require that the memory being used be from this zone
  • Highmem Zone (x86 only): Anything above 896MiB.
Each zone is further divided into power of 2 (also known as the order) page sized chunks by the buddy allocator. The buddy allocator attempts to satisfy an allocation request from a zone's free pool. Over time, this free pool will fragment and higher order allocations will fail. The buddyinfo proc file is generated on demand by walking all the free lists.

Say we have just rebooted the machine and we have a free pool of 16MiB (DMA zone). The most sensible thing to do would be to have the this memory split into largest contiguous blocks available. The largest order is defined at compile time to 11 which means that the largest slice the buddy allocator has is 4MiB block (2^10 * page_size). so the 16 MiB DMA zone would initially split into 4 free blocks.

Here's how we'll service an allocation request for 72KiB:
  1. Round up the allocation request to the next power of 2 (128)
  2. Split a 4MiB chunk into two 2MiB chunks
  3. Split  one 2 MiB chunk into two MiB chunks
  4. Continue splitting until we get a 128KiB chunk that we'll allocate.
 Allocation requests will over time split, merge, split... this pool until we get to a point where we might have to fail a request due to the lack of a contiguous memory block.

Here's an example of an allocation failure from a Gentoo bug report.
swapper: page allocation failure. order:4, mode:0x20
Pid: 0, comm: swapper Not tainted 2.6.32-gentoo-r7 #1
Call Trace:
<IRQ> [<ffffffff8107f082>] 0xffffffff8107f082
[<ffffffff810a29a4>] 0xffffffff810a29a4
[<ffffffff810a2b9f>] 0xffffffff810a2b9f
[<ffffffff810a2d35>] 0xffffffff810a2d35
[<ffffffff810a2de3>] 0xffffffff810a2de3
[<ffffffff810a2e4e>] 0xffffffff810a2e4e
[<ffffffff813ed961>] 0xffffffff813ed961
[<ffffffff813ee21b>] 0xffffffff813ee21b
[<ffffffffa07d8499>] 0xffffffffa07d8499
[<ffffffff813f4a3d>] 0xffffffff813f4a3d
view raw gistfile1.txt hosted with ❤ by GitHub

In such cases, the buddyinfo proc file will allow you to view the current fragmentation state of your memory.
Here's a quick python script that will make this data more digestible.

#!/usr/bin/env python
# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 textwidth=79 autoindent
"""
Python source code
Last modified: 15 Feb 2014 - 13:38
Last author: lmwangi at gmail com
Displays the available memory fragments
by querying /proc/buddyinfo
Example:
# python buddyinfo.py
"""
import optparse
import os
import re
from collections import defaultdict
import logging
class Logger:
def __init__(self, log_level):
self.log_level = log_level
def get_formatter(self):
return logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
def get_handler(self):
return logging.StreamHandler()
def get_logger(self):
"""Returns a Logger instance for the specified module_name"""
logger = logging.getLogger('main')
logger.setLevel(self.log_level)
log_handler = self.get_handler()
log_handler.setFormatter(self.get_formatter())
logger.addHandler(log_handler)
return logger
class BuddyInfo(object):
"""BuddyInfo DAO"""
def __init__(self, logger):
super(BuddyInfo, self).__init__()
self.log = logger
self.buddyinfo = self.load_buddyinfo()
def parse_line(self, line):
line = line.strip()
self.log.debug("Parsing line: %s" % line)
parsed_line = re.match("Node\s+(?P<numa_node>\d+).*zone\s+(?P<zone>\w+)\s+(?P<nr_free>.*)", line).groupdict()
self.log.debug("Parsed line: %s" % parsed_line)
return parsed_line
def read_buddyinfo(self):
buddyhash = defaultdict(list)
buddyinfo = open("/proc/buddyinfo").readlines()
for line in map(self.parse_line, buddyinfo):
numa_node = int(line["numa_node"])
zone = line["zone"]
free_fragments = map(int, line["nr_free"].split())
max_order = len(free_fragments)
fragment_sizes = self.get_order_sizes(max_order)
usage_in_bytes = [block[0] * block[1] for block in zip(free_fragments, fragment_sizes)]
buddyhash[numa_node].append({
"zone": zone,
"nr_free": free_fragments,
"sz_fragment": fragment_sizes,
"usage": usage_in_bytes })
return buddyhash
def load_buddyinfo(self):
buddyhash = self.read_buddyinfo()
self.log.info(buddyhash)
return buddyhash
def page_size(self):
return os.sysconf("SC_PAGE_SIZE")
def get_order_sizes(self, max_order):
return [self.page_size() * 2**order for order in range(0, max_order)]
def __str__(self):
ret_string = ""
width = 20
for node in self.buddyinfo:
ret_string += "Node: %s\n" % node
for zoneinfo in self.buddyinfo.get(node):
ret_string += " Zone: %s\n" % zoneinfo.get("zone")
ret_string += " Free KiB in zone: %.2f\n" % (sum(zoneinfo.get("usage")) / (1024.0))
ret_string += '\t{0:{align}{width}} {1:{align}{width}} {2:{align}{width}}\n'.format(
"Fragment size", "Free fragments", "Total available KiB",
width=width,
align="<")
for idx in range(len(zoneinfo.get("sz_fragment"))):
ret_string += '\t{order:{align}{width}} {nr:{align}{width}} {usage:{align}{width}}\n'.format(
width=width,
align="<",
order = zoneinfo.get("sz_fragment")[idx],
nr = zoneinfo.get("nr_free")[idx],
usage = zoneinfo.get("usage")[idx] / 1024.0)
return ret_string
def main():
"""Main function. Called when this file is a shell script"""
usage = "usage: %prog [options]"
parser = optparse.OptionParser(usage)
parser.add_option("-s", "--size", dest="size", choices=["B","K","M"],
action="store", type="choice", help="Return results in bytes, kib, mib")
(options, args) = parser.parse_args()
logger = Logger(logging.DEBUG).get_logger()
logger.info("Starting....")
logger.info("Parsed options: %s" % options)
print logger
buddy = BuddyInfo(logger)
print buddy
if __name__ == '__main__':
main()
view raw buddyinfo.py hosted with ❤ by GitHub
And sample output for the buddyinfo data pasted earlier on.

2014-03-15 23:34:32,352 - main - INFO - Starting....
2014-03-15 23:34:32,352 - main - INFO - Parsed options: {'size': None}
<logging.Logger object at 0x1102cffd0>
2014-03-15 23:34:32,352 - main - DEBUG - Parsing line: Node 0, zone DMA 5 4 4 4 4 3 2 0 0 0 2
2014-03-15 23:34:32,353 - main - DEBUG - Parsed line: {'numa_node': '0', 'zone': 'DMA', 'nr_free': '5 4 4 4 4 3 2 0 0 0 2'}
2014-03-15 23:34:32,353 - main - DEBUG - Parsing line: Node 0, zone DMA32 2 2 3 2 2 1 3 1 2 3 386
2014-03-15 23:34:32,353 - main - DEBUG - Parsed line: {'numa_node': '0', 'zone': 'DMA32', 'nr_free': '2 2 3 2 2 1 3 1 2 3 386'}
2014-03-15 23:34:32,353 - main - DEBUG - Parsing line: Node 0, zone Normal 56332 12573 2212 959 448 179 35 5 0 0 35002
2014-03-15 23:34:32,353 - main - DEBUG - Parsed line: {'numa_node': '0', 'zone': 'Normal', 'nr_free': '56332 12573 2212 959 448 179 35 5 0 0 35002'}
2014-03-15 23:34:32,375 - main - INFO - defaultdict(<type 'list'>, {0: [{'usage': [20480, 32768, 65536, 131072, 262144, 393216, 524288, 0, 0, 0, 8388608], 'nr_free': [5, 4, 4, 4, 4, 3, 2, 0, 0, 0, 2], 'zone': 'DMA', 'sz_fragment': [4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304]}, {'usage': [8192, 16384, 49152, 65536, 131072, 131072, 786432, 524288, 2097152, 6291456, 1619001344], 'nr_free': [2, 2, 3, 2, 2, 1, 3, 1, 2, 3, 386], 'zone': 'DMA32', 'sz_fragment': [4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304]}, {'usage': [230735872, 102998016, 36241408, 31424512, 29360128, 23461888, 9175040, 2621440, 0, 0, 146809028608], 'nr_free': [56332, 12573, 2212, 959, 448, 179, 35, 5, 0, 0, 35002], 'zone': 'Normal', 'sz_fragment': [4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304]}]})
Node: 0
Zone: DMA
Free KiB in zone: 9588.00
Fragment size Free fragments Total available KiB
4096 5 20.0
8192 4 32.0
16384 4 64.0
32768 4 128.0
65536 4 256.0
131072 3 384.0
262144 2 512.0
524288 0 0.0
1048576 0 0.0
2097152 0 0.0
4194304 2 8192.0
Zone: DMA32
Free KiB in zone: 1590920.00
Fragment size Free fragments Total available KiB
4096 2 8.0
8192 2 16.0
16384 3 48.0
32768 2 64.0
65536 2 128.0
131072 1 128.0
262144 3 768.0
524288 1 512.0
1048576 2 2048.0
2097152 3 6144.0
4194304 386 1581056.0
Zone: Normal
Free KiB in zone: 143823288.00
Fragment size Free fragments Total available KiB
4096 56332 225328.0
8192 12573 100584.0
16384 2212 35392.0
32768 959 30688.0
65536 448 28672.0
131072 179 22912.0
262144 35 8960.0
524288 5 2560.0
1048576 0 0.0
2097152 0 0.0
4194304 35002 143368192.0

Wednesday, January 29, 2014

Fifos and persistent readers

I recently worked on a daemon (call it slurper) that persistently read data from syslog via a FIFO (also known as a named pipe). On startup, slurper would work fine for a couple of hours then stop processing input from the FIFO.  The relevant code in slurper is:

while True:
self.logger.info("[Re]starting to read %s" % self.input_file)
with open(self.input_file, "rb") as f:
for line in f:
...do_stuff_with_line...(line)
Digging into this mystery revealed that syslogd server was getting EAGAIN errors on the fifo descriptor.  According to man 7 pipe:

      O_NONBLOCK enabled, n <= PIPE_BUF
              If there is room to write n bytes to the pipe, then write(2) succeeds immediately, writing all n bytes; otherwise write(2) fails, with errno set to EAGAIN.

The syslogd daemon was opening the pipe in O_NONBLOCK mode and getting EAGAIN errors which implied that the pipe was full. (man 7 pipe states that the pipe buffer is 64K).
Additionally, a `cat` on the FIFO drains the pipe and allows syslogd to write more content.

All these clues imply that the FIFO has no reader. But how can that be? A check on lsof shows that slurper has an open fd for the named pipe. Digging deeper, an attempt to `cat` slurpers' open fd didn't return any data

cat /proc/$(pgrep slurper)/fd/ # Be careful with this. It will steal data from your pipe/file/socket on a production system

So I decided to whip up a reader that emulates slurper's behaviour

# Pre-create the fifo
# mkfifo /tmp/example_fifo
# Save this to a file and strace it
# strace -fttt -o open_close python test.py
import time
fname = "/tmp/example_fifo"
with open(fname, "r") as f:
for line in f:
print line
time.sleep(1)
view raw reader.py hosted with ❤ by GitHub
Strace this script to see which syscalls are being invoked

# This will block until I run the following in ipython:
# f = open("/tmp/example_fifo", "w", 0); f.write("Hello"); f.close()
# $ strace -fttt -o open_close python test.py
Hello
# $ cat open_close
14872 1390947892.561375 open("/tmp/example_fifo", O_RDONLY) = 3 # As expected, the reader is blocked on reading until a writer comes along (look at the second column == no. of elapsed wall time)
...
... # In ipython run: f = open("/tmp/example_fifo", "w", 0); f.write("Hello"); f.close()
...
14872 1390947902.566207 fstat(3, {st_mode=S_IFIFO|0644, st_size=0, ...}) = 0
...
....
14872 1390947902.566460 read(3, "Hello", 8192) = 5
14872 1390947902.567981 read(3, "", 4096) = 0
14872 1390947902.568104 read(3, "", 8192) = 0 # Aha... We are reading an EOF!
14872 1390947902.568226 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 8), ...}) = 0
14872 1390947902.568345 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f99c6de3000
14872 1390947902.568528 write(1, "Hello\n", 6) = 6
14872 1390947902.568693 select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
14872 1390947903.570856 close(3) = 0 # and we exit.
14872 1390947903.571690 munmap(0x7f99c6de4000, 4096) = 0
14872 1390947903.572290 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f99c69bb030}, {0x425842, [], SA_RESTORER, 0x7f99c69bb030}, 8) = 0
14872 1390947903.574587 exit_group(0) = ?
view raw open_close.py hosted with ❤ by GitHub
This reveals that a writer closing it's fd will cause readers to read an EOF (and probably exit in the case of the block under the context manager).
So we have two options:
1) Ugly and kludgy:  Wrap the context manager read block within an infinite loop the reopens the file:
2) Super cool trick. Open another dummy writer to the FIFO. The kernel sends an EOF when the last writer closes it's fd. Since our dummy writer never closes the fd, readers will never get an EOF if the real writer closes it's fd.

while True:
self.logger.info("[Re]starting to read %s" % self.input_file)
with open(self.input_file, "rb") as f:
f1 = open (fname, "w") # The dummy writer has to be opened after a reader, otherwise it would block
for line in f:
...do_stuff_with_line...(line)
The actual root cause: The syslog daemon was being restarted and this would cause it to close and reopen it's fds.

Wednesday, January 22, 2014

Macbook pro setup for office use



Funky prompt thanks to powerline and powerline-fonts. Powerline can integrate with vim/ipython/bash/zsh…

I seem to prefer zsh over bash these days (git integration, rvm integration…):
In zshrc: ZSH_THEME=“agnoster”
Plugin support
Theme screenshots.


Vim has a very cool set of plugins thanks to spf13:

If you have a mac, iterm2 rocks:

And finally, I like the solarized theme for my terminal: