Sunday, July 22, 2012

xargs and race conditions

I tend to use xargs as a quick paralleling tool. It works great. However, you may notice that due to the multiple processes running, output from one process will overlap with another process creating an unreadable mess.  As an example, I want to ping 6 addresses at the same time. I can do it serially at 1 ping/sec for a total of 6 seconds. Linear growth or O(n):

How do you get around it? We can run everything in parallel using xargs

Note that the time taken is markedly reduced but.. Output from each process is interleaved with other commands.

How do you get around that? Well, locking.. Luckily, the shell has a nifty locking utility that is just perfect for this. Enter flock..

Still as fast and neater. Man flock for more awesomeness. The flock utility uses the flock syscall to acquire a lock on a file structure. In the example above, I show that you can use a file that you are reading as a lock file. flock doesn't prevent you from conducting other file ops on your fd.

Friday, June 22, 2012

Digests using OpenSSL. Shell and perl editions,

Here's a small snippet that allows one to generate a digest.
Short of it: Perl

my $key_string = read_file("secretkey.pem");
my $key = Crypt::OpenSSL::RSA->new_private_key($key_string);
my $filestring = read_file("somefile");
$key->use_md5_hash();
my $signature = $key->sign($filestring);
print encode_base64($signature)."\n";

Short of it: shell
openssl dgst -md5 -sign secretkey.pem  <  somefile |base64 | tee somefile.sig_from_shell
cat somefile.sig_from_shell | base64 -d > somefile.sig_from_shell.raw
openssl dgst -md5 -verify public.pem  -signature somefile.sig_from_perl.raw <  somefile
In it's goriness:


Monday, May 28, 2012

Quick and dirty parallel ssh

Occasionally,  you may want to scp or run a job across many hosts.. Well there's parallel-[scp|ssh] for you.

However, if you are constrained or are out of time to install this wonderful tool, xargs might just save your day. So here goes my simple one/two liner. Say we want to scp /etc/passwd to a local dir, then we'll need to:
  • Give a unique name to the file to be copied over.
$ mkdir /tmp/bah/results -p && cd /tmp/bah
$ cat hosts | xargs -n 1 -P 50 -IH  ssh -i ~/.ssh/key.pem user@H 'cp /etc/passwd /tmp/passwd.`hostname` && echo `hostname` ok'
a.b.c.d ok
.....

  • Copy it over:)
 $ cat hosts | xargs -n 1 -P 50 -IH  scp  -i ~/.ssh/key.pem  user@H:/tmp/passwd.* results/
passwd.a.b.c.d                                                                    100% 1155     1.1KB/s   00:01
....
Total time is 9 seconds for 60 hosts... Not bad..
real    0m9.654s
user    0m2.120s
sys    0m0.276s
sh-4.2$ ls results/|wc
     60      60    1486

Monday, April 2, 2012

fgrep if you don't need regexes

I did know that fgrep was faster but I didn't know by how much!

So looking up a set of values (say IP address) from somefilesource in a bunch of logs (300 files with ~ 500k lines)

$ wc -l Some*
 3840 SomeFilesource


fgrep:
$ time fgrep  -hf SomeFilesource *log* > Wantedlogs
real 0m0.952s
user 0m0.890s
sys 0m0.030s

grep:

time grep -f  SomeFilesource *log* > Wantedlogs
real 33m45.601s
user 33m39.150s
sys 0m0.350s


That is quite a speed up. From now on it's fgrep by default... Now if I need to see whether the --mmap optimization speeds up subsequent [f]greps.