Friday, April 5, 2013

Vertical line markers in R using geom_vline

Say you have a two data sets that share a common dimension/axis only.  Here's an example:

# Event csv
timestamp,event
2013-04-03 22:59:05.061Z,A
2013-04-03 22:59:05.061Z,B
2013-04-03 22:59:07.109Z,C
2013-04-03 22:59:07.115Z,D
2013-04-03 22:59:07.209Z,E

# Performance data
hostname;interval;timestamp;CPU;user;nice;system;iowait;steal;idle
box1;1;2013-04-03 22:59:02 UTC;-1;10.53;0.00;2.01;0.50;0.00;86.97
box1;1;2013-04-03 22:59:03 UTC;-1;0.25;0.00;0.00;0.00;0.00;99.75
box1;1;2013-04-03 22:59:04 UTC;-1;0.00;0.00;0.25;0.25;0.00;99.50
box1;1;2013-04-03 22:59:05 UTC;-1;10.72;0.00;1.00;0.25;0.00;88.03
box1;1;2013-04-03 22:59:06 UTC;-1;10.67;0.00;10.67;0.00;0.25;78.41
box1;1;2013-04-03 22:59:07 UTC;-1;5.01;0.00;9.02;3.51;0.00;82.46
box1;1;2013-04-03 22:59:08 UTC;-1;12.28;0.00;11.53;4.26;0.25;71.68
box1;1;2013-04-03 22:59:09 UTC;-1;15.88;0.00;11.66;10.92;0.50;61.04
You'd like to plot these values on one graph; one overlaid over the other. Based on the data from the example above, you'd like to plot the CPU user against the timestamp metric, then you'd like to add in markers to show events over the chart.

Here's a gist:
events <- read.csv("events.csv")
> head(events)
timestamp event
1 2013-04-03 22:59:05.061Z A
2 2013-04-03 22:59:05.061Z B
3 2013-04-03 22:59:07.109Z C
4 2013-04-03 22:59:07.115Z D
5 2013-04-03 22:59:07.209Z E
performance <- read.csv("performance.csv", header=TRUE,sep=";")
> head(performance)
hostname interval timestamp CPU user nice system iowait steal idle
1 box1 1 2013-04-03 22:59:02 UTC -1 10.53 0 2.01 0.50 0.00 86.97
2 box1 1 2013-04-03 22:59:03 UTC -1 0.25 0 0.00 0.00 0.00 99.75
3 box1 1 2013-04-03 22:59:04 UTC -1 0.00 0 0.25 0.25 0.00 99.50
4 box1 1 2013-04-03 22:59:05 UTC -1 10.72 0 1.00 0.25 0.00 88.03
5 box1 1 2013-04-03 22:59:06 UTC -1 10.67 0 10.67 0.00 0.25 78.41
6 box1 1 2013-04-03 22:59:07 UTC -1 5.01 0 9.02 3.51 0.00 82.46
performance$timestamp <- as.POSIXlt(perfomance$timestamp)
events$timestamp <- as.POSIXlt(events$timestamp)
> str(performance)
'data.frame': 8 obs. of 10 variables:
$ hostname : Factor w/ 1 level "box1": 1 1 1 1 1 1 1 1
$ interval : int 1 1 1 1 1 1 1 1
$ timestamp: Factor w/ 8 levels "2013-04-03 22:59:02 UTC",..: 1 2 3 4 5 6 7 8
$ CPU : int -1 -1 -1 -1 -1 -1 -1 -1
$ user : num 10.53 0.25 0 10.72 10.67 ...
$ nice : num 0 0 0 0 0 0 0 0
$ system : num 2.01 0 0.25 1 10.67 ...
$ iowait : num 0.5 0 0.25 0.25 0 ...
$ steal : num 0 0 0 0 0.25 0 0.25 0.5
$ idle : num 87 99.8 99.5 88 78.4 ...
# Plot the first dataset. Two line plots sharing the same X axis.
p <- ggplot(data=performance, aes(x=timestamp)) + geom_line(aes(y=idle, colour="% cpu idle")) + geom_line(aes(y=user, colour="% cpu user")) + scale_x_datetime(labels = date_format("%H:%M:%S"))
# Plot the vertical lines
p + geom_vline(data=events, linetype=4, aes(colour=factor(event),xintercept=as.numeric(timestamp)) )


 Obviously, we need a better legend, y axis label and a title for this graph.. That's left as an exercise.

No comments:

Post a Comment