Digging into this mystery revealed that syslogd server was getting EAGAIN errors on the fifo descriptor. According to man 7 pipe:
O_NONBLOCK enabled, n <= PIPE_BUF
If there is room to write n bytes to the pipe, then write(2) succeeds immediately, writing all n bytes; otherwise write(2) fails, with errno set to EAGAIN.
The syslogd daemon was opening the pipe in O_NONBLOCK mode and getting EAGAIN errors which implied that the pipe was full. (man 7 pipe states that the pipe buffer is 64K).
Additionally, a `cat` on the FIFO drains the pipe and allows syslogd to write more content.
All these clues imply that the FIFO has no reader. But how can that be? A check on lsof shows that slurper has an open fd for the named pipe. Digging deeper, an attempt to `cat` slurpers' open fd didn't return any data
cat /proc/$(pgrep slurper)/fd/# Be careful with this. It will steal data from your pipe/file/socket on a production system
So I decided to whip up a reader that emulates slurper's behaviour
Strace this script to see which syscalls are being invoked
This reveals that a writer closing it's fd will cause readers to read an EOF (and probably exit in the case of the block under the context manager).
So we have two options:
1) Ugly and kludgy: Wrap the context manager read block within an infinite loop the reopens the file:
2) Super cool trick. Open another dummy writer to the FIFO. The kernel sends an EOF when the last writer closes it's fd. Since our dummy writer never closes the fd, readers will never get an EOF if the real writer closes it's fd.
The actual root cause: The syslog daemon was being restarted and this would cause it to close and reopen it's fds.
No comments:
Post a Comment