Friday, October 11, 2013

Linux: The pagecache and the loop back fs

Linux has a mechanism that allows you to create a block device that is backed by a file.  Most commonly, this is used to provide an encrpyted fs. All this is fine and dandy. However, you have to factor in that the Linux OS (And practically any other OS) will want to cache contents for block device in memory.
The reasoning here is. Accessing the contents of a file from a disk will cost you ~5ms. Rather than incur this cost on future reads, the OS caches the contents of the recently used file in a page cache. Future reads or writes to this file will hit the page cache which is orders of magnitude faster than your average disk.
This means that your writes will linger in memory until the your backing file's contents get evicted from the page cache. Using o_direct on files that are hosted within the loop backed fs won't help. You have to force pagecache evictions. The easiest way to do this is a call to fadvise and a sync to force pdflush to write your changes.

Here's an experiment:
I have 3 windows open.:
  • One running blktrace  which shows VFS activity.
  • One that has a oneshot dd.
  • One that has a bunch of shell commands that poke around the loop mounted fs.
The experiment is executed in the following order:
  1. An fadvise and a sync at the beginning to make sure the pagecache is clean and all writes are on the FS.
  2. We also print out a couple of stats from /proc/meminfo.
  3. We issue a dd call with direct I/O (oflag=direct).
  4. Print stats from /proc/meminfo and take a look at the backing file using debugfs.
  5. Show how much of the backing file is cached in the pagecache by using fincore.
  6. Evict the pagecache using the fadvise command from linux-ftools.
  7. Force a sync which wakes up pdflush to write out the dirty buffers.
  8. Run debugfs to take a peek at the backing fs again.
  9. Print out some more stats from meminfo.
Here's the script:
And the output generated:
The interesting bits to note here is that writes to a loop back filesystems in Linux are not guaranteed to be on disk until you evict the pagecache and force a sync.

If you are interested in further digging, here's debug output from:

blktrace:
Debugfs and dumpe2fs:

No comments:

Post a Comment