This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Node 0, zone DMA 5 4 4 4 4 3 2 0 0 0 2 | |
Node 0, zone DMA32 2 2 3 2 2 1 3 1 2 3 386 | |
Node 0, zone Normal 56332 12573 2212 959 448 179 35 5 0 0 35002 |
This box has a single numa node. Each numa node is an entry in the kernel linked list pgdat_list. Each node is further divided into zones. Here are some example zone types:
- DMA Zone: Lower 16 MiB of RAM used by legacy devices that cannot address anything beyond the first 16MiB of RAM.
- DMA32 Zone (only on x86_64): Some devices can't address beyond the first 4GiB of RAM. On x86, this zone would probably be covered by Normal zone
- Normal Zone: Anything above zone DMA and doesn't require kernel tricks to be addressable. Typically on x86, this is 16MiB to 896MiB. Many kernel operations require that the memory being used be from this zone
- Highmem Zone (x86 only): Anything above 896MiB.
Say we have just rebooted the machine and we have a free pool of 16MiB (DMA zone). The most sensible thing to do would be to have the this memory split into largest contiguous blocks available. The largest order is defined at compile time to 11 which means that the largest slice the buddy allocator has is 4MiB block (2^10 * page_size). so the 16 MiB DMA zone would initially split into 4 free blocks.
Here's how we'll service an allocation request for 72KiB:
- Round up the allocation request to the next power of 2 (128)
- Split a 4MiB chunk into two 2MiB chunks
- Split one 2 MiB chunk into two MiB chunks
- Continue splitting until we get a 128KiB chunk that we'll allocate.
Here's an example of an allocation failure from a Gentoo bug report.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
swapper: page allocation failure. order:4, mode:0x20 | |
Pid: 0, comm: swapper Not tainted 2.6.32-gentoo-r7 #1 | |
Call Trace: | |
<IRQ> [<ffffffff8107f082>] 0xffffffff8107f082 | |
[<ffffffff810a29a4>] 0xffffffff810a29a4 | |
[<ffffffff810a2b9f>] 0xffffffff810a2b9f | |
[<ffffffff810a2d35>] 0xffffffff810a2d35 | |
[<ffffffff810a2de3>] 0xffffffff810a2de3 | |
[<ffffffff810a2e4e>] 0xffffffff810a2e4e | |
[<ffffffff813ed961>] 0xffffffff813ed961 | |
[<ffffffff813ee21b>] 0xffffffff813ee21b | |
[<ffffffffa07d8499>] 0xffffffffa07d8499 | |
[<ffffffff813f4a3d>] 0xffffffff813f4a3d |
In such cases, the buddyinfo proc file will allow you to view the current fragmentation state of your memory.
Here's a quick python script that will make this data more digestible.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 textwidth=79 autoindent | |
""" | |
Python source code | |
Last modified: 15 Feb 2014 - 13:38 | |
Last author: lmwangi at gmail com | |
Displays the available memory fragments | |
by querying /proc/buddyinfo | |
Example: | |
# python buddyinfo.py | |
""" | |
import optparse | |
import os | |
import re | |
from collections import defaultdict | |
import logging | |
class Logger: | |
def __init__(self, log_level): | |
self.log_level = log_level | |
def get_formatter(self): | |
return logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') | |
def get_handler(self): | |
return logging.StreamHandler() | |
def get_logger(self): | |
"""Returns a Logger instance for the specified module_name""" | |
logger = logging.getLogger('main') | |
logger.setLevel(self.log_level) | |
log_handler = self.get_handler() | |
log_handler.setFormatter(self.get_formatter()) | |
logger.addHandler(log_handler) | |
return logger | |
class BuddyInfo(object): | |
"""BuddyInfo DAO""" | |
def __init__(self, logger): | |
super(BuddyInfo, self).__init__() | |
self.log = logger | |
self.buddyinfo = self.load_buddyinfo() | |
def parse_line(self, line): | |
line = line.strip() | |
self.log.debug("Parsing line: %s" % line) | |
parsed_line = re.match("Node\s+(?P<numa_node>\d+).*zone\s+(?P<zone>\w+)\s+(?P<nr_free>.*)", line).groupdict() | |
self.log.debug("Parsed line: %s" % parsed_line) | |
return parsed_line | |
def read_buddyinfo(self): | |
buddyhash = defaultdict(list) | |
buddyinfo = open("/proc/buddyinfo").readlines() | |
for line in map(self.parse_line, buddyinfo): | |
numa_node = int(line["numa_node"]) | |
zone = line["zone"] | |
free_fragments = map(int, line["nr_free"].split()) | |
max_order = len(free_fragments) | |
fragment_sizes = self.get_order_sizes(max_order) | |
usage_in_bytes = [block[0] * block[1] for block in zip(free_fragments, fragment_sizes)] | |
buddyhash[numa_node].append({ | |
"zone": zone, | |
"nr_free": free_fragments, | |
"sz_fragment": fragment_sizes, | |
"usage": usage_in_bytes }) | |
return buddyhash | |
def load_buddyinfo(self): | |
buddyhash = self.read_buddyinfo() | |
self.log.info(buddyhash) | |
return buddyhash | |
def page_size(self): | |
return os.sysconf("SC_PAGE_SIZE") | |
def get_order_sizes(self, max_order): | |
return [self.page_size() * 2**order for order in range(0, max_order)] | |
def __str__(self): | |
ret_string = "" | |
width = 20 | |
for node in self.buddyinfo: | |
ret_string += "Node: %s\n" % node | |
for zoneinfo in self.buddyinfo.get(node): | |
ret_string += " Zone: %s\n" % zoneinfo.get("zone") | |
ret_string += " Free KiB in zone: %.2f\n" % (sum(zoneinfo.get("usage")) / (1024.0)) | |
ret_string += '\t{0:{align}{width}} {1:{align}{width}} {2:{align}{width}}\n'.format( | |
"Fragment size", "Free fragments", "Total available KiB", | |
width=width, | |
align="<") | |
for idx in range(len(zoneinfo.get("sz_fragment"))): | |
ret_string += '\t{order:{align}{width}} {nr:{align}{width}} {usage:{align}{width}}\n'.format( | |
width=width, | |
align="<", | |
order = zoneinfo.get("sz_fragment")[idx], | |
nr = zoneinfo.get("nr_free")[idx], | |
usage = zoneinfo.get("usage")[idx] / 1024.0) | |
return ret_string | |
def main(): | |
"""Main function. Called when this file is a shell script""" | |
usage = "usage: %prog [options]" | |
parser = optparse.OptionParser(usage) | |
parser.add_option("-s", "--size", dest="size", choices=["B","K","M"], | |
action="store", type="choice", help="Return results in bytes, kib, mib") | |
(options, args) = parser.parse_args() | |
logger = Logger(logging.DEBUG).get_logger() | |
logger.info("Starting....") | |
logger.info("Parsed options: %s" % options) | |
print logger | |
buddy = BuddyInfo(logger) | |
print buddy | |
if __name__ == '__main__': | |
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2014-03-15 23:34:32,352 - main - INFO - Starting.... | |
2014-03-15 23:34:32,352 - main - INFO - Parsed options: {'size': None} | |
<logging.Logger object at 0x1102cffd0> | |
2014-03-15 23:34:32,352 - main - DEBUG - Parsing line: Node 0, zone DMA 5 4 4 4 4 3 2 0 0 0 2 | |
2014-03-15 23:34:32,353 - main - DEBUG - Parsed line: {'numa_node': '0', 'zone': 'DMA', 'nr_free': '5 4 4 4 4 3 2 0 0 0 2'} | |
2014-03-15 23:34:32,353 - main - DEBUG - Parsing line: Node 0, zone DMA32 2 2 3 2 2 1 3 1 2 3 386 | |
2014-03-15 23:34:32,353 - main - DEBUG - Parsed line: {'numa_node': '0', 'zone': 'DMA32', 'nr_free': '2 2 3 2 2 1 3 1 2 3 386'} | |
2014-03-15 23:34:32,353 - main - DEBUG - Parsing line: Node 0, zone Normal 56332 12573 2212 959 448 179 35 5 0 0 35002 | |
2014-03-15 23:34:32,353 - main - DEBUG - Parsed line: {'numa_node': '0', 'zone': 'Normal', 'nr_free': '56332 12573 2212 959 448 179 35 5 0 0 35002'} | |
2014-03-15 23:34:32,375 - main - INFO - defaultdict(<type 'list'>, {0: [{'usage': [20480, 32768, 65536, 131072, 262144, 393216, 524288, 0, 0, 0, 8388608], 'nr_free': [5, 4, 4, 4, 4, 3, 2, 0, 0, 0, 2], 'zone': 'DMA', 'sz_fragment': [4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304]}, {'usage': [8192, 16384, 49152, 65536, 131072, 131072, 786432, 524288, 2097152, 6291456, 1619001344], 'nr_free': [2, 2, 3, 2, 2, 1, 3, 1, 2, 3, 386], 'zone': 'DMA32', 'sz_fragment': [4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304]}, {'usage': [230735872, 102998016, 36241408, 31424512, 29360128, 23461888, 9175040, 2621440, 0, 0, 146809028608], 'nr_free': [56332, 12573, 2212, 959, 448, 179, 35, 5, 0, 0, 35002], 'zone': 'Normal', 'sz_fragment': [4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304]}]}) | |
Node: 0 | |
Zone: DMA | |
Free KiB in zone: 9588.00 | |
Fragment size Free fragments Total available KiB | |
4096 5 20.0 | |
8192 4 32.0 | |
16384 4 64.0 | |
32768 4 128.0 | |
65536 4 256.0 | |
131072 3 384.0 | |
262144 2 512.0 | |
524288 0 0.0 | |
1048576 0 0.0 | |
2097152 0 0.0 | |
4194304 2 8192.0 | |
Zone: DMA32 | |
Free KiB in zone: 1590920.00 | |
Fragment size Free fragments Total available KiB | |
4096 2 8.0 | |
8192 2 16.0 | |
16384 3 48.0 | |
32768 2 64.0 | |
65536 2 128.0 | |
131072 1 128.0 | |
262144 3 768.0 | |
524288 1 512.0 | |
1048576 2 2048.0 | |
2097152 3 6144.0 | |
4194304 386 1581056.0 | |
Zone: Normal | |
Free KiB in zone: 143823288.00 | |
Fragment size Free fragments Total available KiB | |
4096 56332 225328.0 | |
8192 12573 100584.0 | |
16384 2212 35392.0 | |
32768 959 30688.0 | |
65536 448 28672.0 | |
131072 179 22912.0 | |
262144 35 8960.0 | |
524288 5 2560.0 | |
1048576 0 0.0 | |
2097152 0 0.0 | |
4194304 35002 143368192.0 |