For the sake of discussion, if we are given a 64GB of RAM, at max, how many giga byte of physical memory (RAM) can I allocate to my data, keeping in mind that the fewer I allocate, the more boxes I would require to split the data gallery.
As a test, I wrote a small app to do what I described above. As it loaded around 19 GB of data into the RAM, the kernel started using swap. After loading 25 GB, the swap usage exponentially increased and at one point in time, it stopped responding and I had to physically bounce the box (plug off and plug in again). This din't make sense to me at the beginning.
At first, why would the kernel swap if there is enough RAM? I did a man proc and searched for "swap". I happened to read about /proc/sys/vm/swappiness - a parameter which defines the kernel's tendency to swap. The default value of swappiness on RHEL5 is 60. As the "used" RAM size reaches 60% of the total RAM size, the kernel would being to swap.
In my case, 60% of 64 GB is 38.4 GB. But my data size was 19 GB when the kernel started to swap. Where did the remaining 19GB go, eventually leaving my box in a non-responsive state?! Again intriguing. I could not find the relation between the data size and the memory required to store it.
Few more runs and a close memory monitoring showed that the kernel caches all the data (yes, almost all the data that are used very recently). Thus, if an application has loaded 2GB of data into the memory, the kernel would cache 4GB. 2 GB for the actual shared memory data and 2GB of unused cache using which the data was read/copied into the shared memory. On a typical server environment (runlevel 3), you wouldn't expect this to happen, since apart from the main apps no other applications will be running (like yum-updatesd, vlc, rhythmbox, etc). One would expect the kernel to drop the unused cache immediately. proc man page again showed one other important parameter - /proc/sys/vm/drop_caches. This entry point is helpful in instructing the kernel to drop the unused cache.
To free pagecache:
# echo 1 > /proc/sys/vm/drop_caches
To free dentries and inodes:
# echo 2 > /proc/sys/vm/drop_caches
To free pagecache, dentries and inodes:
# echo 3 > /proc/sys/vm/drop_caches
# echo 1 > /proc/sys/vm/drop_caches
To free dentries and inodes:
# echo 2 > /proc/sys/vm/drop_caches
To free pagecache, dentries and inodes:
# echo 3 > /proc/sys/vm/drop_caches
Mind the space between before the > symbol.
When an application loads all the data into the memory during its initialization and then never tends to read the disk, drop_caches is a real boon if you can't change the code (in my case, at the data center). In my case above, I was able to load 60 GB of data into the shared memory and share it with the other processes. The technique was to clear the cache frequently as the application initialized.
while :; do
echo 3 > /proc/sys/vm/drop_caches
sleep 30
done
As a thumb rule, swappiness must be set to 0 (echo 0 > /proc/sys/vm/swappiness or via sysctl.conf) before the application starts and the drop_caches must be set to 3 periodically to avoid any kind of swaps and performance degradations.
Once the app has been initialized and all the 60GB has been loaded into the memory, the while loop to drop the unused cache is not needed and it can be terminated safely. But the moment you do a huge file read, don't forget to run the script in the background, of course, as root. The need to drop_caches entirely depends on your application. Setting swappiness to 0 is ideal in my opinion for all the server environments.
The code fix is to use posix_fadvice with the POSIX_FADV_NOREUSE option since we will use the data in memory during the application initialization.
EDIT:
Here is the vmstat usage for the below content, when the swappiness was set to 60 and drop_caches was not triggered (by default its value will be 0): http://codepad.org/6pXz0UOH
18:46:22 ~/MySpace/files-to-load# ls -l
total 4198384
-rw-r--r-- 1 root root 1073731945 Jan 2 18:39 7
-rw-r--r-- 1 root root 1073731945 Jan 2 18:39 8
-rw-r--r-- 1 root root 1073731945 Jan 2 18:40 9
-rw-r--r-- 1 root root 1073731945 Jan 2 18:40 10
18:46:22 ~/MySpace/files-to-load#
total 4198384
-rw-r--r-- 1 root root 1073731945 Jan 2 18:39 7
-rw-r--r-- 1 root root 1073731945 Jan 2 18:39 8
-rw-r--r-- 1 root root 1073731945 Jan 2 18:40 9
-rw-r--r-- 1 root root 1073731945 Jan 2 18:40 10
18:46:22 ~/MySpace/files-to-load#
For the same content, vmstat log when swappiness is set to 0 and frequent echo 3 on drop_caches: http://codepad.org/9bNwcdbW