<< September 2010 | Home | November 2010 >>

Cacti Graph Template and Script for NFS v4 Client

First draft: 14 of 35 data sources graphed
Bookmark and Share

Over the summer we at xtendx moved to a new data center at Green.ch, and I setup a new two-node 9 TB DRBD + ext4 + NFS4 NAS. I've also setup a Cacti monitoring system to graph key metrics and alert us to problems. One metric that the Cacti community has not gotten around to addressing is the relatively new and shiny NFSv4 clients and servers.

To address that omission, I started on my first Cacti graph template project. What the Cacti community does have is an NFS v3 client graphing template, so I used this as a base. Here is my preliminary result:

The data aquisition is performed on the target machine with a simple Bash (/usr/local/bin/cacti-nfs4.sh) script that is executed by snmpd. The script:

NFS=/proc/net/rpc/nfs
proc="read write commit open open_confirm open_named_att_dir \
  open_downgrade close set_attr fsinfo renew set_clientid confirm lock \
  lock_test unlock access get_attr loopup lookup_root remove rename \
  link symlink create pathconf statfs readlink readdir server_pas \
  delegreturn getacl setacl"

i=4;

for a in $proc; do
#       echo -n "$a.value "
        grep proc4 $NFS \
                | cut -f $i -d ' ' \
                | awk '{print $1}'
        i=$(expr $i + 1)
done

To configure snmpd to call the script and report the result, I've added the below to /etc/snmp/snmpd.conf and then restarted the snmpd daemon. (Now is probably a good time to mention that this is all on Debian Lenny, so if you are on a different platform YMMV.)

extend .1.3.6.1.4.1.2021.66 nfs_client /bin/sh /usr/local/bin/cacti-nfs4.sh

Once this is done the configuration can be confirmed with snmpwalk, like this below. The output not only should be there, but also match the output of cacti-nfs4.sh

ballito:/home/stu# !snmpwalk
snmpwalk -v2c -c public  ballito.be .1.3.6.1.4.1.2021.66.4
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.1 = STRING: "1209434"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.2 = STRING: "46246622"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.3 = STRING: "46218746"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.4 = STRING: "52265481"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.5 = STRING: "69"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.6 = STRING: "0"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.7 = STRING: "2"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.8 = STRING: "52197430"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.9 = STRING: "46346973"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.10 = STRING: "2"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.11 = STRING: "0"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.12 = STRING: "1"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.13 = STRING: "1"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.14 = STRING: "0"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.15 = STRING: "0"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.16 = STRING: "0"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.17 = STRING: "3260450"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.18 = STRING: "42080094"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.19 = STRING: "47829475"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.20 = STRING: "1"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.21 = STRING: "2357"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.22 = STRING: "89"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.23 = STRING: "0"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.24 = STRING: "0"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.25 = STRING: "161"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.26 = STRING: "1"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.27 = STRING: "27924"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.28 = STRING: "0"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.29 = STRING: "46133"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.30 = STRING: "3"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.31 = STRING: "140977"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.32 = STRING: "0"
UCD-SNMP-MIB::ucdavis.66.4.1.2.10.110.102.115.95.99.108.105.101.110.116.33 = STRING: "0"
ballito:/home/stu# 

With the data being properly collected and reported by SNMP, it's now time to configure Cacti to record and graph it all. nfsstat (which is at the core of the Bash script) reports 35 different metrics for NFS v4. For my tastes, that is way too many metrics for a single graph. I've defined an abridged Cacti graph template with 14 values for now, and intended to squeeze that down to 10 in the upcoming weeks. That said, the template has all but one data source template included. In the end, I think having two graphs may make sense: An everything graph, and a "values of interest" graph.

Here is the graph template itself: cacti_graph_template_ucdnet_-_nfs4_client.xml

Please feel free to let me know if you have any issues with it, have made improvements worth posting, or found it useful and want to buy me a beer!

Building up & testing a new 9TB SATA RAID10 NFSv4 NAS, part II

Initial DRBD Sync - 9TB in 4.5 days
Bookmark and Share

After much blood, sweat and tears getting xtendx up and running in our new data center with Green.ch this summer, I finally had time to complete a major "high availability" aspect of our new file server platform. Our basic architecture is fairly simple: a couple of application servers backed by the network attached storage.

The network attached storage (NAS) is comprised of a pair of nearly identical self-built (Building up & testing a new 9TB SATA RAID10 NFSv4 NAS, part I) file servers. The next, more complex step was to configure a Class C (Primary + Secondary) DRBD cluster. In an effort to mitigate some risk and spread the workload over the course of the summer, I staged the entire installation:

  1. May: Build NAS 0 (thanks to my wife, Robyn, too!)
  2. Early June: Install NAS 0 into data center, copy over production data
  3. Mid-June: Put NAS 0 into production as a simple ext4 + NSFv4 file server
  4. July: Build NAS 1 (Again, a thanks to my wife)
  5. July + August: Install NAS 1 into data center, configure as a HA NAS: drbd + ext4 + NSFv4.
  6. September: Copy data to NAS 1, take NAS 1 into production
  7. Late-September: configure NAS 0 as secondary node in a HA NAS: drbd + ext4 + NSFv4.
  8. Early-October: Initialize DRBD synchronization.

Kicking off the block-level disk synchronization was a big deal. These servers are a running production system and it is paramount that existing service delivery was not impacted. At first I had left the DRBD sync rate unaltered, which I believe effectively means 'fast as you can'. This quickly resulted in poor read times for production applications, so I kicked it down dramatically while the sync was in progress with the drbdsetup command:

 sudo drbdsetup /dev/drbd1 syncer -r 24M
After fooling around with various rates, I settled on 24M. It's much lower than the system could theoretically synchronize at, but that is not the goal. This production value is also now configured in /etc/drbd.d/global_common.conf:

common {
    #snip
    syncer {
        # rate after al-extents use-rle cpu-mask verify-alg csums-alg
        rate 24M; 
    }
}

Interestingly, the synchronization rate does not match disk IO as measured by Cacti via SNMP; in fact it is roughly half. (I have no idea why.) As you can see below, it took a solid 4.5 days for the synchronization of a single 9TB device to complete:

Primary DRBD Node Read and Write IO (MBps)
Bytes-Per-Second during initial DRBD synchronization by primary to secondary

Secondary DRBD Node Read and Write IO (MBps)
Bytes-Per-Second during initial DRBD synchronization by secondary from primary

A bonus to this work on a production system is that the DRBD sync reserved as a real-world production load test. I now know with a respectable degree of certainty what our NAS is capable of, and when it is approaching stressed.

The series:
Building up & testing a new 9TB SATA RAID10 NFSv4 NAS, part I
Building up & testing a new 9TB SATA RAID10 NFSv4 NAS, part II
Building up & testing a new 9TB SATA RAID10 NFSv4 NAS, part III

MacBook Pro GPU Overheating

Spooky...
Bookmark and Share

My mid-2008 MBP's nVidia GeForce 8600M GT GPU overheated yesterday while connected to a second monitor. Yes, it has the nVidia chip set that apparently has issues. No, Data Quest in Zurich says that it is not covered by the recall warrantee extension; and they charged me SFr80 for the privilege of learning this.

2008 MacBook Pro overheading to to GPU

Spooky!

After letting it cool down for 30 minutes it started back up with no problems. But today, one day later, I saw some early flickering. It's days may be numbered...

Update: Yup, it died the next day. So I bought a new MBP and swapped in the Intel SSD in 60 seconds flat!