6-Port LACP Cacti Template

Or, ‘How to monitor LACP device participation levels’

With our HP 2910al, we have a few streaming machines wired with multiport LACP ethernet bonds. Early on with our LACP setup we realized that we needed to monitor the traffic distribution of the individual members in the trunk. Why?

  • Configuration of LACP, especially with older LInux distributions, can be difficult. Misconfigurations don’t generate errors, etc.
  • Some hardware, bot NIC and switches, do not support the more advanced types LACP.
  • To ensure one member of the trunk was not being maxed. This could hide performance problems from us if our traffic was passing over a port with spare capacity.

To that end we created a template in Cacti. (“We” actually means my colleague Roland did the real work, figuring out how to create custom Cacti graphing templates!) It very clearly shows us the distribution of traffic across all members of the trunk.

Kinda groovy, eh?

Here are the graph template files for Cacti. They were saved from v0.8.8a:

  • cacti_graph_template_interface_-_traffic_-_inbound_6_trunks.xml
  • cacti_graph_template_interface_-_traffic_-_outbound_6_trunks.xml

Bon ap’!

How to configure Ethernet Bonding with LACP on Debian Lenny Linux

10GoE is still pricey. But you can get a 4Gbps connection inexpensively!

Quick update: I built a Debian Squeeze box recently and the same steps worked for me, initially. But problems developed afterwards. Specifically, net-agent (snmp) would take up more and more CPU over time. A “process leak”, if you will. After months of monitoring and diagnostics we found a solution which involves a different NIC configuration.

So you need a fat network connection on your Debian 5 server, but a 10 GoE infrastructure is not in your budget? No fear! Link Aggregation Control Protocolis here to save the day!

This was both the problem I had, and the solution I learned to implement thanks to a question posted on serverfault.com (Multiplexed 1 Gbps Ethernet?) and some hours of research & experimentation.

Here’s what I did:

  • Purchase a NIC capable of LACP with solid Linux driver support. I went with an Intel PRO/1000 PT Quad Port Server Adapter.
  • Purchase a Switch capable of LACP. It should be more than capable of handeling the bandwidth. We opted for an HP ProCurve 2510G-24.
  • Install ifenslave-2.6: apt-get install ifenslave-2.6
  • Purchase some network cables. Cat 6 if you can, Cat 5e if not.
  • Edit /etc/modules and add bonding mode=4 miimon=100 max_bonds=2. This will load the module at boot time in the future. The value of max_bonds is the number of bonding devices your system will have. The default is 1.
  • Load the module, so we can proceed. modprobe bonding. There should be no errors. Confirm it is loaded with modprobe -l | grep bond
  • Edit /etc/network/interfaces to look something like this:
    # The loopback network interface
    auto lo
    iface lo inet loopback
    
    auto  bond0
    iface bond0 inet static
        address     192.168.1.131
        gateway     192.168.1.1
        broadcast   192.168.1.255
        netmask     255.255.255.0
        up   /sbin/ifenslave    bond0 eth0 eth1 eth2 eth3
        down /sbin/ifenslave -d bond0 eth0 eth1 eth2 eth3
    
    #similar configuration for bond1
    
  • Add to /etc/modprobe.d/arch/X86_64.conf something like this for a 64-bit installation:
    
    alias bond0 bonding
    alias bond1 bonding
    
  • Plug in the network cables

  • Restart the network: /etc/init.d/network restart
  • Configure the switch to use the four ports as a single trunk. Here’s how for an HP ProCurve:

  • Confirm that there are no errors
    1. The NIC and switch LEDs should be active
    2. dmesg will show what the kernal thinks of your new configuration
    3. ifconfig should show your bond0 interface up
    4. The interface should be pingable both internally and from other computer

You should be good to go at this point!

Discrete Windows 7 Workstation for the Living Room

Small, discrete, usable Windows 7 Workstation Project

We have had a need for a small, out of the way computer for some time. My wife’s netbook runs Ubuntu, but she misses some of the Windows-based applications like the software updater for her Android phone. And the netbook can be painfully slow at times. We also had a list of other requirements, beyond running windows:

  • Small footprint: our flat is very small, we didn’t want to waste space in our living area
  • Quiet: The sound of a fan running while we’re not using the machine would be irritating
  • Low power: While our electricity is inclusive, efficiency is important for an always on machine
  • Low key: My wife and I have an almost pathological desire to hide all of the technology in our apartment
  • Quick: We are both impatient with slow user interfaces

With this in mind, I set out to build a family workstation. After some research and consideration, the parts list below came together:

  • Mobo: Intel D525MW, Atom D525 Dual Core, Mini-ITX (SFr.75)
  • Case: Jou Jye NU-057Wi, Mini-ITX, Vesa Halterung, 250 Watt (SFr.104)
  • System Drive: Intel SSD 320 Series 40GB, SATA-II, 2.5 Zoll (SFr.99)
  • Data Drive: Samsung HN-M101MBB, M8, 5400rpm, 8MB, 2.5 Zoll, 1TB, SATA (SFr.99)
  • Memory: Kingston ValueRAM, 2 GB, DDR3-1066, CL7, SODIMM (SFr.32)
  • OS: Microsoft Windows 7 Home Premium 32bit, OEM (SFr.94)

Total damage? ~SFr.510.

The Intel Atom motherboard was selected because of both the low cost and the passively cooled CPU heat sink. While the case Jou Jye is not cheap it did offer the most interesting option to reduce the technical clutter in our living space. Even with the relatively slow Atom CPU, the 4GB of RAM and SSD system drive make this puppy feel as fast as any workhorse system…as long as you are not running FFmpeg or any other CPU intensive application.

(larger picture)
Matching this new kit with existing keyboard, mouse, and monitor gives us a working system.

  • Keyboard & Mouse: A Logitec keyboard that has been laying around for ages, and a company mouse from xtendx
  • Monitor: A 19″ NEC 1980SX (1280×1024) which I bought off ebay.co.uk in 2003 ago for £400. It was a professional photography display back then so it still looks better than average even today.

Add in a some time down at the local hardware store, a jigsaw, and a bit of elbow grease…and presto! A nice and tidy workstation. The desk is 40cm x 75cm at its widest, with a footprint of under 1/3rd of a meter. The computer case itself is built into the lower shelf. All of the cords are secured to the underside the the shelf including the power strip and ethernet cable.

(larger picture)
Sweet.

Bandwidth testing of LACP bonding link in Linux with iperf

Validating our multi-channel ethernet teams on Debian Lenny & Ubuntu Lucid Lynx

Over the past two months my company (xtendx AG) moved our servers to a new data center at green.ch. One of the primary motivations for this move was to gain access to a multiple gigabit per second Internet link. Each of our production streaming servers have either 2 or 4 channels ethernet bonding configuration with LACP. Once they were configured I set out to test their capacity and validate the entire design and configuration.

To that end, I installed iperf onto each of our servers. One box was configured as a server. Two others were configured as clients. Because of the method LACP uses to split up the traffic, it is impossible or merely very difficult to setup a server-to-server that uses more than 1Gbps. Generally, LACP accomplish multiple link speeds by splitting the individual client-server traffic over the two links on a client-server address pair basis.

Our setup here is fairly simple: three servers, each with either a 2-channel or 4-channel LACP ethernet bonding setup connected to a lone HP ProCurve 2510G-24 switch. The switch was manually configured to place the ports into a dynamic LACP bond. The bonds themselves are configured with the kernel module parameters mode=4 miimon=100 max_bonds=4 xmit_hash_policy=1

Running iperf in server mode. Note that iperf uses port 5001 by default, so adjust your firewalling solution if necessary.

iperf -s -i 2

Running iperf in client mode. This was done on two physically separate machines.

cat /dev/zero |  iperf -c svr.example.com -t 2400 -i 2

Yup, looks good. There was the possibility that both clients would have come in on the same link. This is possible because the decision about which channel to use is based upon a the source and destination addresses. It is also by design–don’t fret! Simply using a different server for one of the clients would resolve the issues.

Building up & testing a new 9TB SATA RAID10 NFSv4 NAS, part III

‘Mistakes Were Made’

Six months after starting this project, I received an email from Ryan Ellis asking me if I had any tips regarding this NAS build.

Just read your article on the NAS. Amazing job. I would like to replicate what you did. Any tips?

Well, I most definitely do have a list of things I’d have done different! Nothing too dramatic, but there are some decisions that were overkill and some where I went a little too low end.

  • Using Intel and Asus motherboards: The Intel DP55KG w/ P55 Express chip set is not liking the Ubuntu 10.04 LTS Linux, or apparently any Linux distro for that matter. Specifically, the NAS box with the Intel mother board is unable to do a soft reboot. That means every reboot requires my physical presence in the data center. This has been a known problem for a while but it did not turn up during my mobo research. Many folks have tried various kernel options to change the rebooting behavior with mixed success. I’ve not been able to resolve the issue. When building up the NAS box I told myself that the Linux community would eventually resolve the issue. Maybe it has, but now that we are in production I can’t really experiment with the server.

    Lesson Learned: If the mobo is not working perfectly for you, then find another. It’s too painful to revisit once in production.

  • Not using “server grade” motherboards: Linux is unable to monitor things on the Asus and Intel motherboards, like fan speed and temperature, that I’d like to be graphing in Cacti. This is apparently possible with the “server grade” budget motherboards from the likes of SuperMicro.


    Lesson Learned: It only saved us SFr.400-800 to use these performance desktop motherboards, but our ability to proactively monitor fans is lost. I wish I’d gone for a SuperMicro motherboard.
  • The network load is much lower than I had realized, so the Intel Quad-port NIC is overkill–not even 100 Mbps at peak usage! This is apparently due to the client side file cache on our client server machines. This was difficult to predict on our old system because we were running with direct attached storage. In hind sight I wish I’d done more research. The two Intel PRO/1000 PT Quad Port Server Adapter could have been single port NICs, saving us SFr.800 total.


    Lesson Learned: Try to accurately measure and predict how much network traffic you’ll see. Did I really need four port NIC bonding? No even close.
  • I didn’t pay enough attention to adapter-to-drive cabling. The LSI 3ware 9650SE-ML16 card came with 1-to-4, Multilane-to-SATA breakout cables, but the SuperMicro SuperChassis 836A-R1200B came with backplane with four Multilane ports. That ment sourcing four CBL-SFF8087-05M Multilane-to-Multilane cables, an extra cost. And when I did get them, two were ~10cm shorter than I would have preferred–the cables are currently a bit tight and cannot be moved without loosening the connection. We probably spent another SFr100 on extra cabling.

    Lesson Learned: At least think about device-to-device cabling beforehand, and don’t leave until the build.

  • RAID 1+0 may have been overkill, RAID 6 performance would probably have sufficed. Our production metrics seem to indicate that we run at no more than 33-40%, conservatively, of capacity at peak, and the vast build of our NAS activity is reads. RAID 6 probably would have been a safe choice, and doing so would have reduced the number of hard drives by 6 total (3 on each server), which would also have allowed us to use a smaller chassis. Total savings would have been SFr 1700-2000, a non-trivial amount.

    My wife, Robyn, helping me build up one of the NAS servers

    That said, we would be reducing our margin for error, room for future growth (there are currently two empty drive bays on each server), and not allowed changes in application behavior which would result in more writes. (RAID 6 is great for heavy read applications, like ours, but have much weaker write performance characteristics.)
  • I did not appreciate how little I understood drbd, or block-level replication for that matter. This resulted in taking poorly understood actions on production data. In hind sight, it would have been wise to setup a test environment on the side (Amazon EC2, some old kit, whatever) and experimented. If I had made a mistake, we would have had to implement our disaster recovery procedures, which are time consuming and resulted in non-trivial down time.

    Lesson Learned: If it works like magic, then you don’t have a clue how it works. For something as fundamental as DRBD is to a redundant NAS system, one should make decisions ad novum, ‘with intent’.

  • Setting up the monitoring was significantly more work that I had predicted. While our Cacti + SNMP setup is very powerful, it is not easy to get going for anything but very common metrics. Specifically, configuring important alerts for things like drive failures, or graphs of NFSv4 metrics has been a considerable amount of work. In fact, I’ve had to come up with my own NFSv4 Cacti template which, to my surprise, did not exist.

  • These boxes are heavy. Like in the 30kg region. Installing them into the rack alone, even with the assistance of a foot-actuated hydraulic lift, was difficult and borderline dangerous. Managing to get the rails aligned correctly was very challenging.

    60kg of Network Attached Storage

    Lesson Learned: Don’t install anything other than a switch alone.

  • WD Green versus WD RE4 drives: We could probably have used cheaper WD Green drives instead of the RE4 series “Enterprise Hard Drives”. Ryan Shrout and Patrick Norton talk about the apparent fallacy that WD Green drives are not suitable for a NAS in Episode #95 of This Week in Computer Hardware. The cost savings is huge. Currently at Digitec.ch, where we bought our drives, a WD Caviar RE4 2TB runs for SFr255 and a WD Caviar Green is SFr109–a SFr146 savings. With the 22 data mount hard drives in our build, that works out to SFr3,212! And we could have saved an additional SFr~168 on the operating system drives too.

All of that said, we are in production and everything works. More dramatically, this project after a mere six months has already resulted in a positive return on investment, when accounting for hardware costs alone. Factor in the time I spent on this project, 60-80 hours, and we will be in the black some time in Q1 2011. Not bad. (This self-built approach was taken in favor of outsourcing our storage to our hosting company’s shared NetApp NAS at a TB/month rate.) It also has been a wildly educational experience and forced me to understand my application even more than before.

Building up & testing a new 9TB SATA RAID10 NFSv4 NAS, part I

Plus, a glamour shot the server itself!

Over the past few weeks I’ve been building up a new data center for my employer, xtendx AG. One of the core tasks has been to design, assemble and install a new storage system. To that end, I’ve put together the below NFSv4 Network Attached Storage (NAS) system.

  • 3U 16-bay Chassis: SuperMicro SuperChassis 836A-R1200B
  • LGA1156 Mother Board: Intel DP55KG w/ P55 Express chip set
  • 2.67 MHz CPU: Intel Core i5 750
  • 8GB DDR3 1366MHz RAM
  • 3x 250 GB Hard Drives for OS: WD RE3 250GB WD2502ABYS
  • 11x 2TB Hard Drives for Data: WD RE4 2TB WD2003FYYS
  • RAID controller: LSI 3ware 9650SE-ML16
  • Quad Port server NIC: Intel PRO/1000 PT Quad Port Server Adapter
  • Operating System: Ubuntu 10.04 LTS

(larger picture)
Both the data and system volumes are in a RAID 10 configuration, with a hot spare for the OS already present. I’ll be adding one or two hot spare 2TB drives in a few weeks.

Ubuntu is installed and mostly configured. Over this long Assention Day weekend, I’ve got IOzone running some benchmarks. Once those are complete I’ll graph and post the data. Below is a sample graph from my first test IOzone run with iozone -Ra -g 64G -n 8B -z -b out.wks.

The main reason behind these benchmarks are not the pretty graphs, which I do love. What we are looking for is a comparison with our existing server infrastructure to ensure the new system will hold up under the load of our on-demand streaming servers.

Once the NAS box is up in the server room, we’ll perform additional configuration, tests and tuning. E.g.: Perform these benchmarks over the network from the client machines. Perform them again one the NFS clustering with drbd is setup with a 2nd NAS box. And then we go into production!

There is still much work to be done. Once all the raw data is collected and analyzed, I’ll be posting here again for your reading pleasure.

Get more characters onto your Debian Squeeze text mode console

Improved utilization of terminal with Grub 2 + Debian 6 by increasing the screen resolution at boot

In the name of “Keep it simple, stupid!”I don’t install the X Windowing system onto my Debian Lenny and Squeezeservers. (And hence no Gnome, no KDE.) What I do change is the screen resolution from the default 640×480 to 1024×768, which any modern monitor can support. This has several advantages:

  • The boot sequence and commands like ls -l will show 56% more lines: 30 vs. 47
  • Commands like ps aux won’t wrap as frequently
  • The huge fonts in text-mode 640×480 are just plain ugly
  • Increases the overall number of characters displayed on a single screen by ~2.5x

Step one: Set the resolution for Grub 2. Edit /etc/defaults/grub with your favorite editor. Uncomment the GRUB_GFXMODE line and change the value to 1024x768.

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
GRUB_GFXMODE=1024x768

Step two: Have the resolution passed to the kernel when it is loaded. Find the below block around line 126 in /etc/grub.d/00_header and add the set gfxpayload=keepline. I’ve also commented the edit, but that is optional. 

if loadfont `make_system_path_relative_to_its_root "${GRUB_FONT_PATH}"` ; then
  set gfxmode=${GRUB_GFXMODE}
  #keep the gfx mode set in grub (stu)
  set gfxpayload=keep
  load_video
  insmod gfxterm
fi

Step three: Generate a grub2 config file based on the template edits above.

root@zimbali:~# update-grub
Generating grub.cfg ...
Found linux image: /boot/vmlinuz-2.6.32-5-amd64
Found initrd image: /boot/initrd.img-2.6.32-5-amd64
done

And that’s it! On the next boot the resolution will be greater and the text terminal will display many more characters.

Maximum number of threads in a JVM

Out of curitosity, I thought it would be interesting to know the maximum number of threads I could squeeze out of a JVM on various platforms. From an old thread on Stack Overflow, I knew that different JVM implementations had different limits. So, I whipped up this little Java program and ran it on various platforms. 

import java.util.concurrent.atomic.AtomicInteger;

public class ThreadCounter extends Thread {
	private static final AtomicInteger count = new AtomicInteger();
	
	public static void main(String[] args) {
		while (true)
			(new ThreadCounter()).start();
			
	}

	@Override
	public void run() {
		System.out.println(count.incrementAndGet());
		
		while (true)
			try {
				Thread.sleep(1000);
			} catch (InterruptedException e) {
				break;
			}
	}
}

Compile and run at the command line: 

  javac ThreadCounter.java
  java  -cp . ThreadCounter

The results:

Hardware OS JVM Count
2010 i5 MBP, 4GB OSX 10.6 Apple 1.6.0_22-b04-307 2540
Xeon 5120, 4GB Debian Lenny 64-bit Sun 1.6.0_12-b04 31618
EC2 m1.small, 1.7GB Ubuntu 10.04 32-bit OpenJDK IcedTea6 1.8.2 7452
Atom 330, 2GB Windows XP SP3 Sun 1.6.0_27-b07 5165
EC2 c1.medium, 1.7GB Ubuntu 10.04 32-bit OpenJDK IcedTea6 1.8 7029

Interesting…

Stack Overflow: Badge Analysis Over Time

The 87/18 Rule Applied to Stack Overflow Badges as Awarded Over the Past Nine Months

Another day, and another Stack Overflow database dump XML to play with.  Some quick statistics from the badges.xml file:

  • 62 distinct badges
  • 239,005 user badges awarded
  • 49,261 users have received at least one badge
  • The Top 11 badges (of 62, making 18% of distinct badges) make up 87% of badges awarded
    • Teacher (13.1% of all badges awarded)
    • Student (12.4%)
    • Supporter (10.6%)
    • Scholar (10.1%)
    • Editor (9.8%)
    • Nice Answer (9.6%)
    • Autobiographer (5.3%)
    • Critic (4.8%)
    • Commentator (4.1%)
    • Popular Question (3.6%)
    • Organizer (3.2%)

Sorta interesting, but it is worth noting that most of these are handed out like parking tickets in Venice Beach. (Easy badges in italics.)  What was more interesting was the anomalies that only become visible when graphed over time.  This graph is of the top eleven badges over time awarded as a percentage of the same top eleven.  This is better as showing relative increases or decreases in badge awarding events.

Larger Image

A few things caught my eye:

  • Beta Days: Things are pretty erratic in the beta days, but that is to be expected with a significantly smaller user base who were actively trying features out as they come online.
  • Days Long Outage: There is a days-long gap in the data in mid-April.  No badges were handed out for about four or five days, but they were eventually awarded when the problem was fixed.  I did not see a mention of any failures on  blog.stackoverflow.com, so the cause of this outage is a mystery to me.
  • Drastic drop new Organizer badges: Once the outage was resolved, the relative amount of Organizer badges drops permanently by two-thirds!  Clearly an Illuminati conspiracy to keep us SOpedians down.
    • ~28 Organizer badges are awarded per day for the three weeks prior to the outage
    • ~8 Organizer badges are awarded per day for the following three weeks

    (UPDATE: Geoff Dalgas , a coder at Stack Overflow, posts on The (unofficial) StackOverflow meta-Discussion Forum that the reason for this behavior is due to a database refactoring that allowed them to distinguish between Question edits and tag edits.)

  • Number of Popular Questions badges awarded daily grows over time: It starts out at near zero, and grows over time to be a considerable fraction of the total.  I guess this is to be expected as questions pick up more and more views over time.
  • No Popular Question badges awarded for 27 May:And, unlike the above outage, they do not seem to have been awarded retroactively.  There the missing badges show in both the absolute graph (not shown, trust me) and the relative graph (close up below.)

    (UPDATE: I asked about this anomaly on the new Meta Stack Overflow site (No Popular Question badges awarded for 27 May? ), and it apparently (to be confirmed) the systems view counter was down that day…so not questions incremented over the threshold for a badge.)