Build your own Drobo-Replacement based on ZFS

The Drobo hype

I saw the first Drobo presentation video on YouTube almost 2 years ago. Since then I was longing to get one, but the price for an empty box being 490,- € without a single harddrive was too much in my oppinion.

As a regular podcast listener I heared everywhere about the sexyness of the Drobo: MacBreak Weekly mentions it in every episode, the guys at BitsUndSo are collecting them like crazy, and finally Tim Pritlove also got one. But then he mentioned on his podcast Mobile Macs his strange difficulties with the Drobo which made me think a little bit more about the subject.

The Drobo has in my opinion some major drawbacks:

  • it doesn’t know its hosted data and the used filesystem, so on a resilvering task it has to duplicate all parts of a harddrive, also the ones with noise
  • one has to set an upper limit for the hosted filesystem, so Drobo acts to the host machine as one physical drive with a given size
  • you cannot use all the disk space if you use drives with different sizes
  • it is limited to a maximum amount of 4 drives
  • if your Drobo fails you cannot access your data

We can do better!

So what would I want from my dream backup device?

  • it should not have an upper storage limit
  • it should be able to heal itself silently
  • it should be network accessible (especially for Macs)
  • the drives should also work when connected to other hardware
  • it should be usable as a Time Machine target
  • it HAS to be cheaper than a Drobo

It doesn’t have to be beautiful or silent, since I want to put it in my storage room and want to forget about it.

Initial thoughts

In my oppinion the most modern and future proof filesystem at the moment is ZFS. After listening to the very good podcast episode of Chaos Radio Express CRE049 in German about ZFS by Tim Pritlove I always wanted to use it. Unfortunately Apple is very lazy with its ZFS plans. It is included in Leopard but can only read ZFS and the plans for Snow Leopard are very vague. So Mac OS X is no option at the moment. Since I want it to be cheap, Mac hardware is also no option.

FreeBSD seems to have a recent version of ZFS, but I gave OpenSolaris a try, since ZFS is developed by Sun I think Solaris is the first OS where new features of ZFS will appear. Bleading edge is always best 😉 so I looked further into this setting.

Test driving OpenSolaris

I wanted to make some more investigations so before using real hardware I wanted to test drive it with a virtualization software. After downloading the current ISO version of OpenSolaris 2008.11 I tried to install it on my Mac with VMWare Fusion but at that time I didn’t know that Solaris and OpenSolaris are the same so I had difficulties setting up the VM properly.

So I tried out VirtualBox and hoped, since it is now owned by Sun, it will work like a charm virtualizing Solaris. I set up a new VM with one boot disk and four raid disks.

virtualbox-opensolaris

I switched on all the fancy CPU extensions. There is only one ISO image for both x86/x64 so I turned on x64 and it automatically used the 64-Bit kernel. The hardware recommencations for ZFS say, that it works best with 64 Bit and at least 1 GB of RAM.

Installing OpenSolaris from the Live system worked very well. I was very surprised by the polished look thanks to Gnome (although I am a KDE fanboy). ZFS is now used by default as the boot filesystem on Solaris so I had to do nothing to activate it.

When the installation was complete and the system was up and running I made a little tweak to get access via SSH to the VM. Since it used NAT I set up a port forward from 2222 to 22 on my Mac. I edited the XML file of my virtual machine (~/Library/VirtualBox/Machines/Open Solaris/Open Solaris.xml in my case) and inserted the following lines to the DataItem section:

      <ExtraDataItem name="VBoxInternal/Devices/e1000/0/LUN#0/Config/ssh/HostPort" value="2222"/>
      <ExtraDataItem name="VBoxInternal/Devices/e1000/0/LUN#0/Config/ssh/GuestPort" value="22"/>
      <ExtraDataItem name="VBoxInternal/Devices/e1000/0/LUN#0/Config/ssh/Protocol" value="TCP"/>

After starting the VM I could connect with “ssh -l <user> -p 2222 localhost” to it.

I used to work for several years with a Linux system and after that on Mac OS X. I had no problems adapting to the Solaris world, since they took many products from the open source world like bash and integrated it. So the learing curve using this system seems very flat.

To get some info about the system I entered the following commands:

Get info about the kernel mode

$ isainfo -kv
64-bit amd64 kernel modules

Listing all system devices

$ prtconf -pv
System Configuration:  Sun Microsystems  i86pc
Memory size: 1061 Megabytes
System Peripherals (PROM Nodes):

Node 0x000001
    bios-boot-device:  '80'
...

Show the system log

$ cat /var/adm/messages

Setting up the storage pool

Since I want to have one huge storage pool which can grow over the time I used RAIDZ.

The following commands where entered as root.

First to get all connected storage devices:

# format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
       0. c3d0 <DEFAULT cyl 2044 alt 2 hd 128 sec 32>
          /pci@0,0/pci-ide@1,1/ide@0/cmdk@0,0
       1. c5t0d0 <ATA-VBOX HARDDISK-1.0-8.00GB>
          /pci@0,0/pci8086,2829@d/disk@0,0
       2. c5t1d0 <ATA-VBOX HARDDISK-1.0-8.00GB>
          /pci@0,0/pci8086,2829@d/disk@1,0
       3. c5t2d0 <ATA-VBOX HARDDISK-1.0-8.00GB>
          /pci@0,0/pci8086,2829@d/disk@2,0
       4. c5t3d0 <ATA-VBOX HARDDISK-1.0-8.00GB>
          /pci@0,0/pci8086,2829@d/disk@3,0
Specify disk (enter its number): ^C

The green ids are the device ids we need to set up the storage pool. To create a pool named “tank” I entered:

zpool create -f tank raidz c5t0d0 c5t1d0 c5t2d0 c5t3d0

To show the available pools type:

# zpool list
NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
rpool  3,97G  2,90G  1,07G    73%  ONLINE  -
tank   31,8G   379K  31,7G     0%  ONLINE  -

rpool is the pool on the boot device. You can see, that the space you get connecting 4 drives with 8 GB is almost 32 GB. When you store something on that pool it is stored redundantly and uses about 30 % more space to ensure the safety when one device is failing.

Now I created a filesystem on that pool

# zfs create tank/home

It is linked automatically to /tank/home. To get all live zfs filesystems enter

# zfs list
NAME                     USED  AVAIL  REFER  MOUNTPOINT
rpool                   3,44G   482M    72K  /rpool
rpool/ROOT              2,71G   482M    18K  legacy
rpool/ROOT/opensolaris  2,71G   482M  2,56G  /
rpool/export             746M   482M    21K  /export
rpool/export/home        746M   482M    19K  /export/home
rpool/export/home/mk     746M   482M  45,4M  /export/home/mk
tank                     682M  22,7G  26,9K  /tank
tank/home                681M  22,7G  29,9K  /tank/home

In this example I copied the OpenSolaris ISO image to my new filesystem. It occupies 681M. On the pool it occupies 911M.

#zpool list
NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
rpool  3,97G  3,44G   545M    86%  ONLINE  -
tank   31,8G   911M  30,9G     2%  ONLINE  -

A very nice feature of ZFS is built-in compression. Properties of file systems are inherited so if you set compression on tank/home and create a new system inside of it it is compressed automatically:

# zfs set compression=on tank/home
# zfs get compression tank/home
NAME       PROPERTY     VALUE      SOURCE
tank/home  compression  on         local
# zfs create tank/home/mk
# zfs get compression tank/home/mk
NAME          PROPERTY     VALUE         SOURCE
tank/home/mk  compression  on            inherited from tank/home

Health insurance

ZFS data validity is ensured by internal checksums so it can see on the fly if data is still valid and can reconstruct if necessary.

To get the status of a pool enter

# zpool status -v tank
  pool: tank
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c5t0d0  ONLINE       0     0     0
            c5t1d0  ONLINE       0     0     0
            c5t2d0  ONLINE       0     0     0
            c5t3d0  ONLINE       0     0     0

errors: No known data errors

A scrub is a filesystem check which should be done with consumer quality drives on a weekly basis and can be trigged by

# zpool scrub tank

and after some time check the result with

# zpool status -v tank
  pool: tank
 state: ONLINE
 scrub: scrub completed after 0h2m with 0 errors on Tue Jan 13 11:36:38 2009
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c5t0d0  ONLINE       0     0     0
            c5t1d0  ONLINE       0     0     0
            c5t2d0  ONLINE       0     0     0
            c5t3d0  ONLINE       0     0     0

errors: No known data errors

Gnome Time Machine

When logging into a Gnome session and browsing through the menues I noticed the program “Time Slider Setup”.

time-slider

When you activate it, it will create ZFS snapshots on a regular basis. These snapshots don’t waste disk space and you can travel back in time (not as fancy as with Time Machine, but who cares) with the Gnome file browser Nautilus.

time-slider-in-action

That is a killer feature and if I still would be a Linux/Java developer, this would be a good reason for me to switch to OpenSolaris. You don’t have to use a RAID with ZFS to make snapshots, so your filesystem has a built-in time-based versioning system. *cool*

Network accessiblity

My next step was to set up Netatalk on Solaris using this guide. I first had some difficulties and had to install gmake and gcc. The described patches didn’t work correctly since the file /usr/ucbinclude/sys/file.h was missing, so I changed the #ifdef statement from

#if defined( sun ) && defined( __svr4__ )
#include </usr/ucbinclude/sys/file.h>
#else /* sun __svr4__ */

to

#if defined( sun ) && !defined( __svr4__ )
#include </usr/ucbinclude/sys/file.h>
#else /* sun __svr4__ */

in all source files where it occurred:

  • etc/atalkd/main.c
  • etc/cnid_dbd/dbif.c
  • etc/papd/main.c
  • etc/papd/lp.c
  • sys/solaris/tpi.c

I need further work with this to get it running with Time Machine.

Getting real

So everything looks promising at first glance. The next logical step would be to buy  the hardware components and try everything out. I configured bare systems with midi towers using AMD or Intel processors for approximately 170,- to 190,- €. Thats about half the price of a Drobo. For each SATA hard drive I would buy a hot pluggable case. Another option would be to use external USB drives but that might lead to a quiet cluttered and fragile construction.

I need to investigate further to use the correct components. Motherboards tested with OpenSolaris are listed here: http://www.sun.com/bigadmin/hcl/

ZFS has a hotplug feature so if a device fails it can be replaced without rebooting and typing in any commands. But if that fails one can also enter some commands into the commandline.

Next steps

I really need to invest more time with the VM and try to corrupt some disk images and how ZFS reacts to that. Also expanding existing pools with different sizes will be interesting.

Of course I need to get Netatalk working and try to use it as a Time Machine target. I could use the VM on a different host to simulate to final system.

Stay tuned for my future investigations and please don’t start trolling about the superiority of the Drobo. I think that is a matter of taste.