Using Sparsebundle Images for Improved PHD Syncing

Since its introduction in Tiger, sysadmins everywhere have been both elated and aggravated by portable home syncing. For a lot of people it is a fantastic solution, and one that Apple has continued to add granularity to with time. It’s not hard though to hit upon situations though where PHD sync just makes sure your disappointment is evenly distributed. We are going to take a look at how to get around one of those situations today.

Read on for more…

So what’s the beef with PHD sync?

On the surface, PHD sync is simply awesome. It fixes all the issues of network homes, improves local performance, centralizes data, and allows users to move from Mac to Mac to Windows to Linux boxes with all their stuff in a network home. There are two issues though that put a wet blanket over the whole thing: lack of file part sync and lack of link speed detection. See if these scenarios sound familiar…

A user is on the road. He VPNs into the network and all of a sudden the houses start blinking in the menubar and everything gets slow. He can’t even logout now as the homes continue try to sync until the bitter end and it will hold logout. This happens because there isn’t any link speed detection. If the Mac can see the home server it will fire up File Sync.app and just go to town. (FWIW, this happens with iDisk sync as well as it’s the same sync engine.) This condition can be mitigated quite a bit by using server side change tracking, but if you aren’t using Mac OS X Server for home storage then you are just out of luck. The only real solution here is to use crankd to disable syncing when you are on the VPN network segment. That is a different article though.

One that gets everyone, even when on the local network, is the lack of file part syncing. If this was rsync then only the changed parts of the files would be copied to the home. The File Sync system only syncs whole files though. This really wrecks things up for people that use apps that need really large, contiguous, files. Yes Entourage. I’m looking directly at you. A database like Entourage has can only be synced at login and logout. When it syncs the whole freaking thing gets copied back and forth if even one email message, one calendar event, or one todo was touched. Lots of people have GBs of Entourage database and this can really delay the process. What happens in those cases is that the users hit the cancel button and they just don’t sync the data. This defeats the whole point.

Why are people using Entourage?

This is a question that comes up sometimes and the answer is really simple. Exchange. “But wait!”, you say, “Snow Leopard has Exchange support built in, and it doesn’t use a big database.”. That’s true, but what if you aren’t on Snow Leopard? Even worse, what if you need something that Apple’s software doesn’t do like mail delegation? Back to Entourage for you! Now it is worth saying that EWS Entourage is much nicer to use, but it still doesn’t change the database sync issues.

The root issue here though isn’t that Entourage uses a big database, it’s that File Sync only pushes whole files. Any large file will have the exact same issues, I just chose Entourage as it’s a common case where this issue comes up.

Disk images to the rescue! Again!

I feel like I say thislot here, but disk images are the solution. In particular the sparsebundle image type. Sparsebundles are a new type of sparse image that was introduced with 10.5 and they are worth taking a quick look at on their own.

Before 10.5 we could use sparse images on Mac OS X. In a nutshell, a sparse image is a self expanding image. You set the volume in the image to a size, say 20GB, but the actual image file will be a tiny fraction of that size. As you add data to the sparse the file size will grow to accommodate the files. This is very cool but it had a few issues.

Firstly you just end up with a giant file. This is hard to sync and any sort of filesystem corruption can render the entire thing unusable. Remember the FUD and stories that surrounded FileVault for it’s first few years of life? A lot of that spawned from the relative fragility of sparse images compared to a traditional file system.

With Leopard things got changed up and Apple introduced a new sparse format, the sparsebundle. This looks just like a sparse image but differs in its internal structure. Rather than just being one flat file, a sparse bundle is a filesystem bundle. Inside that bundle the individual bands of the disk image are expressed as individual files. This has a dramatic impact on image file system recovery. If there is some snafu and a band of the image gets corrupted it simply acts like a disk with bad blocks. It will most likely mount, and you can probably recover your data. The inclusion of the bands as files also can dramatically improve the performance of things that need to track changes in an image such as Time Machine or PHD sync.

Take a deeper look in a sparsebundle.

We will get to the syncing stuff in a minute, but right now lets take a peek inside a sparsebundle.

First off I’ve created a 2.5 GB sparse image on my Desktop. Let’s take a look and see what that looks like.

JoshBookAir:Desktop macshome$ df -H
Filesystem      Size   Used  Avail Capacity  Mounted on
/dev/disk0s2    121G    55G    65G    46%    /
devfs           117k   117k     0B   100%    /dev
map -hosts        0B     0B     0B   100%    /net
map auto_home     0B     0B     0B   100%    /home
/dev/disk2s2     20G   1.3G    19G     7%    /Volumes/macshome
/dev/disk1s9    2.6G    25M   2.6G     1%    /Volumes/SparseImage

You can see that the volume named SparseImage has 2.6G (You young turks that want to use the Gi notations can see that with the -h flag.) of free space. If we were to look at the actual image file though you will see that it is much smaller than that.

JoshBookAir:Desktop macshome$ ls -alh SparseImage.sparseimage
-rw-r--r--  1 macshome  staff    32M Sep 10 13:17 SparseImage.sparseimage

I’ve got nothing in that image, so the actual file size is only 32M, but notice that it is a single file. Now lets take a look at another volume. In our df output above, the volume named macshome is my iDisk sync partition. Apple uses a sparsebundle for these and since I’ve got 1.3G in there it’s a ready made example. We can see the bundle resting on my filesystem here:

JoshBookAir:002436f16ac7 macshome$ ls -lah
total 176
drwx------  6 macshome  staff   204B Sep  9 10:32 .
drwx------@ 4 macshome  staff   136B Sep 10 13:16 ..
-rw-r--r--@ 1 macshome  staff     3B Sep 10 00:13 .pid
-rw-------  1 macshome  staff    64K Sep 10 10:17 SyncSets
-rw-r--r--  1 macshome  staff    17K Sep 10 10:17 SyncSets-journal
drwxr-xr-x@ 6 macshome  staff   204B Sep 10 13:22 macshome_iDisk.sparsebundle

But if we take a closer look at it notice that it isn’t a single file.

JoshBookAir:002436f16ac7 macshome$ ls -lah macshome_iDisk.sparsebundle/
total 16
drwxr-xr-x@   6 macshome  staff   204B Sep 10 13:22 .
drwx------    6 macshome  staff   204B Sep  9 10:32 ..
-rw-r--r--    1 macshome  staff   498B Apr 21 11:24 Info.bckup
-rw-r--r--    1 macshome  staff   498B Apr 21 11:24 Info.plist
drwxr-xr-x  164 macshome  staff   5.4K Aug 12 09:16 bands
-rw-r--r--    1 macshome  staff     0B Apr 21 11:24 token

The important part we want to take a look at here is the bands directory. I’ve truncated the output here for the sake of space:

JoshBookAir:002436f16ac7 macshome$ ls -lah macshome_iDisk.sparsebundle/bands/
total 2615408
drwxr-xr-x  164 macshome  staff   5.4K Aug 12 09:16 .
drwxr-xr-x@   6 macshome  staff   204B Sep 10 13:22 ..
-rw-r--r--    1 macshome  staff   8.0M Sep 10 13:23 0
-rw-r--r--    1 macshome  staff   8.0M Sep 10 13:23 1
-rw-r--r--    1 macshome  staff   8.0M Aug 27 13:35 2
-rw-r--r--    1 macshome  staff   3.6M Aug 27 13:35 3
-rw-r--r--    1 macshome  staff   8.0M Sep 10 10:18 30
-rw-r--r--    1 macshome  staff   8.0M Sep 10 13:22 31
-rw-r--r--    1 macshome  staff   8.0M Apr 21 11:45 32
-rw-r--r--    1 macshome  staff   8.0M Apr 21 11:46 33
-rw-r--r--    1 macshome  staff   8.0M Apr 21 11:47 34
-rw-r--r--    1 macshome  staff   8.0M Apr 21 11:48 35
-rw-r--r--    1 macshome  staff   8.0M Apr 21 11:51 36
-rw-r--r--    1 macshome  staff   8.0M Apr 21 11:55 37
-rw-r--r--    1 macshome  staff   8.0M Sep 10 10:21 38
-rw-r--r--    1 macshome  staff   8.0M Aug 27 13:44 39
-rw-r--r--    1 macshome  staff   8.0M Sep 10 10:21 3a
-rw-r--r--    1 macshome  staff   8.0M Sep 10 00:14 3b
-rw-r--r--    1 macshome  staff   8.0M Apr 21 21:43 3c
-rw-r--r--    1 macshome  staff   8.0M Apr 21 21:44 3d
-rw-r--r--    1 macshome  staff   8.0M Apr 21 21:45 3e

Look at all those 8M files! Those are the bands that makeup the filesystem of my sparsebundle and there are a lot of them. 162 to be exact…

JoshBookAir:002436f16ac7 macshome$ ls  macshome_iDisk.sparsebundle/bands/ | wc -w
     162

When you make your own sparsebundle images you can actually control the size of these band files, but for now we are just going to take the default.

Now when I write changes into my sparsebundle it is only going to actually modify a few of those bands at a time. Even if I write changes to 10 bands then I only really have 80M of changed data to deal with. That is a far better thing to sync than 1.2G and it makes my image far more resistant to corruption.

Enough with the bands! I want to fix homesync!

In order to use all this goodness we need to do a few tasks.

  1. We need to create a sparsebundle that is big enough.
  2. We need to move our Microsoft User Data folder into that image.
  3. We need a way to mount the image in ~/Documents
  4. We need to mount that image on login and root it in the place that Office can find it in.
  5. We want to make sure the image is detached at logout.
  6. Script as much of this as possible.

This is essentially the same process that FileVault uses to work. Apple throws a few extra bits in like doing a space reclamation on logout if you are plugged into an AC adaptor. If you want you can set that sort of stuff up with something like iHook without a lot of fuss. We are just going for the basics here right now though.

In order to prepare for this whole exercise I actually installed Entourage on a Mac here and synced up my IMAP mail account. 1.43 GB of Entourage DB all ready to go.

1. Create a sparsebundle that is big enough

How big is big enough? Well big enough to hold the database at least. It’s probably fairly safe to take a look at the total freespace on our disk and set the sparsebundle volume to something a bit less than that. This is easy to determine with df.

JoshBookAir:002436f16ac7 macshome$ df -H /
Filesystem     Size   Used  Avail Capacity  Mounted on
/dev/disk0s2   121G    55G    65G    46%    /

So we can see that there is 65G free on my device. If I want a quick and dirty way to just get the free space I can do this:

JoshBookAir:~ macshome$ df / | tail -n1 | awk '{print $4}'
127782112

Nasty pipefitting, but effective! Notice we are just using the blocks rather than the “human readable” format. Computer scripts need no such niceties.

Next we need to create the image. Here I’m just giving the sparsebundle filesystem the same amount of free space as the disk. Since we are going to mount it on a custom path the user won’t notice this at all. Note though that as the disk fills up the sparse filesystem won’t reflect this thus the custom logout stuff I mentioned above.

This command will create a disk image named “msud.sparsebundle”.

hdiutil create -sectors 127782112 -type SPARSEBUNDLE -fs HFS+ -volname "Microsoft User Data" msud -attach

Now we have our sparsebundle to work with, mounted on the Desktop like a regular disk.

2. Move our Microsoft User Data folder

This is the easy one if we are staying in the GUI. First we need to quit all the MS apps. Go into Activity Viewer and make sure they are all shut down. Then just drag the contents of ~/Documents/Microsoft User Data to your disk image of the same name.

If we are using the Terminal this is just a simple mv command.

mv ~/Documents/Microsoft User Data/* /Volumes/Microsoft User Data

Now make sure everything copied correctly and delete the Microsoft User Data folder from Documents. Once that is done, eject the disk image.

3. Mount the image in ~/Documents

This is easier than you think it will be. With the hdiutil command we can root a disk image wherever we want in the file system. Something like this will work with the image we just created:

hdiutil attach msud.sparsebundle -mountroot ~/Documents/ -nobrowse

Let’s take a closer look at those options for the attach verb…

-mountroot allows us to root the volume somewhere other than in /Volumes. This makes it appear as an alias in the FInder at that location.

-nobrowse will mount the volume in a way that it isn’t announced as a volume. This prevents it from showing up on the Desktop or as a volume in applications. Since we only want it to appear in ~/Documents this is something we want.

4. Mount the image at login

Here is where Mac OS X spoils, and confuses, us with a wealth of options. How are we going to mount this image so that the user can access it automatically? We have three real options here.

  • Use a login hook.
  • Use a LaunchAgent.
  • Use a pathwatch with launchd.

So what should we choose here and what are the pros and cons?

If we go with a login hook we can centrally mange it via Managed Preferences, but the downsides are that it only works for a console login, and it will run as root.

If we were to use a LaunchAgent it will run as the user for console and SSH logins, but then you need to deploy the launchd job plist to all your machines.

Using launchd to monitor the path to ~/Documents/Microsoft User Data and mount the image as needed is cool, but you will probably run into issues with the image mounting quickly enough for Entourage to be happy about it. Just like the Launch Agent, this would require the deployment of a launchd job file to each Mac.

Looking down this list I decided to go with the login hook as I can deploy it with policy. Here is the simple script I came up with:

#!/bin/bash

if [ -d /Users/$1/Library/PHDHelper/msud.sparsebundle ]; then
/usr/bin/hdiutil attach /Users/$1/Library/PHDHelper/msud.sparsebundle -mountroot /Users/$1/Documents -nobrowse
else
logger "No msud sparsebundle. Logging in normally"
fi

exit 0

Here I just stuck the msud.sparsebundle in ~/Library/PHDHelper/. If it exists then mount it, if not just quit. We can extend the functionality of the false result in just a moment.

5. Detach the image at logout

Now we can just use a simple logout script to make sure that we eject the sparsebundle on logout. The Finder should do this on its own, but it never hurts to be certain.

#!/bin/bash

if [ -d /Users/$1/Library/PHDHelper/msud.sparsebundle ]; then
/usr/bin/hdiutil detach /Users/$1/Documents/Microsoft User Data
else
logger "No msud sparsebundle. Logging out normally"
fi

exit 0

6. Script all of this
Luckily the image creation is easy to script. It’s the reason I seemed to do things the hard way above when sizing the disk. All you need to do is copy and paste the commands from steps one and two into a file with one variable for the block size.

And with that we are almost done!

Sync Policy

None of this is going to work though if we just sync stuff like crazy. As always, in-use files require a bit of extra care for PHD syncing. Taking a look in the Mobility settings in Workgroup Manager though things can be a bit confusing.

We have two tabs of differing sync types to deal with, Preference Syncing and Home Syncing. Taking a look at the default item lists it appears as though Apple wants us to use the Preference Sync tab to deal with the Microsoft User Data folder. Why is this?
Well, the difference between the two sync types is this:
With home sync the newest file wins the sync battle. If files have both changed the user is asked what to do.

When using preference sync things change a bit. On login, the network file will win, on logout the local file will win, and in the background only the local file will sync. On background preference sync if the server file is newer then the sync will be delayed until logout.
In any case the only real change we need to make is to remove the ~/Documents/Microsoft User Data include rule from the Preference Sync tab. We need to do this because that directory is now created by our mounted sparsebundle image and it’s the actual sparsebundle that we want to sync. Because we placed the image into ~/Library it will already be excluded from home sync and included in preference sync.
Everything is about ready to go except for my

GIANT DANGER TEXT!!!!!

If you want to take a look at this solution then you currently need to be sure to disable Server Side File Tracking if you are using a Mac OS X Server as your home server. I seem to have stumbled onto an issue where when it is used, the local copy of a sparsebundle image eventually gets gutted out to a shell of its former self. The server side copy remains intact, so you don’t actually lose data, but it’s a big pain to deal with. For those of you syncing the homes to other server types you can ignore that warning. I have reported this, but if you can duplicate the issue then feel free to file a report on your setup as well.

END GIANT DANGER TEXT!!!!!

Wrapping Up

So that’s it. By taking advantage of the discrete bands in the sparsebundle image format we can dramatically reduce the time it takes to sync a large contiguous file to a Portable Home Directory. Now this isn’t a magic bullet. If someone has a database open and makes a TON of changes to it during the day, you are still going to have a lot of bands to sync, the idea is that when we are talking about multiple GB, then even small savings like 20% add up quickly. In my testing I was able to download my 1.5 GB of AFP548.com IMAP mail then use the system for regular email during the day. Logging out at 5 resulted in about 800MB being synced. That’s still big, but it’s a far cry from 1.5 GB. On slow networks like WiFi or 100BaseT this can be the difference between a user finishing a sync and clicking the “Cancel” button.
So try this out and test it with some large files for a while. Let me know in the comments how it works out.
As always, have fun and read the man pages!