Monday, July 6, 2009

rsync and directory renaming

rsync is an extremely useful tool - but its one shortcoming is when it comes to file and/or directory renaming. I am using it to mirror file systems using "rsync -xHau --delete /home remotesystem:/somedir".

If a file or directory gets renamed on the source host, the entire file or directory gets re-copied.

For files, there is the --fuzzy option, and there is a patch for "detect-renamed" - but files are less of a problem for me.

Directory renames are not addressed by any patch or option that I can find, and often results in a much higher transfer volume (especially if a high-level directory is renamed).

I've developed a strategy that works great in my scenario.. essentially, after every rsync of a file system, a .rsyncname file is created in every directory (to a configurable max depth) simply holding the name of the directory itself. Before each rsync, all the .rsyncname files are compared to the directory name - and if different, a rename is triggered on the remote system. This requires top-down processing so I opted for a breadth-first search algorithm (cannot rename a leaf node if higher node needs renaming).

There are 3 scripts involved, which I called rsyncstorename, rsyncbasename, and rsyncrename, all of which I put in /usr/local/bin on the "source" host. (not required on the mirror host)

rsyncstorename:

#!/bin/bash

dir=$*
x=`cat "$dir/.rsyncname"`
y=`basename "$dir"`
if [ "$x" != "$y" ]; then
echo $y > "$dir/.rsyncname"
fi
This is used by rsyncbasename, and not called directly. The comparison adds a little overhead, but it saves on rsync time overall. Originally it was just creating the file every run - but each rsync sent every .rsyncname file to remote host. With this method - its one-time cost.

rsyncbasename:

#!/bin/bash

#$1 = path
#$2 = max depth

find $1 -maxdepth $2 -mount -type d -exec /usr/local/bin/rsyncstorename {} \;

Very simple.. pass the file system path and the desired max depth (eg. "rsyncbasename /home 4"), but is called from rsyncrename script. Doesn't cross file-system boundaries.

rsyncrename:

#!/bin/bash

#$1 = path
#$2 = max depth
remotepath=/mnt/disk
remotehost=somehost

r ()
{
let level=$3+1
if [ $level -gt $4 ]; then return 0; fi
cd "$1"
for d in *; do
if [ -d "$d" ]; then
if [ -L "$d" ]; then
echo "skipping symbolic link"
else
if [ -f "$d/.rsyncname" ]; then
x=`cat "$d/.rsyncname"`
y=`basename "$d"`
if [ "$x" != "$y" ]; then
echo renaming $remotepath$2/$x to $remotepath$2/$d
ssh $remotehost mv "$remotepath$2/$x" "$remotepath$2/$d"
fi
fi
fi
fi;
done
for d in *; do
if [ -d "$d" ]; then
(r "$d" "$2/$d" $level $4)
fi;
done
}

r "$1" "$1" 0 "$2"
rsync -xHau --delete $1 $remotehost:$remotepath`dirname $1`
/usr/local/bin/rsyncbasename $1 $2

Arguments are again path and depth (eg. "rsyncrename /home 4"). Change remotehost and remotepath to whatever is required. NOTE: Script expects file system mirror on remotehost:remotepath - ie. if you wish to rsync a sub-directory, eg. /home/userA, then remotehost:remotepath/home must exist. This script combines the rsync and the renaming. Note the sequence, renaming, rsync, then rsyncbasename. Will only work on 2nd iteration, since first time no renaming will occur and .rsyncname files will be created.

BTW.. bash based breadth-first-search algorithm grabbed from here.

NOTE: This method does not handle if a directory branch is relocated (eg. "mv /home/userA/mail /home/userB/mail2") .. rsync will handle this, but everything in the branch will be copied.

1 comment:

  1. Great idea. Thanks for sharing. That saved me many time.
    Jean-David

    ReplyDelete