Friday, June 7, 2019

Playing with cloned sparse files on APFS

I've been experimenting a bit with APFS and its ability to create and clone sparse files.

Let's get a baseline of how many 4k blocks are available on my SSD:
bash-3.2$ export BLOCKSIZE=4096
bash-3.2$ df . | awk '{print $3}'
Used
592952
bash-3.2$
Here I create a 40MiB sparse file.  Note that it only takes one 4k block on disk.
bash-3.2$ dd bs=4096 count=1 if=/dev/random of=test seek=10240 2>/dev/null
bash-3.2$ df . | awk '{print $3}'
Used
592953
bash-3.2$ ls -l test
-rw-r--r--  1 pete.nelson  admin  41947136 Jun  7 12:30 test
bash-3.2$ mdls test | tail -2
kMDItemLogicalSize                 = 41947136
kMDItemPhysicalSize                = 4096
bash-3.2$
Now let's add some sparse data within the file at three different offsets.  Note that it only grows by three 4k blocks; the amount of data that was added.  Everything that remains has never been written to, so is the void portion of the sparse file.
bash-3.2$ 2>/dev/null dd bs=4096 count=1 if=/dev/random of=test conv=notrunc seek=1000
bash-3.2$ 2>/dev/null dd bs=4096 count=1 if=/dev/random of=test conv=notrunc seek=2000
bash-3.2$ 2>/dev/null dd bs=4096 count=1 if=/dev/random of=test conv=notrunc seek=3000
bash-3.2$ mdls test | tail -2
kMDItemLogicalSize                 = 41947136
kMDItemPhysicalSize                = 16384
bash-3.2$ df . | awk '{print $3}'
Used
592956
bash-3.2$
Now let's clone the sparse file!  Note that -c is an Apple (actually BSD) feature that calls clonefile() instead of copyfile().  This depends on the underlying filesystem being capable of copy-on-write (COW) operations, which APFS is.  Cloning it takes no more additional space on the volume.
bash-3.2$ cp -c test foo
bash-3.2$ df . | awk '{print $3}'
Used
592956
bash-3.2$
However, mdls doesn't differentiate between the two files, and reports that each one takes 16k of physical space on the drive.  While I'm not ready to call this erroneous, it's not entirely accurate.
bash-3.2$ mdls test | tail -2
kMDItemLogicalSize                 = 41947136
kMDItemPhysicalSize                = 16384
bash-3.2$ mdls foo | tail -2
kMDItemLogicalSize                 = 41947136
kMDItemPhysicalSize                = 16384
bash-3.2$
Now let's modify both files and see when blocks are allocated. Writing to offset 3000 on the original file triggers COW on that block, and an additional block is allocated for the original file.  However, writing to that same offset on the clone after that does not take any more space.  That block belongs just to the cloned file, so it doesn't trigger a COW.  Slick!

bash-3.2$ 2>/dev/null dd bs=4096 count=1 if=/dev/random of=test conv=notrunc seek=3000
bash-3.2$ df . | awk '{print $3}'
Used
592957
bash-3.2$ 2>/dev/null dd bs=4096 count=1 if=/dev/random of=foo conv=notrunc seek=3000
bash-3.2$ df . | awk '{print $3}'
Used
592957
bash-3.2$
 So far, we've used 5 blocks.  Three are shared, and the two remaining are specific to each file.

Now let's clone the clone.  As before, no additional space is required.
bash-3.2$ cp -c foo bar
bash-3.2$ df . | awk '{print $3}'
Used
592957
bash-3.2$
Now let's really mix things up and write to four offsets, one of which is still shared between the original two, one is not, and two of which are still sparse.  Each one allocates an additional 4k block as expected.  When 3000 is written, it triggers COW since that's shared with foo.  When 1000 is written, it triggers COW since that's shared with all three, and should leave the original still shared between test and foo.
bash-3.2$ 2>/dev/null dd bs=4096 count=1 if=/dev/random of=bar conv=notrunc seek=3000
bash-3.2$ 2>/dev/null dd bs=4096 count=1 if=/dev/random of=bar conv=notrunc seek=4000
bash-3.2$ 2>/dev/null dd bs=4096 count=1 if=/dev/random of=bar conv=notrunc seek=5000
bash-3.2$ 2>/dev/null dd bs=4096 count=1 if=/dev/random of=bar conv=notrunc seek=1000
bash-3.2$ df . | awk '{print $3}'
Used
592961
bash-3.2$

mdls is still confused.  Between the three files, it shows a total of 14 blocks used, but in reality there are only 9 allocated (Used climbed from 592952 to 592961).

bash-3.2$ mdls test | tail -2
kMDItemLogicalSize                 = 41947136
kMDItemPhysicalSize                = 16384
bash-3.2$ mdls foo | tail -2
kMDItemLogicalSize                 = 41947136
kMDItemPhysicalSize                = 16384
bash-3.2$ mdls bar | tail -2
kMDItemLogicalSize                 = 41947136
kMDItemPhysicalSize                = 24576
bash-3.2$

How much does each file really take?
Test offsets: 1000 is shared with foo, 2000 and 10240 are shared with all three, 3000 is owned.

Foo offsets: 1000 is shared with test, 2000 and 10240 are shared with all three, 3000 is owned.
Bar offsets: 1000 is owned, 2000 and 10240 are shared with all three, 3000, 4000 and 5000 are owned.

bash-3.2$ df . | awk '{print $3}'
Used
592961
bash-3.2$ rm test
bash-3.2$ df . | awk '{print $3}'
Used
592960
bash-3.2$ rm foo
bash-3.2$ df . | awk '{print $3}'
Used
592958
bash-3.2$ rm bar
bash-3.2$ df . | awk '{print $3}'
Used
592952
bash-3.2$
So deleting test freed its owned block at offset 3000.  Deleting foo freed offset 3000, but also offset 1000 which it had shared with test previously.  And deleting bar freed all 6 blocks, since all clones had been deleted and it now owned all blocks.

My question is, short of deleting a cloned file, how can I determine how much space it claims to itself?  Also, how can I determine the other files a clone is associated with and what percentage of each file it has in common?

Is this someone's chance to develop a tool