Friday, June 7, 2019

Playing with cloned sparse files on APFS

I've been experimenting a bit with APFS and its ability to create and clone sparse files.

Let's get a baseline of how many 4k blocks are available on my SSD:
bash-3.2$ export BLOCKSIZE=4096
bash-3.2$ df . | awk '{print $3}'
Used
592952
bash-3.2$
Here I create a 40MiB sparse file.  Note that it only takes one 4k block on disk.
bash-3.2$ dd bs=4096 count=1 if=/dev/random of=test seek=10240 2>/dev/null
bash-3.2$ df . | awk '{print $3}'
Used
592953
bash-3.2$ ls -l test
-rw-r--r--  1 pete.nelson  admin  41947136 Jun  7 12:30 test
bash-3.2$ mdls test | tail -2
kMDItemLogicalSize                 = 41947136
kMDItemPhysicalSize                = 4096
bash-3.2$
Now let's add some sparse data within the file at three different offsets.  Note that it only grows by three 4k blocks; the amount of data that was added.  Everything that remains has never been written to, so is the void portion of the sparse file.
bash-3.2$ 2>/dev/null dd bs=4096 count=1 if=/dev/random of=test conv=notrunc seek=1000
bash-3.2$ 2>/dev/null dd bs=4096 count=1 if=/dev/random of=test conv=notrunc seek=2000
bash-3.2$ 2>/dev/null dd bs=4096 count=1 if=/dev/random of=test conv=notrunc seek=3000
bash-3.2$ mdls test | tail -2
kMDItemLogicalSize                 = 41947136
kMDItemPhysicalSize                = 16384
bash-3.2$ df . | awk '{print $3}'
Used
592956
bash-3.2$
Now let's clone the sparse file!  Note that -c is an Apple (actually BSD) feature that calls clonefile() instead of copyfile().  This depends on the underlying filesystem being capable of copy-on-write (COW) operations, which APFS is.  Cloning it takes no more additional space on the volume.
bash-3.2$ cp -c test foo
bash-3.2$ df . | awk '{print $3}'
Used
592956
bash-3.2$
However, mdls doesn't differentiate between the two files, and reports that each one takes 16k of physical space on the drive.  While I'm not ready to call this erroneous, it's not entirely accurate.
bash-3.2$ mdls test | tail -2
kMDItemLogicalSize                 = 41947136
kMDItemPhysicalSize                = 16384
bash-3.2$ mdls foo | tail -2
kMDItemLogicalSize                 = 41947136
kMDItemPhysicalSize                = 16384
bash-3.2$
Now let's modify both files and see when blocks are allocated. Writing to offset 3000 on the original file triggers COW on that block, and an additional block is allocated for the original file.  However, writing to that same offset on the clone after that does not take any more space.  That block belongs just to the cloned file, so it doesn't trigger a COW.  Slick!

bash-3.2$ 2>/dev/null dd bs=4096 count=1 if=/dev/random of=test conv=notrunc seek=3000
bash-3.2$ df . | awk '{print $3}'
Used
592957
bash-3.2$ 2>/dev/null dd bs=4096 count=1 if=/dev/random of=foo conv=notrunc seek=3000
bash-3.2$ df . | awk '{print $3}'
Used
592957
bash-3.2$
 So far, we've used 5 blocks.  Three are shared, and the two remaining are specific to each file.

Now let's clone the clone.  As before, no additional space is required.
bash-3.2$ cp -c foo bar
bash-3.2$ df . | awk '{print $3}'
Used
592957
bash-3.2$
Now let's really mix things up and write to four offsets, one of which is still shared between the original two, one is not, and two of which are still sparse.  Each one allocates an additional 4k block as expected.  When 3000 is written, it triggers COW since that's shared with foo.  When 1000 is written, it triggers COW since that's shared with all three, and should leave the original still shared between test and foo.
bash-3.2$ 2>/dev/null dd bs=4096 count=1 if=/dev/random of=bar conv=notrunc seek=3000
bash-3.2$ 2>/dev/null dd bs=4096 count=1 if=/dev/random of=bar conv=notrunc seek=4000
bash-3.2$ 2>/dev/null dd bs=4096 count=1 if=/dev/random of=bar conv=notrunc seek=5000
bash-3.2$ 2>/dev/null dd bs=4096 count=1 if=/dev/random of=bar conv=notrunc seek=1000
bash-3.2$ df . | awk '{print $3}'
Used
592961
bash-3.2$

mdls is still confused.  Between the three files, it shows a total of 14 blocks used, but in reality there are only 9 allocated (Used climbed from 592952 to 592961).

bash-3.2$ mdls test | tail -2
kMDItemLogicalSize                 = 41947136
kMDItemPhysicalSize                = 16384
bash-3.2$ mdls foo | tail -2
kMDItemLogicalSize                 = 41947136
kMDItemPhysicalSize                = 16384
bash-3.2$ mdls bar | tail -2
kMDItemLogicalSize                 = 41947136
kMDItemPhysicalSize                = 24576
bash-3.2$

How much does each file really take?
Test offsets: 1000 is shared with foo, 2000 and 10240 are shared with all three, 3000 is owned.

Foo offsets: 1000 is shared with test, 2000 and 10240 are shared with all three, 3000 is owned.
Bar offsets: 1000 is owned, 2000 and 10240 are shared with all three, 3000, 4000 and 5000 are owned.

bash-3.2$ df . | awk '{print $3}'
Used
592961
bash-3.2$ rm test
bash-3.2$ df . | awk '{print $3}'
Used
592960
bash-3.2$ rm foo
bash-3.2$ df . | awk '{print $3}'
Used
592958
bash-3.2$ rm bar
bash-3.2$ df . | awk '{print $3}'
Used
592952
bash-3.2$
So deleting test freed its owned block at offset 3000.  Deleting foo freed offset 3000, but also offset 1000 which it had shared with test previously.  And deleting bar freed all 6 blocks, since all clones had been deleted and it now owned all blocks.

My question is, short of deleting a cloned file, how can I determine how much space it claims to itself?  Also, how can I determine the other files a clone is associated with and what percentage of each file it has in common?

Is this someone's chance to develop a tool

Wednesday, March 13, 2019

Serial over network via socat

From http://www.anites.com/2017/11/socat.html:

on remote:
  socat tcp-listen:8000,reuseaddr,fork \
    file:/dev/ttyUSB0,nonblock,waitlock=/var/run/tty0.lock,b115200,raw,echo=0

on local:
  socat pty,link=/dev/ttyUSB0,waitslave tcp:pi.local:8000
  tio -b 115200 /dev/ttyUSB0

Tuesday, January 22, 2019

Making the most of APFS and xhyve

I'm running macOS Mojave, and using it to host virtual machines via xhyve.  There are some neat tricks one can use to conserve disk space while giving plenty of room to your VMs.

First, a side-thread:  I read about an issue someone had when APFS first came out.  I don't know for sure if it's still an issue, and haven't been able to find the article for this post.  The gist was, if the user fills up the root volume, there's no way to delete any files in recovery mode when there are no free extents in APFS.  The workaround involves creating a throw-away APFS volume to reserve some free space.  Then, if root ever fills up, boot into recovery, delete the throw-away volume to free up a few extents, then mount root and clean it up as needed.

When creating an Ubuntu VM from the install ISO, I use the technique explained at https://gist.github.com/mowings/f7e348262d61eebf7b83754d3e028f6c.  One has to extract the installer's initramfs and kernel image to pass to xhyve.  The often cited way is to copy the iso to a temporary file and zero out the first couple of sectors before mounting that to extract the files.  There is a way to do that using APFS's COW so you're not taking twice the disk space for the ISO.

I combine these two efforts by creating my throw-away APFS volume (with at least 500MB reserved), and storing the ISO to that, as I can always download it again if I accidentally fill up root.  Then, to make a COW copy of the ISO, duplicate it using Finder (or cp -c at bash prompt, which uses the clonefile() call rather than read/write of the file contents).  You'll see that duplicating this large file on the small volume does not increase the amount of space used on that volume!  Then just overwrite the first couple of blocks with this command:
dd if=/dev/zero of=/Volumes/DeleteIfRootFull/tmp.iso bs=2048 count=1 conv=notrunc
Now you you can keep that temporary ISO around for the next OS install, or even script the mountable ISO creation and boot right from the files on the mounted filesystem.  Note that the conv=notrunc argument is important, as that is what keeps the remaining file intact while we overwrite the first 2KiB with zeroes.

You can use a similar trick with dd to create your VM disk as a sparse file.  Even if you have less than 16GB free, you can create a 16GB or larger drive to install your OS into by seeking within the output file to just the final block.  For example, to calculate your seek size for a 16GiB disk, run this command:
echo '16 1048576*1-p' | dc
I got 16777215, which I use in the following command:
dd if=/dev/zero of=hdd1.img bs=1024 count=1 seek=16777215
You'll see the resulting file is exactly 2^34 bytes (16GiB), but if you go to the folder in Finder and view the file's info, you'll see its size as "17,179,869,184 bytes (4 KB on disk)".  Note the difference in listed size and size "on disk" indicates that this is a sparse file (only the extents that have been written to are actually allocated "on disk").  It's 4k instead of 1K because flash writes a minimum of 4K at once, I believe.

Be warned that, as you write to this file, you will be increasing the space used on your APFS volume, and your xhyve VM could try to write up to the file's given size, even if you don't have that much free space available.  So, if you don't monitor your free space, you could end up needing to delete that extra APFS volume sooner than you had expected...