Lotus Reader - The Hackernews Client

compressedgasa day ago

It works. Already implemented: https://rdiff-backup.net/ https://github.com/rdiff-backup/rdiff-backup

There are also other tools which have implemented reverse incremental backup or backup with reverse deduplication which store the most recent backup in contiguous form and fragment the older backups.

datastackcompressedgas16 hours ago

Thank you for bringing this to my attention. Knowing that there is a working product using this approach gives me confidence. I'm working on a simple backup app for my personal/family use, so good to know I'm not heading in the wrong direction

wmfa day ago

It seems like ZFS/Btrfs snapshots would do this.

HumanOstrichwmf20 hours ago

No, they work the opposite way using copy-on-write.

wmfHumanOstrich20 hours ago

"For files that changed or were deleted: move them into a new delta folder. For new/changed files: copy them into the latest snapshot folder." is just redneck copy-on-write. It's the same result but less efficient under the hood.

datastackwmf16 hours ago

Nice to realize that this boils down to copy on write. Makes it easier to explain.

sandreas datastack9 hours ago

Is there a reason NOT to use ZFS or BTRFS?

I mean the idea sounds cool but what are you missing? ZFS even works on Windows these days and with tools like zrepl you can configure time based snapshotting, auto-sync and auto-cleanup

codingdavea day ago

The low likelihood / high impact edge case this does not handle is: "Oops, our data center blew up." An extreme scenario, but one that this method does not handle. It instead turns your most recent backup into a single point of failure because you cannot restore from other backups.

datastackcodingdave16 hours ago

This sounds more like a downside of single site backups

codingdave datastack11 hours ago

Totally. Which is exactly what your post outlines. You said it yourself: "Only one full copy is needed." You would need to update your logic to have a 2nd copy pushed offsite at some point if you wanted to resolve this edge case.

ahazred8taa day ago

For reference: a comprehensive backup + security plan for individuals https://nau.github.io/triplesec/

datastackahazred8ta16 hours ago

Great resource in general, will look into it if it describes how to implement this backup scheme

dr_kiszonka21 hours ago

It sounds like this method is I/O intensive as you are writing the complete image at every backup time. Theoretically, it could be problematic when dealing with large backups in terms of speed, hardware longevity, and write errors, and I am not sure how you would recover from such errors without also storing the first image. (Or I might be misunderstanding your idea. It is not my area.)

datastackdr_kiszonka16 hours ago

You can see in step 2 and 3 that no full copy is written every time. It's only move operations to create the delta, and copy of new or changes files, so quite minimal on IO.

rawgabbit20 hours ago

What happens if in the process of all this read write rewrite, data is corrupted?

datastackrawgabbit16 hours ago

In this algo nothing is rewritten. A diff between source and latest is made, the changed or deleted files archives to a folder and the latest folder updated with source, like r sync. No more IO than any other backup tool. Versions other than the last one are never touched again

jiggawatts19 hours ago

The more common approach now is incrementals forever with occasional synthetic full backups computed at the storage end. This minimises backup time and data movement.

datastackjiggawatts15 hours ago

I agree it seems more common. However back-up time and data movement should be equivalent if you follow the algo steps.

According to chat GPT the forward delta approach is common because it can be implemented purely append only, whereas reverse deltas require the last snapshot to be mutable. This doesn't work well for backup tapes.

Do you also think that the forward delta approach is a mere historical artifact?

Although perhaps backup tapes are still widely used, I have no idea, I am not in this field. If so the reverse delta approach would not work in industrial settings.

jiggawatts datastack14 hours ago

Nobody[1] backs up directly to tape any more. It’s typically SSD to cheap disk with a copy to tape hours later.

This is more-or-less how most cloud backups work. You copy your “premium” SSD to something like a shingled spinning rust (SMR) that behaves almost like tape for writes but like a disk for reads. Then monthly this is compacted and/or archived to tape.

[1] For some values of nobody.

vrighter16 hours ago

I used to work on backup software. Our first version did exactly that. It was a selling point. We later switched approach to a deduplication based one.

datastackvrighter15 hours ago

Exciting!

Yes, the deduplicated approach is superior, if you can accept requiring dedicated software to read the data or can rely on a file system that supports it (like Unix with hard links).

I'm looking for a cross-platform solution that is simple and can restore files without any app (in case I didn't maintain my app for the next twenty years).

I'm curious if the software you were working on used proprietary format, was relying on Linux, or used some other method of duplication.

tacostakohashi2 hours ago

Sounds a bit like the netapp .snapshot directory thing (which is no bad thing).