My Universe Logo

My Universe Blog » Entries Tagged as backtory

YAR specification feature freeze

Posted by Jesco Freund at Feb. 5, 2011 12:12 a.m.

Today, I froze the design for the YAR container format. Trunk has been branched to 1.0-STABLE. The compiled 1.0-STABLE documentation is available at docs.yarutils.org.

The current state now incorporates all features considered relevant for the 1.0 release. At a short glance, these are:

  • Recoverable index for both, fast access and robustness
  • Secure hashes (no path to attack symmetric encryption)
  • Electronic signatures

This list is far from being complete – just have a look at the specification or the issue tracker. The next two weeks will be dedicated to finalizing the documentation, i. e. tracking down unclear expressions, typos etc. and filling the index and glossary.

No comments | Defined tags for this entry: backtory, development, open source, yar

First alpha of libsynctory released

Posted by Jesco Freund at Jan. 5, 2011 5:02 p.m.

Some minutes ago, I released the first alpha (ever!) of libsynctory. It is a replacement for librsync, but makes use of different algorithms. Therefore, processing might be slower and fingerprint chunks are definitely larger than the ones produced by librsync. However, libsynctory's fingerprints are much more collision safe, so it is more adequate for environments, where high reliability is estimated more than performance or network load (like e. g. backup applications).

Currently, I consider libsynctory to be “feature-complete”, meaning it offers all the functionality required to serve its intended purpose. It can create a fingerprint of a given file. Using this fingerprint only (and without resorting to the original file), it can calculate the difference between the original file and another file. Using this difference and the original file, the other file can be restored.

However, libsynctory does neither offer any documentation yet (shame on me), nor does it provide a decent interface for error handling (or even backtracing). Furthermore, it has not yet been reviewed for thread safety, so use it with care (if you intend to do so). If you would like to test libsynctory, just grab the source archive from the project page and do the following:

tar -xjvf libsynctory-0.1.0-a1-Source.tar.bz2
cd libsynctory-0.1.0-a1-Source
cmake .
make
sudo make install

This will install the header files into /usr/local/include and the library itself into /usr/local/lib. Within the build directory, you will find some test binaries (in src/test). You can use them for very simple single file operations, just to see how libsynctory works.

No comments | Defined tags for this entry: backtory, C, development, synctory

Backtory is moving forward

Posted by Jesco Freund at July 27, 2010 7:35 p.m.

The going is tough, but at least there is any going – I think this describes best my progress with Backtory. During the last two weeks, we've been on holiday in Denmark. I used some of the rather rainy days to code on Backtory, and here's what I got done so far:

  • Libsynctory is in a working state now, meaning it does what is expected of it. However, it's not yet nice – no proper documentation, no thread safety, no error tracing.
  • Libyar got a bit further. The container file layout is finalized and properly documented. An awful lot of macros and some init voodoo for thread saftey are ready, but the lib itself is of no use yet.
  • The Backtory application itself hasn't been touched by me for some weeks now – I concentrated efforts on the two above-mentionned libs. However, I have made up my mind about the application (again ;-)). There will be a “cheap” CLI application, most probably implemented in Python. The more complex thing (daemon, configuration via network and stuff) will become eBacktory, of which I'll take care later.

So first priority for me is to get libyar ready to use, and to finish libsynctory into a state that could be called “release-ready”. When both libs are done, I'll first finalize a CLI for YAR files before taking care of the Backtory implementation itself. As I see tough times ahead concerning my professional life, I dare not give any forecast when I'll be able to spare the hours so urgently needed to finish any of the mentionned tasks. But as I already statet – the going is tough, but at least there is any going…

No comments | Defined tags for this entry: backtory, C, development, synctory, yar

Ich lebe noch!

Posted by Jesco Freund at March 21, 2010 12:09 a.m.

… auch wenn man anhand der langen Zeit seit dem letzten Blog-Eintrag etwas anderes vermuten könnte :-| Da mir momentan etwas die Energie fehlt, zu allen Themen, mit denen ich mich beschäftige, eigene Blog-Einträge zu verfassen, hier ein paar Updates in der Zusammenfassung:

  • Backtory ist nicht tot oder aufgegeben. Ich habe aber entschieden, erst mal ein brauchbares Archiv-Format zu entwerfen – das wird sicherlich noch seine Zeit brauchen. So lange muss es halt mit Duplicity irgendwie gehen, auch wenn ich persönlich inkrementelle Backups verabscheue. Das Leben hält auch so schon genügend Gelegenheiten für Genickbrüche bereit…
  • Den halbfertigen Relaunch meiner Website habe ich erst mal unterbrochen – erstens keine Zeit, und zweitens gefällt mir das, was ich bisher gebastelt habe, schon wieder überhaupt nicht. Ziel ist ein kompletter Rewrite der Site mit Django, und dann gibt's auch endlich ein weniger augenkrebsförderndes Design – versprochen. Hat aber (leider) im Moment nicht die allerhöchste Prio…
  • cdeploy geht gerade in die nächste Runde. Größte Neuerungen des 0.2er Zweigs: es läuft jetzt unter Linux und ist in der Lage zu erkennen, ob eine Datei überhaupt deployed werden muss.
  • Mit Eclipse CDT stehe ich weiterhin auf Kriegsfuß – einerseits gefällt mir das Entwickeln mit Eclipse sehr gut; andererseits ist für mich die Einrichtung eines Projekts jedes Mal ein ziemlicher Albtraum.
  • Zu meinem Mercurial-Server gibt es jetzt auch endlich wieder eine Projektverwaltung. Hier verrichtet Redmine seinen Dienst, das sich endlich in brauchbarer Manier über die FreeBSD Ports installieren lässt. Dem Maintainer sei Dank!

Auch in nächster Zeit wird es hier vermutlich eher ruhig bleiben. Die wenige Zeit, die mir mein Job derzeit zur Verfügung lässt, brauche ich in erster Linie für private Dinge.

No comments | Defined tags for this entry: backtory, blog, Django, programming, Redmine, snafu

The Optimal Archive Format for Backtory

Posted by Jesco Freund at Feb. 13, 2010 9:11 p.m.

During the last few days, I've been thinking about what archive format to use with Backtory. First, everything seemed clear: I intended to use tar (resp. USTar) – a wide-spread standard, meaning backup data would have been accessible with any standard-compliant tar implementation as present among any Unix system I know. However, looking closer at the specification of the tar file format, it shows some weaknesses which turns it to be a bad choice for a differential backup tool. The worst of them are:

  • Tar archives have no index or table of content. This means the whole file has to be scanned to find out about its content, and extracting only one particular file means the same.
  • Tar only reserves 100 bytes for file names. Working with longer path names is possible, yet painful
  • Tar header information is encoded in ASCII, making it difficult working with international character sets
  • Tar archives do not handle arbitrary meta data, meaning
    • it is not possible to encode files in any way (encryption, compression, …) before adding them to an archive
    • it is impossible to store extended information like ACLs for a file without resorting to ugly workarounds (like creating .meta files for each file)
  • When it comes to compression, tar files can only be post-processed with a compressor, meaning the entire archive has to be decompressed before it can be scanned

This rant may give you some more reasons why tar is really a bad choice, but to sum it up: I consider it too painful downloading a complete backup archive to the local harddisk, decompress it there and then scan through the whole file just to restore one single file. Wouldn't it be much smarter if Backtory just had to download the archive header and then check which part of the archive actually has to be downloaded and post-processed (e. g. decompressed)? Even with stupid old FTP this would be possible – just by aborting RETR after x received bytes and using REST plus ABOR again to fetch a specific part of a file.

So how would the optimal archive format for Backtory look like? I think this can be best described by a list of requirements:

  • The archive must be indexed. Minimum requirement for an index would be a table providing (relative) path names and offsets to meta data and data section of the file. The index table must be located at a predictable or easily determinable position within the archive file.
  • For each file, a bunch of standard meta data must be stored (virtually the information provided by lstat)
  • The archive format must allow arbitrary meta data for each file. Some of them could be standardized (e. g. encoding or compression method), others may vary from application to application (e. g. ACL data, MAC labels, encryption method, …)
  • It must be easy and cheap to extend an archive. To be more precise, I would consider it inacceptable if something had to be inserted at the beginning of an archive file, entrailing a shift of all successive bytes, meaning all offset data have to be reacalculated and nearly all data in the file reorganized on the file system. “Easy” and “cheap” would be perfectly achieved if an archive could be extended even via FTP (but I consider this very unlikely to become true without breaking the other requirements).
  • Recoverability in case of a damaged index: By doing a single pass scan over the meta data and content data blocks, it should be possible to regain all information necessary to rebuild the archive index.

Well, that's it. However, I have not found any yet-existing archive format that covers these requirements (at least none which is patent-free and has at least one open source implementation). And before anyone starts rhapsodizing about xar, I'd like to state why I see xar to be unfit for Backtory:

  • The meta data block is in XML, which means you need a fat parser to process it.
  • The heap (all file content data) is useless without the meta data ⇒ recovery of a damaged toc is anything else than easy (I doubt it's possible at all).
  • The toc (XML meta data) is located at the beginning of a xar container. When extending a xar archive, all data inside must be relocated since its toc has to be extended.

I guess I have to design my own container format for Backtory. However, if I really should do so, I would implement it as a library independent of backtory and give it its own CLI tool. I already got some idea how the archive container could look like, but there are still some details I have to work out. Stay tuned, I'll keep you informed about what I'll do and how I will implement it…

No comments | Defined tags for this entry: backtory, code, programming

Backtory Revived

Posted by Jesco Freund at Jan. 23, 2010 9:43 p.m.

Almost one year ago, I started a project named “Backtory”. Its aim was to create a backup solution suitable for my internet servers. However, the whole idea was much too bloated, and my attention to the project wasn't as steady as intended. So for quite a long time, I didn't write a single line of code. Some weeks ago (at Christmas, to be precise) I moved the project to Google Code. After some deeper looks at what I had done so far, I decided to throw away all the old stuff and to wipe the slate clean.

The idea of the reborn Backtory project is a CLI application instead of a heavy client server infrastructure. The requirements however remained nearly untouched:

  • Create differential (not incremental!) backups
  • Encrypt and sign backup archives
  • Store backup data remotely (at least FTP support is required)
  • Make use of ZFS (and optionally LVM2) snapshot capabilities
  • Generate procedures (i. e. POSIX shell scripts) for a bare metal recovery
  • Support pre- and post-backup actions
  • Support point-in-time recovery

Maybe this requirement catalogue illustrates why I started a new project instead of using one of the existing solutions. On the one hand, big solutions like Bacula or Amanda are too bloated (and too much focused on LAN setups for my taste). On the other hand, smaller solutions (mostly shell or Perl scripts) do not meet my requirements.

Duplicity is what I'd call a close miss, but its fixation on incremental backups and its lacking support for snapshot operations would have meant at least a 50% reimplementation (plus the pain of having to endure the restrictions of GPLv2). Furthermore, I'm not sure about using librsync for my implementation. Its development seems to have ceded some years ago, so no wonder some of its implementation details are no longer state of the art (e. g. using MD4 to verify the equity of file chunks).

So what's the plan to go ahead? Well, so far I wrote some Python extensions to build interfaces towards OpenSSL's RIPEMD-160 implementation, the POSIX lstat() function, VMAC (which could become an MD4 replacement in my implementation of the rsync algorithm) and ZFS (works for FreeBSD so far, can create snapshots and destroy them). The next steps will be to implement further Python extensions (I still need interfaces to OpenSSL's enc, genrsa and rsautl for signing and encrypting the archives) and to make my ZFS extension work on Solaris and OpenSolaris. After this, I hopefully will be able to start implementing the full backup functionality; differential backups (including the still-missing rsync algorithm) and the restore functions will follow.

As soon as I will have implemented FTP remote storage functionality, I'll be thinking about a first release. LVM2 support and bare metal recovery are not so urgent for me, so maybe I'll skip them for the first release and implement them later. And of course I'll keep you informed by blogging about Backtory from time to time…

No comments | Defined tags for this entry: backtory

Page 1 of 1