I'm trying to understand CRDTs, and Roshi provides an implementation of how CRDTs can be put to use in real-world applications. I've assess the documentation on Roshi and would like assent or clarification if I'm wrong.
After glossing over the readme of github.com/soundcloud/roshi I came away with the impression that Roshi's LWW-element-set implementation does not strictly adhere to the qualities of a CRDT, mainly that operations must be commutative.
Roshi documents two uses-cases:
If first we apply an add operation to the set, then apply a remove operation with the same tuple, the resulting set will ignore the remove operation:
A(a,1) R() + remove(a,1) = A(a,1) R()
If we apply the remove operation, then next the add operation, the resulting set will ignore the add operation:
A() R(a,1) + add(a,1) = A() R(a,1)
This means the resulting state of the set depends on the order of operations, which violates the commutative property of CRDTs.
As a consequence, if we assume this CRDT is replicated on each redis node in a Roshi cluster, then there is a case (admittedly rare and short-lived) where a Roshi cluster cannot be eventually consistent:
Given no more future operations on a set with key K, if,
node 1 contains a key K with set A(a,1) R(), and
node 2 contains a key K with set A() R(a,1), then the system cannot be eventually consistent.
Roshi seems to have implemented a weak form of CRDT, which under rare and short-lived situations impedes eventual consistent. But given the operational realities, this is totally acceptable.
Interesting. We have an event stream database implemented in Haskell, and this looks like an excellent way to index it. Especially associative, commutative and idempotent can (probably) all be encoded in the type system!
http://christophermeiklejohn.com/coq/2013/06/11/distributed-data-structures.html
Also, see related papers from PaPEC '14:
http://dl.acm.org/citation.cfm?id=2596631
Basically a lot of this:
http://basho.com/tag/crdt
But using Haskell's type system is an interesting idea.
You can start with one level higher than just looking for associative, commutative and idempotent, because that actually comes the definition of a semilattice
http://en.wikipedia.org/wiki/Semilattice
The crucial part is that you have an appropriate meet or join operator (and that it satisfies the above requirements).
Informally think of a functions like max() over natural numbers or a union() over sets. Those are some examples.
I don't know Haskell but found this interesting gist of someone who has tried this, and frankly a lot of stuff there is above my head, but it might help you:
https://gist.github.com/pozorvlak/6540639
I think some of the theory there is starting to get into operational transformation territory [1]. If that's the case then a really interesting application might be applying the same semantics indexing a stream of events to expressing events as a diff against state and propagating them to clients...
[1] http://en.wikipedia.org/wiki/Operational_transformation
http://www.cs.indiana.edu/~lkuper/
Has tons of papers and
https://hackage.haskell.org/package/lvish
is the lib.
With CRDTs the merge operation is made such that the simplify operation is id.
It's also interesting to note that the merge operation is just a pullback in the appropriate category.
http://hackage.haskell.org/package/lattices-1.2.1/docs/Algebra-Lattice.html
Twitter's Algebird[1] has a wide range of monoid implementations (in Scala). It's a surprisingly useful type class for something so simple.
I have a talk[2] and some code [3] that goes into more detail on CRDTs and the connection to monoids.
[1:] https://github.com/twitter/algebird
[2:] http://noelwelsh.com/programming/2013/12/20/crdts-for-fun-and-eventual-profit/
[3:] https://github.com/noelwelsh/crdt