TIBCO Activespaces – Best Practices – 4. Fault-tolerance and Persistence

Fault-tolerance and Persistence

Replication is what provides you with fault-tolerance

If you are a seeder or replicator of the key for a get operation, it will be serviced at the very low latency “in-process” speed.

  • If the size of the dataset is small, you can even ensure that reads on ALL keys will be at in-process speed if you set the replication count of the space to ALL.
  • Replicate ALL (the numeric value used in configuration for REPLICATE_ALL is -1 ) is in general very useful for smaller data sets where data is read a lot more often than it is modified:
    • lookup data
    • joins
    • and so on
  • With asynchronous replication, the write request is sent to the seeder for the key (probably another process), and then synchronously acknowledged back to the process issuing the write while simultaneously being sent to the replicators asynchronously.
  • With synchronous replication, the write request is sent to the seeder and all the replicators with acknowledgement back to the process issuing the write.
    • Synchronous replication does NOT mean that at the network level a single node sends the replication requests to all of the other nodes, because replication happens in a more distributed way in ActiveSpaces.
  • Persistence is what provides you with the ability to shut down all of the seeders or the entire Metaspace and then recover the data that was stored in the space when the system went down:
    • Shared-nothing persistence typically scales better than shared all persistence.
    • Shared-nothing persistence extends the replication degree of the space:
      • A shared-nothing persisted space with a replication degree of one means you won’t lose any data even if you lose all of the persistence files from one of the seeders.
      • However, there is no “automated recovery” of spaces in shared-nothing persistence. You must ensure that you have started enough of your processes doing seeding on the space before administratively triggering the recovery.
      • Also pay attention to the order in which the persisters were brought down if some write operations occurred after some of the persisters went down.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s