XQuery/eXist Replication
This document is outdated and should be either updated or removed.
Please check the pages on GitHub for most recent information.
Motivation
[edit | edit source]You want to configure two or more eXist instances to work together to automatically synchronize collection-specific data sets. This allows you to scale your eXist server capacity. For example, with multiple eXist servers configured to stay in sync as described below, you could add a load-balancer to distribute the load of incoming queries across the pool of servers and still maintain high performance.
Method
[edit | edit source]We will use the new eXist clustering options available only in eXist-db 2.1dev (bleeding edge) developer edition. This feature is based on using collection triggers to trigger "update" messages from a master server to one or more slave systems on remote hosts.
NOTE: This page is under development.
Terminology
[edit | edit source]In the long term there will be many different ways that the eXist clustering system might be configured. In this tutorial we will only cover collection-based replication from a single master to multiple slave systems. The figure below describes the relationship between the nodes in this replication configuration.
We will use the following terms in this document:
- Master - a specially configured eXist instance where active changes to documents and collections are happening. For example XML updates, documents are being stored, updated, or deleted. The master is considered the "publisher" of change events. This is also the server that must have an ActiveMQ server running.
- Slaves - a collection of one or more specially configured eXist instances that automatically receive updates when changes occur on the master. Each slave is considered a "subscriber" to the change events on the master.
- Message Store - a location outside the master server's eXist instance where all update events are stored and forwarded to remote systems when they are ready to receive the update events. Although the term "Message Queue" is frequently used in our case we will be using a "Topic", not a "Queue" for distributing messages to remote systems. ActiveMQ provides this function on the master server.
When any document changes on configured collections an update will be placed on a message queue. The update will stay on that message queue until all subscribers receive the update message.
NOTE: To be confirmed with the developers: Any collection on any system can be configured a master or a slave to other collections on other systems.
NOTE: Once "durable" messages are implemented, slave systems will not need to be running when the changes are made. When a slave goes down it will automatically be notified of all the changes since it last communicated to the master.
How Replication Works
[edit | edit source]The clustering replication configuration uses a standard compliant messaging system built around the Java Messaging System standard (JMS). The implementation eXist uses is based on the Apache ActiveMQ system. ActiveMQ is widely used as "middle-ware" to help applications communicate in a reliable manner.
Configuration Steps
[edit | edit source]Download and Configure the Apache ActiveMQ
[edit | edit source]- Download recent version from ActiveMQ from http://activemq.apache.org/download.html. Note that the TGZ file has additional Unix (Linux, Mac OS X) support, the ZIP file is for Windows.
- Extract content to disk, referred as $ACTIVEMQ_HOME
- Copy the $ACTIVEMQ_HOME/activemq-all-X.Y.Z.jar file to $EXIST_HOME/lib/user
For testing, I used activemq-all-5.6.0.jar.
Create eXist With Clustering Configuration
[edit | edit source]Note: The current work on clustering is being done in a branch of the eXist-db subversion repository. To build this branch checkout the following URL with a subversion client:
https://exist.svn.sourceforge.net/svnroot/exist/branches/dizzzz/clustering
This code will be moved into the main trunk at a future time.
Note that the extensions/local.properties has the following line in it that is not yet in the main trunk:
# Clustering extenstion for reliable document replication include.feature.clustering = true
You can then build eXist by using the $EXIST_HOME/build.sh or build.bat.
When you are done with the build you will see the following file:
$EXIST_HOME/lib/extensions/exist-clustering.jar
Configure the Master Server
[edit | edit source]Add a collection.xconf file for the directory for which the content must be distributed, e.g., for /db/mycollection/ the .xconf file must be stored in /db/system/config/db/mycollection/.
Create collection '/db/mycollection'
Fill in hostname of the activemq message broker (here, "server.local:61616").
file: /db/system/config/db/mycollection/collection.xconf
<collection xmlns="http://exist-db.org/collection-config/1.0">
<triggers>
<trigger class="org.exist.replication.jms.publish.ReplicationTrigger">
<parameter name="java.naming.factory.initial" value="org.apache.activemq.jndi.ActiveMQInitialContextFactory"/>
<parameter name="java.naming.provider.url" value="tcp://localhost:61616"/>
<parameter name="connectionfactory" value="ConnectionFactory"/>
<parameter name="destination" value="dynamicTopics/eXistdb"/>
<parameter name="client-id" value="id1"/>
</trigger>
</triggers>
</collection>
In the sample below the collection is named "mycollection"
Configure the Slave Servers
[edit | edit source]For each 'Slave', a job must be started via conf.xml; the job names must match the job name of the 'Master' configuration:
<!--
Start JMS listener for clustering feature.
Parameters:
java.naming.factory.initial Initial context provider
java.naming.provider.url URL of message broker
connectionfactory Name of connection factory
destination Name of destination (Topic or Queue)
client.id (optional) ClientID. Leave out or set ""
for default behaviour
-->
<job type="startup" name="clustering" class="org.exist.replication.jms.subscribe.MessageReceiverJob">
<parameter name="java.naming.factory.initial" value="org.apache.activemq.jndi.ActiveMQInitialContextFactory"/>
<parameter name="java.naming.provider.url" value="tcp://localhost:61616"/>
<parameter name="connectionfactory" value="ConnectionFactory"/>
<parameter name="destination" value="dynamicTopics/eXistdb"/>
<parameter name="client-id" value="id2"/>
<parameter name="subscriber-name" value="sub_name"/>
</job>
Start up the Servers
[edit | edit source]Start ActiveMQ server
[edit | edit source]Start ActiveMQ server:
cd ACTIVEMQ_HOME ./bin/activemq start (for mac, use the bin/macosx wrapper)
Start eXist-db server on slave(s) and master
[edit | edit source]Start eXist on each slave server and create collection that will mirror the slave
cd EXISTSLAVE_HOME ./bin/startup.sh Create receive collection '/db/mycollection'
Start Master
cd EXISTMASTER_HOME ./bin/startup.sh (No need to create the collection, since we already created above)
Test Document Distribution
[edit | edit source]ActiveMQ queues
[edit | edit source]When you use dynamic topics or dynamic queues, you can see if either master or slave has checked the queue by going to http://localhost:8161/admin/topics.jsp. Remember to refresh the page with F5 key.
Testing
[edit | edit source]On 'Master' create document in /db/mycollection/ (e.g. using java client, or eXide ; login as admin). The document will be automatically replicated to all of the slaves in the system.
Performance
[edit | edit source]With eXide, we can upload a +- 50k XML document to the slave, e.g., /db/mydoc.xml. Then, when we execute the following query, 2000 files (mydoc1000.xml to mydoc3000.xml) will be created on the server and replicated on the slaves.
let $doc := doc('/db/mydoc.xml') for $i in (1000 to 3000) return xmldb:store('/db/mycollection', concat('mydoc', $i , ".xml"), $doc)
Debugging Tips
[edit | edit source]Configure the Log4j system to debug mode.
On the Master system you should see the following lines:
2012-06-19 13:26:43,406 [eXistThread-90] DEBUG (Collection.java [storeXMLInternal]:1339) - document stored. 2012-06-19 13:26:43,406 [eXistThread-90] DEBUG (ClusterTrigger.java [afterCreateDocument]:63) - /db/mycollection/mydoc1000.xml 2012-06-19 13:26:43,406 [eXistThread-90] DEBUG (NativeSerializer.java [serializeToReceiver]:112) - serializing document 1430 (/db/mycollection/mydoc1000.xml) to SAX took 0 msec 2012-06-19 13:26:43,419 [eXistThread-90] DEBUG (JMSMessageSender.java [sendMessage]:156) - Message sent with id: ID:Dan-PC12-51166-1340109804913-3:1:1:1:1
On the Slave system you should see the following:
2012-06-19 13:48:05,875 [DefaultQuartzScheduler_Worker-2] DEBUG (NotificationService.java [debug]:94) - Registered UpdateListeners: 2012-06-19 13:50:06,218 [ActiveMQ Session Task-1] DEBUG (eXistJMSListener.java [onMessage]:138) - CREATE_UPDATE : DOCUMENT from /db/mycollection/mydoc1000.xml 2012-06-19 13:50:06,234 [ActiveMQ Session Task-1] DEBUG (ConfigurationHelper.java [getExistHome]:55) - Got eXist home from broker: C:\ws\exist-trunk\eXist
Other Configurations
[edit | edit source]Because messaging is such a general purpose way to communicate between computer systems there are many other possible business problems that can be solved by variations of this first example. Replication not only can be used for increased reliability but it can also be used in conjunction with load balancing and auto-scaling to increase performance when a system is under heavy load.
Messages can also be used to distribute queries among many nodes each with their own data collection. The results of queries are places on a results queue and returned to the user as though they were using a single very-fast server.
Because the master eXist system only needs to place an update event on a message queue, you are then free to use message stores in many different configurations to distribute both data and programs to remote sites with varying degrees of reliability.
Static or Dynamic Queues
[edit | edit source]There are two options to creating message queues:
- Static you can either define queues in ActiveMQ's configuration file which topics or quest must be created
- Dynamic The queues can be created when you use them for the first time
References
[edit | edit source]- Key Clustering README file in Subversion
- Apache Active MQ - home page of the Apache ActiveMQ software used by the clustering tools
- ActiveMQ In Action - good overview book of using ActiveMQ