Wednesday 2 November 2011

MarkLogic: Next steps with the MarkLogic Hadoop Connector (MarkLogic 5.0-1)

Following on from my post yesterday, here's a further example usage for the hadoop connector

Listed below is an example showing how you can use the connector to update an existing document in MarkLogic. As before, we're only focusing on the Map phase of Map/Reduce in order to keep things simple.

Prerequisites:

  • You'll need the hadoop connector on your buildpath, which is here
  • In addition to having installed MarkLogic 5, you'll need to create an XDBC server (for this example, I'm using port 9999 and targeting the Documents database).

When you're ready, create one XML doc in the database:

Then try the following:

This is the content for the XML config file src/main/resources/marklogic-mlnode-update.xml (referred to in the code above):

And the xcc/j connector for MarkLogic 5.0-1 is now in maven, so all other dependencies can be resolved by this pom:

Enjoy your elephant...

Tuesday 1 November 2011

MarkLogic: First steps with the MarkLogic hadoop connector (MarkLogic 5.0-1)

If you've just installed MarkLogic 5 and you're interested in spending some time checking out the hadoop connector, this may be a useful accompaniment to all the documentation available from the newly [re]launched MarkLogic Developer Community site.

There's quite a lot to get to grips with regarding the hadoop connector and this will probably be a topic warranting a number of posts over time. But as an additional resource for getting started (in addition to all the sample code and resources available here, this may be of some use.

Briefly, it shows a simplistic use of the connector - which in itself is not so useful as all it currently does is completely overwrite a document within the database with a new node, but it should give an idea as to how the connector could be configured to perform updates to existing data - which I hope to comment on within the next few days.

Prerequisites:

  • You'll need the hadoop connector, which is here
  • You'll also need to grab the latest xcc connector
  • In addition to having installed MarkLogic 5, you'll need to create an XDBC server (for this example, I'm using port 9999 and targeting the Documents database).
Insert one doc into the documents database (this will be overwritten when the hadoop job runs). Then try running this code: This is the content for the XML config file src/main/resources/marklogic-mlnode-replace.xml (referred to in the code above): I've used a couple of additional dependencies, so here's the contents of pom.xml for anyone using maven: More examples to follow!