Tuesday, 1 November 2011

MarkLogic: First steps with the MarkLogic hadoop connector (MarkLogic 5.0-1)

If you've just installed MarkLogic 5 and you're interested in spending some time checking out the hadoop connector, this may be a useful accompaniment to all the documentation available from the newly [re]launched MarkLogic Developer Community site.

There's quite a lot to get to grips with regarding the hadoop connector and this will probably be a topic warranting a number of posts over time. But as an additional resource for getting started (in addition to all the sample code and resources available here, this may be of some use.

Briefly, it shows a simplistic use of the connector - which in itself is not so useful as all it currently does is completely overwrite a document within the database with a new node, but it should give an idea as to how the connector could be configured to perform updates to existing data - which I hope to comment on within the next few days.

Prerequisites:

  • You'll need the hadoop connector, which is here
  • You'll also need to grab the latest xcc connector
  • In addition to having installed MarkLogic 5, you'll need to create an XDBC server (for this example, I'm using port 9999 and targeting the Documents database).
Insert one doc into the documents database (this will be overwritten when the hadoop job runs). Then try running this code: This is the content for the XML config file src/main/resources/marklogic-mlnode-replace.xml (referred to in the code above): I've used a couple of additional dependencies, so here's the contents of pom.xml for anyone using maven: More examples to follow!

No comments: