Tuesday 13 December 2011

Capistrano: standard invocation note

To execute a capistrano task against a given host:

Friday 9 December 2011

MarkLogic: Using XCC/J to pass content to an installed Module

I've covered something similar to this before, but here's an example of how you invoke an external module (and pass several variables to it) using XCC:

Wednesday 2 November 2011

MarkLogic: Next steps with the MarkLogic Hadoop Connector (MarkLogic 5.0-1)

Following on from my post yesterday, here's a further example usage for the hadoop connector

Listed below is an example showing how you can use the connector to update an existing document in MarkLogic. As before, we're only focusing on the Map phase of Map/Reduce in order to keep things simple.


  • You'll need the hadoop connector on your buildpath, which is here
  • In addition to having installed MarkLogic 5, you'll need to create an XDBC server (for this example, I'm using port 9999 and targeting the Documents database).

When you're ready, create one XML doc in the database:

Then try the following:

This is the content for the XML config file src/main/resources/marklogic-mlnode-update.xml (referred to in the code above):

And the xcc/j connector for MarkLogic 5.0-1 is now in maven, so all other dependencies can be resolved by this pom:

Enjoy your elephant...

Tuesday 1 November 2011

MarkLogic: First steps with the MarkLogic hadoop connector (MarkLogic 5.0-1)

If you've just installed MarkLogic 5 and you're interested in spending some time checking out the hadoop connector, this may be a useful accompaniment to all the documentation available from the newly [re]launched MarkLogic Developer Community site.

There's quite a lot to get to grips with regarding the hadoop connector and this will probably be a topic warranting a number of posts over time. But as an additional resource for getting started (in addition to all the sample code and resources available here, this may be of some use.

Briefly, it shows a simplistic use of the connector - which in itself is not so useful as all it currently does is completely overwrite a document within the database with a new node, but it should give an idea as to how the connector could be configured to perform updates to existing data - which I hope to comment on within the next few days.


  • You'll need the hadoop connector, which is here
  • You'll also need to grab the latest xcc connector
  • In addition to having installed MarkLogic 5, you'll need to create an XDBC server (for this example, I'm using port 9999 and targeting the Documents database).
Insert one doc into the documents database (this will be overwritten when the hadoop job runs). Then try running this code: This is the content for the XML config file src/main/resources/marklogic-mlnode-replace.xml (referred to in the code above): I've used a couple of additional dependencies, so here's the contents of pom.xml for anyone using maven: More examples to follow!

Sunday 23 October 2011

MarkLogic/XCC/HttpClient: Connecting to an XDBC Application Server using HttpClient

This was written purely out of curiosity rather than anything else - pasted below in case it's of any interest to others. Written with HttpClient 4.1.2 in mind:

Wednesday 12 October 2011

Scala: Example of heirarchical sorting by distinct values

Here's a brief example of how scala can be used to sort using groupBy and mapValues:

In this context, bear in mind that sorted will return values sorted alphabetically - but you probably get the idea...

Thursday 6 October 2011

MarkLogic/Apache HttpClient: POSTing form data containing characters with diacritics

If you're using httpClient and you want to configure it to accept diacritic characters, you'll need to add an extra line of config:

Source: http://hc.apache.org/httpclient-3.x/preference-api.html

Friday 30 September 2011

Bash/shell: Quick hints for gathering information from the MarkLogic logs

One useful debugging / problem tracing tool I find myself using a lot is using grep with the -AXX argument, to find a specific error and dump the first XX lines of the stacktrace out. In this example, it'll find the word "cast" (for an invalid cast exception, for example) and dump the following 20 lines from that term

To get a list of unique GET requests made to a given http application server:

Tuesday 27 September 2011

Java: Getting started with the TrueZip API

Extremely simplistic example of how you can use the TrueZip API to extract a load of files from a given zip file - it'll dump the files in the resources folder for this example:

You'll need the following dependencies in your POM file:

Monday 19 September 2011

MarkLogic / XQuery - get a sample of document sizes across a given database

Here's a simple way to get the sizes of sample documents across a given databases

Friday 16 September 2011

XQuery/MarkLogic: pulling data out of an Excel spreadsheet

Excel spreadsheets can be saved in an xml format - in such cases, it's simple to dump the xml into a CQ buffer (if you're using MarkLogic) and parse the information in adjacent column cells.

In this example, I'm taking a very simple spreadsheet structure to illustrate the procedure:

Val 1a Val 1b
Val 2a Val 2b
... ...

Below is an example of how to parse the XML, pull out the information from the relevant cells and strip white space for good measure:

You should end up with something like this:

Monday 8 August 2011

SBT/ScalaTest: Running individual tests through the use of tagging

In a given suite of tests, one test can be tagged like so:

And to run only the specified test from within in SBT:

Tuesday 2 August 2011

MarkLogic: Example Word Query Specification Template

Below is an example configuration for creating a custom word query specification on a given database:

Wednesday 27 July 2011

Linux: ruby/libxcc bindings - configuring and testing

Outlined below are the steps for installing and configuring libxcc:

And some example usage:

MarkLogic: in-mem-update example

A template example showing the use of the in-mem-update library (shipped with MarkLogic):

MarkLogic: Search API - sort order template

A very brief template outlining a possible approach for a date range sort order (this example works on the premise that the range index is present on the element specified)

MarkLogic: Search API - removing facet constraints using XQuery and xdmp:set

This is some partial code outlining an approach for removing facet constrints from a query string. Thanks to David Cassel for suggesting the use of xdmp:set to maintain state throughout the process.

Wednesday 6 July 2011

Git: Specify Master Branch on git pull

In place of git pull --rebase, set this:

Thursday 30 June 2011

MarkLogic/XCC: Copying a Module with User Content Permissions

To get xcc to copy modules over with (execute) permissions pre-applied, you can do the following:

And you can test by running this against the Modules database in CQ:

And you should see something like this

Wednesday 22 June 2011

MarkLogic: clearing a forest in a database programatically

Quick example for clearing everything in the "Modules" forest - I've renamed the string to protect anyone doing a quick copy/paste on their dev environment:

MarkLogic: Enabling debug options when using the Search API

Adding the following to your options element gives you access to a wealth of incredibly useful information when working with the Search API (search:search):

Tuesday 21 June 2011

Linux: SCP syntax and structure note

The syntax looks like this:

Monday 13 June 2011

MarkLogic/XCC: Installing Modules Programatically

A quick example detailing how XCC can be used to install modules:

Saturday 4 June 2011

Eclipse: Installing XQDT (and DLTK) on Eclipse Indigo [3.7 RC3]

Some brief notes regarding getting set up with XQDT on Eclipse for XQuery app development...

Start by installing the Eclipse IDE for Java Developers from http://www.eclipse.org/downloads/index-developer.php (N.B. this is the link for the Release Candidate version of Eclipse 3.7 - for the current release, the link is http://www.eclipse.org/downloads/).

For everything else, you can use the Eclipse Install manager (Help > Install New Software...)

At outset, there are a couple of dependencies. First, the WST Server Adaptors from the Web, XML, Java EE and OSGi Enterprise Development section within http://download.eclipse.org/releases/indigo:

Then you need to install the DLTK 2.0 Core framework. To do this, add the update site repository http://download.eclipse.org/technology/dltk/updates/ to the "Work with" field and select Dynamic Languages Toolkit (DLTK) 2.0 and choose the package Dynamic Languages Toolkit - Core Frameworks

Finally, add the XQDT update site repository (also paste into the "Work with" input field) http://download.eclipse.org/webtools/incubator/repository/xquery/milestones/ and select all the necessary components:

Tuesday 31 May 2011

Bash: Find the size of a Folder from the shell

Something like this:

Will give you the size of the folder (in this case the Forests folder) in a "human readable" form, like this:

Tuesday 17 May 2011

Bash: recursive search for content in a file

Recursive search example using find, xargs and grep:

This will search all directories recursively, looking for all files containing the term "search:search"

Wednesday 27 April 2011

VirtualBox: Minimal Install of Fedora - no Network on restart

If you've successfully installed Fedora (Minimal network install) and you've found you don't have your network connection on first boot, try this from the shell:

Java: Handling Large XML Documents with VTD-XML

Here's a working example of how VTD-XML can be used to parse a large (~330Mb) XML file. In this example, I'm using AutoPilotHuge to match all page elements in the file. For each element encountered (there are 10s of thousands), the element data will be written to a file using the element's index within the document as a filename:

Tuesday 19 April 2011

XQuery: Replace example

The general format for the replace function in XQuery:

Returns 2007 (the four decimal values in the node)

Monday 11 April 2011

MarkLogic: Performing an operation across two databases using xdmp:eval

There could be situations where you want to perform operations across two databases. One such example could be to quickly clone a batch of documents into a second database.

This can be achieved using xdmp:eval.

In this example, if you have several documents in a database called "DatabaseA" (and in CQ this is selected as your Content Source). An xdmp:eval statement could be used to write those documents into DatabaseB like so:

Thursday 7 April 2011

MarkLogic: Sending and Receiving XML Content over HTTP

A fairly common requirement for an application built on MarkLogic Server could be to:

1. Wrap up content in an XML element and send it to a web service
2. On receipt, extract it from the request-body for further processing (insert, validate, transform etc)

To encode the XML element for transmission, you can use a combination of the right content-type options for xdmp:http-post (or xdmp:http-put) and xdmp:quote the element data:

For the service receiving the data, you use a combination of xdmp:unquote and xdmp:quote to extract the element from the request-body, but bear in mind that unquote will return a document-element. This can be easily resolved by switching the the first node() in the document like so:

Thursday 31 March 2011

MarkLogic: Inserting an XQuery Module into the database from within CQ (or XQuery)

This trick is useful for quick local testing. You can insert a module into the Modules database from within CQ by selecting "Modules" as the Content Source and then wrapping the module like so:

If you explore the contents of your modules database after executing this, you should see /simple-example.xqy in the list.

Friday 25 March 2011

Java: Connecting to a MarkLogic WebDAV server using Apache Jackrabbit and Digest HTTP Authentication

By default, MarkLogic application servers use digest http authentication, so if you're writing some application layer code that rely on connecting to MarkLogic WebDAV, here's a simple example to get started.

The code below uses Apache Jackrabbit and HttpClient 3.x to connect to a MarkLogic WebDAV application server and logs out a list of all the files in a specific location:

I'm using maven to manage my dependencies for this. If you are too, here's the required dependencies:

Tuesday 8 March 2011

Ubuntu / Apache: granting write permissions to a user on /var/www

Something like this should work:

Adapted from these posts:

Tuesday 18 January 2011

MarkLogic: Extracting useful information from a server/cluster support dump

Hopefully this will be a useful tip. In a situation where you need to extract information from a support dump, here are some examples involving loading the entire support dump as a sequence of document-node() elements and then pulling information out of them:

Here's an example for extracting information regarding specific forests:

Another example - a report for all forests (with a given name prefix) and their host node:

And another example: To give a total in-memory usage for all range indexes on a per-forest basis for the entire cluster (organised by host-id):

Blog Archive