Monday, 6 December 2010

MarkLogic - Using XQSync to backup a Security Database (Linux)

Here's an example of how you could use XQSync to backup your security database:

Monday, 22 November 2010

MarkLogic: Obtaining a full dump of the status of each forest in all databases

If you're in a situation where you need to get metrics of all your forests hosted on a MarkLogic instance, you can use xdmp:forest-status to get a lot of metrics such as number of stands, last state change etc. You could consider running this as a scheduled task over time if you need to chart how busy the database and forests are at any given time during the week. Here's the XQuery code:

Friday, 15 October 2010

MarkLogic/RHES - Gathering information about a MarkLogic instance for a support request

Useful in the case where you're experiencing problems with a live MarkLogic instance but are able to gain ssh access into the host server:

Tuesday, 12 October 2010

MarkLogic: Using xcc/j to execute a locally stored module on a remote server

Here's a brief example demonstrating how xcc/j can be used to execute a module from the local filesystem on a remote server, binding a couple of external variables in the process:

The example module to be loaded looks like this:

MarkLogic: Running RecordLoader with Powershell on Windows

Quick reference - here's an example of how you can use powershell to launch recordLoader to ingest some documents into MarkLogic. The script works on the premise that you have the relevant jars copied to c:\jars

1. Run Powershell as administrator
2. Allow script access by running Set-ExecutionPolicy RemoteSigned
3. Execute .\recordloader.ps1

You can test it with this wikibooks config file using the dump available here

Monday, 4 October 2010

Java: Cutting up a text file containing multiple XML Documents

This was put together in response to a request to handle a .txt file containing multiple XML documents (complete with XML Processing instructions). It runs through a text file (of any size really) and seeks out the start of each PI (Looking for the combination of '<' and '?' together). Each time it finds a PI it writes the content to an XML file. Not sure how useful this will be to anybody but I'm placing it here for reference anyway. It's been designed to handle text documents that look like this:

Or this:

Here's the class:

Sunday, 3 October 2010

MarkLogic: Creating Test XML For a Project...

Here's a quick way to generate a load of individual test docs in a given database:

Thursday, 30 September 2010

MarkLogic: Creating Multiple Range Element Indexes

Here's a quick script to create multiple element range indexes on a MarkLogic database, just set the $db-name variable:

Then pass in a sequence of xs:strings or a single xs:string like so:

For each index, it will report back whether the index was successfully created or not, like so:


Friday, 24 September 2010

MarkLogic: Listing all Collections in a Database

Looking to list all currently available collections on a given database?

Thursday, 23 September 2010

MarkLogic: Basic Database Cloning Script

A rough note on the process of copying a database from one place to another - attached for reference (in case I need something similar again). This script is untested and adapted from other code I've seen elsewhere:

Tuesday, 21 September 2010

MarkLogic: Estimating the number of documents contained within collections

Two ways of getting quick estimates on documents within collections and the time taken to search through them (using xdmp:elapsed-time):

And using cts:collection-query:

MarkLogic: Search: Compound OR Queries

Useful search tip:

You can pass sequences of words (and wildcarded terms if you have trailing wildcard searches enabled on your database) into an element word query, like so:

Monday, 13 September 2010

MarkLogic: XQuery: Check for the presence of an element

A useful way to combine cts:element-query and cts:and-query to test for the presence of an element.

The and query gets passed an empty sequence () and this will return either true or false depending on whether the element is found.

Thursday, 2 September 2010

MarkLogic: XQuery: Type Validation using xsi:type

Quick note on XQuery based type validation (using xsi:type):

When executed in cq, this should return true.

Tuesday, 3 August 2010

MarkLogic: XQuery: Safely converting xs:doubles to fully expanded values

A safe way to expand values represented in scientific notation:

Friday, 16 July 2010

MarkLogic: Application Servers: Handling custom http responses

Custom error handling example:

Thursday, 24 June 2010

XQDT: Notes for installing the current Milestone on Eclipse Helios

These are the steps I used in order to get the current XQDT milestone running with Eclipse Helios:

1. Install Helios (I tested this with the J2EE edition)

2. Install DLTK 1.0.2 Core Runtime (Locally):

3. Install the latest Milestone (Locally):

Wednesday, 23 June 2010

MarkLogic: XQuery: performance with Maps

Fastest way to check membership of a specific key-pair within a map:

Monday, 21 June 2010

Java: Using Jersey with JAXB to output XML or JSON.

A very brief (and simple) example of the principle behind harnessing Jersey's content negotiation ability whilst taking advantage of JAXB to marshall a bean from a Java Object to XML/JSON:

And the resource:

Testing the resource(s) using curl:

Which would yield:


Which would yield:

Friday, 18 June 2010

MarkLogic: XQuery: hacking the position() - a basic example

I'd been wondering whether there was a way in a FLWOR statement to get the index position. I'd tried a few things that I'd hoped might work but I couldn't find a decent way to get back to the context whenever I got to the return portion of the statement.

It became apparent that in some (not all) contexts, I could rely on this example as a quick (and cheap) workaround:

Yields this:

Update - John Snelson advised me on the correct way to deal with position in XQuery - using for and at when starting the FLWOR statement:

Also worth considering:

And another useful positional trick, suggested by Jason Booth:

Monday, 14 June 2010

MarkLogic: XQuery: and / and-not query examples

Structure for a query where the a document is selected based on two attribute values (and):

And to match where they're not the same (and-not):

MarkLogic: XQuery: UNIX Timestamp to xs:dateTime Conversion

A hopefully useful note for anyone in the situation where they need to convert from UNIX time to an XML dateTime()

First, to get a timestamp:

Which should return something like:

In cq/DQ, declare the following function:

Which can be run like so:

And should give you:

The XQuery code was taken (and updated for 1.0-ml) from this module:

Friday, 11 June 2010

MarkLogic: XCC/J: Passing a node() into a Query as an external variable

I'd noticed that the MarkLogic XCC drivers don't appear (at first glance) to allow you to pass an Object as a node() into a query. From a bit of research and some testing, I'd found that it would allow you to pass a String (as an xs:string) as an external variable - and from there - this can be converted into a document-node() using xdmp:unquote()

So if you're using XCC/J with MarkLogic and you want to pass a node() into a Query, here's a brief example of how you could achieve such a thing:

Some example XML:

Some example XQuery:

And putting it all together, one Java class containing everything:

In your MarkLogic ErrorLog.txt, you should something like this:

Wednesday, 2 June 2010

MarkLogic: XQuery: Typeswitching based on input "type"

In a previous post I'd created on Function Overloading in XQuery, there has been a brief discussion (so far) on whether you can overload functions based on type.

This was the given example:


Throws this:

Which appears to be part of the spec:

Is there a workaround? The only way around this issue that I know of right now is to use a typeswitch:


Will give you the expected results - albeit at the cost of having to write (and manage) more code.

Friday, 28 May 2010

MarkLogic: XQuery - dateTime intervals for date ranges

It's a simple requirement, but if you want to calculate date ranges using XQuery, this is the general format:

The time and date seven days ago:

The time and date seven days ahead:

Thursday, 27 May 2010

GIT: Rough notes on merge issues

I tried this:

And saw this:

So I did this:

And saw this:

Then I did this again:

And it updated (pulled) successfully.
Then I was able to do this:

Which showed something like this:

As I knew my last changes that were "stashed", this showed me a list of all the changed files currently stashed:

Then I was able to retrieve some changes from one of the stashed files by doing this:

Followed by:

Thursday, 20 May 2010

MarkLogic: XQuery - Notes on Return Types

Here are some brief notes and examples of specific return types in XQuery:

local:gen-string returns an xs:string. Not that uncommon, although I include this example here to demonstrate another way to concatenate in XQuery; this is something I often use with xdmp:log statements:

Functions which write elements and attributes can be tasked to return those specific types, like so:

However, as elements and attributes are nodes, it's also possible to make a function which returns a node(). This example will return an element containing one attribute:

In the above example, you could also return the content as an element(), but there are definite places where returning an element can be very useful. This example demonstrates a function which returns an element with a given name (which I've rather unimaginatively named 'blah' for this example).

Such an example may be useful for situations where you really want to be specific about the kind of content a function can return:

Example usage:

Finally, here's an example of a function which returns another specific type of node: a document-node() representing a document which is stored in the database:

Example usage:

Friday, 14 May 2010

RedHat: Unpacking an .rpm file

This is the syntax:

Thursday, 6 May 2010

MarkLogic: Gathering information about a forest

Here's a very simple example of how you can get to specific information regarding a given forest in your database. Here we're interested in returning the document count (number of documents) in two forests and returning the results in some simple XML:

Which can also be expressed as:

Thursday, 29 April 2010

MarkLogic: XQuery - Creating XML Elements and Attributes programmatically

In the past, I've written XQuery modules which populate a particular sequence of elements in order (adding in the relevant values and returning the result). I had a brief discussion with Philip Fennell today who informed me of a useful aspect of XQuery: Element and Attribute constructors.

Below is an example of how they can be used with MarkLogic - hopefully useful if you want to generate sequences from the raw components: name and value pairs:

MarkLogic: XQuery Function Overloading

I remember seeing in a few places that XQuery doesn't support function overloading. I also remember seeing examples where function overloading is used.

This simplistic example shows overloading at work - as written for MarkLogic (although the example could be easily modified for eXist:

Tuesday, 27 April 2010

MarkLogic: Viewing Installed Modules on a Server

The format for this is:

Another useful trick for getting a list of installed modules - suggested by Lee Pollington - is to use cts:uri-match. You'll need to enable the URI lexicon to make this one work:

Thursday, 22 April 2010

MarkLogic: Dump the XML Configuration of a given Database in cq/DQ

Useful if you want to dump the current database settings as an XML doc; open cq/DQ, set your source and run this against a specific database to get the XML configuration settings back from MarkLogic

MarkLogic: Managing "slow" document delete times

I ran into a problem where a process ingests a lot of documents into MarkLogic on a fairly regular basis. One of the modes of this particular process is to run a "full" delete of all directories in a given area of the database and to re-ingest.

The problem we were noticing is that deletion of this folder was timing out, which in turn was causing major server performance issues, which led to a lot of "let's restart the server and see what happens" comments.

Briefly, the setup was a directory with a significant number of child directories, with each directory containing 10,000 XML documents or less. Attempting to delete any of the child folders in their entiriety (like so) would cause a timeout:

I found that doing this did seem to work (at least, it did for a folder containing 10,000 items):

However for each child directory (and there were *well* over 100), each one took 2-3 minutes to delete - which meant an ingestion process with an existing clearing of folders would have to take over 6 hours just managing those deletes.

I found this gem of a post on MarkMail:

And I discovered that a default setting on creation of a new database meant slow batch deletes.

Changing Directory Creation from the default 'automatic' to 'manual' led to almost instant bulk deletes and allowed me to delete the entire folder structure without timing out. In essence this setting seems to allow the delete to use the indexes, which means the files can be removed very quickly.

There are some caveats, so I'd recommend reading the article first before deciding whether this process could be suitable for your situation.

Hope this helps someone else experiencing the same problem.

Saturday, 10 April 2010

MarkLogic (x64) install on Ubuntu 10.4

The following steps were required:

Friday, 9 April 2010

MarkLogic: Techniques for querying in-memory fragments using cts:contains

This snippet demonstrates the use of cts:contains and cts:element-attribute-word-query on an in-memory fragment (something that has been stored in the Expanded Tree Cache using a let statement):

To really add power to the search, you can use cts:contains with a cts:and-query. For this example, I want to return link(s) with a class of featured and a rel containing the value "video":

This last example demonstrates the use of an additional cts:or-query. This example creates a list of featured videos and documentation. Using the 'or' query will return any links that have the "featured" class attribute and have a rel of "video" or "pdf":

xquery version '1.0-ml';

let $x :=
Mark Logic Application Builder
Mark Logic Application Builder
Mark Logic Corporation
Mark Logic 4.1 Install Guide


Using cts:element-attribute-word-query on an in-memory fragment

Featured Videos and Documentation

    for $item in $x/a
    where cts:contains($item,


    xs:QName("a"), xs:QName("class"),
    "FEATURED", "case-insensitive"),


    xs:QName("a"), xs:QName("rel"),
    "VIDEO", "case-insensitive"),
    xs:QName("a"), xs:QName("rel"),
    "PDF", "case-insensitive")
  • {$item}

  • }

Thursday, 8 April 2010

MarkLogic: Query Optimisation Notes (Part One: getting started with xdmp:query-meters())

Here are some brief notes regarding query performance tuning in MarkLogic using xdmp:query-meters()

A simple starting point involves using xdmp:estimate to find out how many documents you currently have in MarkLogic:

As this is an estimate - and as such, is returning the result direct from the indexes - it's going to return a result set almost instantaneously. Changing the xdmp:estimate to fn:count will take fractionally longer to compute a result. Also, removing the /qm:elapsed-time/text() will give you the full breakdown of where the indexes and caches are being hit (and where they're not).
You could express that like so:

xdmp:query-meters tend to become really useful when you use them with cts:searches, a simple example of such use would be:

On a big result set, however, this could take a while as it will return all of the matched XML documents in that set - what normally happens is that the query will resolve quickly, then the rest of the time can be taken up sending vast quantities of XML back to the browser over the network. This can - and with big result sets often does - cause both your machine and your browser to become unresponsive, so it's always best to estimate the size of the resultset before you attempt this!

So for situations where you want the output from query meters but don't want MarkLogic to stream megabytes of XML back to cq/DQ (or your middle tier layer), you can use fn:count. After all, in most cases when you're getting query stats you're probably more interested in the result timings rather than the result set. So here's the query re-written to return just the number of records (and not the results themselves:

Part two will discuss example usage(s) for the xdmp:query-trace(true()) and xdmp:query-trace(false()) functions.

MarkLogic Search Note: cts:search vs. XPath

A quick note about MarkLogic's extensive search APIs with an emphasis on using cts:search

Trying an XQuery snippet like this in cq (or DQ):

Will return [1.0-ml] XDMP-UNSEARCHABLE.

One of the most important aspects of cts:search is that it uses MarkLogic's indexes. As soon as you assign a document to a variable as in the above example (let $doc := fn:doc("/uri/for/doc.xml")), the document is no longer considered within the context of the indexes as it becomes its own entity and - as such - is considered as an in-memory fragment, rather than a "loaded" document. As cts:search relies on the use of indexes - which is what makes it so fast - the error gets thrown.

There's another important distinction we should make at this stage too; the assignation to a variable means the requested doc gets stored in MarkLogic's Expanded Tree Cache. If you fill the cache with a document which is too big, you'll see XDMP-EXPNTREECACHEFULL exceptions and your XQuery will fail.

If do you need to obtain the document as a variable, you can always use XPath to pull values from the fragment - and in some cases this will not have any noticable effect on performance (MarkLogic's XPath handling is still pretty fast). However, I think it's safe to say that cts:search is almost always the right tool for the job and by using it you're getting your money's worth from the MarkLogic licence!

What's the workaround? The obvious one is to rewrite the original example like so:

However, as I've mentioned in previous discussions, when you assign documents to variables, you can use cts:contains - which will return an xs:boolean based on given search criteria. So this pattern will work:

I hope this is useful to anyone wishing to learn more about how to get the most out of MarkLogic

Thursday, 25 March 2010

MarkLogic: Loading and using XML Schemas (Part Two : Schema Aware XQuery)

Now the schema is loaded into MarkLogic, here are some rough notes on handling the "Schema Aware" facets of the server. Again, this information is pulled together from various MarkMail posts and a bit of trial and error:

Executing this should return the element. It should be noted that the formatted example, all the tags appear with lower-case element names. These will need to be changed back if you're following with the example schema and code.

To prove that the content is being validated against the schema, changing the date to something nonsensical (add a couple of random characters in, for example), should yield something like this:

[1.0-ml] XDMP-VALIDATEBADTYPE: (err:XQDY0027) validate strict { $input } -- Invalid node type: emp:DateOfBirth lexical value "2006-05-0ss4" invalid for expected type #xs:date at /emp:Employees/emp:Employee/emp:DateOfBirth using schema "rdl.xsd"

Namespace Prefixed Example

This example may be more useful if you're dealing with elements whose content comes from multiple namespaces:

MarkLogic: Loading and using XML Schemas (Part One)

These are some rough notes on techniques for using XML Schemas with MarkLogic.

Loading an XML Schema using cq:

The quickest way I've found to get a schema into your local MarkLogic instance is to copy your xsd file into your MarkLogic Docs folder (I also made a folder called schemas), then open a cq instance (or DQ if you prefer), select Schemas as your content source and use some XQuery like this:

Loading an XML Schema using XCC/J:

Another method would be to use XCC/J - I hope to write another post on this technique at another time with a more detailed example (note this method is untested at the time of writing):

After the cq process has been executed, running either:


Should show you the schema.

It's important to note that when MarkLogic loads schemas into its database, it will put in any values that have defaults on load. This caused some confusion when I was testing the "Schema Aware" aspects of the server. I added the following attribute to my schema's parent element:

Here's a very basic example of a schema layout:

Most of this information is available elsewhere, so this post will hopefully serve to pull a few pieces together in the interests of saving time.

Recommended reading:

Wednesday, 24 March 2010

MarkLogic: testing for a specific word in local content

This returns a simple boolean response based on whether a specific word can be found in an element bound to a variable

Tuesday, 16 February 2010

Marklogic: Notes on Collations

This is going to be an evolving post. In the meantime, using cq or DQ, something like this will return the uri of the default collation:

Some other useful MarkLogic encoding tools include url-encode:

And url-decode:

Both techniques could be used in a situation where you're sending characters that would require encoding/decoding using xdmp:get-request-field.

codepoints-to-string will return an xs:string from a series of unicode codepoints (as an example taken from the API docs):

Friday, 12 February 2010

MarkLogic: Creating a Merge Blackout Specification

Brief note on the creation of a recurring merge blackout spec for MarkLogic 4.1-4

An important note is that just using xs:time will stick with GMT (Zulu) time. This means the blackout will be an hour out-of-step. As we want to observe BST:

Can be tested using cq or - if you prefer: dq and should return the merge specification as an xml node. Once you have that, you can do something like this:

After that, go to your MarkLogic admin interface > Databases > dbname > Merge Policy and you should see the new changes.

On a final note about timezones, you can always extract the timezone offset from the system's current dateTime by doing this:

Blog Archive