Thursday, 8 April 2010

MarkLogic Search Note: cts:search vs. XPath

A quick note about MarkLogic's extensive search APIs with an emphasis on using cts:search

Trying an XQuery snippet like this in cq (or DQ):

Will return [1.0-ml] XDMP-UNSEARCHABLE.

One of the most important aspects of cts:search is that it uses MarkLogic's indexes. As soon as you assign a document to a variable as in the above example (let $doc := fn:doc("/uri/for/doc.xml")), the document is no longer considered within the context of the indexes as it becomes its own entity and - as such - is considered as an in-memory fragment, rather than a "loaded" document. As cts:search relies on the use of indexes - which is what makes it so fast - the error gets thrown.

There's another important distinction we should make at this stage too; the assignation to a variable means the requested doc gets stored in MarkLogic's Expanded Tree Cache. If you fill the cache with a document which is too big, you'll see XDMP-EXPNTREECACHEFULL exceptions and your XQuery will fail.

If do you need to obtain the document as a variable, you can always use XPath to pull values from the fragment - and in some cases this will not have any noticable effect on performance (MarkLogic's XPath handling is still pretty fast). However, I think it's safe to say that cts:search is almost always the right tool for the job and by using it you're getting your money's worth from the MarkLogic licence!

What's the workaround? The obvious one is to rewrite the original example like so:

However, as I've mentioned in previous discussions, when you assign documents to variables, you can use cts:contains - which will return an xs:boolean based on given search criteria. So this pattern will work:

I hope this is useful to anyone wishing to learn more about how to get the most out of MarkLogic

No comments: