Thursday, 22 April 2010

MarkLogic: Managing "slow" document delete times

I ran into a problem where a process ingests a lot of documents into MarkLogic on a fairly regular basis. One of the modes of this particular process is to run a "full" delete of all directories in a given area of the database and to re-ingest.

The problem we were noticing is that deletion of this folder was timing out, which in turn was causing major server performance issues, which led to a lot of "let's restart the server and see what happens" comments.

Briefly, the setup was a directory with a significant number of child directories, with each directory containing 10,000 XML documents or less. Attempting to delete any of the child folders in their entiriety (like so) would cause a timeout:



I found that doing this did seem to work (at least, it did for a folder containing 10,000 items):


However for each child directory (and there were *well* over 100), each one took 2-3 minutes to delete - which meant an ingestion process with an existing clearing of folders would have to take over 6 hours just managing those deletes.

I found this gem of a post on MarkMail: http://www.mail-archive.com/general@developer.marklogic.com/msg03616.html

And I discovered that a default setting on creation of a new database meant slow batch deletes.

Changing Directory Creation from the default 'automatic' to 'manual' led to almost instant bulk deletes and allowed me to delete the entire folder structure without timing out. In essence this setting seems to allow the delete to use the indexes, which means the files can be removed very quickly.

There are some caveats, so I'd recommend reading the article first before deciding whether this process could be suitable for your situation.

Hope this helps someone else experiencing the same problem.

No comments: