Text corpus search API

This document is a part of Royal Danish Library's APIs, and in particular The documentation on how use our texts. See also Licences & Legalese and Caveats

Try out the API here

Search for
filter query
result format
start record
number of records
Sort by
Query parser

Reset form!
 
      

Properties of the search index

All the texts that can be searched in using the API are in Text Encoding Initiative, TEI for short, markup.

ID and Relations fields

label description values
id
The ID of the record. It identifies the collection, the TEI file and is constructed as a string concatenation of that basename with the sequence of xml:ids identifying the uniq xpath to the content indexed.
string
volume_id_ssi
The ID of the volume that contain the node
part_of_ssim
Array of IDs of trunk nodes being containers the node at hand. Typically containing
  • One (or more) work(s) as a parent(s). Works may contain works.
  • A volume as an ancestor

Filter fields

label description values
cat_ssi
Category of a text. Use when limiting searches to works or to find volumes or find author portraits (biographies), omit otherwise.
work
volume
author
period
	    
type_ssi
Node type in document. A trunk node can be a whole work, a chapter etc, whereas a leaf could a paragraph of prose, a stanza (or strophe) of poetry or a speak in a dialog in a scenic work. For historical reasons, whole texts describing authors and periods are given type_ssi:work.
work
trunk
leaf
	    
genre_ssi
Genre of a leaf node. Note that this is not the genre of a work, but the structure of the paragraph level markup.
prose
poetry
play
	    
subcollection_ssi
Filter with respect to section
adl
sks
	    

Sort fields

position_isi
	  
The position of the current node along the sibling axis of the document. Sorting with respect to this field will guarantee that the result is presented in document order. (We cannot use page number, which might be a roman numeral or an arabic one).
integer
	    

Search fields

label description values
work_title_tesim
Misc. metadata fields. There are more of them, but they should be self explanatory.
just plain text
volume_title_tesim
work_title_tesim
author_name_tesim
text_tesim
The text
just plain text
speaker_tesim
The name of a character uttering something in a dialogue
just plain text

Examples

Find all works try it! (clicking on try it fills in the form to the left such that you may submit the search and then customize the search for your purposes. You might need to reset the form before a new search.)
cat_ssi:work
	  
Find all works by Gustaf Munch-Petersen try it!
author_name_tesim:munch
AND
cat_ssi:work
	  
Find all texts in dialogs (TEI <sp> elements) in ADL, written by someone called Jeppe try it!
genre_ssi:play
AND
subcollection_ssi:adl
AND
author_name_tesim:jeppe
	  
Find all texts in dialogs (<sp> elements) in ADL, spoken by a character named Jeppe try it!
genre_ssi:play
AND
subcollection_ssi:adl
AND
speaker_tesim:jeppe
	  
Find all strophes of poetry by Nikolaj Frederik Severin Grundtvig containing the words hjerte and smerte (heart and agony) try it!
type_ssi:leaf
AND
genre_ssi:poetry
AND
subcollection_ssi:adl
AND
author_name_tesim:grundtvig
AND
text_tesim:hjerte
AND  
text_tesim:smerte
	  
Find all dialogues in the plays by Holberg where someone is talking about Mester Erich try it!
genre_ssi:play
AND
subcollection_ssi:adl
AND
text_tesim:mester erich
AND
author_name_tesim:holberg
	  

Filter and sort examples

Find all works by Holberg containing poetry try it!. Steps in the search:
Search for author
author_name_tesim:holberg
	
Filter by genre_ssi:poetry, but return the record corresponding to the containing work rather than to the leaf node corresponding to a piece of poetry. Requires a database join:
{!join to=id from=part_of_ssim}genre_ssi:poetry
	
Poetry often consists of strophes containing lines (which may or may not contain rhymes and rythm). In TEI, strophes are lines in a line group element (<lg>). Find all strophes containing "regn" (i.e., rain) in poetry in volume 1 of Gustaf Munch Petersen's collected works.
Sort the result set in inverse document order Try it!
The actual search
volume_id_ssi:adl-texts-munp1-root
AND
text_tesim:regn
AND
genre_ssi:poetry
	  
The sort
position_isi desc
	  
A poem is, technically in TEI, a sequence of line groups (see above). Find all poems (i.e., works) containing strophes with "regn" (i.e., rain) in volume 1 of Gustaf Munch Petersen's collected works.
Sort the result set in the actual document order Try it!
The actual search
volume_id_ssi:adl-texts-munp1-root
AND
text_tesim:regn
	  
The join
{!join to=id from=part_of_ssim}genre_ssi:poetry
	
The sort
position_isi asc