The Termcat Markup Language

Termcat is a markup language optimized for scientific and technical writing. It compiles to HTML and MathML (using MathJax). To generate PDFs, Prince can be used.

Termcat takes inspiration from LaTeX and Markdown. From Markdown it copies the user-friendly syntax for titles, lists, links, emphasis, and so on. Termcat aims to be more powerful, however. Like LaTeX, it is a programming language and it also comes with special syntax for typographic features that are difficult to unlock using plain Markdown or HTML.

Contents

The Basics

The syntax for titles is the same as in Markdown.

# This is a document title 

## This is a section title 

### This is a subsection title

So are the syntax for emphasis (*emphasis*), underline (_underline_), and links ([links](http://example.com)). Caveat: To emphasize or underline more than one word, it’s necessary to enclose the text fragment in a block. Parentheses can be used for this purpose: **(This is strong text)** renders as This is strong text.

Bullet lists can be created by starting a line with a hyphen followed by a space.

Consider the following code:

- Item 1
- Item 2
- Bullet items can consist of multiple paragraphs.

  Simply indent subsequent lines with two extra spaces.

  - Bullet lists can also be nested
    - Once more the trick is to indent the nested lists with two extra spaces.

This renders as follows:

By default, blocks of text that are indented with two spaces are considered blockquotes.

The syntax ![alt text](filename.png) includes an image in the document.

One symbol of anarcho-syndicalism is a black cat, drawn by Ralph Chaplin. It was also the logo of my go-to Bagel place when I lived in Brooklyn.

Upright single quotes ' render as right quotes (or apostrophes) and backticks ` render as left quotes . Termcat automatically inserts extra horizontal whitespace after periods, question marks, exclamation marks, and colons, but o.c. not after acronyms. Long dashes can be included using double (--) or triple () hyphens.

There’s special syntax for diactritics:

Users of pīnyīn can also use the notation %1e, %2e, %3e, %4e for , , , and .

It’s possible to change the space between characters: l%-2pt%ess is m%+0.4ex%oreless is more.

Similarly, spaces of variable width can be created using the syntax %_1em. Like in LaTeX, tildes can be used for non-breaking spaces: 100~kg100 kg.

Two subsequent percent symbols %% indicate the start of a comment. The comment continues until a newline character is encountered.

When % is used in other contexts then it inserts a zero-width space. This space will not make it into the output HTML document. This is useful when you want to trigger syntactic sugar that only works after or before whitespace. For example, type a%**b**%c for abc.

Put a backslash \ in front of a symbol to indicate that the symbol is content, not syntax: \%%.

HTML

When HTML tags or entities are encountered in Termcat documents, they ‘fall through’ to the HTML output. Hence <b>this &amp; that</b> renders as this & that.

When a <head> block is encountered, the code within that block is moved to the head of the output HTML document. This way the title of the document, stylesheets, and so on can be changed.

The Structure of a Termcat Document

Termcat documents are trees. The nodes of these trees—‘blocks’ in Termcat parlance—are delimited by parentheses (), brackets [], braces {}, indentation, or bullet marks. One important consequence of this is that parentheses, brackets, and braces must at all times be balanced (unless they’re escaped by backslashes).

For many purposes it doesn’t matter how a block is delimited. For instance, a numbered list can be created using the notation

:numbered-list
- item 1
- item 2
or
:numbered-list(item 1)[item 2]
and it will render as
  1. item 1
  2. item 2

Math

Termcat has a novel syntax for mathematics. Rather than having delimiters that indicate the start and end of a mathematical expression, Termcat expects to be told what (i) the identifiers, and what (ii) the operators, functions, and predicates are in an expression.

At least it expects to be told these things for some of the expressions that you type. Termcat can be programmed to recognize common operators and identifiers, and it can also draw inferences.

To mark a character as a mathematical identifier, prefix it with an asterisk *: *aa. Similarly, when Termcat encounters the expression *abc then it will infer that ‘b’ and ‘c’ are also identifiers. The result is abc. To turn a word into a single identifier, type *(op). This renders as op—as in LaTeX, multi-character indentifiers are not italicized. Mathematical identifiers can also be typeset in a caligraphic font by using + instead of *. Thus, +M becomes M.

To use the infix minus operator, type x ~-~ y. This yields x-y. Termcat automatically infers that the expressions to the left and to the right of the infix operator are mathematical identifiers.

Does typing ~-~ over and over again sound like a chore? No problem. Simply include the following code at the top of your document:

!bind
- -
- ~-~
From that point onwards, Termcat will interpret every minus sign surrounded by whitespace on both sides as a binary infix operator.

Predictably, for prefix operators you can type sin~ x or x ~!. This renders as sinx or x!.

When one tilde is used on either side of an operator, Termcat will use the default MathML spacing for that operator. Use two tildes to insert a ‘thick math space’ and three tildes to use a ‘very very thick math space’.

It may sometimes be necessary to type (~ and ~) to indicate that parentheses (or brackets or braces) should be interpreted as mathematical ‘fences’: (~ x ~,~ y ~](x,y].

Type <~ and ~> to insert chevrons.

Using some of the above knowledge we can use the following code

!bind
- =
- ~=~
!bind
- ,
- ~,~
!bind
- SUBSETEQ
- ~⊆~
!bind
- +
- ~+~
!bind
- *
- ~×~
!bind
- |
- ~~|~~
!bind
- <
- ~<~

E = {<~ a , n , n' ~> SUBSETEQ A * N * N | Pa and n < n'}

to generate

E={a,n,nA×N×N|Pa and n<n}

Notice that Termcat also correctly recognizes primes. It’s also possible to use subscripts, superscripts, and fractions using syntax like this:

To coerce something to be text, prefix it with a double upright quotes character ".

Programming

Functions are invoked using the syntax function-name(arg 1)(arg 2)(arg 3). The arguments are blocks and can be delimited by parentheses, brackets, braces, indentation, or bullet list marks as usual. The numbered list in the section The Structure of a Termcat Document is an example of a function call.

There’s also experimental support for user-defined functions. The following Termcat code computes the fifth Fibonacci number:

!bind
- fib
- .fn
  - self n
  - .if
    - .eq(n)(0)
    - 1
    - .if
      - .eq(n)(1)
      - 1
      - .add
        - self(self#)(.add(n)(-2))
        - self(self#)(.add(n)(-1))
  #

fib(fib#)(5)

It reduces to 8.

Usage Instructions

Termcat can be embedded in web pages or it can be used from the terminal. Here’s the syntax for calling it from the terminal:

Usage: java -jar termcat.jar [options] <document[.tc]>

The HTML output is stored in document.html.

  -b, --browse   Open HTML output in browser
  -w, --watch    Watch document for changes and recompile when changed
  -v, --verbose  Use verbose output

Combine the switches -b and -w for a live preview of your Termcat document: Any change you make in the source document will be automatically reflected in your browser. It’s worth noting that when a document is changed, Termcat is smart enough not to recompile the blocks that it already processed previously.

To use Termcat from JavaScript, first load termcat.js. Next, call the function termcat.core.compile with a string for argument to transform Termcat documents into HTML code. Optionally, a mutable cache can be passed as a second argument; termcat.rewrite.cache() returns a new cache.

Termcat is written in Clojure(Script) and is an open source project. You can find the Github project here. Email, bug reports, and pull requests are welcome!

The current version of Termcat should be considered a prototype. I find it quite usable as-is, but nothing is set in stone and a lot of features are still missing. There’s a lot of room for performance enhancements too.