Web display engines only read and render mathematics markup already contained in an HTML
document.
The HTML document must be configured to use a display engine through its <script>
parameters, and most likely will require some CSS styling to accommodate different devices
accessing the content.
This contrasts with the LaTeX philosophy where an author should not have to worry about presentation and
styling.
Beyond this, one may need to convert entire documents containing a host of other content.
An article written in LaTeX, for example, may contain:
- Inline mathematics
- Block display mathematics
- Abstract
- Table of Contents
- Document Sections
- Tables
- Special formatting for content blocks such as theorems, lemmas, and definitions.
- Numbered or bulleted lists
- Images
- Figures
- Code Blocks
- Footnotes
- Automatic numbering for any of the above
- Bibliography
As many of these elements are also native to HTML, a direct conversion is theoretically
possible, but quite difficult in practice.
There are several document conversion tools to assist with this part of the process. Each has their own
strengths, and each conversion will require some additional testing for responsiveness across devices and
accessibility support. The tools discussed here are:
- Markdown
- Pandoc
- TeX4ht
- LaTeXML
- LaTeX.js
- LaTeX2HTML
- sTeX
Markdown
Markdown is not so much a
conversion tool as it is a markup language itself that is meant to be converted into
HTML. Markdown is a bridge between plain text and HTML, designed for fast authoring
but still readable as plain text. Markdown uses syntax such as *emphasis
text* to represent emphasis text and **strong** to
represent strong text. It is an ideal option for authors that want to minimize the amount of
additional coding needed to render their work.
Markdown can be written in any text editor, but requires a Markdown reader to render its output. Many
browsers, collaborative workspaces, and online forums can automatically
read and render Markdown. Markdown has also been expanded and customized within different
document editing software. See, for example, the entry on R Markdown below.
LaTeX markup can be included in a Markdown document by enclosing the LaTeX code between math mode delimiters,
just as in an HTML document. Then, the entire Markdown document can be converted to
HTML by a popular tool called Pandoc. This process is described below. Sample code is provided in the Code Examples section.
Pandoc
Pandoc is an extremely powerful "universal
document converter" capable of parsing and converting between hundreds of file formats. It is run from a
command line interface (CLI) by specifying the to- and from- formats along with any special
options available to those formats. An example is provided in the Code Examples section.
Pandoc can call MathJax (or another engine) to render mathematics when it converts a document
to HTML. The resulting HTML file will have MathJax automatically set up in the script section, and can work as a template for future use.
Pandoc can also convert directly from a LaTeX source file, but content
such as figures, tables, section headers, and other structured LaTeX syntax might not translate
as well into HTML. The recommended approach is to write the document in Markdown and
enclose mathematics between math mode delimiters. Pandoc uses its own expanded version of Markdown
(called Pandoc Markdown)
with its own syntax for this extra structure.
We encourage you to read about the many capabilities of Pandoc at the Pandoc home page.
TeX4ht
TeX4ht was developed in 2004 to convert directly from
TeX source files to HTML and several other formats. It is now included as part of the TeXLive and
MikTeX distributions. It incorporates MathJax (and other methods) for mathematics output, includes the ability
to add some custom CSS to the final HTML output, can convert documents written with
R Markdown and Overleaf, and has other unique features.
The article Making Accessible Documents Using LaTeX by Eric Larson and Isabel Vogt shows an example of how to use TeX4ht.
LaTeXML
LaTeXML was developed to convert LaTeX markup to an
internal XML format specifically for the Digital Library of Mathematical Functions (DLMF) project at the National Institute of Standards and Technology (NIST). In turn, the
internal XML can be used to export to HTML, with various customizable settings.
LaTeXML offers a sample web editor that converts user input to rendered HTML, with numerous built-in examples.
LaTeXML was recently used to convert the arXiv research library to HTML. This is an exciting development that could boost accessibility adoption for mathematics and science research everywhere.
LaTeX.js
LaTeX.js is an interesting engine that ports full LaTeX
documents into HTML while also reproducing their original LaTeX style formatting, similar to
LaTeX.css. The web site showcases a side by side comparison of a sample LaTeX document to the rendered
HTML. The engine is extensible and the project aims to automate translation of more native LaTeX
environments. LaTeX.js uses the KaTeX engine to render mathematical formulas, and thus does not have
accessibility support at this time.
LaTeX2HTML
LaTeX2HTML was way ahead of the curve, dating back
to 1999. It is a command line tool that converts a LaTeX document into several HTML files to
provide
a paginated navigation similar to the original LaTeX PDF output. It automatically generates necessary
hyperlinks, recreates lists, and converts mathematics, tables, and figures as into images. While the process
does not provide alt-text, an author can insert alt-text for all necessary components.
sTeX
sTeX is a set of LaTeX packages designed to embed semantic information in LaTeX generated output. sTeX can generate a PDF with some annotations and a (much richer) annotated HTML document using MathML with the OpenMath (or other user chosen) reference library. sTeX also offers a plug-in for LaTeXML.
sTeX is under active development, and requires some additional manual set up to use. Please read thesTeX documentation for detailed instructions and example content.