bibtex2web: create webpages from BibTeX bibliography files

The bibtex2web package creates webpages from BibTeX bibliography files. It can be downloaded from http://homes.cs.washington.edu/~mernst/software/#bibtex2web.

Contents:


Overview

It is convenient to produce webpages from BibTeX files, because you only need to keep one set of (BibTeX) sources up to date, thus avoiding skew between your webpages and bibliographies.

Given a collection of BibTeX files, bibtex2web creates

Each of these webpages is built from a template that you can customize to your liking; see below.

Here are two examples of sets of webpages that were created by bibtex2web:

The bibtex2web package consists of three programs:

For examples of how to call these programs, see file examples/README in this distribution.

Many other BibTeX to HTML translators exist.


Installation and use

To install bibtex2web, obtain the distribution (linked from http://homes.cs.washington.edu/~mernst/software/#bibtex2web), unpack it, and then set your BPHOME environment variable to point to its lib/ directory. For example, in csh, add something like

  setenv BPHOME ${HOME}/bibtex2web/lib

to your .cshrc file. You may need to log out and log back in for this to take effect.

Then, you can run the programs in bin/. (You may add that directory to your path if you wish, but it is not required.)

Files examples/README and examples/Makefile in this distribution for examples of how to run bibtex2web and examples of the various command-line arguments that bibtex2web's program accept. The easiest way to use bibtex2web is to follow the instructions in examples/README and modify examples/Makefile to fit your goals. More details will be added to this manual later.


BibTeX fields

bibtex2web works with ordinary BibTeX files, but it can take advantage of several additional BibTeX fields:

abstract
The abstract, in LaTeX format. Do not include blank lines to separate paragraphs; use “\par” instead.
basefilename
The file name for the article itself, without the extension. Automatically recognized extensions are “.pdf”, “.doc”, “.docx”, “.key”, “.ppt”, “.pptx”, “.odp”, “-slides.pdf”. You must put these files in the destination directory before running the programs. The basefilename is also used for the per-article webpage (which includes the abstract). If the basefilename is not specified, then the entry's cite key is used instead.
downloads
A list of other downloads, in addition to those automatically detected by virtue of matching the basefilename. The list is semicolon-separated, and each entry consists of a URL (which may not contain whitespace) and anchor text (which may contain whitespace), separated by whitespace.
downloadsnonlocal
A list of other downloads, used (in addition to the “downloads” field) only if no local files are found via the “basefilename” mechanism. In other words, if you have a “basefilename” attribute, and bibtex2web finds at least one of the files it refers to, then bibtex2web skips processing the “downloadsnonlocal” attribute.
nodownloads
If present, this field suppresses a warning that there are no downloads for a given paper.
supersededby
A comma-separated list of keys of articles that supersede this one. A superseded article does not get its own per-article webpage, but is briefly noted on the webpage of each article that supersedes it.

Optionally, in the comma-separated list, the key may be followed by whitespace and (comma-free) text; when bibtex2web generates webpages, that text is used instead of “A previous version”. For instance, in a subsequent TR version, you could add the field
                  supersededby = "ConfVer An extended version",
                
to ensure that the BibTeX entry with key “ConfVer” remains the canonical version.
category
The name of the topic under which this article should be listed in the by-topic webpage.

In general, when adding a new entry, one should choose an existing category rather than making up a new one (lest you end up with few papers per category, defeating the purpose of this field).
summary
A brief description of the article that appears in the by-topic webpage. To include the full abstracts of all the papers would make that webpage too long. For example, see http://pag.csail.mit.edu/pubs/bytopic.html (where the standard is a 3-line description in the BibTeX file).
alsosee
This field is not currently processed.

You can also define your own additional fields. For instance, the example Makefile that is distributed with bibtex2web uses the “-filter” argument to the programs to make them ignore any article containing an “omitfromcv” field. Additionally, it ignores any entry containing an “onlycrossref” field (unless that field was inherited via a crossref); this permits info about just the conferences from appearing. Another use of the “-filter” argument is to create a separate webpage for any article containing an “underreview” field.


Command-line arguments

The main program of the bibtex2web package is bwconv.pl. You can supply it a variety of command-line arguments.

-format=informat[,outformat]
-outformat=outformat
Required. -outformat need not be supplied if the optional outformat part of the -format argument is supplied. For instance, legal invocations include:
 -format=bibtex,htmlpubs
 -format=bibtex,htmlsummary
 -format=bibtex -outformat=htmlabstract
-outopts
Additional arguments for the output format.
-to file
Places output in file.

Output formats

The output formats supported by bibtex2web are as follows.

htmlabstract

Creates one page per publication, giving the abstract and other details and links to the paper itself.

If the “-linknames link-names-file” command-line option is also given, then each author name (or conference name, etc.) becomes the anchor text for a link to that author's homepage. It is recommended that you use, as the argument to -linknames, the file plume-lib/bin/html-canonical-urls, from the plume-lib project. You can supply the -linknames option multiple times, so you can use the html-canonical-urls file and then also your own augmentations. Feel free to contribute improvements to the plume-lib version of the file.

By contrast, if the linkauthors option is given, then author names on abstract pages are linked to the authors' publications lists. Here is how to customize that behavior:

-outopts=linkauthors
Link author names to authors' (bibtex2web-generated) publications lists.
-outopts=linkauthors:myauthorslist
Override the default “authors” filename.
-outopts=withbibtex
Place a BibTeX entry on the summary page, ready for readers to cut and paste into their own bibliographies.
-outopts=linkauthors\ withbibtex
Separate multiple options with spaces.

If the “-validurls urls-file” command-line argument is given, then each URL in the file (one per line) is considered to be valid and is not checked for validity.

htmlsummary

(The htmlsummary format needs to be documented here.)

htmllist

The htmllist output format generates a list of entry titles, each of which links to the abstract page for that entry, separated by <br /> line breaks. This is useful to generate a list of “recent publications” on a home page. For example, see “Selected Publications” list on http://pmg.csail.mit.edu/.

Example make rule:

index.html:
        ${BWBIN}/bwconv.pl -format=bibtex,htmllist -outopts=limit:5\ abstract_dir:pubs -headfoot index-headfoot.html -copyright ../copyright -to $@ $(FILTER) ${BIBFILES}

The “limit” output option limits the list to the specified number of entries (without this option, list is unlimited). The “abstract_dir” option specifies the relative directory containing the per-entry abstract files (default “../pubs”).

Webpage templates

You can specify how the generated webpages look by supplying templates. The most common of these is supplied by the -headfoot argument to the bwconv.pl program.

A template turns into the final webpage, but certain special strings are replaced replaced first:

BODY
contains the actual content produced by bibtex2web
BIBTEX2WEB_NOTICE
is replaced by

This page was generated $timestamp by bibtex2web

where $timestamp is the local time in ctime(3) format.
COPYRIGHT_NOTICE
is replaced by the contents of the file specified by the -copyright parameter. Bug: this only works for files generated by bwconv directly, so it doesn't work yet for the author index page (which uses make-author-pages.pl) or the per-paper pages (which uses htmlabstract-split.pl).

Supporting new LaTeX commands

bibtex2web has built-in support for many LaTeX commands, but you may find additional commands that are not supported. A common symptom of an unsupported command in an abstract is the warning

bp warning (main): Unknown TeX characters (backslashes) in ...

To support a new LaTeX command, you need to add information about how to convert it to bibtex2web's internal representation (based on Unicode) and from that representation to HTML and other formats. A good way to find the places you need to change is to grep for “017C”, which is the Unicode code for a z with a dot above it (ż), or for “21D2”, which is a right arrow (⇒), and then mimic one or the other of them.

You can find Unicode character codes (and HTML equivalents) at one of these URLs: http://www.w3.org/TR/REC-html40/sgml/entities.html, http://www.alanwood.net/unicode/arrows.html, http://www.alanwood.net/demos/ansi.html.


Reporting problems

If you have any problems or questions, please contact Michael Ernst (mernst@cs.washington.edu). I will do my best to help, though I cannot make any guarantee.

Credits

bibtex2web was written by Michael Ernst, with contributions by Sameer Ajmani.

bibtex2web builds on the bp library by Dana Jacobsen.

David Andersen contributed patches and suggestions.


Implementation details

bibtex2web is built on the bp Perl library. The bibtex2web distribution is simply the bp distribution, with corrections and enhancements. bp-README is the original README file for the bp Perl library, and other files and directories have been similarly prefixed with bp- to avoid confusion to users of bibtex2web. bp documentation appeared at at http://www.ecst.csuchico.edu/~jacobsd/bib/bp/index.html (but wasn't packaged with bp itself). bp has not been supported since December 1996, but it works well enough for me, particularly with my enhancements. (Another library is btool, but bp is better.) Other systems exist, but did not have the features I needed.

Under both Netscape and Internet Explorer, <br /> needs to be at the end of a line rather than at the beginning of the next line, because otherwise there can be two line breaks (i.e., a blank line) rather than just a single line break.

To do

Don't require an extra "/dl" close tag after BODY in webpage templates.

Change the behavior of "downloadsnonlocal" as follows:

Permit multiple categories per entry, because some entries span categories. I'm not sure how to do this without overhauling bwconv.pl.

Add additional cross-reference types (beyond supersededby), such as permitting a later technical report that is linked from the page. Examples: http://homes.cs.washington.edu/~mernst/pubs/instantiating-generics-oopsla2004-abstract.html, http://pag.csail.mit.edu/pubs/deadlock-library-ecoop2005-abstract.html. (I think this may already be supported...)

Create a shared utility package for duplicated subroutines like read_link_names. Also, some code for the linkauthors option was copied from make-author-pages.pl; this should be consolidated.

Permit printing superseded articles rather than suppressing them (but do add links to the subsequent version); this gives a list of all publications, including duplicates.

The Perl expressions in -filter arguments can get long; permit simplifying them. For examples, add a -omitiffieldexists and/or an -includeiffieldexists?

Add a noabstractpage field, to replace the old nobasefilename field.

When the -q flag is supplied, this warning message

Parsing of undecoded UTF-8 will give garbage when decoding entities at checklink.pl line 1075.

appears to be referring to the last page whose URL is printed (the last page for which there was a problem). The warning message is coming from HTML::Parser, but I can't find the exact place in the code (maybe it's in C, not Perl?), and the parse method doesn't return an error status. The workaround is to run without the -q command-line option, determine where the problem is, and fix it. Alternately, follow these suggestsions from HTML::Parser: The solution is to use the Encode::encode_utf8() on the data before feeding it to the $p->parse(). For $p->parse_file() pass a file that has been opened in ":utf8" mode. The parser can process raw undecoded UTF-8 sanely if the C<utf8_mode> is enabled or if the "attr", "@attr" or "dtext" argspecs is avoided.