\MakeShortVerb{+} \title{\HT: a working standard} \author[Arthur P. Smith]{Arthur P. Smith\\ Dept. of Chemistry, BG-10,\\ University of Washington, \\Seattle, WA 98195\\\texttt{asmith@mammoth.chem.washington.edu}} \begin{Article} \section{Introduction} \emph{Note: this paper was prepared for the American Phsyical Society electronic publishing conference, Los Alamos N.M. October 14--15 1994.} The past year has seen a revolution in the processes of Internet-based information navigation and retrieval with the advent of easy-to-use graphical browsers (in particular Mosaic) based on the World-Wide-Web (WWW). The revolution is a result of two components --- first the browsers allow a near-uniform (point-and-click or other method) access to documents in almost any format and from almost any Internet-based source, accessed as regular files or via ftp, gopher, http or one of many other possible methods; along with this the Universal Resource Locator (URL) mechanism provides a surprisingly easy and uniform way to specify the location of any document on the net. Second, for certain classes of documents (html files, or gopher text files) embedded URL's or other addresses are understood to refer to other, external, documents which can be followed according to the interests of the person viewing the document, producing an interconnected web of documents. The goal of the \HT{} collaboration is to extend this second privileged class of documents to include documents based on \TeX{}, the word-processing language of choice for mathematical and scientific writing, thus fully incorporating \TeX{} documents into the burgeoning \textbf{web} of information on the internet. \section{Why \HT?} There already exists one approach for incorporating \TeX{} documents more fully into the \textbf{web} --- conversion to HTML, as in the program \texttt{latex2html} by Nikos Drakos. This can work very well, and is already used in some of the electronic publications in mathematics, but there are also several serious problems with this, aside from the technical issues associated with the complexity of the conversion process. HTML by design allows very little author control of the visual form of a document. This is touted as an advantage because it preserves only the \emph{essential} elements of a document and not the artificialities of a page --- in fact HTML documents do not have pages at all, although some of the sense of a \emph{page} is implied by separation of a single document into many files. Aside from loss of author control, there is a practical problem of a lack of mathematical tools in the current implementations of HTML --- tables and equations are either difficult to implement or impossible. \texttt{latex2html} gets around this by conversion of such things to bitmapped images, but this is an inefficient and expensive process --- and goes in just the opposite direction of HTML's theme of extracting the \emph{essence} of a document, making the document essentially unreadable without a good network connection and a computer with a high quality display. These problems with HTML are compounded if scientific authors attempt to write documents directly in HTML rather than using \TeX{} first --- the lack of authoring tools, the absence of macro capabilities, and the ill-defined nature of the language make this an unpleasant task; just dealing with ordinary text is easy, but getting Greek letters, mathematical symbols, equations and tables into your document is not. The one nice feature of HTML is the ease with which figures can be incorporated into a document. But at least PostScript figures can be incorporated into a \TeX{} document with equal ease using modern \emph{dvi} interpreters, and the \HT{} standard presented here allows arbitrary images and other external documents to be referred to and brought to the screen with a single mouse click. The point of all this is that hypertext capabilities, and the use of URL's to locate new documents --- the main feature of HTML that makes it such a useful network information navigation tool --- can be much more easily incorporated into \TeX{} than the mathematical capabilities of \TeX{} and the years of experience embedded in various \TeX{} macro packages can be incorporated into HTML. Whether \TeX{} in general provides a better model for the viewing of on-line information remains to be seen. \section{How does it work?} The underlying element of our implementation of \HT{} is the use of a \TeX{} macro that bypasses the \TeX{} interpretation process and sends a message directly to the \emph{dvi} interpreter that processes \TeX{} output. This is the +\special+ command, previously used to define procedures for drawing or including figures in \TeX{} documents. When the characters +\special{+{\ttfamily\itshape string}+}+ appear in the \TeX{} document, the \emph{string} is passed directly without interpretation to the output \emph{dvi} file (preceded by a marker to identify this as a \emph{special} message to the \emph{dvi} interpreter). The \emph{dvi} previewers or processers then interpret this string according to its first few characters. The original \HT{} specification (due to Paul Ginsparg, Tanmoy Bhattacharya, and me) uses the initial characters \emph{html:} to denote \HT{} elements in an HTML-like style. David Oliver (\texttt{oliver@gang.umass.edu}) has introduced a slightly different specification that uses the initial characters \emph{hyp} to denote his own style of \HT. I will discuss only the original specification in this paper, since as far as they are currently implemented both specifications are essentially equivalent. Note that \emph{dvi} interpreters that do not understand the \emph{html:} or \emph{hyp} special commands will ignore them, or at worst print out warning messages. Therefore \emph{dvi} files processed to include \HT{} commands are fully compatible with old \emph{dvi} interpreters. After the initial \emph{html:} string, the specification is identical to a restricted form of HTML. The five arguments we have added to the +\special+ command are: \begin{description} \item[href:] +html:+ \item[name:] +html:+ \item[end:] +html:+ \item[image:] +html:+ \item[base\_name:] +html:+ \end{description} The \emph{href}, \emph{name} and \emph{end} commands are used to do the basic hypertext operations of establishing links between sections of documents. The \emph{image} command is intended (as with current html viewers) to eventually place an image of arbitrary graphical format on the page in the current location. Currently for \XHDVI, \emph{image} brings up an external viewer with the image, if such a viewer is available. The \emph{base\_name} command should be used to communicate to the \emph{dvi} viewer the full (URL) location of the current document so that files specified by relative URL's may be retrieved correctly. The href and name commands must be paired with an end command later in the \TeX{} file --- the \TeX{} commands between the two ends of a pair form an \emph{anchor} in the document. In the case of an +\href+ command, the \emph{anchor} is to be highlighted in the \emph{dvi} viewer, and when clicked on will cause the scene to shift to the destination specified by \emph{href\_string}. The \emph{anchor} associated with a name command represents a possible location to which other hypertext links may refer, either as local references (of the form \texttt{href="\#name\_string"} with the \emph{name\_string} identical to the one in the name command) or as part of a URL (of the form \emph{URL\#name\_string}). Here \emph{href\_string} is a valid URL or local identifier, while name\_string could be any string at all: the only caveat is that `+"+' characters should be escaped with a backslash (+\+), and if it looks like a URL name it may cause problems. There may also be problems if \LaTeX\ tries to interpret the \emph{href\_string} or \emph{name\_string} --- in that case preceding the command with +\protect+ should usually work. Any defined \emph{name\_string} can be referred to in any href referring to the document, in the form \texttt{href="URL\#name\_string"}. Note that anchors may be nested. The only restriction in current implementations is that anchors are truncated at page boundaries. Because this html-based naming scheme is somewhat unwieldy, although very general, Tanmoy Bhattacharya (\texttt{tanmoy@qcd.lanl.gov}) has written several collections of \TeX{} macros to simplify things. The basic package is \emph{hyperbasics.tex}\footnote{{\ttfamily http://nqcd.lanl.gov/people/tanmoy/hypertex/hyperbasics.tex}} which defines the following simple low level hypertex macros: \begin{itemize} \item+\href{url}{text}+: text becomes an href anchor referring to \emph{url}. \item+\hname{myname}{text}+: text becomes a name anchor with name \emph{myname}. \end{itemize} plus others that are used to automatically convert \LaTeX\ or other style markup into corresponding names and references. \section{How do I use it?} \subsection{As a reader} There are currently two \emph{dvi} interpreters that understand the \HT{} +\special+s: \XHDVI{} for X windows, and HyperTeXView.app for NextStep. We are proceeding with work on a \emph{dvi}-pdf converter that understands \HT{}, and we are encouraging work on \emph{dvi} previewers or \TeX{} authoring tools for Macintosh and PC that incorporate \HT{} elements. For a \TeX{} document that has already been processed to a \emph{dvi} file with \HT{} elements, viewing the internal hypertext is almost trivial --- you just fire up the \emph{dvi} previewer and navigate by button clicks as with Mosaic or other WWW browsers. To have \XHDVI, for example, brought up automatically from Mosaic when a \emph{dvi} document is referenced, you need to have a +.mailcap+ file in your home directory, and create or modify the line: \begin{verbatim} application/x-dvi; xhdvi %s \end{verbatim} Your machine must already have the \TeX{} essentials on board of course --- in particular the pk font files, and the location of those font files needs to be communicated to the previewer. If xdvi is already working for you, \XHDVI{} should work too. Details for getting \XHDVI{} working on your machine are provided below. For jumping to external documents from within the hypertexted \emph{dvi} file, a couple of additional elements are needed, also desribed below for the case of \XHDVI. \subsection{As an author} Here is where the power of \TeX's macro capabilities appears. A working internal hypertext document can be made from a \LaTeX\ document with a one-line addition to the file, using Tanmoy Bhattacharya's hypertex macros. These macros convert the standard \LaTeX\ markup into hypertext links between the different sections of the document, so that references to equations, tables, footnotes, and section headings are in place, and bibliographic references and figures refer back at least to the bibliography entry or figure caption. These in turn may be set to refer to corresponding external documents but this process is not automatic --- currently the author will have to add these references by hand, although automatic procedures can be envisioned. With an Internet connection, \XHDVI{} can be used to preview the document and check that the references actually work, before the document is submitted to the archives. The macros developed thus far use standard naming conventions for the underlying structures in \LaTeX\ and other standard macro packages, so that appending \#equation.2.3, \#page.7, \#figure.4, \#table.2, etc. to the URL for any \TeX{} file processed with these packages will go to the right place, allowing easy hypertext reference to the internal structure of other documents. In order to get started, however, you need to place these macro files in one of the standard areas that your \TeX{} looks for input files (you can modify your TEXINPUTS environment variable to get it to look in your own directories). The needed macro files are itemized in the \HT{} introductory document at \URL|http://xxx.lanl.gov/hypertex/index.html\#more| and can be obtained in one lump by anonymous ftp.\footnote{\relax\texttt{ftp://snorri.chem.washington.edu/hypertex/hypermacros.Z}} \subsection{As an e-print manager} Since we currently only have \emph{dvi} previewers, an e-print server would have to serve the documents in pre-processed \emph{.dvi} form. This means converting documents to \HT{} if the author has not already done this, and possibly applying automated insertion of URL's corresponding to references in the bibliographic section. The manager could do this by hand but it might be rather time-consuming. For ease of use, the best way to serve the documents is probably as a combined package of \emph{dvi} and PostScript files that go together. This requires the e-print manager to create a new content-type associated with this package, and to supply an unpackaging program for the reader to place in their +.mailcap+ file, which automatically calls up \XHDVI{} or another \HT{} browser on the resultant main \emph{dvi} file. The reason for doing this is that .ps files included by standard macros will not generally be understood as remote documents, at least at the current level of previewer capabilities. Another option in this unpackaging method is to supply the \TeX{} file itself, pipe it through a simple converter to \HT{} and through \TeX{} itself, and then call one of the \HT{} viewers. These approaches are already in use at some locations (e.g., CERN). When the pdf converter is available, the entire document should come as a single pdf file, unless the document refers to non-PostScript images or other inclusions in which case the packaging approach (or use of absolute URL's) remains necessary. \section{How do I get it?} Currently the following are available: \begin{enumerate} \item A \HT{} viewer\footnote{\texttt{ftp://snorri.chem.washington.edu/hypertex/xhdvi\_0.6.tar.Z}} based on xdvi-18, modified by Arthur Smith. Precompiled versions for various UNIX architectures are available in the same directory. \item HyperTeXview.app,\footnote{\texttt{dmitri@physics.stanford.edu}} courtesy of Dmitri Linde (also the author of InstantTeX.app) for NextStep, precompiled for Motorola and Intel-based NeXT machines.\footnote{See \texttt{http://xxx.lanl.gov/hypertex/index.html\#dvi} for availability.} \end{enumerate} The macro and style files listed above by Tanmoy Bhatta\-charya, available at \URL|ftp://nqcd.lanl.gov/people/tanmoy/hypertex| \section{Details on \protect\XHDVI} \XHDVI{} retains all the features of the latest version of xdvi (version 18) and adopts in addition many of the hypertext features of Mosaic, the most popular WWW browser. Hypertext links are underlined or altered in colour (the underlining can be turned off) and a left-mouse click on a link causes the view to shift to the destination point for the link, as long as the destination is another \emph{dvi} file. If the link is not to a \emph{dvi} file, an external viewer is employed, following the mime and mailcap definitions or using standard defaults if those are not locally defined. A middle mouse click on a link brings up a new viewer whether or not the destination is a \emph{dvi} file --- this is intended to be useful to refer back to equations or to bring up footnotes, since the new \emph{dvi} window is small. There are also a large number of keyboard accelerators, all described in detail in the man page. In general, see the installation notes provided with \XHDVI. In outline what is needed is: \begin{enumerate} \item The compiled \XHDVI{} program --- precompiled binaries are available for Sun, NeXT, SGI, HP, IBM RS6000, or you can get the source and compile it yourself. Let me know of any compilation troubles --- it's written in C. \item The \TeX{} fonts, at least in pk format. If xdvi, \emph{dvi}ps or some other \emph{dvi} interpreter are working on your machine then they must be around somewhere. \item Set up the connections between the Web browser and \XHDVI. If you use mosaic for example, \begin{verbatim} setenv WWWBROWSER /usr/local/bin/mosaic \end{verbatim} will let \XHDVI{} know what to send HTML files to. To let mosaic know to bring up \XHDVI{} for any \emph{dvi} files, you need to amend in your +.mailcap+ file as described above. \item The application defaults file for \XHDVI{} should be installed in the standard application defaults directory on your machine, or you can take lines from it and modify them for your own taste and put them in your +~/.Xdefaults+ file. For example I use the following resource specifications to get a particular size and position of the window with white on black lettering and with the hyperlinks in cyan, and to remove the buttons: \begin{itemize} \item[\null] xhdvi*geometry: 800x600-0-0 \item[\null] xhdvi*foreground: white \item[\null] xhdvi*background: black \item[\null] xhdvi*highlight: cyan \item[\null] xhdvi*expert: true \end{itemize} \item You need to have the \textsf{ghostscript} program on your machine and in your default execution path in order to view postscript from \XHDVI. Similarly, other viewers defined in the +.mailcap+ file should be available on the machine. \item You need to install the man page xhdvi.man in \texttt{/usr/local/man/man1} and add \texttt{/usr/local/man} to your MANPATH environment variable in order for \emph{help} to work from \XHDVI. \end{enumerate} \section{Some examples} This document is available in raw \HT{} format and in converted \emph{dvi} format via anonymous ftp at the address \URL|ftp://snorri.chem.washington.edu/hypertex|. The \HT{} version of this paper uses the two-column APS journal style of revtex. The table of contents at the beginning is generated automatically with the \LaTeX\ +\tableofcontents+ command. See also the examples provided by Paul Ginsparg in the \HT{} introductory document at \URL|http://xxx.lanl.gov/hypertex/index.html|. Some of these are files randomly selected from the HEP archive, including \LaTeX, Rev\TeX, and other formats. \section{What still needs to be done?} Unfortunately, at this point reference to networked files (via URL's) suffers from a couple of problems. \XHDVI{} does not yet include any of the network transport code that ordinary WWW browsers use, and the intention was to avoid having to add this layer of complexity by communications back and forth with a WWW browser. However, such communication is as yet not standardized, and suffers from its own problems. So currently, when \XHDVI{} comes across a URL reference, it forwards it directly to the WWW browser (defined by environment or Xresource variables) so that a reference to an external \emph{dvi} file would bring up a new instance of the WWW browser which would in turn bring up a new \XHDVI{} viewer. This is a rather inelegant solution, but it is perhaps sufficient at the moment. A better solution will come along, and it may simply be inclusion of network transport code in the \XHDVI{} viewer itself, to make it a competing WWW browser\ldots The other problem is that if brought up by a WWW browser, \XHDVI{} is not provided with the absolute URL information used in obtaining the \emph{dvi} file it is working on, and so cannot pass this information on to further instances. Therefore, relative URL's in a \HT{} document (unless they can be guaranteed to be to local files that would have been transported along with the \emph{dvi} file) will not work. Both of the above are problems intrinsic to current WWW browsers, and we are working on promulgating solutions to these. \section{How do I stay in contact?} The Hypertex discussion group is a mailing list based at \FTP|snorri.chem.washington.edu| which I maintain. Send me e-mail if you want to join the list, or send queries directly to the mailing list: \Email|hypertex@snorri.chem.washington.edu|. \DeleteShortVerb{+} \end{Article}