\title{New perspectives on \TeX\ macros} \author[Jonathan Fine]{Jonathan Fine\\{\tt J.Fine@pmms.cam.ac.uk}} \begin{article} {\sl [Author's note: This article has been prepared from the notes and transparencies I used for the talk I gave at the October \ukt\ meeting. As a result, it is somewhat informal and unpolished, and like much conversation, the topic may abruptly change from time to time!]} Consider the success and failure of \TeX, and which triumphs (and disappointments) were expected, and which a surprise. Each person will have their own list. Here are some entries from mine: {\bf Mathematics\/} and {\bf paragraphs\/} are expected successes. {\bf Display adverts\/} is an expected failure; setting of {\bf non-roman\/} fonts (such as arabic) and the {\bf loyalty of users\/} are surprise successes. However, {\bf SGML} and {\bf technical documentation}, and {\bf program source\/} are unexpected failures, as are handling of {\bf floats\/} (insertions) and (I can expect some correspondence on this item) also {\bf typography}. Having taken stock of the current state of \TeX, let us consider goals: what would we like \TeX\ to be able to do? It is here worth mentioning that, in rough figures, \TeX\ runs 10 times quicker that 10 years ago. This is because more powerful hardware is available. Can we use this additional capacity to make more of the capabilities of \TeX? I have a friend who earns a living using Quark Xpress, and when it was shown to me, I was quite impressed. A certain amount of thought told me that the typesetting was just one part of the capabilities of the program. In any case, when it come to correcting misspelt words, fixing bad page or equation breaks, or bad placement of floats, or even simply adjusting copy to fit space available, it is useful to be able to see and thereby change what is going on. This gives our first goal: \begin{itemize} \item A visual typsetting system with \TeX\ as its engine. \end{itemize} It is not easy to have \TeX\ process structured documents such as program source files, SGML, etc. We believe (probably correctly) that \TeX\ gives better typesetting than Ventura, for example. But suppose we are given a Ventura document and asked to typeset it. (Ventura stores its documents as ASCII files). There is great difficulty in even having \TeX\ read it, {\em as a structured document}. The same comments apply to the RTF (Rich Text Format) of Microsoft. This gives the second goal: \begin{itemize} \item Compatibility with SGML and other file formats. \end{itemize} The concept of a structured document is not built into \verb"tex" the program. Most often, when a typist (one who prepares a document for processing by \TeX, who might also be the author) makes an error in tagging the document, it is \TeX\ the program which discovers the error, and \TeX\ and the typist are left to clean up the mess together as best as they can. Typists who use \TeX\ very often need to learn more than they would like of the internal workings of \TeX\ (and the format being used). This is unusual. You don't need to know the {\it C\/} programming language to use a program written in {\it C}. I summarize all these topics in a single phrase: \begin{itemize} \item Friendly error recovery. \end{itemize} All of the above will require powerful \TeX\ formats, that will, for example, be able to parse a structured document and report on errors. To create these we will require: \begin{itemize} \item Powerful programming and document design tools. \end{itemize} %It should be clear that the goals can be achieved only by using \TeX\ %the program in a rather different way than is customary. However, %there are thousand upon thousand of document and macro files written %in the customary (let me call it {\em backslash}) style. %Imagine that some strange ray from outer space had, oh %horror of horrors, erased all \TeX\ document and macro files, so that %we could make a fresh beginning, without worry of backward %compatibility. I illustrated this by showing a blank transparency.} Having reduced \TeX\ to \verb"tex" the program, I then listed those of its admirable qualities, which I particularly valued. These are: \begin{description} \item[Reliable,] it almost always behaves as advertised. \item[Stable,] its behaviour does not change from version to version. \item[Quality,] its is extremely well-designed and well-written. It produces excellent paragraphs and mathematics (the rest is up to the format designer). \item[Widely available,] it runs on an enormous range of machines and operating systems. \item[Quick,] it will set text, and expand macros, at a prodigious rate. It really does run very quickly. \item[Flexible,] few assumptions were made about how \TeX\ would be used, and so by writing macros it can be used in new and unexpected ways. \end{description} The next few paragraphs are somewhat technical, and those who do not know what category codes are, or why they are important, should skip until further notice. What I am proposing to do with \TeX\ may appear a little unorthodox, and so some justification \begin{quotation} \noindent \ldots\ it is best not to play with the category codes very often because \ldots\ when the arguments to a macro are first scanned \ldots\ their categories are fixed once and for all at that time. \ldots\ The author \ldots\ discourage[s] people from making extensive use of \verb"\catcode" changes \ldots \rightline{{\it The \TeX book}, page 48} \end{quotation} from the canonical source is called to support my proposal. So many problems arise from category codes. The difficulties encountered by verbatim processing are legion. But think of friendly error recovery. When the typist produces an undefined control sequence, a \TeX\ error results. The same applies to a misplaced \verb"$" or \verb"&" character. Even the innocuous (and omnipresent) braces \verb"{" and \verb"}" cause errors. For example, forgetting to turn off emphasised text at the end of a paragraph can result in the rest of the document being mis-set (unlimited propagation of an error) together with the \begin{verbatim} (\end occurred inside a group at level 1) \end{verbatim} error at the end of the run. Let us solve all category code problems once and for all by insisting that {\em the document be read throughout with fixed category codes}. Of course, the format will want `control sequences' and so forth, so we can let {\tt\char`\\}, for instance, be an {\em active\/} character, whose meaning will parse the succeeding characters until a non-letter is found, and then turn the parsed string into a control word, and then test the control word for being undefined. This will not be as quick as reading using the usual category codes, but \TeX\ is now so much quicker than when it was first released, that the delay will probably not bother us. (Those who know not what category codes are, should stop skipping. Something new will start soon). Two of the four goals (SGML etc.\ and friendly error recovery) are made possible by fixing document category codes to carefully chosen meanings. It is hard to see how else they could be realised. Now, \verb"tex" the program can be thought of as a typesetting engine. It turns text into paragraphs and pages. Just as a petrol engine could be used to power a car, or an aeroplane, or a lawnmower, so a \verb"tex" could be used for batch typesetting or as the engine for a system similar to Quark Xpress. By {\bf visual typesetting} I mean interacting with a {\em graphic\/} representation of the document being created or processed. The display (or formatting) of the document should be adapted to the device being used to present the document. For example, on a computer screen, colour could be used to indicate emphasis and so forth, rather than shape and weight of font, which are more appropiate to printed representation. And of course the size and resolution of the computer screen (and the visual acuity of the user) are most relevant to making the best of what there is. WYSIWYG is a special case of {\em visual typesetting}. The basic idea is that the document is a long galley, set paragraph by paragraph. When a change is made to the underlying text, the affected paragraphs should be reset, and the display refreshed. Please note that \TeX\ will reset a paragraph in a fraction of the time required to update the display, particularly when run as a continuous process. Note also that this approach will put {\em sensible\/} restrictions on what the typist can do. For example, it is an error (inadmissable) to make a global change of font within a paragraph, for that would require resetting all subsequent paragraphs. These ideas are further developed in my article {\em Editing \verb".dvi" files, or visual \TeX}, which will appear in a future issue of \TUB. Since the meeting my proposal for a Special Interest Technical Working Group on Visual \TeX\ was approved by the Technical Council of TUG. If you would like information, or wish to join, please contact me, for I am the chair of this group. Also since the meeting I found that the same basic underlying concept \begin{quotation} \noindent It is sometimes useful to maintain information about a source and a result document simultaneously in the same document, as in ``what you see is what you get'' (WYSIWYG) word processors. There, the user appears to interact with the formatted output, but the editorial changes are actually made in the source, which is then reformatted for display. \end{quotation} put forward to motivate the CONCUR feature provided by SGML. This quotation comes from Annex C.3.1 of ISO 8879 (the SGML standard) and is also reproduced (as is the whole of ISO 8879) on page 88 of Charles F.~Goldfarb, The SGML Handbook, OUP (1990). Now, the creation of format files to support these new demands presents new problems for the macro writer, not least of which is the very many active characters that will be required. (At least there will be no more than 256 active characters). Notice that macro files contain tokens, while under the new scheme text files contain characters. When reading macros we wish to have access to special tokens. The solution is to enhance the programming language and compile to a special file format, which can then be loaded. The basic idea is to provide the power that languages such as {\it C\/} take for granted. For example, one would like named parameters, like so, \begin{verbatim} \def \centerline #\text { \line { \hss \text \hss } } \end{verbatim} and escape characters, so that \begin{verbatim} \def ! { ... } \end{verbatim} will define a meaning for the active (\verb"!") space character. These ideas are further developed in articles which appear in TUGboat 13(4) {\bf 1992}, and Baskerville 3(1) {\bf 1993}. \end{article} \endinput To close the presentation I returned to page 129 from Hodge's {\em Harmonic Integrals}. This page contains several long expressions, which needed to be broken to fit the measure. This is, if one likes, a horizontal difficulty. The page contains two long (sequences of) equations, each almost a half page high, and some conecting words. (Well, if you must know, they are {\it Then}, {\it since}, and {\it Now}. The rest of the page was math symbols.) It just so happens that the page break so occurs that neither of these half-page blocks of mathematics needed to be broken. I am reminded of Abraham Lincoln's observation, that he was fortunate that ``his legs were just long enough to reach the ground''. Hodge's book was published by Cambridge University Press in 1941. There were several valuable questions and contributions from the floor. Robin Fairbairns asked me if Hodge was William Hodge, and on being told yes, told the audience of his memory of a course that this great man once gave. For me this was a valuable and surprising connection, for my interest in harmonic integrals is not purely typographic. Chris Rowley wondered if the reduction of performance to one quarter of the speed was a proper measured result. I said that it was a ``scientifically obtained ball-park figure'', and that in a visual environment the penalty probably didn't even matter. To reset a single paragraph slowly has to better than resetting a whole document quickly. Adrian Clark thought that \TeX\ had been successful in formatting program source code, particularly the source for the \TeX\ system itself. I pointed out that \TeX\ could not (yet) handle regular program source code files, and that \TeX\ users were surprisingly loyal. Sebastian Rahtz thought that developing all these new macros and software might be a lot of work. I agreed, but suggested that the macro side of the project was probably no larger than the \LaTeX3 project. Graphic programs to interact with the \TeX\ typesetting engine are additional work, which might in the first instance be done to create a commercial product. Allan Reese noted that the WYSIWYG systems allowed typists to produce space at erroneous locations, such as an indent on the first paragraph of a section. He hoped that this `feature' would not be reproduced. I replied that I envisioned a system where the source document was parsed, but control of space and so forth continued to reside with the format file. By way of example, I explained that Scientific Word did not allow user access to the space around operators such as $+$ in mathematical formula (such as $2+2=4$) because this space belonged to the `$+$', not to the user. (I owe this example to Roger Hunter of TCI, who are the developers of Scientific Word). David Longfoot described the difficulties he had, as a professional printer, with the correct placement of floats. He suggested that the ideal system would place these items automatically, but allow the operator to change the placement of selected items in an interactive and graphical manner. This would allow the best of both worlds. I drew attention to the article {\em Inside Type \& Set}, Graham Asher, TUGboat 13(1), {\bf 1992} which deals particularly with the related problem of global optimism of page breaks. Finally, I give the last word to Sebastian. Allan Reese (who admirably chaired the afternoon) was describing how he used \TeX\ to format a 4-page newsletter for his wife. Sebastian interrupted to ask ``Why don't you just talk to her?''