\title{New perspectives on \TeX\ macros}
\author[Jonathan Fine]{Jonathan Fine\\{\tt J.Fine@pmms.cam.ac.uk}}
\begin{article}

  {\sl [Author's note: This article has been prepared from the notes
    and transparencies I used for the talk I gave at the October \ukt\ 
    meeting.  As a result, it is somewhat informal and unpolished, and
    like much conversation, the topic may abruptly change from time to
    time!]}

Consider the success and failure of \TeX,  and which triumphs
(and disappointments) were expected, and which a surprise.  Each person
will have their own list.  Here are some entries from mine:  {\bf
Mathematics\/} and {\bf paragraphs\/} are expected successes.  {\bf
Display adverts\/} is an expected failure; setting of {\bf
non-roman\/} fonts (such as arabic) and the {\bf loyalty of users\/}
are surprise successes.  However, {\bf SGML} and {\bf technical
documentation}, and {\bf program source\/} are unexpected failures,
as are handling of {\bf floats\/} (insertions) and (I can expect some
correspondence on this item) also {\bf typography}.

Having taken stock of the current state of \TeX, let us consider
goals:  what would we like \TeX\ to be able to do?  It is here worth
mentioning that, in rough figures, \TeX\ runs 10 times quicker that
10 years ago.  This is because more powerful hardware is available.
Can we use this additional capacity to make more of the capabilities
of \TeX?

I have a friend who earns a living using Quark Xpress, and when it
was shown to me, I was quite impressed.  A certain amount of thought
told me that the typesetting was just one part of the capabilities of
the program.  In any case, when it come to correcting misspelt words,
fixing bad page or equation breaks, or bad placement of floats, or
even simply adjusting copy to fit space available, it is useful to be
able to see and thereby change what is going on.
This gives our first goal:
\begin{itemize}
\item A visual typsetting system with \TeX\ as its engine.
\end{itemize}

It is not easy to have \TeX\ process structured documents such as
program source files, SGML, etc.  We believe (probably correctly) that
\TeX\ gives better typesetting than Ventura, for example.  But suppose
we are given a Ventura document and asked to typeset it.  (Ventura
stores its documents as ASCII files).  There is great difficulty in
even having \TeX\ read it, {\em as a structured document}.  The same
comments apply to the RTF (Rich Text Format) of Microsoft.  This gives
the second goal:
\begin{itemize}
\item Compatibility with SGML and other file formats.
\end{itemize}

The concept of a structured document is not built into  \verb"tex"
the program.  Most often, when a typist (one who prepares a document
for processing by \TeX, who might also be the author) makes an error
in tagging the document, it is \TeX\ the program which discovers the
error, and \TeX\ and the typist are left to clean up the mess
together as best as they can.  Typists who use \TeX\ very often need
to learn more than they would like of the internal workings of \TeX\
(and the format being used).  This is unusual.  You don't need
to know the {\it C\/} programming language to use a program written
in {\it C}.  I summarize all these topics in a single phrase:
\begin{itemize}
\item Friendly error recovery.
\end{itemize}

All of the above will require powerful \TeX\ formats, that will, for
example, be able to parse a structured document and report on errors.
To create these we will require:
\begin{itemize}
\item Powerful programming and document design tools.
\end{itemize}

%It should be clear that the goals can be achieved only by using \TeX\
%the program in a rather different way than is customary.  However,
%there are thousand upon thousand of document and macro files written
%in the customary (let me call it {\em backslash}) style.  
%Imagine that some strange ray from outer space had, oh
%horror of horrors, erased all \TeX\ document and macro files, so that
%we could make a fresh beginning, without worry of backward
%compatibility.  I illustrated this by showing a blank transparency.}

Having reduced \TeX\ to \verb"tex" the program, I then listed those of
its admirable qualities, which I particularly valued.  These are:
\begin{description}
\item[Reliable,] it almost always behaves as advertised.
\item[Stable,] its behaviour does not change from version to version.
\item[Quality,] its is extremely well-designed and well-written.  It
  produces excellent paragraphs and mathematics (the rest is up to the
  format designer).
\item[Widely available,] it runs on an enormous range of machines and
  operating systems.
\item[Quick,] it will set text, and expand macros, at a prodigious
  rate.  It really does run very quickly.
\item[Flexible,] few assumptions were made about how \TeX\ would be
  used, and so by writing macros it can be used in new and unexpected
  ways.
\end{description}

The next few paragraphs are somewhat technical, and those who do not
know what category codes are, or why they are important, should skip
until further notice.  What I am proposing to do with \TeX\ may appear
a little unorthodox, and so some justification
\begin{quotation}
\noindent \ldots\ it is best not to play with the category codes very often
because \ldots\ when the arguments to a macro are first scanned
\ldots\ their categories are fixed once and for all at that time.
\ldots\ The author \ldots\ discourage[s] people from making extensive
use of \verb"\catcode" changes \ldots

\rightline{{\it The \TeX book}, page 48}
\end{quotation}
from the canonical source is called to support my proposal.

So many problems arise from category codes.  The difficulties
encountered by verbatim processing are legion.  But think of friendly
error recovery.  When the typist produces an undefined control
sequence, a \TeX\ error results.  The same applies to a misplaced
\verb"$" or \verb"&" character.  Even the innocuous (and omnipresent)
braces \verb"{" and \verb"}" cause errors.  For example, forgetting
to turn off emphasised text at the end of a paragraph can result in
the rest of the document being mis-set (unlimited propagation of an
error) together with the
\begin{verbatim}
(\end occurred inside a group at level 1)
\end{verbatim}
error at the end of the run.

Let us solve all category code problems once and for all by insisting
that {\em the document be read throughout with fixed category codes}.
Of course, the format will want `control sequences' and so forth, so
we can let {\tt\char`\\}, for instance, be an {\em active\/} character,
whose meaning will parse the succeeding characters until a non-letter
is found, and then turn the parsed string into a control word, and
then test the control word for being undefined.  This will not be as
quick as reading using the usual category codes, but \TeX\ is now so
much quicker than when it was first released, that the delay will
probably not bother us.

(Those who know not what category codes are, should stop skipping.
Something new will start soon).  Two of the four goals (SGML etc.\
and friendly error recovery) are made possible by fixing document
category codes to carefully chosen meanings.  It is hard to see how
else they could be realised.

Now, \verb"tex" the program can be thought of as a typesetting engine.
It turns text into paragraphs and pages.  Just as a petrol engine
could be used to power a car, or an aeroplane, or a lawnmower, so a
\verb"tex" could be used for batch typesetting or as the engine for a
system similar to Quark Xpress.

By {\bf visual typesetting} I mean interacting with a {\em graphic\/}
representation of the document being created or processed. The
display (or formatting) of the document should be adapted to the
device being used to present the document.  For example, on a
computer screen, colour could be used to indicate emphasis and so
forth, rather than shape and weight of font, which are more
appropiate to printed representation.  And of course the size and
resolution of the computer screen (and the visual acuity of the user)
are most relevant to making the best of what there is.
WYSIWYG is a special case of {\em visual typesetting}.

The basic idea is that the document is a long galley, set paragraph
by paragraph.  When a change is made to the underlying text, the
affected paragraphs should be reset, and the display refreshed.
Please note that \TeX\ will reset a paragraph in a fraction of the
time required to update the display, particularly when run as a
continuous process.  Note also that this approach will put {\em
sensible\/} restrictions on what the typist can do.  For example, it
is an error (inadmissable) to make a global change of font within a
paragraph, for that would require resetting all subsequent
paragraphs.

These ideas are further developed in my article {\em
Editing \verb".dvi" files, or visual \TeX}, which will appear in a
future issue of \TUB. Since the meeting
my proposal for a Special Interest Technical Working Group on Visual
\TeX\ was approved by the Technical Council of TUG.  If you would
like information, or wish to join, please contact me, for I am the
chair of this group.

Also since the meeting I found that the same basic underlying concept
\begin{quotation}
\noindent
It is sometimes useful to maintain information about a source and a
result document simultaneously in the same document, as in ``what you
see is what you get'' (WYSIWYG) word processors.  There, the user
appears to interact with the formatted output, but the editorial
changes are actually made in the source, which is then reformatted
for display.
\end{quotation}
put forward to motivate the CONCUR feature provided by SGML. This
quotation comes from Annex C.3.1 of ISO 8879 (the SGML standard) and
is also reproduced (as is the whole of ISO 8879) on page 88 of
Charles F.~Goldfarb, The SGML Handbook, OUP (1990).

Now, the creation of format files to support these new
demands presents new problems for the macro writer, not least of
which is the very many active characters that will be required.  (At
least there will be no more than 256 active characters).  Notice that
macro files contain tokens, while under the new scheme text files
contain characters.  When reading macros we wish to have access to
special tokens.  The solution is to enhance the programming language
and compile to a special file format, which can then be loaded.

The basic idea is to provide the power that languages such as {\it
C\/} take for granted.  For example, one would like named parameters,
like so, 
\begin{verbatim}
\def \centerline #\text
{
   \line { \hss \text \hss }
}
\end{verbatim}
and escape characters, so that
\begin{verbatim}
\def !  { ... }
\end{verbatim}
will define a meaning for the active (\verb"!") space character.
These ideas are further developed in articles which appear in TUGboat
13(4) {\bf 1992}, and Baskerville 3(1) {\bf 1993}.

\end{article}

\endinput
To close the presentation I returned to page 129 from Hodge's {\em
Harmonic Integrals}.  This page contains several long expressions,
which needed to be broken to fit the measure.  This is, if one likes,
a horizontal difficulty.  The page contains two long (sequences of)
equations, each almost a half page high, and some conecting
words.  (Well, if you must know, they are {\it Then}, {\it since}, and
{\it Now}.  The rest of the page was math symbols.)  It just so
happens that the page break so occurs that neither of these half-page
blocks of mathematics needed to be broken.  I am reminded of Abraham
Lincoln's observation, that he was fortunate that ``his legs were
just long enough to reach the ground''.  Hodge's book was published
by Cambridge University Press in 1941.

There were several valuable questions and contributions from the
floor. Robin Fairbairns asked me if Hodge was William Hodge, and on
being told yes, told the audience of his memory of a course that this
great man once gave.  For me this was a valuable and surprising
connection, for my interest in harmonic integrals is not purely
typographic.

Chris Rowley wondered if the reduction of performance to one quarter
of the speed was a proper measured result.  I said that it was a
``scientifically obtained ball-park figure'', and that in a visual
environment the penalty probably didn't even matter.  To reset a
single paragraph slowly has to better than resetting a whole document
quickly.

Adrian Clark thought that \TeX\ had been successful in formatting
program source code, particularly the source for the \TeX\ system
itself.  I pointed out that \TeX\ could not (yet) handle regular
program source code files, and that \TeX\ users were surprisingly
loyal.

Sebastian Rahtz thought that developing all these new macros and
software might be a lot of work.  I agreed, but suggested that the
macro side of the project was probably no larger than the \LaTeX3
project.  Graphic programs to interact with the \TeX\ typesetting
engine are additional work, which might in the first instance be done
to create a commercial product.

Allan Reese noted that the WYSIWYG systems allowed typists to produce
space at erroneous locations, such as an indent on the first
paragraph of a section.  He hoped that this `feature' would not be
reproduced.  I replied that I envisioned a system where the source
document was parsed, but control of space and so forth continued to
reside with the format file.  By way of example, I explained that
Scientific Word did not allow user access to the space around
operators such as $+$ in mathematical formula (such as $2+2=4$)
because this space belonged to the `$+$', not to the user.  (I owe
this example to Roger Hunter of TCI, who are the developers of
Scientific Word).

David Longfoot described the difficulties he had, as a professional
printer, with the correct placement of floats.  He suggested that the
ideal system would place these items automatically, but allow the
operator to change the placement of selected items in an interactive
and graphical manner.  This would allow the best of both worlds.  I
drew attention to the article {\em Inside Type \& Set}, Graham Asher,
TUGboat 13(1), {\bf 1992} which deals particularly with the related
problem of global optimism of page breaks.

Finally, I give the last word to Sebastian.  Allan Reese (who
admirably chaired the afternoon) was describing how he used \TeX\ to
format a 4-page newsletter for his wife.  Sebastian interrupted to
ask ``Why don't you just talk to her?''