\def\dash{---}
\let\Dash\dash
\def \ifundefined#1{\expandafter\ifx\csname#1\endcsname\relax }
%Use to generate a new row in tables with a horizontal line separating
%them:
\newcommand{\newrow}{\\ \hline}
\newcommand{\headrow}{\\ \hline \hline} 
\newcommand{\mdash}{---}
\newcommand{\lisparg}[1]{{\em #1\/}}
\newcommand{\lispname}[1]{{\sf #1\/}}
\newtheorem{theorem}{Theorem}
\newtheorem{algorithm}{algorithm}
\newtheorem{lemma}{Lemma}
\newtheorem{definition}{Definition}
\newtheorem{corollary}{Corollary}
\newtheorem{conjecture}{Conjecture}
\newcommand{\nonterm}[1]{\mbox{${\scriptstyle <}{\mbox{\em #1\/}}{\scriptstyle >}$}}
\newcommand{\bld}[1]{{\bf #1}}
\newcommand{\type}[1]{{\tt #1}}
\newcommand{\pz}{\phantom{0}}
\newcommand{\inference}[2]{\frac{#1}{#2}}
\newcommand{\induction}[2]{\frac{#1}{#2}} 
\newcommand{\kronecker}{\raisebox{1pt}{$ \:\otimes \:$}}
\newcommand{\subst}[3]{{#1[#2/#3]}}
\newcommand{\id}[1]{\mbox{{\sf #1\/}}}
\newcommand\french[1]{{\it #1\/}}
\newcommand{\afl}{{AFL}}
\newcommand{\term}[1]{\mbox{\sf #1\/}}
\newcommand{\divides}[2]{#1/#2}
\newcommand{\subgroup}{\triangleright }
%slide title.
\newcommand{\itidetitle}[1]{\center \framebox{\large\bf #1}}

%section reference. 
\newcommand{\sref}[1]{Section~$\ref{#1}$}
\newcommand\cref[1]{Chapter~$\ref{#1}$}
\newcommand{\aref}[1]{Appendix~$\ref{#1}$}
%integral d
\newcommand{\varint}[1]{\,d#1} 
%quantifiers:  cs611
\newcommand{\all}[2]{\forall #1\!\!:\!#2.\:}
\newcommand{\exist}[2]{\exists #1\!\!:\!#2.\:}
%integrals
\newcommand{\dx}{\,dx}
\newcommand{\dy}{\,dy}
\newcommand{\dz}{\,dz}
\newcommand{\dt}{\,dt} 

\newcommand{\naive}{na{\"\i}ve{}}
\providecommand\AmSTeX{$\cal A\kern-.1667em\lower.5ex\hbox{$\cal
    M$}\kern-.075emS$-\TeX}
\providecommand{\amstex}{\AmSTeX{}}
\newcount\TestCount
\providecommand{\La}{\TestCount=\the\fam \leavevmode L\raise.42ex
        \hbox{$\fam\TestCount\scriptstyle\kern-.3em A$}}
\providecommand\AllTeX{(\La)\TeX}
\providecommand{\alltex}{\AllTeX{}}

\providecommand{\latex}{\LaTeX{}}

\providecommand{\tex}{\TeX{}}

\providecommand{\macro}[1]{{\cal M}_#1}

\providecommand{\rfb}{{\sc RFB}\footnote{Recordings for the
Blind}}

%
\newcommand{\www}[1]{{\it WWW}: {\small #1}}
\newcommand{\email}[1]{{\it E-mail\/}: $\langle\hbox{\tt#1}\rangle$}
\newcommand{\phone}[1]{{\it Phone\/}: {\tt #1}}

\newcommand{\voicemail}[1]{{\it Voice-mail\/}: {\tt #1}}
\newcommand\homepage{\sf http://www.research.digital.com/CRL/personal/raman/raman.html}
\newcommand{\faxno}[1]{{\it Fax\/}: {\tt #1}}

\newcommand{\textalk}{\rm T\kern -.1667em\lower .5ex\hbox {E}\kern%
  -.125emXT\kern -.1667em\lower .5ex\hbox {A}\kern -.125em L\kern -.125em K}


\newcommand{\Dectalk}{{\sc dectalk}}
\newcommand{\Sparc}{{\sc sparc}}


\title{An Audio View of \alltex{} Documents}
\author[T. V. Raman]{T.\  V.\ Raman\\
Digital Equipment Corporation\\
  Cambridge Research Lab\\
  One Kendall Square, Building 650\\
  Cambridge, MA 02139\\
\emph{Email:} \texttt{raman@crl.dec.com}
}
\begin{Article}
\begin{abstract}
  \aster{} \dash Audio System For Technical Readings\dash is a computing
  system that produces audio renderings from the {\em same\/} \alltex{} source
  used to produce the printed document.  \cite{Raman:TB13-3-372-377}
  described our preliminary
  work on this project.  At the time, correct handling of user-defined
  \alltex{} macros was described as one of the key issues in building a fully
  extensible audio rendering system.
  \aster{} \cite{raman-phd-thesis} has now been fully implemented. 
  This paper reports on the approach used
  in \aster{} to handle user-defined macros.

\aster{} treats macro definitions as introducing new object types
into the document logical structure.  The \alltex{} macro consists of two
parts; a declaration, and a series of \TeX{} commands that the macro expands
into.  The macro expansion is nothing but a visual rendering rule that
specifies how \TeX{} should display instances of the object represented by the
macro. 

\aster{} provides an equivalent mechanism for  extending  the class of
logical structures that are recognized.  Once \aster{} has been told about a
user-defined macro,  audio rendering rules for the new object type  introduced
by this  macro  can be defined  in AFL (Audio Formatting Language).

The approach used not only makes  \aster{}  fully extensible;
it points out a unique advantage of  \alltex\dash the ability of the
author to encode semantic meaning into the markup by extending the document
model in ways appropriate to the specific document instance that is being
encoded. 
  \end{abstract} 
\section{Introduction}\label{s:introduction} 


\begin{center}
  \asterlogo
\end{center}

\aster\dash Audio System For Technical Readings\dash is a computing
system that aurally renders electronic documents marked up in the \alltex{} family  of
markup languages (see~\cite{raman-phd-thesis} for details).  \aster{} uses the
structural markup present in the electronic source to advantage in producing
high-quality,  interactive audio renderings.  This paper focuses on a specific aspect of the
problem; namely that of flexibly rendering the extended document logical
structure encapsulated in  a \alltex{} document.


One primary advantage of \alltex{} is the flexibility it provides the author
in defining logical structures that are specific to a particular document
instance. In this sense, the class of logical structures that can be
encapsulated in a \alltex{} document is extensible.  \alltex{} macros allow
an author to abstract away the layout details. At the same time, they provide
a powerful mechanism for defining new constructs that are not already present
in the document style (DTD in SGML parlance) in use.  As a consequence, 
 when introducing a new piece of mathematical notation, an author  can first define
a new \alltex{} macro that produces a desired layout, and then use this newly
defined construct throughout the document.

The flexibility of the \alltex{} macro facility initially proved a major
stumbling block in building a fully extensible audio rendering system.  A
system that attempts to produce aural renderings by {\em mapping\/} the
built-in \alltex{} commands to an equivalent aural representation faces the
severe shortcoming of not being able to render documents that contain
user-defined macros. At the same time, it is impossible to translate such
user-defined \alltex{} macros into a suitable aural representation. This is
because \tex{} in its full glory is a Turing-complete programming language, and
saying ``we can translate a general \tex{} macro to audio'' is equivalent to
saying that ``Given a \tex{} program, we can predict the result''.  Being able
to achieve the above without actually running \tex{} on the program (document
fragment) would amount to being able to solve the Halting Problem!

In the rest of this paper, we describe the solution used in \aster{} to
circumvent this difficulty. The solution we used in fact turns the presence of
user-definable  \alltex{} macros into an advantage.
Such user-defined constructs allow \aster{} to glean even more information
about the document logical structure than would be possible if the document
were encoded using only the built-in \alltex{} operators; as a consequence,
the audio renderings produced are also significantly better. 

\section{Document Models in \protect\aster{}}\label{s:represent}


\aster{} produces audio renderings by first extracting the document
logical structure.  In this model, all forms of rendering, \ie visual,
aural, etc.\ are regarded as a projection of the structure present in
the information being conveyed onto the medium being used to
communicate the information. Thus, typesetting a document requires
visual formatting\Dash projecting the information structure onto a
two-dimensional visual tablet; aural rendering requires presenting the
structure using various features of the auditory display.

The recognizer used in \aster{} extracts logical structure present in
documents encoded in the \alltex{} family of languages. An important
feature of this recognizer is that it works on the entire gamut of
encodings, ranging from plain ASCII documents, \ie no explicit markup,
up to documents containing completely unambiguous encodings of the
logical structure.


The basic document model used in \aster{} is the attributed tree.
Each hierarchical level of the document is modeled as a node in this
tree.  Each node can have content, children and attributes.  Using
object-oriented terminology, each different kind of node of the tree
is called an {\em object\/} and represents a document element. Thus,
``chapter'', ``section'', ``paragraph'', and ``sentence'' are all
objects. If a document contained five sections, its representation in
\aster{} would have five instances of object ``section''.  This
object-oriented terminology is used because \aster{} actually uses
CLOS objects in this fashion.  The use of an object-oriented language
was instrumental in allowing us to develop and implement the ideas in
\aster{} incrementally and effectively.

This attributed tree
structure is augmented to represent mathematical content; we call this
augmented representation the {\em quasi-prefix form},
(see figure~\ref{fig:math-object} below).
Expressions that are completely unambiguous, \eg $x+y$, are captured in their
prefix form.  In addition to linearizing the underlying tree structure,
mathematical notation uses {\em visual attributes\/} such as superscripts and
subscripts, whose interpretation is context-dependent.  We extend the prefix
form to capture such visual attributes\Dash hence the name {\em
  quasi\/}-prefix. 
\begin{minipage}{\linewidth}
\makeatletter\def\@captype{figure}\makeatletter
  \begin{center} 
\begin{tabular}[h]{|rcl|}\hline
left-superscript & accent & superscript \\
   &$\displaystyle \nwarrow$ \hfill
   $\displaystyle \uparrow$
   \hfill  $\displaystyle \nearrow$   &   \\
&  {\bf math object }   & \\
 &  $\displaystyle \swarrow$ \hfill
 $\displaystyle \downarrow$
 \hfill  $\displaystyle \searrow$  &   \\
left-subscript & underbar & subscript \\ \hline
\end{tabular}
\end{center} 
\caption{A math object with attributes. Each of the attributes
  themselves contain math objects.}
  \label{fig:math-object}
\end{minipage}

The next section describes how this model is extended to encapsulate the use
of user-defined constructs in \alltex.
\section{Extended Logical Structure}\label{s:macros}

The \alltex{} facility can be used to extend the document logical structure by
defining new constructs.  Thus, an author preparing a manuscript on inference
logic might define
\begin{verbatim}
\newcommand{\inference}[2]{{#1\over#2}}
\end{verbatim}
\noindent and write
\begin{verbatim}
\inference{x}{y}
\end{verbatim} 
\noindent and use this construct throughout the document.

Notice that defining the \verb|\inference| as shown above and using it to
encode inference statements is distinct from and more powerful than just using
the \tex{} built-in operator \verb|\over| throughout the document. 
A commonly mentioned advantage  in this context is that using the newly
defined construct \verb|\inference| will permit the author to easily change
the notation used to denote  {\it inference}.
Notice, that this is in fact the same as saying that
\begin{quote}
  If distinct elements in a document instance are marked up using distinct
  constructs, then it  is  possible to recognize and process these elements
  in a multiplicity of ways. 
\end{quote}
In \aster, the \alltex{} facility of defining a second \verb|\inference| macro
that produces a different layout for {\it inference\/} can be generalized to
the notion of different {\em audio renderings\/} for {\it inference}.


  As explained above (``Document models''), \aster{} achieves its
  aural renderings by building a rich internal representation of the
  document content.  In this representation, each document
  element\footnote{We use the term {\em element\/} loosely to mean a
    logical unit of the document. } $E$ is represented by an instance
  of object $O_E$.  \aster{} provides a predefined type $O_E$ for each
  of the built-in constructs in \alltex.  Thus, we could represent the
  use of \verb|\inference| defined above in terms of object $O_{\rm
    over}$.  However, notice that this would mean losing valuable
  information.  When building up the internal representation, the
  additional semantic information provided by the author's use of the
  \verb|\inference| construct is very useful.  In addition, expanding
  all \alltex{} macros results in a pure layout representation, which
  is not appropriate for producing aural renderings
  (see~\cite{Raman:TB13-3-372-377}).  If we were to represent
  instances of \verb|\inference| in terms of $O_{\rm over}$, \aster{}
  would be forced to render \verb|\inference| the same as the
  \verb|\over| construct.  Though the author in this particular
  example may have chosen to use the same visual rendering for
  inferences that is normally used for fractions, the same may not
  carry over well to the aural domain.


\subsection*{Representing Extended Logical Structure}\label{s:extend}


  \aster{} solves the problem of representing and rendering the
  extended logical structure arising from user-definable macros by
  considering each macro definition as introducing a new object type.
  Instances of a macro $M$, are represented by instances of object
  $O_M$.  Thus, in the example shown above, the definition of the
  construct \verb|\inference| introduces a new object type $O_{\rm
    inference}$.  The \alltex{} macro consists of two parts; a
  declaration, and a series of \TeX{} commands that the macro expands
  into.  The macro expansion is nothing but a visual rendering rule
  that specifies how \TeX{} should display instances of the object
  represented by the macro.


\aster{} provides an equivalent mechanism for extending the class of logical
structures that are recognized.  Once \aster{} has been told about a
user-defined macro, audio rendering rules for the new object type introduced
by this macro can be defined in AFL (Audio Formatting Language).  Notice that
such audio rendering rules have to be defined by the user, just as the
\alltex{} macro is defined by hand. It is not possible in general to translate
the \tex{} macro into a set of audio rendering rules.  This is because the
\tex{} macro is capable of performing any arbitrary computation permitted by
the operators present in the \tex{} language \cite{knuth84}\dash a
Turing-complete programming language.
\section{Rendering Information}\label{s:rendering}
\aster{} renders information by applying {\em rendering rules\/} to the
internal representation described above (``Document models'').
 The system of rendering rules used in \aster{}
and the language in which they are written (AFL\dash Audio Formatting
Language) are described in detail in~\cite{raman-phd-thesis}.  In a sense, AFL
is to audio formatting as Postscript is to visual formatting, although AFL is
a much smaller language.

Here, we show a
small example of such a rendering rule for a user-defined macro.  In the
following, we use \term{CLOS} generic function \term{read-aloud}.  For the
present, let us assume that function \term{read-aloud} executes the necessary
actions to render its argument.


  After extending \aster{} to process the \alltex{} macro
  \verb|\inference| shown above (``Logical structure''), we can define

{\small
\begin{verbatim}
 (defmethod read-aloud((inference inference))
   "Sample rendering for object inference."
   (read-aloud (argument 1 inference))
   (read-aloud "implies")
   (read-aloud (argument 2 inference)))
\end{verbatim}
}
\noindent Given $\inference{A}{B}$, this  produces ``A implies B''. 

If we wished to produce a rendering  that inverts the order in which the
arguments to macro \verb|\inference| are rendered, we would define:

{
\small\begin{verbatim}
 (defmethod read-aloud((inference inference))
   "Renders inference with arguments reversed."
   (read-aloud "We know")
   (read-aloud (argument 2 inference))
   (read-aloud "because")
   (read-aloud (argument 1 inference)))
\end{verbatim}
}
\noindent which produces ``We know B because A''.

Switching between these two rendering rules has the effect of inverting a
proof-tree!
Notice that writing a new rendering rule for an object $O_E$  has the same
effect as redefining the \alltex{} macro that corresponds to $E$.

\aster{} makes it easy to write several rendering rules for the same object
and  also allows rendering rules to be partitioned into rendering {\em
  styles}.  Such {\em styles\/} can be thought of as being analogous to
\latex{} styles, but with one important difference.  Due to the
non-interactive nature of traditional paper documents, a paper is typically
typeset in a given style. It is not possible for the reader to change the
style in which the document is typeset.
Typically, we do not feel the shortcoming of not being able to change the way
a mathematical expression is rendered when reading a printed paper because the
eye is capable of reading the various parts of an expression in any order that
is convenient. However, when listening to an aural presentation, the listener
does not have this flexibility. In other words, an active reader peruses a
printed paper, a passive display, whereas in the case of audio, these roles
are reversed\dash the aural display scrolls {\em actively\/} past a passive
listener. 

\aster{} overcomes these difficulties by being a fully interactive system.
It is    possible for the listener to interrupt the rendering, change the
rendering style in use, and listen to the document.  In an interactive session
with \aster{}, switching between rendering styles (a collection of rendering
rules for different objects) and invoking individual rendering rules can be
done with a few keystrokes, making it easy for a listener to obtain many
different views of a document.
This facility enables {\em active\/} listening.

\aster{} derives its power from representing document content as objects and
by allowing multiple user-defined rendering rules for individual object types.
These rules can cause any number of audio events (ranging from speaking a
simple phrase, to playing a digitized sound).  The pitch of the voice, the
physical head-size of the virtual speaker, the volume, and many other
parameters can be changed by rendering rules, making it easy to create sound
cues to help display structure.
In fact, the design of \aster{} does not restrict the system to producing
purely aural renderings; there is nothing to preclude us from defining
renderings that produce truly multimodal output; \ie renderings where the
traditional visual rendering is augmented with aural feedback. We conjecture
that such multimodal renderings may prove very useful for persons with
learning impairments. 

To give an example of  a multimodal rendering,  the logo for \aster{} is
\begin{center}
  \asterlogo{}
  \end{center}
  \noindent and is produced by \alltex{} macro \verb|\asterlogo|.
  After appropriately extending \aster{} to recognize this macro, we
  can define an audio rendering rule for object {\em asterlogo\/} that
  produces a bark when rendering instances of this macro.  Thus, the
  same piece of markup \verb|\asterlogo| produces the picture of
  Aster\footnote{Aster is my guide-dog. } when rendered visually, and
  an appropriate sound\footnote{The bark is that of a generic dog,
    Aster is too well trained to bark, and could not therefore be
    recorded.} when rendered aurally.


  This feature was exploited to advantage when producing the audio formatted
  version of the author's thesis.  The dedication page of the thesis contains
  a large picture of Aster, and the audio formatted version\footnote{An audio
    formatted version of the thesis produced by \aster{} (about 6 hours) is
    being distributed by RFB\dash Recordings For The Blind\dash as the first fully
    computer-generated talking book. } contains a verbal description of the
  picture, accompanied by the sound of Aster panting in the background.  You
  can listen to this example on the WWW\dash visit the \aster{} home
  page by following  the link to the \aster{} demonstration
  from my home
  page\footnote{\URL|http://www.research.digital.com/CRL/personal/raman/raman.html|} 
  and clicking  on the picture of Aster.

Several ideas come together to make all this possible.  First, logical
structure is of paramount importance\dash not its display on any one
particular medium. The more a document makes structure explicit,
the better the document can be displayed on (projected onto) several
different media.

Next, the use of \alltex{} macros to  encode structure makes it
possible to have a system like \aster, in which the internal
structure can be extended to fit a document. This allows the encoding
of the structure in a flexible, uniform, and consistent representation
such as an attributed tree, with the addition of the quasi-prefix form
for dealing with mathematics.

Finally, providing  different rendering rules and styles and  a
flexible way to switch among them makes it possible to obtain  multiple
views of a document in an interactive fashion.

\section{Conclusion}\label{s:conclusion}


  The approach used in \aster{} to exploit the additional semantic
  information present in the electronic encoding in the form of
  user-defined constructs points to an important feature of markup
  systems like \alltex{} that is currently missing to a certain extent
  in systems like SGML.  When \aster{} was at its inception, I firmly
  believed that one should use a semantic-oriented DTD to encode a
  document in order to be able to produce high-quality audio
  renderings. I still believe this; however the work on \aster{} does
  point out one shortcoming with the fixed document DTD model.  Given
  that mathematical and technical notation is being invented all the
  time, a fixed DTD forces the author to encode new constructs using
  {\em only\/} primitives that are provided by the DTD.  As a
  consequence, authors end up using a presentation-oriented encoding
  even though the DTD in use is one that is semantically oriented.


  To make this concrete, consider the case of the {\it inference\/}
  construct described above (``Logical structure'').  If the document
  were being encoded using a fixed non-extensible DTD that only
  provides a {\it fraction\/} element, the author would be forced to
  encode {\it inference\/} using this element.

Since in general it is not possible to define an all-encompassing DTD that
covers every possible kind of math notation (those currently known and those
yet to be discovered) extensibility of the DTD as provided by \alltex{}  is of
vital importance.

Another good example of this facility in \alltex{} being put to good
use is the Hyper\tex{} system \mdash an extension to \tex{} that
allows the user to view his legacy \alltex{} documents as online
hypertext.  Conceptually, we can think of \verb|\ref| and
\verb|\label| as being object types; traditionally, these cause
specific marks to appear on paper when rendered visually by \tex; to a
system like Hyper\tex{} these turn into {\em active\/} links that a
user can follow interactively.

The ability to produce multiple renderings of the same object provided by
\aster{} was introduced in the context of aural presentations. However, such
multiple presentations become equally relevant when interactively perusing
online documents visually.  For instance, when reading a document that
presents a complex proof, a user may wish to have the same proof displayed as
an outline in one window, and as a proof-tree in another
(see~\cite{lamport:proofs93}).  In the case of paper documents, the user has
to use her  imagination to achieve such multiple views \mdash though she is
aided in this by the visual notation. 
 In the interactive scenario presented by electronic
documents, the previewer can provide some additional functionality to aid in
this process.

\begin{thebibliography}{}
\bibitem[Knuth 1984]{knuth84}
Knuth, D.~E.
\newblock {\em The \TeX{}book}, volume~A of {\em Computers and Typesetting}.
\newblock Addison-Wesley, Reading, Massachusetts, 1984.

\bibitem[Lamport 1993]{lamport:proofs93}
Lamport, L.
\newblock ``How to write a proof''.
\newblock Technical Report~94, DEC Systems Research Center, Palo Alto, {CA},
  1993.
\newblock To appear in {\em American Mathematical Monthly}.

\bibitem[Raman 1992]{Raman:TB13-3-372-377}
Raman, T.~V.
\newblock ``An audio view of \TeX\ documents''.
\newblock {\em TUGBoat} {\bf 13}(3), 372--377, 1992.

\bibitem[Raman 1994]{raman-phd-thesis}
Raman, T.~V.
\newblock {\em Audio System for Technical Readings}.
\newblock Ph.D. thesis, Cornell University, 1994.

\end{thebibliography}

\end{Article}