\def\dash{---} \let\Dash\dash \def \ifundefined#1{\expandafter\ifx\csname#1\endcsname\relax } %Use to generate a new row in tables with a horizontal line separating %them: \newcommand{\newrow}{\\ \hline} \newcommand{\headrow}{\\ \hline \hline} \newcommand{\mdash}{---} \newcommand{\lisparg}[1]{{\em #1\/}} \newcommand{\lispname}[1]{{\sf #1\/}} \newtheorem{theorem}{Theorem} \newtheorem{algorithm}{algorithm} \newtheorem{lemma}{Lemma} \newtheorem{definition}{Definition} \newtheorem{corollary}{Corollary} \newtheorem{conjecture}{Conjecture} \newcommand{\nonterm}[1]{\mbox{${\scriptstyle <}{\mbox{\em #1\/}}{\scriptstyle >}$}} \newcommand{\bld}[1]{{\bf #1}} \newcommand{\type}[1]{{\tt #1}} \newcommand{\pz}{\phantom{0}} \newcommand{\inference}[2]{\frac{#1}{#2}} \newcommand{\induction}[2]{\frac{#1}{#2}} \newcommand{\kronecker}{\raisebox{1pt}{$ \:\otimes \:$}} \newcommand{\subst}[3]{{#1[#2/#3]}} \newcommand{\id}[1]{\mbox{{\sf #1\/}}} \newcommand\french[1]{{\it #1\/}} \newcommand{\afl}{{AFL}} \newcommand{\term}[1]{\mbox{\sf #1\/}} \newcommand{\divides}[2]{#1/#2} \newcommand{\subgroup}{\triangleright } %slide title. \newcommand{\itidetitle}[1]{\center \framebox{\large\bf #1}} %section reference. \newcommand{\sref}[1]{Section~$\ref{#1}$} \newcommand\cref[1]{Chapter~$\ref{#1}$} \newcommand{\aref}[1]{Appendix~$\ref{#1}$} %integral d \newcommand{\varint}[1]{\,d#1} %quantifiers: cs611 \newcommand{\all}[2]{\forall #1\!\!:\!#2.\:} \newcommand{\exist}[2]{\exists #1\!\!:\!#2.\:} %integrals \newcommand{\dx}{\,dx} \newcommand{\dy}{\,dy} \newcommand{\dz}{\,dz} \newcommand{\dt}{\,dt} \newcommand{\naive}{na{\"\i}ve{}} \providecommand\AmSTeX{$\cal A\kern-.1667em\lower.5ex\hbox{$\cal M$}\kern-.075emS$-\TeX} \providecommand{\amstex}{\AmSTeX{}} \newcount\TestCount \providecommand{\La}{\TestCount=\the\fam \leavevmode L\raise.42ex \hbox{$\fam\TestCount\scriptstyle\kern-.3em A$}} \providecommand\AllTeX{(\La)\TeX} \providecommand{\alltex}{\AllTeX{}} \providecommand{\latex}{\LaTeX{}} \providecommand{\tex}{\TeX{}} \providecommand{\macro}[1]{{\cal M}_#1} \providecommand{\rfb}{{\sc RFB}\footnote{Recordings for the Blind}} % \newcommand{\www}[1]{{\it WWW}: {\small #1}} \newcommand{\email}[1]{{\it E-mail\/}: $\langle\hbox{\tt#1}\rangle$} \newcommand{\phone}[1]{{\it Phone\/}: {\tt #1}} \newcommand{\voicemail}[1]{{\it Voice-mail\/}: {\tt #1}} \newcommand\homepage{\sf http://www.research.digital.com/CRL/personal/raman/raman.html} \newcommand{\faxno}[1]{{\it Fax\/}: {\tt #1}} \newcommand{\textalk}{\rm T\kern -.1667em\lower .5ex\hbox {E}\kern% -.125emXT\kern -.1667em\lower .5ex\hbox {A}\kern -.125em L\kern -.125em K} \newcommand{\Dectalk}{{\sc dectalk}} \newcommand{\Sparc}{{\sc sparc}} \title{An Audio View of \alltex{} Documents} \author[T. V. Raman]{T.\ V.\ Raman\\ Digital Equipment Corporation\\ Cambridge Research Lab\\ One Kendall Square, Building 650\\ Cambridge, MA 02139\\ \emph{Email:} \texttt{raman@crl.dec.com} } \begin{Article} \begin{abstract} \aster{} \dash Audio System For Technical Readings\dash is a computing system that produces audio renderings from the {\em same\/} \alltex{} source used to produce the printed document. \cite{Raman:TB13-3-372-377} described our preliminary work on this project. At the time, correct handling of user-defined \alltex{} macros was described as one of the key issues in building a fully extensible audio rendering system. \aster{} \cite{raman-phd-thesis} has now been fully implemented. This paper reports on the approach used in \aster{} to handle user-defined macros. \aster{} treats macro definitions as introducing new object types into the document logical structure. The \alltex{} macro consists of two parts; a declaration, and a series of \TeX{} commands that the macro expands into. The macro expansion is nothing but a visual rendering rule that specifies how \TeX{} should display instances of the object represented by the macro. \aster{} provides an equivalent mechanism for extending the class of logical structures that are recognized. Once \aster{} has been told about a user-defined macro, audio rendering rules for the new object type introduced by this macro can be defined in AFL (Audio Formatting Language). The approach used not only makes \aster{} fully extensible; it points out a unique advantage of \alltex\dash the ability of the author to encode semantic meaning into the markup by extending the document model in ways appropriate to the specific document instance that is being encoded. \end{abstract} \section{Introduction}\label{s:introduction} \begin{center} \asterlogo \end{center} \aster\dash Audio System For Technical Readings\dash is a computing system that aurally renders electronic documents marked up in the \alltex{} family of markup languages (see~\cite{raman-phd-thesis} for details). \aster{} uses the structural markup present in the electronic source to advantage in producing high-quality, interactive audio renderings. This paper focuses on a specific aspect of the problem; namely that of flexibly rendering the extended document logical structure encapsulated in a \alltex{} document. One primary advantage of \alltex{} is the flexibility it provides the author in defining logical structures that are specific to a particular document instance. In this sense, the class of logical structures that can be encapsulated in a \alltex{} document is extensible. \alltex{} macros allow an author to abstract away the layout details. At the same time, they provide a powerful mechanism for defining new constructs that are not already present in the document style (DTD in SGML parlance) in use. As a consequence, when introducing a new piece of mathematical notation, an author can first define a new \alltex{} macro that produces a desired layout, and then use this newly defined construct throughout the document. The flexibility of the \alltex{} macro facility initially proved a major stumbling block in building a fully extensible audio rendering system. A system that attempts to produce aural renderings by {\em mapping\/} the built-in \alltex{} commands to an equivalent aural representation faces the severe shortcoming of not being able to render documents that contain user-defined macros. At the same time, it is impossible to translate such user-defined \alltex{} macros into a suitable aural representation. This is because \tex{} in its full glory is a Turing-complete programming language, and saying ``we can translate a general \tex{} macro to audio'' is equivalent to saying that ``Given a \tex{} program, we can predict the result''. Being able to achieve the above without actually running \tex{} on the program (document fragment) would amount to being able to solve the Halting Problem! In the rest of this paper, we describe the solution used in \aster{} to circumvent this difficulty. The solution we used in fact turns the presence of user-definable \alltex{} macros into an advantage. Such user-defined constructs allow \aster{} to glean even more information about the document logical structure than would be possible if the document were encoded using only the built-in \alltex{} operators; as a consequence, the audio renderings produced are also significantly better. \section{Document Models in \protect\aster{}}\label{s:represent} \aster{} produces audio renderings by first extracting the document logical structure. In this model, all forms of rendering, \ie visual, aural, etc.\ are regarded as a projection of the structure present in the information being conveyed onto the medium being used to communicate the information. Thus, typesetting a document requires visual formatting\Dash projecting the information structure onto a two-dimensional visual tablet; aural rendering requires presenting the structure using various features of the auditory display. The recognizer used in \aster{} extracts logical structure present in documents encoded in the \alltex{} family of languages. An important feature of this recognizer is that it works on the entire gamut of encodings, ranging from plain ASCII documents, \ie no explicit markup, up to documents containing completely unambiguous encodings of the logical structure. The basic document model used in \aster{} is the attributed tree. Each hierarchical level of the document is modeled as a node in this tree. Each node can have content, children and attributes. Using object-oriented terminology, each different kind of node of the tree is called an {\em object\/} and represents a document element. Thus, ``chapter'', ``section'', ``paragraph'', and ``sentence'' are all objects. If a document contained five sections, its representation in \aster{} would have five instances of object ``section''. This object-oriented terminology is used because \aster{} actually uses CLOS objects in this fashion. The use of an object-oriented language was instrumental in allowing us to develop and implement the ideas in \aster{} incrementally and effectively. This attributed tree structure is augmented to represent mathematical content; we call this augmented representation the {\em quasi-prefix form}, (see figure~\ref{fig:math-object} below). Expressions that are completely unambiguous, \eg $x+y$, are captured in their prefix form. In addition to linearizing the underlying tree structure, mathematical notation uses {\em visual attributes\/} such as superscripts and subscripts, whose interpretation is context-dependent. We extend the prefix form to capture such visual attributes\Dash hence the name {\em quasi\/}-prefix. \begin{minipage}{\linewidth} \makeatletter\def\@captype{figure}\makeatletter \begin{center} \begin{tabular}[h]{|rcl|}\hline left-superscript & accent & superscript \\ &$\displaystyle \nwarrow$ \hfill $\displaystyle \uparrow$ \hfill $\displaystyle \nearrow$ & \\ & {\bf math object } & \\ & $\displaystyle \swarrow$ \hfill $\displaystyle \downarrow$ \hfill $\displaystyle \searrow$ & \\ left-subscript & underbar & subscript \\ \hline \end{tabular} \end{center} \caption{A math object with attributes. Each of the attributes themselves contain math objects.} \label{fig:math-object} \end{minipage} The next section describes how this model is extended to encapsulate the use of user-defined constructs in \alltex. \section{Extended Logical Structure}\label{s:macros} The \alltex{} facility can be used to extend the document logical structure by defining new constructs. Thus, an author preparing a manuscript on inference logic might define \begin{verbatim} \newcommand{\inference}[2]{{#1\over#2}} \end{verbatim} \noindent and write \begin{verbatim} \inference{x}{y} \end{verbatim} \noindent and use this construct throughout the document. Notice that defining the \verb|\inference| as shown above and using it to encode inference statements is distinct from and more powerful than just using the \tex{} built-in operator \verb|\over| throughout the document. A commonly mentioned advantage in this context is that using the newly defined construct \verb|\inference| will permit the author to easily change the notation used to denote {\it inference}. Notice, that this is in fact the same as saying that \begin{quote} If distinct elements in a document instance are marked up using distinct constructs, then it is possible to recognize and process these elements in a multiplicity of ways. \end{quote} In \aster, the \alltex{} facility of defining a second \verb|\inference| macro that produces a different layout for {\it inference\/} can be generalized to the notion of different {\em audio renderings\/} for {\it inference}. As explained above (``Document models''), \aster{} achieves its aural renderings by building a rich internal representation of the document content. In this representation, each document element\footnote{We use the term {\em element\/} loosely to mean a logical unit of the document. } $E$ is represented by an instance of object $O_E$. \aster{} provides a predefined type $O_E$ for each of the built-in constructs in \alltex. Thus, we could represent the use of \verb|\inference| defined above in terms of object $O_{\rm over}$. However, notice that this would mean losing valuable information. When building up the internal representation, the additional semantic information provided by the author's use of the \verb|\inference| construct is very useful. In addition, expanding all \alltex{} macros results in a pure layout representation, which is not appropriate for producing aural renderings (see~\cite{Raman:TB13-3-372-377}). If we were to represent instances of \verb|\inference| in terms of $O_{\rm over}$, \aster{} would be forced to render \verb|\inference| the same as the \verb|\over| construct. Though the author in this particular example may have chosen to use the same visual rendering for inferences that is normally used for fractions, the same may not carry over well to the aural domain. \subsection*{Representing Extended Logical Structure}\label{s:extend} \aster{} solves the problem of representing and rendering the extended logical structure arising from user-definable macros by considering each macro definition as introducing a new object type. Instances of a macro $M$, are represented by instances of object $O_M$. Thus, in the example shown above, the definition of the construct \verb|\inference| introduces a new object type $O_{\rm inference}$. The \alltex{} macro consists of two parts; a declaration, and a series of \TeX{} commands that the macro expands into. The macro expansion is nothing but a visual rendering rule that specifies how \TeX{} should display instances of the object represented by the macro. \aster{} provides an equivalent mechanism for extending the class of logical structures that are recognized. Once \aster{} has been told about a user-defined macro, audio rendering rules for the new object type introduced by this macro can be defined in AFL (Audio Formatting Language). Notice that such audio rendering rules have to be defined by the user, just as the \alltex{} macro is defined by hand. It is not possible in general to translate the \tex{} macro into a set of audio rendering rules. This is because the \tex{} macro is capable of performing any arbitrary computation permitted by the operators present in the \tex{} language \cite{knuth84}\dash a Turing-complete programming language. \section{Rendering Information}\label{s:rendering} \aster{} renders information by applying {\em rendering rules\/} to the internal representation described above (``Document models''). The system of rendering rules used in \aster{} and the language in which they are written (AFL\dash Audio Formatting Language) are described in detail in~\cite{raman-phd-thesis}. In a sense, AFL is to audio formatting as Postscript is to visual formatting, although AFL is a much smaller language. Here, we show a small example of such a rendering rule for a user-defined macro. In the following, we use \term{CLOS} generic function \term{read-aloud}. For the present, let us assume that function \term{read-aloud} executes the necessary actions to render its argument. After extending \aster{} to process the \alltex{} macro \verb|\inference| shown above (``Logical structure''), we can define {\small \begin{verbatim} (defmethod read-aloud((inference inference)) "Sample rendering for object inference." (read-aloud (argument 1 inference)) (read-aloud "implies") (read-aloud (argument 2 inference))) \end{verbatim} } \noindent Given $\inference{A}{B}$, this produces ``A implies B''. If we wished to produce a rendering that inverts the order in which the arguments to macro \verb|\inference| are rendered, we would define: { \small\begin{verbatim} (defmethod read-aloud((inference inference)) "Renders inference with arguments reversed." (read-aloud "We know") (read-aloud (argument 2 inference)) (read-aloud "because") (read-aloud (argument 1 inference))) \end{verbatim} } \noindent which produces ``We know B because A''. Switching between these two rendering rules has the effect of inverting a proof-tree! Notice that writing a new rendering rule for an object $O_E$ has the same effect as redefining the \alltex{} macro that corresponds to $E$. \aster{} makes it easy to write several rendering rules for the same object and also allows rendering rules to be partitioned into rendering {\em styles}. Such {\em styles\/} can be thought of as being analogous to \latex{} styles, but with one important difference. Due to the non-interactive nature of traditional paper documents, a paper is typically typeset in a given style. It is not possible for the reader to change the style in which the document is typeset. Typically, we do not feel the shortcoming of not being able to change the way a mathematical expression is rendered when reading a printed paper because the eye is capable of reading the various parts of an expression in any order that is convenient. However, when listening to an aural presentation, the listener does not have this flexibility. In other words, an active reader peruses a printed paper, a passive display, whereas in the case of audio, these roles are reversed\dash the aural display scrolls {\em actively\/} past a passive listener. \aster{} overcomes these difficulties by being a fully interactive system. It is possible for the listener to interrupt the rendering, change the rendering style in use, and listen to the document. In an interactive session with \aster{}, switching between rendering styles (a collection of rendering rules for different objects) and invoking individual rendering rules can be done with a few keystrokes, making it easy for a listener to obtain many different views of a document. This facility enables {\em active\/} listening. \aster{} derives its power from representing document content as objects and by allowing multiple user-defined rendering rules for individual object types. These rules can cause any number of audio events (ranging from speaking a simple phrase, to playing a digitized sound). The pitch of the voice, the physical head-size of the virtual speaker, the volume, and many other parameters can be changed by rendering rules, making it easy to create sound cues to help display structure. In fact, the design of \aster{} does not restrict the system to producing purely aural renderings; there is nothing to preclude us from defining renderings that produce truly multimodal output; \ie renderings where the traditional visual rendering is augmented with aural feedback. We conjecture that such multimodal renderings may prove very useful for persons with learning impairments. To give an example of a multimodal rendering, the logo for \aster{} is \begin{center} \asterlogo{} \end{center} \noindent and is produced by \alltex{} macro \verb|\asterlogo|. After appropriately extending \aster{} to recognize this macro, we can define an audio rendering rule for object {\em asterlogo\/} that produces a bark when rendering instances of this macro. Thus, the same piece of markup \verb|\asterlogo| produces the picture of Aster\footnote{Aster is my guide-dog. } when rendered visually, and an appropriate sound\footnote{The bark is that of a generic dog, Aster is too well trained to bark, and could not therefore be recorded.} when rendered aurally. This feature was exploited to advantage when producing the audio formatted version of the author's thesis. The dedication page of the thesis contains a large picture of Aster, and the audio formatted version\footnote{An audio formatted version of the thesis produced by \aster{} (about 6 hours) is being distributed by RFB\dash Recordings For The Blind\dash as the first fully computer-generated talking book. } contains a verbal description of the picture, accompanied by the sound of Aster panting in the background. You can listen to this example on the WWW\dash visit the \aster{} home page by following the link to the \aster{} demonstration from my home page\footnote{\URL|http://www.research.digital.com/CRL/personal/raman/raman.html|} and clicking on the picture of Aster. Several ideas come together to make all this possible. First, logical structure is of paramount importance\dash not its display on any one particular medium. The more a document makes structure explicit, the better the document can be displayed on (projected onto) several different media. Next, the use of \alltex{} macros to encode structure makes it possible to have a system like \aster, in which the internal structure can be extended to fit a document. This allows the encoding of the structure in a flexible, uniform, and consistent representation such as an attributed tree, with the addition of the quasi-prefix form for dealing with mathematics. Finally, providing different rendering rules and styles and a flexible way to switch among them makes it possible to obtain multiple views of a document in an interactive fashion. \section{Conclusion}\label{s:conclusion} The approach used in \aster{} to exploit the additional semantic information present in the electronic encoding in the form of user-defined constructs points to an important feature of markup systems like \alltex{} that is currently missing to a certain extent in systems like SGML. When \aster{} was at its inception, I firmly believed that one should use a semantic-oriented DTD to encode a document in order to be able to produce high-quality audio renderings. I still believe this; however the work on \aster{} does point out one shortcoming with the fixed document DTD model. Given that mathematical and technical notation is being invented all the time, a fixed DTD forces the author to encode new constructs using {\em only\/} primitives that are provided by the DTD. As a consequence, authors end up using a presentation-oriented encoding even though the DTD in use is one that is semantically oriented. To make this concrete, consider the case of the {\it inference\/} construct described above (``Logical structure''). If the document were being encoded using a fixed non-extensible DTD that only provides a {\it fraction\/} element, the author would be forced to encode {\it inference\/} using this element. Since in general it is not possible to define an all-encompassing DTD that covers every possible kind of math notation (those currently known and those yet to be discovered) extensibility of the DTD as provided by \alltex{} is of vital importance. Another good example of this facility in \alltex{} being put to good use is the Hyper\tex{} system \mdash an extension to \tex{} that allows the user to view his legacy \alltex{} documents as online hypertext. Conceptually, we can think of \verb|\ref| and \verb|\label| as being object types; traditionally, these cause specific marks to appear on paper when rendered visually by \tex; to a system like Hyper\tex{} these turn into {\em active\/} links that a user can follow interactively. The ability to produce multiple renderings of the same object provided by \aster{} was introduced in the context of aural presentations. However, such multiple presentations become equally relevant when interactively perusing online documents visually. For instance, when reading a document that presents a complex proof, a user may wish to have the same proof displayed as an outline in one window, and as a proof-tree in another (see~\cite{lamport:proofs93}). In the case of paper documents, the user has to use her imagination to achieve such multiple views \mdash though she is aided in this by the visual notation. In the interactive scenario presented by electronic documents, the previewer can provide some additional functionality to aid in this process. \begin{thebibliography}{} \bibitem[Knuth 1984]{knuth84} Knuth, D.~E. \newblock {\em The \TeX{}book}, volume~A of {\em Computers and Typesetting}. \newblock Addison-Wesley, Reading, Massachusetts, 1984. \bibitem[Lamport 1993]{lamport:proofs93} Lamport, L. \newblock ``How to write a proof''. \newblock Technical Report~94, DEC Systems Research Center, Palo Alto, {CA}, 1993. \newblock To appear in {\em American Mathematical Monthly}. \bibitem[Raman 1992]{Raman:TB13-3-372-377} Raman, T.~V. \newblock ``An audio view of \TeX\ documents''. \newblock {\em TUGBoat} {\bf 13}(3), 372--377, 1992. \bibitem[Raman 1994]{raman-phd-thesis} Raman, T.~V. \newblock {\em Audio System for Technical Readings}. \newblock Ph.D. thesis, Cornell University, 1994. \end{thebibliography} \end{Article}