\documentstyle{article}
\def\TeXorTho{T\kern-.1667em\lower.5ex\hbox{E}\kern-.1667emX\kern-.4667em\lower1.1667ex\hbox{or}\kern-.5334em\raise.5ex\hbox{T}\kern-.1667em\lower.5ex\hbox{ho}}
\def\TeXortho{T\kern-.1667em\lower.5ex\hbox{E}\kern-.1667emX\kern-.1667em\hbox{or}\kern-.3334em\raise.5ex\hbox{T}\kern-.2em\hbox{ho}}


\begin{document}
\section*{Correcting \TeX{}ts with \TeXorTho }
\TeXortho\ is a program designed for correcting \TeX~files. But with the
right info file it can be made apt for almost any text file. It was written
with GNU C by J\"urgen Hannappel and Eduard Werner, Bonn 1991. 

{\sc Disclaimer:} You are using this program at your own risk. It's your
problem if it crashes your system (and most probably your fault) or writes
over something important on your hard disk etc.

\TeXortho~is able of checking spelling, correct white space around punctuation
characters and produces output ready to be digested by an EMACS. We've
tried hard to make it as flexible as possible so that active characters
and user-defined macros can be checked as well.


\section{Concepts}
To use \TeXortho{} it is neccesary to understand the basic concepts underlying
 its operation. 
\TeXortho{} parses a \TeX{}-file into words,
 which are sequences of letters seperated by white space or other characters.
These words are then looked up in a word list,
 and an error-message is printed into the log-file if the word cannot be found.
So we see that \TeXortho{} has two central parts:
 the parser, which parses the \TeX-file into words 
 and the word list, which contains all knowledge about the spelling of words.

The \TeXortho{} parser is not only applicable to texts in \TeX{} or \LaTeX{}
 format,
 but also to almost any other ASCII represented text format.
It can be adapted to your special text format by means of a special 
 {\em info-file},
 which contains information about which characters are allowed in words,
 what sequences of characters are to be considered as commands etc.
The info-file is discussed later on in a special chapter,
 but a full understanding will require some knowledge about the \TeX-interna
 such as character catcodes or active characters.
Luckily you will not have to alter the info-file at all,
 or at most you will have to add the definitions of some rather
 seldom occurring commands to it.

Currently the word list is organized as a binary search tree,
 with the frequent words close to the tree root,
 so that they can be found as fast as possible
To insure the optimal ordering of words it is neccesary to keep track of
 every occurence of every word in as many texts as possible,
 which means,
 that every time you have produced a text you should check it by the help
 of \TeXortho{} and add the new correct spelled words to the standard word
 list and update its word frequency information.

The word lists are kept in files with a default extension of {\tt .twl} 
 (which stands for {\tt t}ex {\tt w}ord {\tt l}ist).
They are (or should be) sorted in a way which insures an optimal search tree:
 sorted by frequency of occurence in descending order,
 while groups of equally frequent words are sorted in a way which insures
 a good balance of the search tree
 (it is not even impossible that for the sake of balance the ordering by 
  frequency is violated a bit).

To keep the standard word lists in a proper state the usual usage of
 \TeXortho{} should happen in the following steps:

\begin{enumerate}
\item Process your file (in the accompanying example it will be called 
	{\tt example.tex}) with the standard word list  and infofile.\\
	\verb/TeXortho example.tex -c english.twl -i english.inf/ \\
	This will produce a word list {\tt example.twl} which contains
	all the words in your file which are not in the standard word list.

\item Edit the word list {\tt example.twl} and kill all words that are 
	misspelled or to strange to be put into the standard library.

\item Process your file with both the standard word list and your new list of 
	correct words:\\
	\verb/TeXortho example.tex -c english.twl example.twl -i english.inf -n/ \\
	This will show you most errors in your file, which you should correct
	by now.

\item Process your corrected file with both the standard and the new word list
	in order to produce a list of the words which occur in your text
	but shall not become part of the standard library:\\
	\verb/TeXortho example.tex -c english.twl example.twl -i english.inf -l spurious.twl/ \\
	This produces a word list {\tt spurious.twl} with all the unwanted words.

\item Process your file with the word list {\tt spurious.twl} to get a word
	list with the common words of your file and correct information
	about their frequency:\\
	\verb/TeXortho example.tex -c spurious.twl -i english.inf/ \\
	This will produce a word list {\tt example.twl} consisting of all
	 words in your file.

\item Merge the information of your new word list {\tt example.twl} with the 
	standard word list:\\
	\verb/TeXortho english.twl example.twl -m/ \\
	This will merge the two libraries into a new version of the standard
	library,
	containing both the new words and the updated frequency information.
\end{enumerate}


\section{Usage and Options}

\TeXortho{} is invoced both with parameters an with options;
 the parameters are the names of the files to be processed or produced.
\TeXortho{} takes any number of filenames as parameters, 
 their meaning depends on the option \verb/-c/ or \verb/-m/:
With \verb/-m/ you merge word lists and thus all filenames not preceeded by 
 one of the options \verb/-i/, \verb/-l/ or \verb/-o/ will be interpreted as 
 one of the word lists to be merged,
 while \verb/-c/ tells \TeXortho{} to check the \TeX{}t signified by the first
 filename against the word lists contained in the other files. 

Some files of special importance are denoted after one of the options
 \verb/-i/, \verb/-l/ and \verb/-o/: after any one of these options a filename
 seperated by a blank is exspected.


\subsection{{\protect \tt -m} merge word lists}
This option merges word lists together. You must use either -m or -c.

\subsection{{\protect \tt -c} check text}
The first file name is taken as the input file to be checked, all other
files are word lists.

\subsection{{\protect \tt -i} info file name}
This option must be followed be an info file name. Default is \verb/TeXortho.inf/ (check the use of uppercase characters on unix systems!).

\subsection{{\protect \tt -l} output word list name}
This option must be followed by the name of the new word list,
 seperated by a blank.
Use this option if you don't like {\tt jobname.twl} as output file name,
 which is the default;
 jobname is the name of the first file specified,
 without extension. 

\subsection{{\protect \tt -o}  name}
This option must be followed by the name of a log file,
 which will contain all the error messages produced by \TeXortho.
Normally all output goes to stdout.
On unix systems  you may use this option to suppress the generation of errormessages by using \verb?/dev/null? as output file.

\subsection{{\protect \tt -C} case sensitive search}
By default \TeXortho{} converts all the words found in a \TeX{}t into lowercase
 prior to looking them up in the word list,
 a feature which can be supressed with this option.
It does {\em not} influence the check for capital letters at the beginnings of
 new sentences.

If you want to check case-sensitive,
 make sure your library is apt for this.

\subsection{{\protect \tt -p} supress punctuation check}
This option supresses checking of correct white space around punctuation
characters.

\subsection{{\protect \tt -P} supress firstcheck}
This option supresses checking of capital letters at the beginnings of sentences.

\subsection{{\protect \tt -n} don't write output word list}
This supresses the generation of the  output word list.

\subsection{{\protect \tt -h} generate word list histogram}
A file called \verb/libhist/ is generated,
 containing some information about the generated word list,
 i.e. the depth of the deepest branch of the binary search tree,
 and for each level of that tree the number of nodes in it,
 the number of nodes weighted by the frequency of occurence of the word,
 the number of branches ending there,
 the ratio between capacity and filling of that level
 and the mean frequency of the words of that level. 

\section{info file}
\subsection{overview}
\subsection{Escape code keywords}
The following keywords affect the handling of \TeX-catcodes. 
Normally you won't need them because the \TeX-defaults are already set.
 If you use them,
\TeXortho{} will think them handled equally by your TeX-version. 
If you don't know anything about them, read the \TeX-book before using these
commands. You can assign values as follows:
\begin{verbatim}
<keyword>	<character>
\end{verbatim}
for instance, 
\begin{verbatim}
begingroup	[
\end{verbatim}
will force \TeXortho{} to treat a {\tt [} as begin-group character.
(This option defaults to {\tt \{ \} }.)

Interesting is the keyword {\tt active} which has a different syntax.
Settings should be made as follows:
\begin{verbatim}
active		<char><number of possible parameters>
<first parameter>	<expansion>
. . .	
<last parameter>	<expansion>
\end{verbatim}
The usual setting for the german documentstyle shall serve for an example:
\begin{verbatim}
active	"12
a	\"a
o	\"o
u	\"u
A	\"A
O	\"O
U	\"U
s	\ss@
`	\glqq@
'	\grqq@
-	\-@
<	\flqq@
>	\frqq@
\end{verbatim}

The keyword {\tt escape\_def} has a similar syntax, i.e.
\begin{verbatim}
escape_def	<number of different escaped characters>
<first char>	<expansion>
. . .
<last char>	<expansion> 
\end{verbatim}
Again, the usual setting for the german style is used as example:
\begin{verbatim}
escape_def	3
3	\ss@
-	\-@
/	\quad@
\end{verbatim}


Possible keywords to assign catcodes are:
\begin{itemize}
\item {\tt escape}
\item {\tt begingroup}
\item {\tt endgroup}
\item {\tt mathshift}
\item {\tt alignmenttab}
\item {\tt endofline}
\item {\tt space}
\item {\tt letter}
\item {\tt active}
\item {\tt other}
\item {\tt ignored}
\item {\tt comment}
\end{itemize}

\subsection{Making special commands known to \TeXortho}
There is a lot of commands which usually don't occur within words or even
can't occur within words (like \verb/\quad /). Other commands don't make 
sense written isolated, like \verb/\v/. Some environments would better
be skipped, since they don't contain words to be checked, like {\tt equation}.
You may want to tell \TeXortho~how each command behaves:

\begin{itemize}

\item {\tt skipped\_command} says that the command following will be
  skipped over in the \TeX-file,
 typicaly commands such as \verb/\protect/ or \verb/\tt/.

\item {\tt punctuation} is followed by a string consisting of the punctuation
  characters; default is \verb",.;:!?". You should not put a blank before such
  characters in ordinary text.
  \TeXortho{} will also complain if you don't leave white space behind them.

\item {\tt sentenceend} is followed by a string consisting of the characters
 after which you should start with a capital letter. Default is \verb".!?"

\item {\tt text\_begingroup} is followed by a string of characters before
 which a white space should be found and after which there should be none.
 Default is \verb"(".

\item {\tt text\_endgroup} is followed by a string of characters after
 which a white space should be found and before which there should be none.
 Default is \verb")".

\item {\tt capital} is followed by a control sequence standing for one capital
 letter e.g. \verb/\L@/.
 If this sequence requires a delimiter, it must be terminated by @.
 Default is none.

\item {\tt skipped\_environment} is followed by an environment name.
 The environment will be skipped by \TeXortho. Default: none.

\item {\tt delimiter\_command} is followed by a control sequence.
 The command will than be taken as a word delimiter like \verb/\quad/.
\item {\tt end\_of\_file} is the terminating line of the info file.
\end{itemize}

\section{Known bugs and limitations}
\begin{itemize}
\item In the present version it is not possible to process more
 than one \TeX-file at once.
\item After a full stop a command like {\tt quad} with no space in front will
cause an error message {\tt missing space after presumed end of sentence}.
\item An erroneous info file might crash the program or even the OS.
 So try to set up properly.
\end{itemize}
\end{document}