\documentstyle{article} \def\TeXorTho{T\kern-.1667em\lower.5ex\hbox{E}\kern-.1667emX\kern-.4667em\lower1.1667ex\hbox{or}\kern-.5334em\raise.5ex\hbox{T}\kern-.1667em\lower.5ex\hbox{ho}} \def\TeXortho{T\kern-.1667em\lower.5ex\hbox{E}\kern-.1667emX\kern-.1667em\hbox{or}\kern-.3334em\raise.5ex\hbox{T}\kern-.2em\hbox{ho}} \begin{document} \section*{Correcting \TeX{}ts with \TeXorTho } \TeXortho\ is a program designed for correcting \TeX~files. But with the right info file it can be made apt for almost any text file. It was written with GNU C by J\"urgen Hannappel and Eduard Werner, Bonn 1991. {\sc Disclaimer:} You are using this program at your own risk. It's your problem if it crashes your system (and most probably your fault) or writes over something important on your hard disk etc. \TeXortho~is able of checking spelling, correct white space around punctuation characters and produces output ready to be digested by an EMACS. We've tried hard to make it as flexible as possible so that active characters and user-defined macros can be checked as well. \section{Concepts} To use \TeXortho{} it is neccesary to understand the basic concepts underlying its operation. \TeXortho{} parses a \TeX{}-file into words, which are sequences of letters seperated by white space or other characters. These words are then looked up in a word list, and an error-message is printed into the log-file if the word cannot be found. So we see that \TeXortho{} has two central parts: the parser, which parses the \TeX-file into words and the word list, which contains all knowledge about the spelling of words. The \TeXortho{} parser is not only applicable to texts in \TeX{} or \LaTeX{} format, but also to almost any other ASCII represented text format. It can be adapted to your special text format by means of a special {\em info-file}, which contains information about which characters are allowed in words, what sequences of characters are to be considered as commands etc. The info-file is discussed later on in a special chapter, but a full understanding will require some knowledge about the \TeX-interna such as character catcodes or active characters. Luckily you will not have to alter the info-file at all, or at most you will have to add the definitions of some rather seldom occurring commands to it. Currently the word list is organized as a binary search tree, with the frequent words close to the tree root, so that they can be found as fast as possible To insure the optimal ordering of words it is neccesary to keep track of every occurence of every word in as many texts as possible, which means, that every time you have produced a text you should check it by the help of \TeXortho{} and add the new correct spelled words to the standard word list and update its word frequency information. The word lists are kept in files with a default extension of {\tt .twl} (which stands for {\tt t}ex {\tt w}ord {\tt l}ist). They are (or should be) sorted in a way which insures an optimal search tree: sorted by frequency of occurence in descending order, while groups of equally frequent words are sorted in a way which insures a good balance of the search tree (it is not even impossible that for the sake of balance the ordering by frequency is violated a bit). To keep the standard word lists in a proper state the usual usage of \TeXortho{} should happen in the following steps: \begin{enumerate} \item Process your file (in the accompanying example it will be called {\tt example.tex}) with the standard word list and infofile.\\ \verb/TeXortho example.tex -c english.twl -i english.inf/ \\ This will produce a word list {\tt example.twl} which contains all the words in your file which are not in the standard word list. \item Edit the word list {\tt example.twl} and kill all words that are misspelled or to strange to be put into the standard library. \item Process your file with both the standard word list and your new list of correct words:\\ \verb/TeXortho example.tex -c english.twl example.twl -i english.inf -n/ \\ This will show you most errors in your file, which you should correct by now. \item Process your corrected file with both the standard and the new word list in order to produce a list of the words which occur in your text but shall not become part of the standard library:\\ \verb/TeXortho example.tex -c english.twl example.twl -i english.inf -l spurious.twl/ \\ This produces a word list {\tt spurious.twl} with all the unwanted words. \item Process your file with the word list {\tt spurious.twl} to get a word list with the common words of your file and correct information about their frequency:\\ \verb/TeXortho example.tex -c spurious.twl -i english.inf/ \\ This will produce a word list {\tt example.twl} consisting of all words in your file. \item Merge the information of your new word list {\tt example.twl} with the standard word list:\\ \verb/TeXortho english.twl example.twl -m/ \\ This will merge the two libraries into a new version of the standard library, containing both the new words and the updated frequency information. \end{enumerate} \section{Usage and Options} \TeXortho{} is invoced both with parameters an with options; the parameters are the names of the files to be processed or produced. \TeXortho{} takes any number of filenames as parameters, their meaning depends on the option \verb/-c/ or \verb/-m/: With \verb/-m/ you merge word lists and thus all filenames not preceeded by one of the options \verb/-i/, \verb/-l/ or \verb/-o/ will be interpreted as one of the word lists to be merged, while \verb/-c/ tells \TeXortho{} to check the \TeX{}t signified by the first filename against the word lists contained in the other files. Some files of special importance are denoted after one of the options \verb/-i/, \verb/-l/ and \verb/-o/: after any one of these options a filename seperated by a blank is exspected. \subsection{{\protect \tt -m} merge word lists} This option merges word lists together. You must use either -m or -c. \subsection{{\protect \tt -c} check text} The first file name is taken as the input file to be checked, all other files are word lists. \subsection{{\protect \tt -i} info file name} This option must be followed be an info file name. Default is \verb/TeXortho.inf/ (check the use of uppercase characters on unix systems!). \subsection{{\protect \tt -l} output word list name} This option must be followed by the name of the new word list, seperated by a blank. Use this option if you don't like {\tt jobname.twl} as output file name, which is the default; jobname is the name of the first file specified, without extension. \subsection{{\protect \tt -o} name} This option must be followed by the name of a log file, which will contain all the error messages produced by \TeXortho. Normally all output goes to stdout. On unix systems you may use this option to suppress the generation of errormessages by using \verb?/dev/null? as output file. \subsection{{\protect \tt -C} case sensitive search} By default \TeXortho{} converts all the words found in a \TeX{}t into lowercase prior to looking them up in the word list, a feature which can be supressed with this option. It does {\em not} influence the check for capital letters at the beginnings of new sentences. If you want to check case-sensitive, make sure your library is apt for this. \subsection{{\protect \tt -p} supress punctuation check} This option supresses checking of correct white space around punctuation characters. \subsection{{\protect \tt -P} supress firstcheck} This option supresses checking of capital letters at the beginnings of sentences. \subsection{{\protect \tt -n} don't write output word list} This supresses the generation of the output word list. \subsection{{\protect \tt -h} generate word list histogram} A file called \verb/libhist/ is generated, containing some information about the generated word list, i.e. the depth of the deepest branch of the binary search tree, and for each level of that tree the number of nodes in it, the number of nodes weighted by the frequency of occurence of the word, the number of branches ending there, the ratio between capacity and filling of that level and the mean frequency of the words of that level. \section{info file} \subsection{overview} \subsection{Escape code keywords} The following keywords affect the handling of \TeX-catcodes. Normally you won't need them because the \TeX-defaults are already set. If you use them, \TeXortho{} will think them handled equally by your TeX-version. If you don't know anything about them, read the \TeX-book before using these commands. You can assign values as follows: \begin{verbatim} \end{verbatim} for instance, \begin{verbatim} begingroup [ \end{verbatim} will force \TeXortho{} to treat a {\tt [} as begin-group character. (This option defaults to {\tt \{ \} }.) Interesting is the keyword {\tt active} which has a different syntax. Settings should be made as follows: \begin{verbatim} active . . . \end{verbatim} The usual setting for the german documentstyle shall serve for an example: \begin{verbatim} active "12 a \"a o \"o u \"u A \"A O \"O U \"U s \ss@ ` \glqq@ ' \grqq@ - \-@ < \flqq@ > \frqq@ \end{verbatim} The keyword {\tt escape\_def} has a similar syntax, i.e. \begin{verbatim} escape_def . . . \end{verbatim} Again, the usual setting for the german style is used as example: \begin{verbatim} escape_def 3 3 \ss@ - \-@ / \quad@ \end{verbatim} Possible keywords to assign catcodes are: \begin{itemize} \item {\tt escape} \item {\tt begingroup} \item {\tt endgroup} \item {\tt mathshift} \item {\tt alignmenttab} \item {\tt endofline} \item {\tt space} \item {\tt letter} \item {\tt active} \item {\tt other} \item {\tt ignored} \item {\tt comment} \end{itemize} \subsection{Making special commands known to \TeXortho} There is a lot of commands which usually don't occur within words or even can't occur within words (like \verb/\quad /). Other commands don't make sense written isolated, like \verb/\v/. Some environments would better be skipped, since they don't contain words to be checked, like {\tt equation}. You may want to tell \TeXortho~how each command behaves: \begin{itemize} \item {\tt skipped\_command} says that the command following will be skipped over in the \TeX-file, typicaly commands such as \verb/\protect/ or \verb/\tt/. \item {\tt punctuation} is followed by a string consisting of the punctuation characters; default is \verb",.;:!?". You should not put a blank before such characters in ordinary text. \TeXortho{} will also complain if you don't leave white space behind them. \item {\tt sentenceend} is followed by a string consisting of the characters after which you should start with a capital letter. Default is \verb".!?" \item {\tt text\_begingroup} is followed by a string of characters before which a white space should be found and after which there should be none. Default is \verb"(". \item {\tt text\_endgroup} is followed by a string of characters after which a white space should be found and before which there should be none. Default is \verb")". \item {\tt capital} is followed by a control sequence standing for one capital letter e.g. \verb/\L@/. If this sequence requires a delimiter, it must be terminated by @. Default is none. \item {\tt skipped\_environment} is followed by an environment name. The environment will be skipped by \TeXortho. Default: none. \item {\tt delimiter\_command} is followed by a control sequence. The command will than be taken as a word delimiter like \verb/\quad/. \item {\tt end\_of\_file} is the terminating line of the info file. \end{itemize} \section{Known bugs and limitations} \begin{itemize} \item In the present version it is not possible to process more than one \TeX-file at once. \item After a full stop a command like {\tt quad} with no space in front will cause an error message {\tt missing space after presumed end of sentence}. \item An erroneous info file might crash the program or even the OS. So try to set up properly. \end{itemize} \end{document}