% Things to take account of present location % (can all be deleted in favour of the correct bundle of packages, and % whatever, for journal) % \renewcommand\textfraction{0.25} % \InputIfFileExists{fontuse.cfg}{\typeout{Using config file fontuse.cfg}}{} % % Things this paper definitely needs %\usepackage{xspace} \newcommand\ftpcmd{\textsf{ftp}\xspace} \DeclareRobustCommand\cs[1]{\texttt{\char`\\#1}} % \hyphenation{gobble-de-gook} % \title{\CTAN{} past and present\Dash what next?} \author{Robin Fairbairns} %\netaddress{rf@cl.cam.ac.uk} \begin{Article} \section{Introduction} The Comprehensive \TeX{} Archive Network (\CTAN{}) is a set of `loosely consistent' archives of \TeX{} material which (together with a large set of sites that mirror them) provide a single (logical) point for people to acquire material for their \TeX-related work. I've been a user of \CTAN{} since its inception at Aston, but since the move of the Aston node to Cambridge I've been involved in its management; this article derives from a talk I gave at the \UKTUG{} meeting on `\TeX{} and the Internet'. % As a result, when we were % discussing who should talk on the topic at the \UKTUG{} `\TeX{} and % the Internet' meeting, for some reason everyone assumed I would % take the responsibility\dots As far as \acro{UK} people are concerned, \TeX{} archiving begins with the archive that Peter Abbott established at Aston back in the eighties. Aston was available to people who could talk coloured-book protocols over Janet.\footnote{Those protocols were available to people such as myself over the public X.25 network; I worked at the time in a small firm, and had European money to connect to the network.} A while after the archive was established, the UK\TeX{} mailing list was started, initially as a means of propagating information about the content of the archive. Volume `87' number 1 of UK\TeX{} talked about accessions to the archive in the week ending 4 September 1987. Peter's initiative was, at heart, a practical one: connection to the wider Internet was a major undertaking for most people in the \acro{UK}, so it made sense to collect things available `on the Internet' into one place for access over Janet. Peter wrote a paper about his experiences \cite{Abbott:TB10-1-59}, which he gave at \acro{TUG}'89 at Stanford University. I was struck, in re-reading Peter's paper, by the manpower he had to run the archive; the list of eight people reads like one of the great and the good of \TeX{} in the \acro{UK}.\footnote{Those on that list remark wryly about numbers of cooks and quality of broth\dots} Peter found at Stanford that the Americans were wildly jealous of his achievement; at the time, whoever felt like it would establish a directory or two on his or her site's \ftpcmd server to promulgate their own favourite stuff. This was all very much in the spirit of the ``anarchistic Internet'', but it made for a terrible to-do to find any given \TeX-related item you might be looking for. % (Somewhere around that time, Don Hosek established the \texttt{ymir} % archive; I don't know if it was causally related to Peter's paper, % but it did start to offer some degree of comprehensiveness, at least % in macros offered.) Whatever was its real driving force, it was after the Stanford meeting that \CTAN{} appeared. George Greenwade chaired a Technical Working Group on archive structures, and undertook to establish the first truly comprehensive \acro{USA}n \TeX{} archive at Sam Houston State University (\acro{SHSU}) in Huntsville, Texas; he reported on the matter in \cite{Greenwade:TB14-3-342}. A German node was established at Stuttgart, and with the advent of the \acro{JIPS} service, \acro{UK} academics started to have access to the Internet, and the Aston archive became a \CTAN{} node too. The tale of Aston ended in 1994, when Peter Abbott was about to take early retirement from Aston University. Since we could no longer guarantee that the archive would have a protecting friend, the \UKTUG{} committee\footnote{As private individuals: we don't consider the archive a service offered by \UKTUG, though we claim to `support' it.} sought alternative sites for it. Some obvious candidates for the archive site were examined and rejected. The \acro{HENSA} archive would want to split the holdings between `Unix' and `\acro{PC}' sets, which do not match the structure of \CTAN's stuff (\TeX{} is after all a portable program, \emph{par excellence}). The national typesetting archive at Oxford couldn't offer \acro{CPU} power and disc space in the required time frame. In either case, management of the archive by remote `experts' could have been problematic. Finally, Prof.~Roger Needham kindly agreed to a proposal prepared by Sebastian Rahtz and presented to him by Martyn Johnson, that Cambridge should host the archive. We completed the change-over \emph{just} before Peter retired! \section{Archive mechanisms} I said at the start that \CTAN{} is a weakly-consistent set of archives. What I mean is, that they are allowed to differ in the short term, but (in principle) they will sooner or later get themselves back into synchronisation. Weak consistency is a common thing to find: another \TeX-related weakly consistent set is the \textsf{Refdbms} archives, which maintain a set of bibliographic references which can be updated anywhere with the knowledge that sooner or later every instance will become a pukkah copy\Dash see, for example \cite{Golding:1994}. \CTAN{} maintains its consistency by the `archive user' sending email messages to the archive users at other archives when something new is installed, or something is replaced, moved or deleted.\footnote{While this isn't as eccentric a proceeding as it would have been in the days when I first used email, one must admit it lacks a certain~\dots\ fundamental sense. I am reminded of the Dilbert cartoon\Dash which I failed to save from the Web\Dash where the idiot manager suggests maintaining a database by email, and Dilbert and Co.~fall about at his stupidity.} The receiving archive user checks that the message is from a `known' source, parses it, and translates it into a one-shot mirror operation. The \textsf{Perl} scripts to do all of this are maintained by Rainer Sch\"opf. That procedure deals with the `installations': when someone has submitted something to the \verb|/incoming| directory of one of the archives, or when one of us has something to install (for example, all the updates to \LaTeXe{} itself are installed by Rainer in this way). There are still archives of the `old' sort: places where people make their own bits and pieces available for general access, but which don't offer a comprehensive collection. An example is Knuth's (and Tom Rokicki's, among others) stuff, kept on \url{labrea.stanford.edu}; there are many others, and there are many things that are kept in an archive related to another matter\Dash for example, the N\TeX{} implementation for Linux, lives on the general Linux archive at \url{sunsite.unc.edu} For all these things, each of the \CTAN{} nodes runs a considerable mirroring operation every night. Managing that mirror operation, and dealing with the installations and miscellaneous queries, takes a considerable portion of a person's time. For the three \CTAN{} nodes at \url|ftp.tex.ac.uk|, \url|ftp.shsu.edu| and \url|ftp.dante.de|, there are three people to deal with all the work (the nominal manager of \url|ftp.shsu.edu|, George Greenwade, has taken no part in the work for more than a year). If the eight people running Aston at the start were spoiling a broth, I think it's reasonable to claim that three people for three archives are perhaps a little thinly spread (despite excellent systems support from Martyn Johnson at Cambridge, and a number of back-up workers helping Rainer at \url|ftp.dante.de|). \section{The archives from the users' viewpoint} I persist in viewing the archives primarily as sources of stuff to get by \ftpcmd, but the evidence suggests that they are as often as not accessed via \textsf{Web} browsers. As anonymous \ftpcmd archives, \CTAN{} nodes offer exactly the same structure. A rather deep tree of directories is accessed via the root \url|/tex-archive/|, and the structure of that tree is the same at all nodes. If a location on \CTAN{} is quoted, such as \url|macros/eplain|, the common root is assumed. \CTAN{} stores very large numbers of individual files (for example, macro packages), which is good for those who want to browse, but bad for retrieving the files. Therefore, \CTAN{} provides means to compress entire directories (or even directory trees) on-the-fly. Suppose, for example, I want to acquire the latest version of \LaTeX{} for my \acro{DOS} machine; I will (within \ftpcmd) change directory to \url|tex-archive/macros/latex|, and then \verb|get base.zip|. There's no file called \url|base.zip|, but there is a directory \url|base|: \CTAN{} will make a \textsf{zip} archive of the directory `on the fly', and return that. The archive can also make \url|.tar.gz| and \url|.tar.Z| archives, but nothing specifically targeted at Mac users. It's worth noting that the most frequent cause of connections to \CTAN{} failing is a user's optimism about how much he can pull in one go; a common one is to try and pull all of the \verb|latex| tree, which gives you all the \verb|packages| and (huge) \verb|contrib| sub-trees, as well as \verb|doc| and \verb|unpacked| directories (which in a \verb|.zip| archive simply repeat some or all of the \verb|base| directory). As the material on \CTAN{} gradually approaches 2~Gbytes, how is the poor user to find her way around? Ideally, one would like some kind of advanced indexing software, but even within the rather restricted compass of an \ftpcmd connection \CTAN{} can help. Each night, the archive examines its own navel: it produces sorted lists of all its files, and stores them in the archive itself; very often, one can gain an adequate clue to the location of a file by use of the command \texttt{quote site index} at the \ftpcmd prompt.\footnote{Note that some \ftpcmd clients do not require the \texttt{quote}; check with the documentation.} While a tool of more expressive power would be nice, the clever use of the arcane `regular expressions' employed can find things with some precision (see question~23 in \BV~5.6). Other means of accessing the files are: \begin{itemize} \item via \acro{NFS} (fine for people with lots of bandwidth to the archive machine, such as sites on the SuperJanet backbone, but not otherwise terribly practical) \item via \textsf{gopher}; this is an entirely automatic mechanism\Dash we've devoted little effort to it in Cambridge, and the evidence is that it's little used \item via mail: both \texttt{dante} and \texttt{shsu} offer an \textsf{ftpmail} interface; mail a message containing just `\texttt{help}' to \texttt{ftpmail@dante.de} (or at \texttt{shsu}) for details. \end{itemize} And there's the Web\dots \begin{figure*}[tp] \leavevmode \centerline{\includegraphics[scale=0.75]{texpkgs-bit.eps}} \caption{The start of Graham Williams' Web page} \label{fig:williams-web} \end{figure*} \section{Access by \acro{WWW} interfaces} The really big expansion of the media hype about the Internet has coincided with the explosion of the Web into people's consciousness. The Web is indeed a fine mechanism, particularly for those with lots of bandwidth, a big screen, and strong wrists and fingers, but it's nothing without information providers. Sadly, providing information in a useful and attractive form proves actually to be rather tricky; the manpower required to do it is not easily available to those of us running \CTAN, so we tend to rely on people outside our numbers. Norm Walsh \cite{Walsh:TB15-3-339} developed a mechanism that permits access to the `normal' searching facilities of \acro{CTAN}. People (particularly in the USA) speak well of it, but I've never had success with it (there's \emph{never} enough bandwidth across the Atlantic). The \acro{URL} is \url{http://jasper.ora.com/ctan.html}, and it has a series of menus for accessing the archive, as well as some searching mechanisms. The information content is derived automatically from directory listings of \acro{SHSU}, so that (beyond the considerable effort of setting the thing up) there's little day-to-day work involved. % *NOTE* The one instance of `---' instead of \Dash in this % paragraph comes from visual editing ... if the para gets % reformatted, it may need changing back to \Dash Another interesting mechanism is the \AllTeX{} Navigator, which appears in three languages (French, English\Dash \url{http://www.loria.fr/tex/english/index.html} --- and German; the German portion is said not to be up to date). The range of information stored is enormous (it must represent a massive investment of effort); it's well worth a visit just to browse. One of its services is a \acro{CGI}-script that searches the archives; this is a better interface than is \texttt{quote site index}, in that one can scroll or search through the information returned, but it's still not terribly informative. More promising is Graham Williams' \url{ftp://cbr.dit.csiro.au/staff/gjw/www/texpkgs.html}; he is undertaking to index all the macro packages on \CTAN{} to a fairly impressive level of detail (somewhat like David Jones' pre-\CTAN{} index which is now, sadly, no longer maintained). I've included a short extract in figure~\ref{fig:williams-web}, which gives some indication; that figure (though it contains some exciting flags at its top) is missing the flag-links to the \CTAN{} sites themselves; by default, the package location points to \path{ftp.cdrom.com} \section{And for those with no network?} The Internet's a grand place \dots~for those of us who are connected to it. Even when I'm at home, dialling in to a \CTAN{} site, I find the archive tricky (and I know the layout better than most). How are the `unconnected' to survive? The obvious solution is to dump the archive's contents to \acro{CD-ROM}. This is plainly do-able (though the whole archive now takes more even than two \acro{CD}s). The problem is navigation: with the crazy restrictions of \acro{ISO}~9660 \acro{CD} directory format, file names lose what one might call `expressive power'. The problem is addressed by the `targeted' \acro{CD}\Dash one which you can use as an installation source from the word go. The first of these was the \acro{NTG}'s excellent 4All\TeX{} \acro{CD} for \acro{PC}s, which combines a view of how the system ought to be run with a well-constructed, extensive set of backup archive material. A new one, whose structure is based on the \acro{TDS} `standard' and which uses the te\TeX{} implementation of \TeX{}, is being prepared for release in May 1996. \section{The future} \CTAN{}, or something like it, will remain necessary for some time to come, but none of those involved would claim that it's entirely satisfactory as it stands. In particular, a `world-wide' network of archives should consist of more than two European sites and one barely-functioning \acro{USA}n one. Where are we on the Pacific rim?; why have we only in the last year gained our first mirror in Africa? A straightforward first step would be to make Norm Walsh's mechanism available more widely; for wide usage, it should be available at all CTAN sites and at a representative selection of mirrors. However, indexing and searching mechanisms have advanced massively over the last few years, and it would be nice to unleash the power of (say) Digital's AltaVista engine on the contents of \CTAN{}. To do this, we need some criterion for indexing; even a well-documented \TeX{} file can be expected largely to consist of gobbledygook, and finding the relevant stuff (in \textsf{doc}-package material, in running comments, or after \cs{endinput}) needs some careful heuristic work. There is a comparable set of archives, called the \acro{CPAN}, which holds \textsf{Perl}-related material (they do say that imitation is the sincerest form of flattery!). The \acro{CPAN} people have constructed a script that will decide for you where you \emph{should} have connected your Web-browser, and sends your connection off there. If you have access to a Web-browser, try connecting to \url|http://www.perl.com/CPAN/|\Dash if I do it, I end up at a directory at \url|ftp://unix.hensa.ac.uk/mirrors/perl-CPAN//|, which is physically in a different continent. This is a neat trick, and we're working on something similar for \CTAN{}. The script depends on your domain address, so that if you are one of those tricksy sites that is (say) in \verb|.com| even though you're physically located in the \acro{UK} and connected through \acro{UK} service providers, you'll end up at the wrong place. I mentioned above that there are better protocols to run what \CTAN{} does. I (continue to) feel it would be nice to use some protocol other than \textsf{email} to maintain our consistency, but providing a complete suite of protocols (and getting it accepted!) is beyond the resources I have (notably the time resources\dots). To close, at the meeting at Warwick, I asked for suggestions. I've presented all the ideas that came from the meeting; do readers have any? % \subsubsection*{Where does this fit?} % The first-level subdirectories of the tree are: % \begin{tabular}{ll} % biblio & bibliography manipulation \\ % digests & \TeX{}-related publications \\ % dviware & \acro{DVI} processors, etc. \\ % fonts & fonts and related stuff \\ % graphics & graphics in \TeX{} \\ % help & \acro{FAQ}s and the like \\ % indexing & support for index creation \\ % info & `other' information \\ % language & non-American language support, \\ % & and hyphenation patterns \\ % macros & of all sorts \\ % support & programs for making life with \TeX{} easier \\ % systems & implementations of \TeX{}, \\ % & including a subdirectory \texttt{knuth}\dots \\ % tds & the output of the \TeX{} directory structure \\ % & Technical Working Group \\ % tools & of use to archive maintainers \\ % usergrps & a place for \TeX{} groups to `advertise' \\ % web & literate programming tools % \end{tabular} \begin{thebibliography}{1} \bibitem{Abbott:TB10-1-59} Peter Abbott. \newblock {{{UK\TeX} and the Aston archive}}. \newblock {\em TUGboat}, 10(1):59--60, April 1989. \bibitem{Golding:1994} Richard~A. Golding, Darrell D.~E. Long, and John Wilkes. \newblock The {\emph{refdbms}} distributed bibliographic database system. \newblock In {\em Proceedings of the Winter Usenix Conference}, San Francisco, CA, January 1994. \bibitem{Greenwade:TB14-3-342} George~D. Greenwade. \newblock {{The Comprehensive {\TeX} Archive Network ({\CTAN})}}. \newblock {\em TUGboat}, 14(3):342--351, October 1993. \bibitem{Walsh:TB15-3-339} Norm Walsh. \newblock {{A World Wide Web interface to {\CTAN}}}. \newblock {\em TUGboat}, 15(3):339--343, September 1994. \end{thebibliography} \end{Article}