RFC: Central document registry (long)

Christoph Weber (cweber@oci.unizh.ch)
Fri, 3 Mar 95 10:03:33 +0100

Request for comments:

Central Document Registry

One of the great strengths of the VSNS-PPS course is its slightly anarchic
nature. Everyone is free to follow as much as he/she wants, for the most part
whenever he/she wants, and what's more, everyone is free to contribute
whatever they choose. Within the general limits given by subject nature,
scientific accuracy and good net citicenship this ensures the broadest
possible coverage at a very high quality standard, as we have witnessed
This is of course in the best tradition of the internet, but it is also
recognized as the internet's (and hence the course's) biggest drawback, for
example because it badly hinders coherency.

Within the course, we have a few very active people, most notably Peter MR
and Alan M, who are functioning as glue to hold this whole venture together,
but their (time) resources are limited. We also have the hyperglossary that
acts as a central repository of general wisdom, and it is rightly regarded as
a very essential tool.

A large number of people are currently writing hyperdocuments about some
course subject or one of the assigments or something else connected to the
course. Most of these contributions are of high scientific quality and many
are nothing less than exciting. But as the course web is growing, it becomes
increasingly hard to stay abreast of what is going on. Moreover, the wealth
of information that is accumulated is not used at its fullest, because it is
not readily accessible when it is needed, due to lack of knowledge that it

What is needed, therefore, is a facility a) to help course participants stay
abreast of what's out there and b) to help building an interlinked web of
documents. Furthermore, we probably want some or most of our output to remain
connected once the course is over (think of it as 'live' course
documentation). It is my feeling, though, that we don't just need another
'What's new' facility. What we need should go beyond: It should allow to
automatically retrieve most or all documents that pertain to a given subject.
It thus necessitates a more formal and also more restricted approach which I
will call 'central document registry'.

In a way, the glossary has already pointed the way we can successfully deal
with such demands: Contributions are made by individuals, and the database
can be searched by keyword. Furthermore, documents can be marked up
automatically with pointers to entries in the glossary.

The central document registry would function just as its name implies:
Authors would register their document's URL along with its title and at most
three keywords that best describe the content. Severly restricting the
possible entries here probably functions best as a tool to separate
meaningful from chance database 'hits'.

Access to the registry would be all through the same query system in a number
of ways:
First, analogous to the glossary, via keyword search from a form.
The result would be a HTML document, containing titles as links with keywords
further describing the content to be expected. This follows the successful
annotated list format which is found throughout the Web.

Second, authors could automatically or manually mark up documents, much like
the with the glossary. However, and here lies the strength of the proposed
approach, links would not point to the registered URLs themselves, but to the
registry query system, to which they supply the search term. This ensures
that information is always as current as can be, at the expense of having to
go through an intervening layer on the part of the reader. Alternatively,
authors could provide a link to the automatic markup facility at a prominent
space it their document to automatically (on the fly) create links throughout
the document based on currently existing registry entries.

As a variation of the second approach, authors could envisage future growth
of the registry and point to yet inexistent entries via a search term. This
last possibility allows an author to strategically place a links where he/she
believes more information would be helpful and is very likely to appear. In
essence, it allows for concurrent development of hypertext documents without
worrying about synchronizing the effort with others, and still providing
crossreferences. If the page returned upon a failed query is polite enough
and visually pleasing, this should not generate too much irritation on the
part of the reader.

Since authors want to have an audience, it should not be hard to get them to
register their contributions. And given that all of us have a pretty good
knowledge of what we are trying to convey, and that most of us have previous
publication experience, it should also not be hard to come up with a good
title and keywords that adequately describe a document, so that searches can
provide a maximum of meaningful hits.

Let's see what all of you think of this. Any and all comments, critique and
additions are welcome.



Disclaimer: Although I have considerable UNIX and Web experience, I'm no
programmer, particularly not with regard to CGI/web servers and the like. I
*think* most of what I propose here is technically feasible, but I would not
be able to verify this and surely not to implement it within reasonable time

Dr. Christoph Weber                 email: cweber@oci.unizh.ch
OCI  Uni Zurich                     phone: +41 1 257 4219
Winterthurerstr. 190                FAX:   +41 1 361 9895
CH-8057 Zurich,  Switzerland