Re: Glossary terms: questions

Murray-Rust Dr P (pmr1716@ggr.co.uk)
Tue, 9 May 1995 13:07:46 +0100 (BST)

Hi everyone,
Thanks very much for continuing to contribute materials to the
glossaries. Meanwhile I have been developing the next stage of the
technology and I'd be very happy for comments.

Namespace
---------
This is the single - perhaps the only - thing that really
matters. Get this right, and the rest will follow. So ...
In principle the HG concept can cover any subject you can think
of and we should plan for this. I've been thrashing around and feel that
the best way - at present - is not to try to emulate Dewey or UDC or
Library of Congress since that requires a lot of knowledge and we would
end up with a deep hierarchy most of which was unpopulated. But, for
example, I saw a book in our library which shows the possible scale -
this was on 'Acronyms in chemical spectroscopy' - SpringerVerlag I think
and about 200+ pages at about 1-2 terms /page. Most of these were things
like NMR pulse sequences, etc, but each had the algorithms, etc. Exactly
what a hyperglossaey addresses supremely well. Quite a lot of internal
x-refs. So do we try to classify it as
/sci/chemistry/analytical/spectroscopy
or whatever, or just go for: chemistry/spectroscopy and when anoth comes
alongs call it spectroscopy_1, etc. I favour the latter - i.e. see what
we get.

I therefore favour a hierarchy starting with the virtual library
top subjects witha few extra s (e.g. GNA). We would have:
gna/pps
gna/bcd
chemistry/ectoc_1
biology/glycosci
etc.
This will be done by SUBJECT rather than site, i.e. NOT:
birkbeck/cryst/pps
or
publisher/dictionaries/music
but there will be a mapping.

I have worked out how we can manage glossary servers worldwide
and will talk IRL to Lesley before putting ideas forward. Basically it
will be that there is a prime server for each dictionary but that every
server has an index of which glossaries are where and can re-route people.
I have wildcarded search tools so that people could search for:
biology/*prot*
and find the list of glossaries on proteins. On a *single* server I can
allow for:
(chemistry/biology)/*prot*
but couldn't allow for this world wide as it would be very wasteful. I
have other ideas here...

Content
-------
Content is difficult for PPS because it's a subset of many
subjects. It's perhaps valuable that we see it as a prototype, and it's
seen as a learning and casual reference tools rather than a complete
dictionary. However where there is a class it should be fully populated
- i.e. every a.a. should have an entry. But obviously there will only be
a few key proteins.
Some entries should be junked (e.g. bipyridyl). They are only
there because I wanted something to show the technology and it was the
first molecule that came to mind.
Do we think the glossary has usefulness outside the course?

Quality
-------
For the larger hyperglossary project we expect to appoint
curators for the different servers and they wil have *total* autonomy on
what is on their server. The curators who may also be the server
maintainers will have to belong to an institution or group which is
comitted to maintaining the source (unlike the present individuals who do
their best but cannot promise what they will be doing next year). If
this is probably presented it will work.

Software
--------
I have made a lot of progress here and now have the whole process
from creation through editing to searching under csh/tcl scripts. All
glossaries are references by a hierarchy (e.g. gna/pps or chemistry/ectoc).
At present all entries have generic terms (I have NOT bolted chemistry
in) so that we have:
REVISION ID NAME DESCRIPTION SYNONYM LINKS INDEXTERMS IMAGES (CROSSREFS)
as the primitives. Please let me know if there are any more you'd like
in. These will NOT be domain-specific.
I intend to make available a list of strong types for DATA
files. At present these are:
FLOAT INTEGER STRING DATE LINK ARRAY MATRIX TABLE
These can be combined to have things like:
FLOATARRAY
TABLE (FLOAT, DATE, ...)
This will all be managed by the postprocessor. I think it gives us
enough scope for almost anything. Again, please comment on other
fundamental types.

P.

Peter Murray-Rust 44-1438-763338 T "Nothing exists except atoms and empty
pmr1716@ggr.co.uk 44-1438-764918 F space; all else is opinion" Democritos.
Biomolecular Structure, Glaxo Wellcome, Stevenage, Herts, SG1 2NY, UK