THE VIRTUAL HYPERGLOSSARY - Feb 1995

peter Murray-rust (p.murray-rust@mail.cryst.bbk.ac.uk)
Sat, 3 Feb 1996 13:18:08 +0000 (GMT)

The Virtual Hyperglossary

This message is addressed to:
The old inhabitants of vsns-pps-glossary at BBK
Intercocta95 participants
and a few other recent friends :-)

Summary
-------
This document reports the current state of the Virtual Hyperglossary project and its links
to other groups.

IMPORTANT
---------
You have received this because you belong to one of the categories above. The
'glossary' lists at Birkbeck will now become active and you will receive more messages. The
current glossary list is:
vsns-pps-glossary@mail.cryst.bbk.ac.uk
(if you wish to unsubscribe, mail listproc@mail.cryst.bbk.ac.uk with the message
unsubscribe vsns-pps-glossary)

This list is archived and hypermailed onto WWW pages at BBK. (Recent postings will be
hypermailed in a few days). For the next few days we shall continue to use 'vsns-pps-glossary'
so that anyone no longer interested can unsubscribe. IN ABOUT A WEEK'S TIME, DAVE
HOULDERSHAW WILL CHANGE THE NAME TO 'HYPERGLOSSARY' AND HYPERMAILING
WILL RECOMMENCE (it will include *all* messages sent to the list). The purpose of this will be
to support discussion of the VHG concept and its implementation. We shall also continue to
develop communal technology for the creation of glossaries.

If you have any technical or other problems, direct them to me!

Background
----------

In 1995 the Virtual course on Principles of Protein Structure at Birkbeck created a collaborative
glossary for terminology in protein structure. The technology was developed by PM-R using
HTTP/FORMS (with CGI-scripts written in tcl) and provided a crude mechanism for collaborative
working. Virtual authors (about 20) under the enthusiastic and knowledgeable guidance of our
curator Lesley West compiled about 300 terms with a high degree of structure within the entries.
Entries had several elements of multimedia including molecular coordinates, pictures and
connexion tables.
They were also extensively hyperlinked both within and outside the glossary. Terms were
indexed and we achieved proof-of-concept in searching and also marking up documents by
sending them to the server (we believe this latter to be something that hasn't been done
elsewhere.). Initially the terms were created in a local format, but
later I moved to using SGML with a very simple hardcoded DTD and tools to parse it. The
namespace was consistent but somewhat of a prototype.

The Hyperglossary idea is (deliberately) infectious and several other groups started their own
glossaries. Two (Henry Rzepa/Chris Leach - Imperial College, and Barry Hardy et al - Oxford)
used them to support electronic conferences they were running. Georg Fuellen (Bielefeld) used
one in conjunction with the virtual Biocomputing course he and colleagues ran.

Lesley and I published accounts of the VHG which attracted the attention of Matti Malkia and the
InterCOCTA group of UNESCO who are using a terminological/conceptual approach to support
social/political science research. Matti organised an excellent workshop in Tampare, Finland
where I learnt a great deal about the science of terminology.

(Briefly - the PPS glossary can be called a terminological database since we didn't formally
consider the relationships between terms, for which the term 'concept' is used. There are,
however, elements of concept analysis - for example the entry 'amino acid' lists all 20 amino
acids with links to them. 'amino acid' would be formally called a generic superordinate concept
and (say) phenylalanine would be a subordinate term.).

A most important contact at Tampare was Gerhard Budin from Vienna who has been involved in
developing ISO standards for terminology. Gerhard has sent me the ISO standards 12620
(standard terms to be used in terminological databases) and 12200 - the MARTIF DTD for
creating SGML-based terminology. (My understanding of these documents is that 'glossary' is
an acceptable term for the individual terminological
databases that we have created and I intend to continue with it).

I have also met a number of other people who are interested in the idea of the VHG either in
general or wish to create their own glossaries.

(The other piece of history was that the VHG at Birkbeck was destroyed in Nov 1995 in the
Great Disk Crash (not my fault!). Perhaps - like the Great Fire of London (1666) - this is a
blessing, in that I shan't go back to the hacks that I used last year - those who haven't used the
VHG have to take on trust that the things actually worked!)

Recent progress
---------------

The continuing interest in VHG, the rise of the use of SGML (and the Martif initiative) along with
other WWW developmenst means that we have a critical mass of interested people and
technology to accomplish something of real value. At present it's possible to locate information
on the WWW in a few subjects only (biology being the best IMO). Retrospective text-based
indexing robots (like LYCOS) and collaborative collections of discpline-related material (Virtual
Library, YAHOO, etc.) are impressive, but they are very limited where what is required is a
straighforward piece of terminological information. The VHG approach can solve much of this
problem by some simple, scalable technology and philosophy. the simple idea is that there are
enough enthusiasts already and enough existing 'legacy' glossaries that the world community
could create a large terminological database in a distributed fashion. The VHG's role is:
to promote this idea
to provide the ground rules
to produce technology which is simple and robust

We wish to keep the basis of creation by enthusiasts, but are also starting to seek formal funding
for parts of the venture such as the technology.

The VHG has a reneewd Home Page (which I hacked last night) - it does not contain all the links
and will have errors. There was some urgency in that Gerhard wants to demonstrate the VHG to
a group involved in European Environmental analysis on Monday! I'll try to clean it up, but I
have to do the work on a machine which isn't connected directly :-(.

Current technology
------------------

There may be a slight hiatus as we change from the old system to the SGML-based approach.
This shouldn't stop anyone *creating* glossaries because they should be trivial to reformat into
MARTIF. (Don't be afraid of this - we will give you complete guidance! For example, the PPS
glossary can be automatically translated into MARTIF - all we have to do is to choose which of
MARTIFs many terms are most useful for the VHG project.)

Collaboration: I believe we should use Hyper-G as the communal tool for the creation of
glossaries and Dave Houldershaw is setting up a Hyper-G server at BBK. Hyper-G (developed
at the Technical University of Graz) is a 'new generation' of WWW servers which allows
communal working on a server and holds documents as a tree structure to which links can be
added dynamically. It also allows searches on keywords, which is ideal for our work. Hyper-G
reseources can be *viewed* with standard WWW tools and anyone interested should have a
look at the impressive start that Dave has made:
http://iona.cryst.bbk.ac.uk:8000/0805703D/CThe-Root-to-all-Glossaries
(Note that one minor drawback of Hyper-G is that the URLs are not mnemonic). If you are new
to this, some interesteing features are reall molecules (e.g. in phe, myoglobin and many others).
Dave had independently realised the importance of concept analysis and if you look under
'proteins' you will see a collection. - in fact this is probably a good area to spawn off a new
glossary.

If you eant to *edit* this material (e.g. in a curatorial group) you will need a client. Hyper-G
provide two : Harmony (UNIX) and Amadeus (PC). It's probably worth the effort to install one of
these on your machine and it may be that BBK is a good place to develop *your* glossary in the
first instance. Henry Rzepa and Omer Casher at Imperial have also set up a Hyper-G server.

HyperG uses SGML as a fundamental tool so that many of the things we want to do will be well
supported. (I have no formal contact with people at Graz, but intend to approach them about
VHG - any suggestions?).

SGML: To create complex SGML documents can require commercial software but I believe that
(Ignore this para unless you are a techie!)
If you want to do complex things with the glossary or the documents parsed by it you'll need a
parser (we recommend SGMLS or SP which are free and run on all platforms.) You will then
need some postprocessing software to render your document and there are few shortcuts here.
If it's mainly text-based, then tools like Panorama may be adequate - if you want to process the
document aginst the glossary then you will need your own tool. Joe English and I have written a
generic tool for this (costwish) which runs on UNIX and because it uses tcl/tk we hope it will port
to the PC soon. If you want to include molecules then I recommmend Chemical Markup
Language (which I am developing) - "I would, wouldn't I". For parsable diagrams, CGM is
recommended and for maths there is a no clear standard but TeX is useful if you don't want to
parse it.

Current projects
----------------

Some brief descriptions - I hope that you will all write in much more detail.

InterCOCTA is attracted by the concept of a distributed hyperglossary for sharing,
collating and developing terminology. I hope the group will look at the proposed namespace and
'rules' :-). Fred Riggs and colleagues has championed the use of glossaries for many years
(Fred uses the term 'nomenclator') and foresees the use of personal and group glossaries. VHG
can support this when there is an agreed robust namespace for this and we'd welcome
suggestions here.
Fred demonstrated a nomenclator for his 'Turmoil Among Nations' TAN work which includes
terminology for 'migrant', 'quasi-state', etc. This nomenclator is well structured and - I believe -
could be translated into MARTIF straighforwardly. TAN seems like an excellent prototype for
testing the technology in a complex area of conceptual analysis.

Technology: Keith Jeffery is Head of Systems Engineering at Rutherford Appleton Laboratory (a
leading UK Research Council site). he and I are submitting a grant to the Bioinformatics
Committee of the BBSRC on a 'Biomolecular Hyperglossary' whose main purpose will be to
develop the underlying technology for collaborative working, namespace, etc. Keith was the first
UK representative on W3C so is very well placed to advise us. I will start including molecular
tools in that so that PPS96, ECTOC , EGC-1 will be supported. (BTW MARTIF supports figures,
molecuels, etc by the use of pointers so that we can create glossaries entries and their
specialised content independently and then link them). I am not waiting for this grant to be
funded (!) which is why I am pushing ahead with the rules and the namespace - this is a fairly
low-cost activity.

Gerhard is presenting the possible use of the VHG in Environmental terminology on Monday -
we look forward to hearing from Gerhard on this list.

Biocomputing: Georg Fuellen (Bielefeld) and colleagues built an impressive dictionary for their
biocomputing course which suffered in the Great Disk Crash. He is now actively planning to
return to this and will (I think) be using Hyper-G at BBK.

ECTOC. Henry Rzepa and colleagues ran an electronic conference in chmeistry last year and
collected molecules in a hyperglossary. Molecules pose a special problem because they need a
special means of representing them and there is as yet no standard (that is where we are
intending to use CML). The same applies to the first electronic glycoscience conference (EGC-
1) where Clare Sansom and Barry Hardy are hoping to start again on a hyperglossary. (There is
a wider community of people interested in providing molecular resources and I expect to see
more activity here).

Immunology: Chris Thorpe (who was involved in setting up HISTO - the major histocompatibility
complex - at BBK) is keen to start a hyperglossary on immunology. I, for one would find this
VERY useful.

Informatics: Kate Dymoke-Bradshaw (a colleague of mine at GW) specialises in informatics and
has a group of students at Imperial College. Kate is very keen for them to be involved in building
a glossary of terms in informatics - this might interest the InterCOCTA group?

PPS Kurt Giles and Darren Fast have offered to curate the PPS glossary. This is a good place
to start introducing concept analysis so that the glossary is more easily navigable and
maintainable. The original terms were a ragbag and the glossary should probably split into at
least two.

There are other areas where we expect to see activity and Lesley and I will try to hold some of
this together! If anyone is interested in helping to look after central VHG resources (e.g. at BBK)
we would love to hear!

Summary
-------

There are lots of potential, independent projects here and we'd like to see the idea being
promoted! If we can reach consensus on a reasonable number of the rules, especially the
namespace, I'd like to prepare a draft spec. It's very important that we establish a claim to
namespace before it gets polluted by independent efforts. One possible mechanism would be to
prepare an internet-draft (leading possibly to an RFC and more). I'd welcome comments.

Please post!

PeterMR

Feb 3 1996


Peter Murray-Rust, Glaxo Research & Dev. (pmr1716@ggr.co.uk); (BioMOO: PeterMR)
Birkbeck College, ubcg09q@cryst.bbk.ac.uk, CBMT/Daresbury mbglx@seqnet.dl.ac.uk
http://www.cryst.bbk.ac.uk/PPS/index.html, http://www.dl.ac.uk/CBMT/HOME.html