Glossary (markup/lookup) ideas/concers

Mark Dalton (mwd@alamos.cray.com)
Tue, 17 Jan 1995 08:28:20 -0700 (MST)

Hi! These are just ideas/concerns to be thought about.

The problem with 'amino acid' search is being addressed, it
is where if you have 'amino acid' in your text and you do a search:
The Glossary search will find:
===========
<a href=http://www.cryst.bbk.ac.uk/PPS/glossary/proteinstruct/amino<a href=http://www.cryst.bbk.ac.uk/PPS/glossary/proteinstruct/aminoacid.html>acid</a>.html>Amino</a> <a href=http://www.cryst.bbk.ac.uk/PPS/glossary/proteinstruct/aminoacid.html>acid</a>

Which displays as:
acid.html>Amino acid
===========

If this problem is fixed we could also change the lookup to be able to do
the same, thus fixing the people needing to merge to 'aminoacid' when doing
a 'lookup'.

Now of comments/ideas:

For PPS-Course Glossary FAQ:
I would suggest one change, the part is point '5', second part:
----------------------------< Point 5 of FAQ >-----------------------------
5. What index terms are needed?

Index terms reflect those occurring in PPS documents. Don't worry about case
in indexing terms (e.g. tertiary and Tertiary). At present indexing is case
insensitive. Thus, only enter lower case index terms (e.g. tertiary). Where
the indexing term consists of more than one word, add index terms with care.
For example, nuclear magnetic resonance is an indexing term but NOT magnetic
or resonance. Indexing terms should contain whitespace if appropriate. Thus
greekkey (the glossary term) should have indexing term 'greek key', *not*
greekkey.
----------------------< end of point 5 of FAQ >-----------------------------

What I would suggest is instead of limiting the search to the
user needing to specify a request in a specific form.

For the automatic glossary markup:
In your document you would have 'amino acid'.

Posible conflicts of single 'isolated' word search:
1. You may run into other documents with similar words.
i.e.:
a. Amino acid
b. Nucleic Acid
c. Ascorbic Acid
d. Amino terminal
e. etc.

2. It may look at only 'amino' and 'acid'.

Posible solution (this will help with posible future expansion also):
Have the script look at words in a contextual form.
Example is: 'nuclear magnetic resonance'
1. Check word against the glossary for all hits.
posible hits:
- nuclear membrane
- nuclear magnetic resonance
- nuclear RNA
etc.
2. Look at 'context'/surrounding words.
For glossary markup:
See if any of the glossary 'hits' match the previous/next word
in the document. 'nuclear' AND 'magnetic'

For lookup:
An example of a simple idea of the concept would be:
'grep -i nuclear gloss-index | grep -i magnetic | grep -i resonance'

For the general lookup script, perhaps we could have a pull down button
or a 'checkbox' button, that would let the user decide if he want exact
match or a list of posible matches.

Finally, how broad do we want the index to go? I was thinking of making
it so a high school student could go into the course and be able to get
a basic knowledge of the topic. (I am doing the Cell Biology, okay trying).

Perhaps there can be various levels of indexing:
Science
Chemistry
General -
Biochemistry
Biology
General
Cell Biology
Biochemistry
Genetics
histology
medicine

It seems a little odd, to me, to have organelles in Protein biochemistry
glossary and not have other basic terms (but then my area is genetics
and cell biology) (^8. That is why I was thinking of either:
1. The indexing 'scheme' above.
2. One large general index.

Well, these were just concerns/ideas (^8.

Thanks!

Mark

-- 
Mark Dalton       CH3-S-CH2 H                      H      O       H
Cray Research,Inc.      |   |                      |       \      |
Los Alamos,NM 87544     CH2-C-COO    //\ ---C--CH2-C-COO    C-CH2-C-COO
mwd@cray.com                |       |  ||   ||     |       //     |
                            NH3      \\/ \ / CH    NH3    O       NH3
                                          NH
URL = http://lenti.med.umn.edu/~mwd/mwd.html