Transcript of meeting in BioMOO, PPS Base 21st May '96 15:00 GMT: H. influenzae Sequence Data

A Big Thanks to PPS Consultant Henry Brzeski of the University of Strathclyde, Dept of Bioscience and Biotechnology for holding this seminar.

This transcript can currently be found on the tape 'H_I_tape' in the PPS Base, but will eventually have to be deleted to save disk space.

Participants:

jzt turns the C-recorder on.

Jzt says, "Testing H_I_tape 21st May 1996"

jzt turns the C-recorder off.

jzt turns the C-recorder on.

Jzt [to henryb]: could you repeat the Agenda as I just turned on the recorder!!

Henryb says, "I think the most amazing part of all this work was the fact that it was possible to sequence 1.8 mBp in 3 months"

Henryb says, "AGENDA"

Henryb says, "1 Technology"

Henryb says, "1a Random DNA sequencing of HI as a prelude to human genome sequencing"

Henryb says, "2 Analysis of data"

Henryb says, "2a identification of genes y homology"

Henryb says, "2b identification of open reading frames (ORFs)"

Henryb says, "2c When is an ORF a gene?"

Henryb says, "2d Philosopy - is there a vis vitalis?"

JosteinA finds his/her way in.

PeterMR asks whether technology is peeding up faster than we expected

JosteinA says, "hi everybody"

Gustavo [to PeterMR]: why do you think so?

Henryb says, "I think that it is still going as planned but I never believed the forecasts and I am surprised to see them keeping to their timetable"

PeterMR got distracted. I felt that the yeast chromosome went faster than people had thought and wondered whether (say) we were moving faster than people had predicted 5 years ago

Jzt [to henryb]: Are you now starting with Item 1, Technology?

Rob finds his/her way in.

CheeMV has disconnected.

Henryb says, "TIGR (the people who sequenced HI) used it as a simple model in preparation for the human genome. They wanted to show how easy it is to seqence DNA when you have a bank of DNA sequencers."

Henryb says, "It took them 3 months to sequence 1.8 mbp"

PeterMR asks how many sequencers (a) human (b) machine

Henryb says, "this means that TIGR could sequence the human genome on their own in 1000 * 0.25 years (25 years). Isn't life so easy !!"

CheeMV has connected.

Henryb says, "The statistics on the paper say an average of 14 machines. No mention of number of people required for this but it did require 28,643 sequncing reactions "

PeterMR asks what is a sequncing reaction? How many bases does it give?

Paulyta finds his/her way in.

Paulyta says, "hi"

Paulyta waves

Henryb says, "but it is not only the machines to do the sequenceing but also the computers to do the data analysis. the sequnces were not touched by hand. all sequences were read by computer, trimmed by computer, irrelevant sequences removed by computer and then submitted to the database by computer"

JensJL (Hi everybody, where is the party ?) finds his way in.

Cathy has disconnected.

JosteinA (idle 21m) goes northwest.

Henryb says, "sequencing reactions are based on the sanger method. DNA is synthesised enzymatically and stopped by random incorporation of labelled terminators. In this way it is possible to read 300 - 600 bass at one time (depending on the machine you have) TIGRs average reaction gave bout 250 - 300 bp"

Gustavo says, "Some other methods can give much longer reads as well. Notably, Ansorge's, which reads 2000 bases properly per run."

Henryb says, "I think the problem is reliable technology. ABI have cornered the market in automated sequencers which are easy to run so everybody is using them"

PeterMR is learning a lot, thanks, henry

Hrosa finds his/her way in.

JosteinA finds his/her way in.

Henryb says, "I think the next amazing part of this project was the automation involved. as I mentioned earlier after loading on the machine"

The housekeeper arrives to cart Cathy off to bed.

KarlS says, "I think that the real achievement of TIGRs HI work was not the sequencing on 4 machines, but the shotgun cloning of a whole genome. As far as i know, people usually produce subclones, map them by hybridization and then start shotgun cloning and sequencing many small clones (of 3 to 7 kb length). They produced a high quality shotgun library of the *whole* genome and let the computer to the mapping."

Henryb says, "I think the next amazing part of this project was the automation involved. as I mentioned earlier after loading the sequencing reactions on the machine, whihc I presume was done by hand, the rest was done by computer (which takes us onto the second item on the agenda (analysis of data0"

Gustavo says, "But of course such an approach is unfeasible for Homo sapiens."

Gustavo means, full genome shotgun.

Jzt wonders what a shotgun is?

Gustavo says, "Shotgun sequencing: random breakage of the DNA molecule, subcloning, sequencing of the fragments, and computer-assisted assembly."

Gautam finds his/her way in.

Gautam says, "Hey - Anybody here""

Gautam says, "Hi Paul""

PaulSt finds his way in.

Gautam gestures OK to Gustavo

KarlS says, "shotgun cloning means that a piece of DNA is sheared mechanically or by sonification which should result in random breakpoints (as opposed to restriction enzyme digestion). The DNA is then separated by size and cloned into a vector. Theoretically, if one sequences a certain number of these clones, the whole DNA molecule can be sequenced"

Henryb says, "as far as I understood it TIGR DID NOT map the DNA. 10 genomes worth of DNA were sequenced and then overlapped to give the entire sequence. An amazing amount of computational power(242000) different fragments to overlap"

PeterMR assumes this is why it won't work for the human genome

KarlS says, "They use massively parallel computers"

Gustavo [to PeterMR]: that's just one of the reasons. The massive presence of repetitive sequences in the human genome is another one...

PeterMR thanks G.

Henryb says, "I think the numbers involved make impressive reading. 28000 sequences obtained (average read length 450bp). Equivalent to about 11 million base pairs (obviously many duplicates) all finally aligned into 1.8 millino consective, uniques sequneces"

Gustavo says, "Shotgun sequencing a single 40 kb cosmid usually presents a few difficulties at assembly time, from repetitive sequences. Assembling 3x10^9 bases with >10^6 repeats, all at once, is just unthinkable. :)"

Paulyta says, "Is there some way to only sequence the non-repetitve DNA?"

Paulyta says, "MAybe by using some technique ti fish out the repetitve sequenceas?"

Henryb says, "that is the reason for following Craig Venter's expressed sequence tag strategy"

Gustavo says, "The 'problem' is that the repetitive DNA is 'interspersed' in the genome. If you suppressed it, you would lose also single-copy sequences, and would end with a highly fragmented resulting sequence."

Henryb says, "most people are firsst sequencing teh mRNAs from cells because that will be most interesting. Then they intend to tie this into the human genome later. Obviously repeat sequence do not occur (to any great extent) in mRNA"

KarlS says, "repetitive sequences occur in the noncoding regions of mRNA, but are usually not a problem for sequencing"

Paulyta says, "If the repeating sequences are dispersed, won't you get problems with correcxtly linking the rest of the sequences togehter, depending on the length of the repeated and cloned DNA?"

Gustavo [to paulyta]: right.

Hrosa has disconnected.

JosteinA has disconnected.

Henryb says, "the next problem with DNA sequencing is the following:- its easy to get a sequence but what does it mean? This is where our great networked databases really score. This is why you have to feel comfortable with querying these datbasase (next item on the agenda - identification of genes by homology)"

PeterMR is listening :-)

The housekeeper arrives to cart hrosa off to bed.

Gustavo . o O ( and of course you need software that works properly ;) )

The housekeeper arrives to cart JosteinA off to bed.

Henryb says, "yes. there is a lot of software available which will do subtly different things. you need to know how to use it and what the results mean"

Gautam says, "Not to mention the ambiguity/controversy surrounding the matrix to use for searching database to look for homology""

Henryb says, "we are discussing the use of software to analyse DNA sequences and how this will allow you to assign unknown DNA sequences to defined functions"

Paulyta says, "How is homology defined -in terms of sequence additions/deletions substitutions...? In the various search methods?"

Gustavo would rather use the word 'similarity' in this context.

Gautam . o O Time for the weekly meeting.

Gautam has to go

Gautam has disconnected.

CrisCan has to go. bye bye

CrisCan has disconnected.

Henryb says, "I prefer to use a program called tfasta whcih first translates the DNA sequence int a protein sequence and then compares it against all the translated DNA sequences in the database. This is probably more sensitive if your looking for coding sequences. If not then DNA homology becomes rather difficult toquantify."

KarlS says, "I have a problem with using FASTA. I never know when I to consider similarity to be significant (i think BLAST is more useful in that respect)"

Paulyta says, " this would also reduce your search by a factor of three (bases/AA). But what about codon usage and alternate codons in differnt species?"

The housekeeper arrives to cart Gautam off to bed.

Henryb says, "firstly it makes your comparison more difficult because you have to compare translation in six reading frames (three forward and three back). As far as alternate genetic codes goes then the answer is I don't know. I only work with standard DNA sequences so I've never had to worry about this !"

Hersh finds his way in.

The housekeeper arrives to cart CrisCan off to bed.

Henryb says, "the agenda has gone to pot - as usual. I will have to be off soon. Are there any other burning issues?"

Hersh has disconnected.

Jzt [to henryb]: Undoubtedly there are, but it's a question of how long you and others want to keep going at one time! We can have additional meetings later on...

PeterMR tells henry not to worry about the agenda :-) In my experience you get about 1 topic/15mins - and then only if you regulate it firmly.

JohnW [to henryb]: Well, maybe we could continue later at some other date? I for one have learned a lot.... many thanks for this seminar

Jzt says, "Yes, many thanks, Henry. "

PeterMR has learnt quite a lot - has been compiling Java on the other window, so it's a nice easy pace. It's a real strain when you are *running* it though!

KarlS says, "Thank you, Henry"

Paulyta says, "thanks and bye for now"

PeterMR imagines that henry fels wiped out.

Gustavo says, "I'd like to mention that Eitan and I have started a 'GCG HelpDesk' here in BioMOO, where people can give and get help about sequence analysis."

Paolo says, "thank you"

JohnW says, "I will get the transcript on the WWW pronto"

jzt turns the C-recorder off.