And some programming for those who can't live without it...
introduction
1
2
3
4
 
NOTE: If you are attending the next two Chemoinformatics lectures and tutorials, you can ignore this part. In this case, you have now finished the first tutorial - congratulations!
 
If you want to try out some programming relating to chemoinformatics, the good news is that you don't have to have to do it from scratch. Currently there are open source Java chemoinformatics libraries that can be downloaded and you can try out writing simple programs that will use these libraries. Beware that these libraries are under development, many of the classes have not been tested extensively and hence, they contain many known and (possibly many more) unknown bugs.
 
I would suggest that if you are going to try out one of these, start with CDK, the Chemistry Development Kit. This is not because it is the best one around or the most popular. I just think it's the easiest one to use for a quick test. The homepage for CDK is:
http://cdk.sourceforge.net/
Do read the basic introduction on the page and quickly navigate the site to learn a little bit more about this project.
 
Like with any Java library, you need to import the CDK classes into your program, if you want to make use of them. In order to import them, your Java compiler must be able to find them. This means that your CLASSPATH variable needs to be set so that the directory where the classes are held is included in the class path. If you want to practice at home you will need to download the jar files from sourceforge.net: http://sourceforge.net/project/showfiles.php?group_id=20024 and then you need to set your CLASSPATH to include the directory of your jar file.
 
The current release of CDK is 1.0.4 You will need Java 1.5 to use this release. Alternatively, you may download the source files and recompile them (you need Ant for this). Assuming you have downloaded a jar file, you need to do one of the following: a) put it under $JAVA_HOME/jre/lib/ext (where JAVA_HOME is the directory holding your Java SDK (could be something like /usr/java/j2sdk1.5/ in a Linux installation) but you need to have write permission to that directory) OR b)include it in your CLASSPATH variable OR c) include it at compilation and execution as follows:
  javac -classpath path_to_jarfile/cdk-1.0.4.jar MyClass.java
  java -cp .:path_to_jarfile/cdk-1.0.4.jar MyClass
 
You only need to know the basics of Java to write a useful program using an existing library of classes. Here are some suggestions for short programs you may want to try out writing with the help of CDK (by all means try something different, if you don't like my suggestions!):
  • A program that will read in an MDL mol formatted file and write out the corresponding SMILES string (check out first some basic classes, such as the AtomContainer, ChemFile, ReaderFactory etc in CDK's online API documentation at: http://cdk.sourceforge.net/api/).
  • A program that will read in molecules in any of the recognised formats, and print out the molecular weight of each.
  • A program that will read in two molecules (in any format), produce their fingerprints (org.openscience.cdk.fingerprint.Fingerprinter class) and calculate a Tanimoto score of similarity based on these fingerprints.
  • A program that will read in two molecules, find the maximum common subgraph (org.openscience.cdk.isomorphism.UniversalIsomorphismTester) and return an MDL formatted file that will contain the matched atoms of the first molecule.
 
If you want to cheat and make an easy start, take a look at an example program that does the first of the above tasks (turns MDL files into SMILES strings): MolToSmiles.java(NOTE: this program has not been tested recently and it is unlikely to compile with the current CDK. This is because many of the classes were renamed or the definition of the methods was changed. You can still use this piece of code as a guideline on how to proceed, but it almost certainly requires changes.)
Thanks to Rajarshi Guha you can now ignore the paragraph above, and go straight to a nice selection of CDK code examples that should work! Go to the following URL: http://cheminfo.informatics.indiana.edu/~rguha/code/java/ and explore the examples there.
If you want some molecular connectivity files to try out your programs, you can either download them from the web (you know many sites where you can find them now!) or you can use these MDL-formatted ones:
amphetamine.mol
viagra.mol
ATP.mol
 
Don't forget that you are supposed to enjoy this! Good luck.
 
Previous Page
THE END!