The Bioinformatics Template Library (BTL)

An Introduction to Generic Component Programming using C++ Templates

Contents

1.0 Templates

2.0 The Standard Template Library

3.0 The Bioinformatics Template Library

3.1 The BTL_Graph

3.1.1 Constructing a BTL_Graph

3.1.2 Adding and Removing Vertices

3.1.3 Adding and Removing Edges

3.1.4 Navigating Sequentially Through a BTL_Graph

3.1.5 Navigating in Depth First Search Order

1.0 Templates

Templates are a method within C++ for parameterising code. They reduce code size and code duplication by allowing the same code to perform many jobs. For example:

// A template function definition

template <class T>
bool less(const T& a, const T& b)
{
return a < b;
}

// The use of the template

int main()
{
    // The function used to compare 2 integers
    //
    int a = 3, b = 4;
    if (!less(a,b)) exit(1);

    // The same function used to compare to characters
    //
    char c1 = 'x', c2 = 'z';
    if (!less(c1,c2)) exit(1);

return 0;
}

In this example the compiler will instantiate two versions of the less function i.e. the code is duplicated by the compiler and not by the programmer. This make for smaller code and for less human error.

As well as functions you can have template classes. For example:

// A list container where the type of data stored is determined by the template parameter
//
template<class T>
class list
{
public:

void insert(const T& t);
};

int main()
{
    // Create three instants of the list class, each containing a different type of data
          //
    list<int>    list_of_ints;
    list<float> list_of_floats;
    list<Atom>   list_of_atoms;

    // Insert a data item of the appropriate type into each container
    list_of_ints.insert(34);
    list_of_floats.insert(-7.3);
    Atom atom;
    list_of_atoms.insert(atom);

return 0;
}

2.0 The Standard Template Library

The standard template library (STL) is part of the standard C++ library which an official part of the C++ language. The STL contains containers (data structures) and algorithms. These two types of components are designed to be used in combination. The algorithms are generic in that they are largely independent of the way the data they manipulate is stored. Each of the STL containers has a common interface that algorithms interact with. This interface, or the "glue" between the containers and algorithms takes the for of iterators.

Each STL container provides a type of iterator that allows navigation through the contents of the container. Iterators act like pointers, and they reference individual data items in the container. The ++ operator is used to move the iterator so that it references the next data item in the container. As with pointers, the * operator is used to dereference an iterator and obtain an item of data.

For example, the following code uses a iterator to access each element stored in a STL list in turn. Before each move through the container the iterator is dereferenced and the data item is printed out. This methodology will work with each of the STL containers (not just the list class):

#include <list>

int main()
{
    // Compiler such as Borland C++ which already implement C++ namespaces require that you use the
          // following statement to state that it is a standard C++ list that you want to use. This is standard C++ but
          // not all compilers will recognise it.

using namespace std;

// Create a STL container and put some data into it

    list<int> list;
    list.insert(2);
    list.insert(-8);

...

// select the iterator type provided by the container being used

typedef list<int>::iterator iterator;

// Use the begin() and end() functions to find the bounds of the container
// Move through each item in the container in turn using the ++ operator

for(iterator i=list.begin(); i!=list.end(); i++)

cout << *i; // "dereference" the iterator to access the actual data item

return 0;
}

All STL containers provide a begin() function that returns an iterator that references the first element in a container and an end() function that returns an iterator that references the position in memory after the last element in container. For obvious reasons the iterator returned by end() should not be dereferenced.

Generic functions within the STL generally take iterators as input. The type of iterators used are deteremined by template parameters. This allows the algorithms to be independant of the container. One of the simplest algorithms is find. Its declaration looks like this:

template <class InputIterator, class T>
InputIterator find(InputIterator first, InputIterator last, const T& t);

and is used like this:

#include <vector>
#include <algo.h>

int main()
{

// Create an STL vector which is a dynamically extendable array.

vector<char> vec;

// Put some data into the vector

vec.push_back('e');
....

// Use the STL generic find function to search through the vector for the letter g.

typedef vector<char>::iterator iterator;
iterator i = find(vec.begin(); vec.end(); 'g');

// If the letter was not in the container the algorithm returns the end() iterator, otherwise it returns the
// iterator references the first occurence of the letter in the container

    if (i == vec.end())
        cout << "Letter not found.";
    else
        cout << *i << " found was found.";

return 0;
}

Containers included in the STL are list, vector, set and map. Algorithms include those for sorting and merging. Not all algorithms will work with all containers. For instance, one cannot use a sort algorithm with a list container because the iterator type provided by this type of container does not provide a random access facility.

There are also other component types within the STL. For a fuller description, please see SGI's STL Programmers Guide.

3.0 The Bioinformatics Template Library

The Bioinformatics Template Library (BTL) is designed to have the same 'look and feel' as the STL. The BTL includes containers and generic algorithms, as well the other types of components found in the STL. However, whereas the STL contains components of general usage the BTL, as the name suggests, contains components specifically for use in bioinformatics and molecular modelling. Components include Matrix and Vector "containers" and fast Fourier transform algorithms. Please see the BTL documentation for a complete list of its contents. This documentation assumes a good working knowledge of C++ templates and the STL. Here I will describe in more detail the use of two of the BTL components: the BTL_Graph and the BTL_GraphWithEdges.

3.1 The BTL_Graph

This class models a graph data structure where the vertices are labelled but the edges are not.That is a template parameter controls what type of data or objects are associated with each vertex. The graph can be directed or undirected (see Figure below).


Undirected graph of real numbers	Directed graph of characters

The vertices can be associated with more complicated types. The obvious application in molecular modelling would be to have each vertex represent an atom. These atoms could be stored within the graph container or simple pointers to atoms could be stored. Following the same analogy, the edges could represent bonds or simply interatomic distances.

The BTL_Graph class is actually a container adaptor. This means that not only can the user choose the type of data associated with vertices but they can also choose the type of container is used to store the vertex data.

An Illustration of the How the Data and the Connectivity is Stored in the BTL_Graph

Different containers have different properties. For instance, the vector is the fastest for sequential and random access, but can be slow to insert and erase items from. The STL set is ordered and cannot contain duplicate items, so the graph illustrated above, where 'c' occurs twice, could not be stored in a set.

3.1.1 Constructing a BTL_Graph

A BTL_Graph comes in two forms, one for the STL vector and deque containers and one for the rest. In the latter case vertices reference a vertex data item via iterators. This method does not work for vectors and deques because the iterators provided by these containers are not stable and can become invalid after insertions or deletions. For vectors and deques, data items are accessed via index numbers. The two forms of BTL_Graph are used like this:

// Use the first type of graph for deques and vectors

BTL_Graph1< vector<char> > graph1;
BTL_Graph1< deque<int> > graph2;

// Use the 2nd type of graph for all other STL containers

BTL_Graph2< set<float, less<float> > > graph3;
BTL_Graph2< list<Atom> > graph4;
...

N.B.1 The STL set sometimes requires and method to be used to place the contents in a unique order. For more advanced compilers the less<T> can be left out and is implicitly used as a default.

N.B.2 There is another container included in the ANSI standard C++ library called a valarray. This has not yet been widely implemented but when it is the BTL_Graph1 should be used.

In all the above cases a undirected graph is constructed. To construct a directed graph do the following:

// Construct a directed graph

BTL_Graph1< vector<char> > graph(directed);

3.1.2 Adding and Removing Vertices

When you add a vertex to a BTL_Graph an iterator to the new vertex is returned:

// Define the vertex iterator type

typedef BTL_Graph< vector<char> > Graph;
typedef Graph::VertexIterator VertexIterator;

// Add a vertex to the graph

VertexIterator vi = graph.AddVertex('d');

N.B. if a STL set was used here instead of a vector and the character 'd' was already present in the graph, then this function would return an iterator to an existing vertex.

To remove a vertex you must use a vertex iterator to indicate which vertex you want to remove:

// Remove the vertex just added

bool removed = graph.RemoveVertex(vi);

A boolean 'false' is returned if the vertex to be removed was not found in the graph

3.1.3 Adding and Removing Edges

To add an edge, you must provide 2 vertex iterators:

// Add two vertices:

VertexIterator v1 = graph.AddVertex('f'),
v2 = graph.AddVertex('q');

// Add an edge between them

bool added = graph.AddLink(v1,v2);

'true' is returned if the edge was sucessfully added. These boolean return values can be ignored in most cases.

The name of this function indicates that an edge in the BTL_Graph is simply a link between two vertices and no value can be associated with it.

Removing an edge follows the same approach:

// Remove an edge

bool erased = graph.RemoveLink(v1,v2);

or simply:

graph.RemoveLink(v1,v2);

3.1.4 Navigating Sequentially Through a BTL_Graph

You can access vertex data or the vertices themselves sequentially:

// Accessing the vertex data sequentially (the normal STL container method)

typedef Graph::iterator iterator;

for (iterator i=graph.begin(); i!=graph.end(); i++)
{
char item = *i;
cout << item;
}

// Accessing the vertices sequentially (Graph specific)

typedef Graph::Vertex Vertex;

for (VertexIterator v=graph.begin_vertex(); v!=graph.end_vertex(); v++)
{
Vertex& vertex = *v;
cout << vertex;
}

3.1.5 Navigating in Depth First Search Order

If all you want to do is access data sequentially then a graph data structure is not the best choice. To access the contents of the graph in an order dictated by the connectivity of the graph you can use the BTL_DFSIterator. This is separate class as it is anticipated that it could be used on other Graph classes. Its use is simple:

// Start from first vertex in graph (but you could start anywhere)

VertexIterator vi = graph.begin();

// Construct a DFSIterator from a VertexIterator

BTL_ConstDFSIterator<Graph> di(vi);

// Loop through each connected vertex in DFS order, stop when done them all

while (!di)
{
    char item = *di;
    cout << item;
    di++;
}

Any questions to Will Pitt. This page was last updated on $Date: 1997/06/30 11:13:55 $