Protein structures - the next bottleneck

Published: 1-Dec-2003

Julian F Burke, chief scientific officer at Genetix, discusses how companies are developing methods to speed up the elucidation of protein structures


Julian F Burke, chief scientific officer at Genetix, discusses how companies are developing methods to speed up the elucidation of protein structures

With the sequencing of the human genome and many other organisms, the opportunity now exists to stand back and count the number of genes and hence proteins in each organism. Somewhat to the surprise of many biologists, the total number of genes in the human genome is around 25,000, confirming kinetic experiments that were performed in the 1970s, and a lot lower than the number predicted from the early sequencing studies in the 1990s. These put some estimates of the total number of genes as high as 120,000 - a belief that was not discouraged by the purveyors of genome informatics.The realisation that there are only a relatively small number of genes has had many consequences. In proteomics the impact is on the determination of protein structures. Partly in response to the assumed large numbers of proteins several massive facilities have been, or are under construction for the determination of protein structure. An example is the Synchrotron source under construction south of Oxford, UK. When these sources of immensely powerful X rays come on line they will revolutionise the rate at which protein structures can be determined.

The rate at which DNA sequences were accumulated in the 1990s showed a logarithmic increase over time, while in contrast protein structures accumulated in a linear manner. This may start to change soon.

In the conventional protein crystallisation process the investigator purifies the protein; in almost all cases the protein is the recombinant form. The pathway to successful protein production is via the cDNA. In most instances the protein is expressed in E.coli from an inducible promoter, the culture is grown up and expression turned on.

There are a number of issues with this technique. Although it can work well, producing up to 20mg of protein per litre of bacteria, a number of variables can affect the amount of heterologous protein. One simple parameter is the codon use. In some organisms even though the genetic code is universal, the actual codons used by an organism vary: if an uncommon codon is used this results in a smaller amount of protein produced. In extreme cases the gene has to be re-engineered to put in optimal codons.

Solubility is a major issue. In E.coli when proteins are produced in large amounts, making up to 20% of the total cell protein in some cases, the protein is in an insoluble form and therefore denatured and cannot be used immediately. This has some advantage in that it makes purification relatively simple. Other issues concern the bacterial host. Although E.coli is the workhorse of protein production there are different strains available and these have different properties in terms of protein production. The reason for this is not entirely clear but may reflect proteases encoded by the organisms.

heated debate

The number of proteins that can be successfully produced in E.coli is an issue that is much debated and has a profound effect on throughput. In a study of the Thermotoga maritime genome1 all 1,877 genes were cloned (100% success rate) and 24% crystallised. But to increase this percentage further requires empirical work and is now the basis of the bottleneck in protein crystallisation. For example, the salt solutions that are used to develop the crystals need to be altered in a systematic way. The issue then becomes one of visualisation. How does an investigator know that a crystal has formed? Traditionally plates containing the crystals were observed under the microscope and the crystal removed by hand. This process has now been automated.

exponential increase

The Automation Partnership, for example, has developed equipment to visualise up to 10,000 trays of proteins and Genetix is developing a high throughput system for storing and imaging 600 trays of proteins. These systems will revolutionise the process of protein crystallisation.

Genetix envisages a day when protein laboratories' own X-ray source is replaced by storage and imaging systems, with the resultant crystals are sent to the synchrotron source for data collection. As data collection may take only minutes, this will see the beginning of an exponential increase in protein structures and an end to the flashing red lights in the X-ray lab.

You may also like