Bench philosophy: Chromosome conformation capture methods

Mapping Chromatin Interactions
by Steven Buckingham, Labtimes 06/2017

The three-dimensional structure of the chromosomes has a major impact, on which genes get expressed and when. For the last ten years, a growing number of scientists has been working on better ways to look at this 3D structure.

A naive view of the genome is: it is a string of symbols bearing the message that on its own, is enough to build an organism. Some bits of the string are comparatively easy to understand, namely the protein-coding parts, where each codon triplet stands for an amino acid. Some parts are a little more subtle, such as the special sequences that mark the start of a protein-coding gene, or mark where sequences are to be spliced. No doubt there are yet higher-orders of meaning in the message, too, which remain to be discovered. But the basic idea is that the genome is a kind of Turing machine. In other words, it works just like a computer programme, where one string encodes not only the data, but also the instructions for manipulating those data.

Chromosomes fold into complex three-dimensional structures that affect, e.g. gene function. Photo: NIH

But perhaps we have been too influenced by this metaphor of the “genome-as-programme”. That is certainly one point made by the many biologists, who have looked at not only the sequence of the genome but also at the chromosomes as a whole, with their own three-dimensional structure.

A chromosome's job is not an easy one. Unpack all the DNA in just one of your cells, line it up end to end and it will be some twenty metres in length. All of that has to be packed into a spherical container less than ten μm in diameter. Oh, and by the way, you can’t just pack it any old way – you’d have to allow access to various sites in the sequence at various times: when you want to transcribe a gene or replicate the DNA, for instance.

In other words, the formation of chromosomes is little short of an engineering miracle.

More than a packing problem

And if you don’t think those design specifications are tough enough, hang on, because it gets worse. It is emerging that the 3D structure of chromosomes is not just the solution to a difficult packing problem but also plays a role in controlling how genes are expressed. Right, so we need to understand the 3D structure of chromosomes. But how are we to determine that structure? One solution is microscopy but its resolution is far too low to answer the questions being asked today.

The biggest advance in mapping the structure of chromosomes comes from a field of related approaches called Chromosome Conformational Capture (3C as it is called). The basic idea behind all these techniques is to measure the distance between pairs of sites on the DNA sequence. Imagine you want a map of a city, showing the location of all the key buildings. Imagine further that a clever cartographer decides she wants to save bandwidth by sending you not a map but a spreadsheet, listing the distances between every pair of buildings. Irritating, to say the least, but being an even cleverer biologist, you realise that it is indeed possible, if perhaps not easy, to work out the relative position of each building in space just from the distance data alone.

How is this done in Chromosome Conformation Capture? There are three key steps: ‘freezing’ the DNA into place using formaldehyde cross-linking, joining the cross-linked DNA fragments together into single DNA strands, then using DNA chemistry on those strands, to figure out who was sitting next to whom.

To understand the latest incarnation of 3C, we need to go over its history. You can sum up that history as being ‘one-to-one’, ‘one-to-many’, ‘many-to-many’ and finally ‘all-to-all’. One-to-one 3C was the first type to be described and is the starting point for all the others, and for the rest of this article, I will use the term 3C to refer to this particular version.

Latest 3C incarnation

The first step is to cross-link the DNA strands together with formaldehyde. The assumption is that the closer two loci are together and the more frequently they are close together, the more likely they are to be cross-linked. That cross-linking can occur between loci on the same chromosome or loci on different chromosomes.

It is important to get the concentration of formaldehyde right but, as with so many techniques, there is no easy way of telling what that ideal concentration is and how long the fixation should take. The usual range is between one and four percent for between five and ten minutes, but no systematic study on these parameters has been done.

The next step is to cut the cross-linked DNA into fragments using a restriction endonuclease. The choice of enzyme depends on the question being asked. For example, if you are particularly interested in short-range interactions, it would make sense to use a four-base cutter rather than a six-base cutter. Whichever enzyme you go for, it will leave you with a mix of pairs of short DNA strands connected together by a formaldehyde bridge. The next task, then, is to join the ends of these pairs together with a ligase. Now, you can do PCR amplification on these chimeric DNA strands, using primers directed to your loci of interest; and with standard quantitative PCR you get an indication of the relative frequency, with which your pair of loci interact in the population of cells.

From 3C to 4C

Plain vanilla 3C gives you a relative measure of the interaction of one pair (or maybe two if you multiplex) of loci, so it is very much a hypothesis-driven approach. Unsurprisingly, it did not take long to adapt 3C one more step in the genome-wide direction. The next development, ‘one-to-many’, was realised with the development of 4C. The term, ‘4C’, refers generally to ‘one-to-many’ capture but is not defined by any single definitive feature of methodology. The most common method of 4C uses hybridising chips. You start with the same steps as 3C but when you have finished the ligation step, you add in a second restriction digestion, followed by a second ligation. This results in a circular construct that is amplified by inverse PCR to yield a product that can be hybridised to a chip.

One-to-many opens up a lot of possibilities but we are still not at the genomic scale. The next generation, 5C (be patient, there is no 6C!), takes us up to the ‘many-to-many’ level. Once again, start with 3C but now you are going to change the PCR amplification step somewhat. Instead of a set of primers designed for the two loci of interest, you add a mix of primers, each one directed to a single restriction site in the region of the genome, in which you are interested. If any pair of loci is cross-linked, a pair of primers will have lined themselves up. You ligate the primers and amplify them using primers containing a universal sequence for sequencing. “Hold on”, you may be saying, “that is a lot of primers.” Indeed, and therein is the limitation of the method – you can only cover as much of the genome as you can design specific primers. That is why 5C can only ever realistically be ‘many-to-many’.

But it is Hi-C that brings us to the last and greatest level of capture, that of ‘all-to-all’. The strategy behind Hi-C does not involve any radically different method or approach. Hi-C’s power comes from two additions: enriching the chimeric DNA and deploying parallel deep sequencing. The first feature is accomplished by slightly changing the ligation step. You take the digestion products of 3C but make sure you use cutters that leave an overhang. You then fill in the overhangs with dNTPs, one of which is biotinylated. The chimeric DNA strands can then be enriched by pulling them down with magnetic streptavidin beads and the resulting mix is finally analysed, using massively parallel deep sequencing.

Job Dekker's group at the University of Massachusetts Medical School pioneered the mapping of spatial genome organisation with Chromosome Conformation Capture methods, such as 3C, 5C and Hi-C. Photo: CSHL

Hi-C has been around for some eight years now, following Job Dekker’s original Science paper (326, 289-93), and has given a host of new insights into the 3D organisation of the chromosomes in both health and disease. Is it easy to use? No – no 3C method is easy. They are complex sequences of steps, each of which has a lot of potential to go wrong. The concentration of DNA in the ligation step, for example, has to be just right. Too low and not enough ligation happens; too high and you get spurious self-ligations. Dekker has published a comprehensive practical guide to Hi-C (Methods 58, 268-76), which, on the one hand, provides a guide through all the potential pitfalls, but at the same time shows just how many of those pitfalls there are. Even when you get to the end of all the wet work, you still have some pretty complex analysis to do. Thankfully, that is getting slightly easier as new programmes come on line, with the inevitable amusing acronyms (HiCUP, HIPPIE, GOTHIC, HiFive, for example). Neither is Hi-C the best answer to all 3C questions. Being genome-wide, it would be overkill, if you are only interested in a handful of loci, because the signal of interest could well be swamped out.

Technical limitations

There are also technical limitations, many of the worst kind, that don’t bring your experiment to a grinding halt but instead hide their damage in the form of some undetected bias. For example, restriction enzyme sites are scattered unevenly through the genome, introducing bias at this crucial step. Again, although Hi-C scales 3C up to the genome level, the number of interactions between pairs increases exponentially with genome size, so this can become a problem with very large genomes.

For all these technical limitations, there will be some kind of Hi-C variant designed to get around the problem. But choosing which strategy will work for you without introducing new biases is not for the faint-hearted. After all, Chromosome Conformation Capture has always been a big-project, high commitment technique.

Last Changed: 28.11.2017

Information 4

Information 5

Information 6