Credit: Scott Eisen/AP Images for HHMI

Feng Zhang, a molecular biologist at the Massachusetts Institute of Technology, has always liked taking things apart and putting them back together. Since stumbling across the prokaryotic CRISPR-Cas gene-editing system in 2011, he’s found a seemingly endless source of biological fodder to tinker with.

Already, Zhang and his colleagues and competitors have generated a suite of CRISPR tools that can be used to delete genes, to swap DNA letters with precision, to insert specific sequences into the genome, to modify the epigenome and to modulate RNA — albeit with various limitations. Several of these are already in and approaching the clinic, in both in vivo and ex vivo applications. But this CRISPR toolkit may just be the tip of the iceberg. Reporting in Science this month, Zhang showed that IscB, TnpB and related protein families — with millions of members, across prokaryotes and eukaryotes — offer similar reprogrammable capabilities.

For Zhang, these findings are another reminder of the untapped potential of understudied and as-yet undiscovered reprogrammable systems. By mining this bounty of natural diversity, he hopes to be able to develop even more basic research tools, drug delivery vectors and therapeutic modalities.

Let’s start with IscB proteins. How did you find new gene editors hiding in plain sight?

We started out with Cas9, and have developed a number of different orthologs of this protein for genome editing. And this started us thinking that it would be nice to really look at the bigger diversity picture. So we looked at all of the bacterial sequences and pulled out everything that contained a CAS9 nuclease domain. A number of very small proteins came up — the IscB-like proteins.

One of our collaborators, Eugene Koonin, had found IscB proteins several years back, but we didn't know how they functioned. And we thought that it would be interesting to see if we can we figure out how they work.

One of the things that gave us a clue is that we found a number of very small IscB proteins next to CRISPR arrays [the conserved sequences that are transcribed into precursor CRISPR RNA]. That led us to hypothesize that maybe there's a link between the systems. We know how to study CRISPR systems, and so we started to try to figure out whether there are RNAs that are expressed with these IscB proteins. And indeed, there is a long non-coding RNA that is expressed alongside these proteins. We’ve called these ωRNA. And this helped us realize that these IscB and related genes are RNA-guided systems.

What do these add to the CRISPR toolbox?

One of the powerful things about CRISPR is that it is programmable: you can give it RNA and then it will find the exact DNA sequence you want. Of course the CAS nuclease is very useful, but you can also use this programmable approach to modulate chromatin, to edit DNA bases, to do prime editing.

But CAS9 evolved as a bacterial defence system. And as it was optimized for recognizing and cleaving DNA, it became very large. If you were only trying to take advantage of the reprogrammability, you may not need all the different bulky aspects of CAS9.

IscB proteins provide almost the bare minimum that you need for a programmable domain. [CRISPR-CAS9s are 800–1200 amino acids, whereas IcsB proteins are ~400].

So we can take this very minimal system and augment it in the way that we want to augment it. The size definitely makes it much easier to deliver for various applications. But also, I think that these proteins offer a really nice template for engineering. They are more exposed, and you can attach things at different places to generate new functionalities.

Such as?

There are always more capabilities that we want to have. Existing tools don’t really allow us to do larger manipulations very well, to precisely delete large fragments of DNA, to insert large fragments of DNA, to rewrite whole sections of a chromosome, or to rearrange the conformation of a chromosome. Through engineering these systems, it may be possible to create these new capabilities.

How will this play into the future of therapeutic gene editing?

What we're really talking about is genetic medicine, enabling the repair of things that have gone awry in the cell through changes to the DNA or the transcriptome. To do that, we need two things: we need the molecules that can make the changes we want, and we need delivery systems to put these tools into the right cells efficiently and safely.

One of the really exciting directions we are pursuing is continuing to develop that modular toolbox. Right now we have some tools that can knock out genes and that can make small edits in the DNA, or turn genes on and off. But there are many other things we want to do more effectively. How do we multiplex more? A lot of diseases are caused by complex genetic interactions, so how do we modulate multiple genes to tune their activity? How do we edit multiple mutations? Those are challenges that require new tools.

In parallel, delivery capabilities need to be advanced significantly from where they are currently.

Do you think delivery gets overlooked in the excitement of gene editing?

I don't think it is overlooked. But it's a tough problem to crack. There are many different tissues and cell types. And some tissues are much larger than others. We have lots of muscle cells, for example, whereas the eye is much smaller. We need different delivery approaches depending on where and what cell types we're trying to target. There's a lot of work ahead.

On the biology side, we need systems that can target and access tissues that we can’t reach right now. When people do systemic delivery, the cargo typically goes to the liver. That presents a challenge, because if you are trying to target another tissue, you are going to saturate the liver first, causing liver toxicity. We need to understand how to develop systems that go to other tissues or that can be delivered locally. We also need to understand the biological mechanisms of how to avoid bioaccumulation in the liver.

And then there are the more technical aspects. We need to make sure that the vectors manufactured are high quality. There can be huge variations from batch to batch in terms of how many empty vectors there are, for example. If only 10% of your vectors are carrying your therapeutic payload, that's going to be problematic. This sort of thing needs to be addressed in the manufacturing.

Another recent Science paper of yours introduced the PEG10 delivery system. How does this advance the delivery field?

We have been very interested in delivery technologies, and one of our approaches has been to look at proteins that are made by the human body that have the ability to package and transfer nucleic acids to other cells. It turns out that our ancestors acquired genes from virus-like elements that can form virus-like structures to perform these functions.

We still don't fully understand how many of these work. So we studied one of these proteins called PEG10, using it to package mRNA so that we can use this endogenous human protein system to deliver mRNA to cells. I'm very excited about this direction. Using human proteins for delivery should reduce a lot of issues surrounding immunogenicity. This can potentially provide a safer and also repeatedly doseable delivery solution.

What’s the carrying capacity of this system?

The natural RNA that they package is 7–8 kb. I think it is going to be comparable to lentiviruses, maybe around 10 kb. [Adeno-associated viruses, the workhorse of most in vivo gene-therapy candidates, have a capacity of around 5 kb.]

The safety profile of exogenous AAVs remains a closely watched topic. But what about the risks that vectors such as PEG10 might trigger immune responses to endogenous proteins?

That is certainly one of the things that we need to study and characterize more. It is still very early days for this technology.

Do you see this work as another case study for repurposing programmable proteins?

Yes, there is a whole family of proteins here that is exciting. We focused on PEG10, but we also showed that a number of other proteins are also secreted by cells and they can form these virus-like structures. And these are made by different tissues in the body, so they might offer different advantages for different tissue types and diseases.

Now, we don't know yet where PEG10 goes naturally. Scientists showed that it is important for placental development around 10 years ago. But for its function in the adult body, we need to study it more.

What's interesting is that there are lots of these really mysterious proteins that people haven't studied much, both from the human body and from the biodiversity on this planet. That's something that really fascinates us, and most of the work in my lab is focused on trying to understand these mysterious proteins and figure out what they do.

How do you approach this?

You start with a question. One of the questions that we frame our searches around is: what are programmable systems that nature has evolved and engineered over time?

There are various aspects to programmability. Something that is RNA-guided is programmable, because you can give it a new RNA and it will go somewhere else. Proteins that are modular, with domains that can be swapped around, are also programmable. Anything that is modular, there is programmability in that. So we come up with evolutionary hypotheses of what features might signal modularity and then we look for them in bacterial and eukaryotic sequences. Sometimes it is regions of a gene that are hypermutated, or that are co-evolving with pieces of RNA. Those are some examples of how we have started to take a more systematic approach.

That’s on the search side of things. How about on the function side?

The two aspects of this really go hand in hand, because we use the computational analysis to guide our hypotheses and to design experiments to figure out functions. And then the experimental data sometimes feeds back onto our search.

So for example, with IscB, experimental work showed that there was RNA that was co-expressed with these proteins. That allowed us to then take that RNA sequence, go back to the database and figure out where the RNAs are hypervariable and where they are conserved. And that helped us to figure out the guide sequence. It's an iterative process.

Beyond genome editing and delivery vectors, what other programmable functions do you hope to find?

One way that I think about biology is that biological systems are made of polymers. DNA is a polymer. RNA is a polymer. Proteins are polymers. And complex carbohydrates are polymers. There are certain repetitions in the way that all of these exist. Modular systems that can be easily tweaked to recognize variations in those repetitions are very interesting to us. And we try to find modular systems that can recognize and interact with polymers in critical, predictable ways.

Do you expect these to have the same therapeutic potential as CRISPR?

First and foremost, we are just fascinated to see what diversity may be out there. All of these systems will be interesting in their own right. And some of them, certainly, I think will have therapeutic potential.

Did you expect your work to go in this direction when you started studying CRISPR?

My interest has always been on trying to develop something that is useful. By developing better technologies that people can use to understand biology or treat diseases, we can enable more people and have more impact. But when it comes to the specific work on any technology, I am surprised by how much biology we're doing — to try to understand microbial organisms or enzymatic processes, for example.

That is part of the joy, because you have to let your exploration take you wherever it leads. And you have to understand the biology in order to engineer it. If you don't understand the biology, it's very hard to tweak it and tinker with it.