Protein structural diversity encompasses a finite set of architectural designs. Embedded in these topologies are evolutionary histories that we here uncover using cladistic principles and measurements of protein-fold usage and sharing. The reconstructed phylogenies are inherently rooted and depict histories of protein and proteome diversification. Proteome phylogenies showed two monophyletic sister-groups delimiting Bacteria and Archaea, and a topology rooted in Eucarya. This suggests three dramatic evolutionary events and a common ancestor with a eukaryotic-like, gene-rich, and relatively modern organization. Conversely, a general phylogeny of protein architectures showed that structural classes of globular proteins appeared early in evolution and in defined order, the α/β class being the first. Although most ancestral folds shared a common architecture of barrels or interleaved β-sheets and α-helices, many were clearly derived, such as polyhedral folds in the all-α class and β-sandwiches, β-propellers, and β-prisms in all-β proteins. We also describe transformation pathways of architectures that are prevalently used in nature. For example, β-barrels with increased curl and stagger were favored evolutionary outcomes in the all-β class. Interestingly, we found cases where structural change followed the α-to-β tendency uncovered in the tree of architectures. Lastly, we traced the total number of enzymatic functions associated with folds in the trees and show that there is a general link between structure and enzymatic function.
ASJC Scopus subject areas