System failures are expected to be frequent in the exascale era such as current Petascale systems. The health of such systems is usually determined from challenging analysis of large amounts of unstructured & redundant log data. In this paper, we leverage log data and propose Clairvoyant, a novel self-supervised (i.e., no labels needed) model to predict node failures in HPC systems based on a recent deep learning approach called transformer-decoder and the self-attention mechanism. Clairvoyant predicts node failures by (i) predicting a sequence of log events and then (ii) identifying if a failure is a part of that sequence. We carefully evaluate Clairvoyant and another state-of-the-art failure prediction approach-Desh, based on two real-world system log datasets. Experiments show that Clairvoyant is significantly better: e.g., it can predict node failures with an average Bleu, Rouge, and MCC scores of 0.90, 0.78, and 0.65 respectively while Desh scores only 0.58, 0.58, and 0.25. More importantly, this improvement is achieved with faster training and prediction time, with Clairvoyant being about 25X and 15X faster than Desh respectively.