Many complex systems with relational data can be naturally represented as dynamic processes on graphs, with the addition/deletion of nodes and edges over time. For such graphs, network embedding provides an important class of tools for leveraging the node proximity to learn a low-dimensional representation before using the off-the-shelf machine learning models. However, for dynamic graphs, most, if not all, embedding approaches rely on various hyper-parameters to extract spatial and temporal context information, which differ from task to task and from data to data. Besides, many regulated industries (e.g., finance, health care) require the learning models to be interpretable and the output results to meet compliance. Therefore, a natural research question is how we can jointly model the spatial and temporal context information and learn a unique network representation, while being able to provide interpretable inference over the observed data. To address this question, we propose a generic graph attention neural mechanism named STANE, which guides the context sampling process to focus on the crucial part of the data. Moreover, to interpret the network embedding results, STANE enables the end users to investigate the graph context distributions along three dimensions (i.e., nodes, training window length, and time). We perform extensive experiments regarding quantitative evaluation and case studies, which demonstrate the effectiveness and interpretability of STANE.