Measuring node proximity on large scale networks is a fundamental building block in many application domains, ranging from computer vision, e-commerce, social networks, software engineering, disaster management to biology and epidemiology. The state of the art (e.g., random walk based methods) typically assumes the input network is given a priori, with the known network topology and the associated edge weights. A few recent works aim to further infer the optimal edge weights based on the side information. This paper generalizes the challenge in multiple dimensions, aiming to learn optimal networks for node proximity measures. First (optimization scope), our proposed formulation explores a much larger parameter space, so that it is able to simultaneously infer the optimal network topology and the associated edge weights. This is important as a noisy or missing edge could greatly mislead the network node proximity measures. Second (optimization granularity), while all the existing works assume one common optimal network, be it given as the input or learned by the algorithms, exists for all queries, our method performs optimization at a much finer granularity, essentially being able to infer an optimal network that is specific to a given query. Third (optimization efficiency), we carefully design our algorithms with a linear complexity wrt the neighborhood size of the user preference set. We perform extensive empirical evaluations on a diverse set of 10+ real networks, which show that the proposed algorithms (1) consistently outperform the existing methods on all six commonly used metrics; (2) empirically scale sub-linearly to billion-scale networks and (3) respond in a fraction of a second.