Aerobraking is a process that uses atmospheric drag for beneficial orbit changes through multiple passes into the atmosphere of a planet. However, aerobraking with current technology is operationally intensive, requiring constant supervision by a ground team for 2-11 months. Autonomous aerobraking would abate the operational costs and improve the mission performance, freeing the mission from the human ground cost and potential errors. A parallel simulation-based deep q-learning architecture for aerobraking maneuver planning and decision-making purposes is developed to improve aerobraking autonomy. A directional exploration method is proposed, which takes advantage of the partially observable environment. A three-dimensional reward function expressed in terms of apoapsis radius, heat rate, and action provides a stable learning process. This deep reinforcement learning approach development represents a first step towards a fully autonomous, on board aerobraking capability. Results in terms of learning capabilities and reward are presented to demonstrate the technology using Mars Odyssey mission Endgame data as baseline. The learning algorithm results are also compared with a state-of-the-art heuristic for autonomous aerobraking. Results show that with a 6 m/s increase in the ΔV budget with respect to Mars Odyssey, the deep reinforcement learning approach shortened the aerobraking time by 68.3%. Also, the learned algorithm is able to make proactive and robust decisions in extremely aggressive environment conditions, specifically where a ground station approach is infeasible; results report that the DRL algorithm does not encounter any thermal violations over 40 episodes compared to the 2.8 average thermal violations experienced by the state-of-the-art heuristic.