Subgraph enumeration is an important problem in the field of Graph Analytics with numerous applications. The problem is provably NP-complete and requires sophisticated heuristics and highly efficient implementations to be feasible on problem sizes of realistic scales. Parallel solutions have shown a lot of promise on CPUs and distributed environments. Recently, GPU-based parallel solutions have also been proposed to take advantage of the massive execution resources in modern GPUs. Subgraph enumeration involves traversing a search tree for each vertex of the data graph to find matches of a query in a graph. Most GPU-based solutions traverse the tree in breadth-first manner that exploits parallelism at the cost of high memory requirement and presents a formidable challenge for processing large graphs with high-degree vertices since the memory capacity of GPUs is significantly lower than that of CPUs. In this work, we propose a novel GPU solution based on a hybrid BFS and DFS approach where the top level(s) of the search trees are traversed in a fully parallel, breadth-first manner while each subtree is traversed in a more space-efficient, depth-first manner. The depth-first traversal of subtrees requires less memory but presents more challenges for parallel execution. To overcome the less parallel nature of depth-first traversal, we exploit fine-grained parallelism in each step of the depth-first traversal of sub-trees. We further identify and implement various optimizations to efficiently utilize memory and compute resources of the GPUs. We evaluate our performance in comparison with the state-of-the-art GPU and CPU implementations. We outperform the GPU and CPU implementations with a geometric mean speedup of 9.47× (up to 92.01×) and 2.37× (up to 12.70×), respectively. We also show that the proposed approach can efficiently process the graphs that previously cannot be processed by the state-of-the-art GPU solutions due to their excessive memory requirement.