Many-core accelerators, e.g. GPUs, are widely used for ac- celerating general-purpose compute kernels. With the SIMT execution model, GPUs can hide memory latency through massive multithreading for many regular applications. To support more applications with irregular memory access pat- Tern, cache hierarchy is introduced to GPU architecture to capture input data sharing and mitigate the effect of irreg- ular accesses. However, GPU caches suffer from poor effi- ciency due to severe contention, which makes it difficult to adopt heuristic management policies, and also limits system performance and energy-efficiency. We propose an adaptive cache management policy specifi- cally for many-core accelerators. The tag array of L2 cache is enhanced with extra bits to track memory access history, an thus the locality information is captured and provided to L1 cache as heuristics to guide its run- Time bypass and inser- Tion decisions. By preventing un-reused data from polluting the cache and alleviating contention, cache efficiency is sig- nificantly improved. As a result, the system performance is improved by 31% on average for cache sensitive benchmarks, compared to the baseline GPU architecture.