Eventually-consistent key-value storage systems sacrifice the ACID semantics of conventional databases to achieve superior latency and availability. However, this means that client applications, and hence end-users, can be exposed to stale data. The degree of staleness observed depends on various tuning knobs set by application developers (customers of key-value stores) and system administrators (providers of key-value stores). Both parties must be cognizant of how these tuning knobs affect the consistency observed by client applications in the interest of both providing the best end-user experience and maximizing revenues for storage providers. Quantifying consistency in a meaningful way is a critical step toward both understanding what clients actually observe, and supporting consistency-aware service level agreements (SLAs) in next generation storage systems. This paper proposes a novel consistency metric called Gamma that captures client-observed consistency. This metric provides quantitative answers to questions regarding observed consistency anomalies, such as how often they occur and how bad they are when they do occur. We argue that Gamma is more useful and accurate than existing metrics. We also apply Gamma to benchmark the popular Cassandra key-value store. Our experiments demonstrate that Gamma is sensitive to both the workload and client-level tuning knobs, and is preferable to existing techniques which focus on worst-case behavior.