Sorting items by user rating is a fundamental interaction pattern of the modern Web, used to rank products (Amazon), posts (Reddit), businesses (Yelp), movies (YouTube), and more. To implement this pattern, designers must take in a distribution of ratings for each item and define a sensible total ordering over them. This is a challenging problem, since each distribution is drawn from a distinct sample population, rendering the most straightforward method of sorting — comparing averages — unreliable when the samples are small or of different sizes. Several statistical orderings for binary ratings have been proposed in the literature (e.g., based on the Wilson score, or Laplace smoothing), each attempting to account for the uncertainty introduced by sampling. In this paper, we study this uncertainty through the lens of human perception, and ask “How do people sort by ratings?” In an online study, we collected 48,000 item-ranking pairs from 4,000 crowd workers along with 4,800 rationales, and analyzed the results to understand how users make decisions when comparing rated items. Our results shed light on the cognitive models users employ to choose between rating distributions, which sorts of comparisons are most contentious, and how the presentation of rating information affects users’ preferences.