I’ve always been fond of numbers. My father, who was a math teacher, fostered this appreciation. He’d always give my sisters and me sums to do as we drove along in the car to see who could answer the quickest and who could do the most difficult sums. My love of numbers continued though my studies where I took courses in Maths and Accounting. In the end I pursued my love of design as a career. And now, my affinity for numbers and design has come together in a fascinating project that explores if, and how, user’s perceptions of ratings change based on whether they are depicted in numbers or stars.
The difference between numbers and ratings
I began the project by researching the best rating system for users. I considered single buttons (akin to Facebook’s ‘Like’ button); a two-button thumbs up or thumbs down option; and a star-rating system that uses four, five, seven or nine stars to define numerical ratings out of 10 or 100.
We then ran multiple user testing sessions to understand how users respond to numbers and how this might impact restaurants.
To me, numbers are precise. Three is always equal to three. Ten is always equal to ten. And so it would follow suit that a rating of three out of five would be the same as a rating of six out of 10, or 60%, but our testing revealed that’s not the case. When it comes to ratings, numbers mean different things to different people. Different forms of ratings affect a user’s perception of how good or bad a rating is, and where the neutral point of a rating lies.
Understanding how our users rate restaurants
All the ratings given to restaurants on OpenTable are from users who have booked through our site and dined there, so the reviews and ratings are a fair reflection of the restaurant — there are no fake reviews. Each user, post-dining, is asked to rate and review the food, service, ambiance, and overall experience on a scale of one to five stars (one being poor, five being outstanding). We use the cumulative results to display the rating of the restaurant, in stars, on our website and mobile apps.
About 85% of restaurants have a rating of between four and five stars, making it easy to find a good quality restaurant. The ratings for top-rated restaurants are all very similar. We don’t round the ratings up and down as you might find on other sites, because we want to be true to the rating the restaurant has received.
Defining hypotheses to test against
Like many problems, there are far more things to consider than you may initially think. We listed out a set of hypotheses around the potential impact of changing the rating, which in turn generated questions:
By rounding the ratings, we’ll introduce clearer differences and allow users to make better decisions about where to dine. Does seeing only part of a star filled in aid decision making? Should we round stars to the nearest half-star or quarter-star for easier distinction?
By making it easier for users to see a rating, we’ll help them make better judgements on where they want to dine. If we double ratings will finer granularity lead to better recommendations and happier users?
By using a 10-point scale, we’ll have enough variation to alleviate the need for numbers with decimal places. Will removing decimal numbers help users scan the ratings more quickly?
Using a numerical value will increase the amount of time it takes users to evaluate if a restaurant is a good fit for them. Will users be able to make finer distinctions between ratings? Will the additional expressiveness just produce clutter on the page or will it look cleaner and easier to scan?
Understanding the results
We are testing a number of variants on OpenTable.com to see how we can improve the experience. On one test we’ve changed stars to a numerical value out of five. On another test we’ve changed stars to a numerical value to out of 10.
These tests seemed very standard — until we received multiple comments similar to this one:
“I would consider booking a three-star or 3.0/5 restaurant, but not 6.0/10, even though I know they are the same.”
We probed a little further to understand what happens to users’ perceptions when we double ratings. Some users consider three stars or 3.0 as a mid-point, but when asked for the corollary for a scale of 10, they often said 5.0, which isn’t a simple doubling of three. Some users said they might consider a restaurant with three stars but, on a 10-point scale, would only consider restaurants with a seven+ rating.
Our tests showed that when people make decisions, they base it on the relative ratings they see. If, for instance, you were shown 10 restaurants with varying ratings from one to 10, you might consider five to be the minimum you’d consider booking. If, however you are shown the same 10 restaurants, but each had a rating of five or above, your minimum rating would change.
When it came to testing the speed in which users were able to browse a list and make a decision, nearly all users preferred stars over numbers. Seeing the number of stars filled alleviates any confusion of scale.
Additional considerations
Does the fact that people’s judgments might change based on a five- or 10-point scale matter? It’s true that ratings are important, but they’re not the only thing that diners consider when booking a restaurant. They also read reviews, look at the menu and photos before making a decision.
What’s next?
We’ll keep testing to understand the effect of stars, as well as five- and 10-point rating systems to find an ideal solution for our users.
Edited by Jennifer Bader