Criticism, Video Games, and the Quest for More Data

Being graded is generally not a fun experience. If as a student you’ve ever done an art project or written an English paper, then you’ve probably had the experience of having one of your creative efforts dissected, examined, and then branded with an arbitrary alphanumeric grade. If your experience was like mine, a wholly subjective expression of your thoughts and emotions was relabeled with a “check” or “satisfactory”, or some other, similarly meaningless accolade—and I’ll bet it happened with all the empathy and ceremony of a life insurance actuary placing a dollar value on a human life.

Remarkably, despite most of us having shared this soul-deadening experience in our youth, as adults we don’t hesitate to do it to other people all the time. Every day we find ourselves criticizing someone else’s creative output, be it movies, music, theater, or restaurants. And don’t even get me started if you’re a teacher… all you people do all day is grade other people’s ideas. It’s sick.

Unfortunately, on some level we need all this criticism. Imagine you’re trying to impress a hot girl on a first date… it’s a big world out there, and without restaurant and movie reviews, dinner and a movie could end up being Olive Garden and John Carter. That’s why I try to do my part in the video game world by mixing a few spirited reviews in with the rest of my inane ramblings. However, at all times I try to remain conscious of the dangers of attaching a numerical quantity to an inherently complex and subjective experience.

Originally, I thought the easiest way to mitigate grading errors would be to write my reviews using a symbolic representation of my ideas, a.k.a. “words”. Then, even if I got my review grade “wrong”, someone could read these “words” to see what I really thought about the game. But then I realized: who the hell wants to have to read things just to get information about stuff? No one, that’s who.

DANG scores from my review
of Saints Row: The Third

So then I thought, maybe we just need more numbers. That’s how I came up with the DANG rubric I typically use in my reviews, wherein I rate each game’s Design, Artistic, Narrative, and Gameplay elements separately. Unfortunately, instead of making my review scores more reliable, this practice probably just makes them approximately four times more arbitrary. On the other hand, it does give my reviews with authoritative voice and pseudo-scientific air of your typical USA Today infographic.

However, since I’m (1) an applied mathematician, (2) not typically one to give up, and (3) generally incapable of outside-the-box thinking, I decided that the answer must be even more numbers. That’s why during my recent slog through Square Enix’s Final Fantasy XIII-2, I decided to record a separate rating for every hour of gameplay.

Figure 1: My rating versus hours of gameplay for Final Fantasy XIII-2.

Figure 1 is a chart of my ratings for the first (and probably last) 40 hours I played the game. Each bar represents my impression of FFXIII-2 for the corresponding hour that I played it, on a scale of one to five stars.

While to the untrained eye Figure 1 may look like an uninformative queue of green french fries, to me it’s dramatic recap of my FFXIII-2 gaming experience, told through the sultry interpretive dance of the bar graph. I recall the game getting off to an unimpressive start, picking up a bit here and there over the first 10 hours. Just as I was beginning to feel optimistic, I entered the 2-star valley from 9:00-12:00, a desperate little trench dug built of inane side quests and my first trip to “Academia”, a location in the game with pitifully weak enemies but a painfully high encounter rate. (The irony of how much I hated this “Academia” was not lost on me.)

The game picks up a bit again in the third quarter, as by that point you’ve earned a bit of freedom, and your characters have finally leveled up enough to withstand a stiff breeze without giving up the ghost. After this, we slowly lumber to the game’s finale, which was tolerable enough to earn one last 4-star rating.  (The final 2-star entry corresponds to the extra hour wherein I attempted a few post-ending side quests, which succeeded in quickly cementing my decision to never play this game again.)

However, let us return to my original quandary re grading: does Figure 1 gives a more reliable indication of my opinion of Final Fantasy XIII-2’s quality? For example, is it a better representation than a single score? I’ll admit it’s a bit more cumbersome than a few simple stars, but I kind of like the idea of a review having a time series of ratings… a “score vector”, if you will.

If you like, I suppose you can always distill the data into a single number; for instance, the arithmetic mean of the points is around 3.07, and I suppose 3 stars is probably what I would give this game in a review. Incidentally, if you take into account that the standard deviation is about 0.67, it seems pretty unlikely that I’m too confused about my opinion (e.g. that deep down I really think Final Fantasy XIII-2 is a 1-star or a 5-star game.)

Figure 2: A histogram of the scores, along with a normal (Gaussian) distribution with the same mean and standard deviation. (For you stats folks out there, I’m aware of the issues of using a continuous distribution to represent data over a discrete sample space, but this is a video game blog, so let’s not get too uptight about it.)

On the other hand, if you can resist the temptation to just take an average and be done with it, I believe there’s a lot more interesting things you can do with this data.1   For example, if you collected it across multiple people, you could get an idea of how much their opinions varied at different times in the game.  (With its “controversial” endings, I wonder what the distribution for the last few hours of Mass Effect 3 would look like.) Or, you could develop some statistical models (such as a hidden Markov model) for how the quality of video games evolve, and then try to use them to predict how long you could play a given game before you started to hate it. (Maybe we’ll also do some principal component analysis for the hell of it, because I really start to lose my shit if I go more than 24 hours without computing the eigenvalue of something.)

In the meantime, I think I’m going to keep recording these hourly ratings and other kinds of data as I play, and continue thinking about fun and unnecessarily convoluted ways to use them. If you have any interesting ideas, I’d love to hear them.

I’m aware that some people believe one should write “these data” instead of “this data”. I’m also aware that some people are pretentious assholes. (Although I’m careful never to say “your dildo“.)

What Do You Think?