feeling cute, might delete later:
an analysis of r/rateme
Introduction
Almost universally, our vanity will occasionally get the best of us, and we will wonder: “How attractive am I, really?”
Having entered the 21st century, we no longer rely on archaic methods like asking a Magic Mirror. Today, instead of asking “who is the fairest of them all”, Queen Grimhelde might ask something more along the lines of
And instead of a talking mirror, dozens of anonymous internet users offer their collective opinions. This is the premise of the r/rateme subreddit, which, since its inception in 2009, has accumulated close to 150,000 members.
Asking for feedback is difficult, especially on a topic so sensitive to so many people. We wanted to take a deep dive into this particular internet community and see what we can learn about who uses r/rateme, how fair it actually is, and what it can tell us about feedback-seeking interactions on the internet.
What is r/rateme?
We used the pushshift.io API to query for posts and comments made on r/rateme since January 1st, 2014. This resulted in close to 218,000 posts and over 1.7 million comments. Per subreddit rules, posts need to have gender and age information, which we used to understand the demographic distribution of users. In processing the post titles of each data point, a regex expression was used to determine the age and gender of the poster, as both are required pieces of information for post titles. Any posts with an extracted age less than 18 or greater than 50 were not included in our dataset. Also, comments authored by the AutoModerator were also dropped from the dataset. On top of this, all posts with under five comments were dropped from any exploratory data analysis.
The large majority of users are young people, ages 18 - 22. However, there is a small community of people in the hundreds up to age 50. The distribution of ages was very exponential, with ages ranged from 20 to 60 essentially linearly decreasing in frequency.
Rating and Age
The average rating was relatively consistent across all ages. As the age increased, the number of posters decreased, so our average was more variable. It is noticeable that average ratings drop a bit past 25 years old.
Rating and Gender
Approximately 73.3% of posters self identified as male, and 26.7% identified as female (according to the tags on their post titles).
The ratings on male posts are more spread out, with a greater percentage of posts on both the higher and lower end.
Females, on the other hand, have ratings centered at the higher end, with fewer posts on the extreme ends of the spectrum.
Ratings and Scores
Scores are the number of upvotes a post received -- this can be interpreted as community support or approval of a post. Rating is the average "attractiveness" rating we scraped from the comments of each post.
In general, the highest scoring posts tend to have average ratings that center around the mean. Highly rated posts, on the other hand, don’t have a high score, likely because if they have a higher score, the average rating would grow closer to the mean (since there are more comments on more popular posts).
The score on males posts on average tend to score (have fewer upvotes) on the lower end of the spectrum. Females on the other hand have higher scores and average ratings overall.
There is a group of relatively high score posts with low ratings which we found to be mostly posts of people showing off their progress as they work to get healthier. This supports our assumption that scores are more indicative of support than ratings are.
We also plotted out how the frequency of posts per day changed over the course of the subreddit’s life. Overall, r/rateme has remained relatively stable in the amount of content posted everyday. There are few sharp spikes, and one huge outlier on November 17th 2014. We were unable to find why this date was so significant.
Language of a RATEME POST
Common Words and Phrases in Post Titles
Before we go into examining feedback-seeking language, we looked at common words and phrases used in post titles. We extracted the most common two-word phrases in titles, and found the most popular to be:
Two-word Phrase: Count
“rate me”: 8430
“be honest”: 3735
“just curious”: 2809
“looking for”: 2688
“please rate”: 1390
The phrase “rate me” makes sense to be the most common two-word phrase, but the phrase “be honest” as a second is much more interesting. Scouring the list, the word “honest,” while not as popular as “rate” is still prevalent, with phrases such as “honest opinions” and “rate honestly” also having popular counts with 640 and 396 occurrences respectively. Upon first just looking at r/rateme for reference, “honest” stuck out as a popular word, which is verified by this analysis.
Emojis
We also looked at emojis in post titles. To do this, we used the Python emoji
library. Specifically, we extracted the most popular single character and multi-character emojis used in titles.
Notably, certain emojis, such as 🤷♂️(Man Shrugging), are programmed to display as a single emoji, but are actually a combination of multiple emojis, in this case: 🤷🏻♂(Shrug + Male Sign). Outside of combination emojis, the most popular use of multiple emojis were repeated 😂, the most popular singular emoji, rather than a combination of different emojis to convey something else (e.g. 🎁 🎂 🎈, though this specific example isn’t on theme for r/rateme).
Feedback Seeking Language
Post Titles and Ratings
We first took a look at which phrases in post titles were most highly correlated with high and low ratings. We construct a linear classifier which takes as phrases of 3 or 4 words which occur in five or more posts. After normalizing by subtracting the mean and dividing by the standard deviation of the ratings, we define a positive rating as a normalized rating greater than zero and a negative rating as a normalized rating less than or equal to zero.
See the table below for the most positively and negatively correlated phrases and their respective weights. No strong patterns emerged, so we decided it might be worth going a step further in terms of granularity and examining how posts interacted with individual comments.
Post-Comment Interactions
We used a projection method from Zhang, et al. 2017 to map common post title phrases into a space of common comment phrases. Given these post and comment phrase vectors, we can see which comment phrases are most frequently used in response to certain phrase vectors. We performed some clustering and you can see how the post phrases and comment phrases match up. For example, posts with phrases about hair in the title tend to have comments about “pixie cuts” or length of hair.
With this model, we’re able to see how people in the comments respond to certain types of posts. Let’s revisit some of the trends we saw from topic mining and qualitative observations.
Indifference
The first class of posts we examine are the indifferent ones: posts with titles like “just curious” or “bored”. We select all post phrases which contain the keywords bored, wondering, curious. When we look at the most closely associated comment phrases, they are overwhelmingly positive. This suggests maybe there is a warranted sense of confidence behind posts of indifference -- the “feeling cute, might delete later” effect.
Honest Advice Seeking
These are posts which ask for constructive criticism and honest opinions. First we looked at post phrases containing keywords criticism and honest. Qualitatively speaking, these tend to also invoke generally positive comments. We also looked at post phrases containing the keyword unsure. Interestingly, many of the comment phrases close to those posts refer to specific pictures -- for example, “top left and bottom middle” or “top left and bottom right picture”. This points to the idea that one advice-giving strategy to reduce uncertainty in advice seekers is being specific about the target of advice or commentary (e.g. I like your hair vs I like your hair in the second photo from the left).
Support Seeking
The last group we looked at were the support seekers -- these posts display more vulnerability. We first looked at the set of post phrases containing the following keywords: struggling, esteem, mental, depression, depressed, anxious, anxiety. While no clear themes emerge from the comments, it’s apparent the associated comment phrases contain a tonal shift from the previous two groups.
We then zoomed in on one type of support seeking: posts about weight. These come in two types, characterized by two examples: progress post (e.g. “Recently lost a bunch of weight”) as opposed to more of a “current status” post (e.g. “Recently gained some weight”). For this analysis, we considered these two types together as we examined post/comment relationships for post phrases which included keywords weight and fat. We found comments to be not just positive but also encouraging. Posts about weight tend to garner lots of support in the comments, which tracks with what we saw in our ratings analysis: we found progress posts to have high scores with relatively low ratings.