Difference between revisions of "Cursing in English on Twitter"

From Knoesis wiki
Jump to: navigation, search
(Cursing vs. Emotion)
Line 5: Line 5:
 
==Introduction==
 
==Introduction==
 
Do you curse? Do you curse on social media? How often do you see people cursing on social media (e.g., Twitter)? Cursing, also called swearing, profanity, or bad language, is the use of certain words and phrases that are considered by some to be rude, impolite, offensive, obscene, or insulting <ref> [http://en.wikipedia.org/wiki/Profanity "Profanity - Wikipedia, the free encyclopedia"], ''Wikipedia'', March 2013</ref>. In this paper, we use cursing, profanity and swearing interchangeably. As Jay <ref name="The utility and ubiquity of taboo words">Jay, T. The utility and ubiquity of taboo words. Perspectives on Psychological Science 4, 2 (2009), 153–161.</ref> pointed out, cursing is a “rich emotional, psychological and sociocultural phenomenon”, which has attracted many researchers from related fields such as psychology, sociology, and linguistics <ref>Jay, T. Do offensive words harm people? Psychology, public policy, and law 15, 2 (2009), 81.</ref>
 
Do you curse? Do you curse on social media? How often do you see people cursing on social media (e.g., Twitter)? Cursing, also called swearing, profanity, or bad language, is the use of certain words and phrases that are considered by some to be rude, impolite, offensive, obscene, or insulting <ref> [http://en.wikipedia.org/wiki/Profanity "Profanity - Wikipedia, the free encyclopedia"], ''Wikipedia'', March 2013</ref>. In this paper, we use cursing, profanity and swearing interchangeably. As Jay <ref name="The utility and ubiquity of taboo words">Jay, T. The utility and ubiquity of taboo words. Perspectives on Psychological Science 4, 2 (2009), 153–161.</ref> pointed out, cursing is a “rich emotional, psychological and sociocultural phenomenon”, which has attracted many researchers from related fields such as psychology, sociology, and linguistics <ref>Jay, T. Do offensive words harm people? Psychology, public policy, and law 15, 2 (2009), 81.</ref>
<ref>Jay, T., and Janschewitz, K. The pragmatics of swearing. Journal of Politeness Research. Language, Behaviour, Culture 4, 2 (2008), 267–288.</ref>.
+
<ref name="The pragmatics of swearing">Jay, T., and Janschewitz, K. The pragmatics of swearing. Journal of Politeness Research. Language, Behaviour, Culture 4, 2 (2008), 267–288.</ref>.
  
 
Over the last decade, social media has become an integral part of our daily lives. According to the 2012 Pew Internet & American Life Project report <ref> [http://pewinternet.org/Commentary/2012/March/Pew-Internet-Social-Networking-full-detail.aspx "Pew Internet: Social Networking (full detail)"], ''PewResearch Internet Project'', February 2013</ref>, 69% of online adults use social media sites and the number is steadily increasing. Another Pew study in 2011 <ref> [http://pewinternet.org/Reports/2011/Teens-and-social-media/Summary/Findings.aspx "How American teens navigate the new world of “digital citizenship”"], ''PewResearch Internet Project'', November 2011.</ref> shows that 95% of all teens with ages 12-17 are now online and 80% of those online teens are users of social media sites. People post on these sites to share their daily activities, happenings, thoughts and feelings with their contacts, and keep up with close social ties, which makes social media both a valuable data source and a great target for various areas of research and practice, including the study of cursing. While the CSCW community has made great efforts to study various aspects (e.g., credibility <ref>Morris, M. R., Counts, S., Roseway, A., Hoff, A., and Schwarz, J. Tweeting is believing?: understanding microblog credibility perceptions. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, ACM (2012), 441–450.</ref>, privacy <ref>Almuhimedi, H., Wilson, S., Liu, B., Sadeh, N., and Acquisti, A. Tweets are forever: a large-scale quantitative analysis of deleted tweets. In Proceedings of the 2013 conference on Computer supported cooperative work, ACM (2013), 897–908.</ref>) of social networking and social media, our understanding of cursing on social media still remains very limited.
 
Over the last decade, social media has become an integral part of our daily lives. According to the 2012 Pew Internet & American Life Project report <ref> [http://pewinternet.org/Commentary/2012/March/Pew-Internet-Social-Networking-full-detail.aspx "Pew Internet: Social Networking (full detail)"], ''PewResearch Internet Project'', February 2013</ref>, 69% of online adults use social media sites and the number is steadily increasing. Another Pew study in 2011 <ref> [http://pewinternet.org/Reports/2011/Teens-and-social-media/Summary/Findings.aspx "How American teens navigate the new world of “digital citizenship”"], ''PewResearch Internet Project'', November 2011.</ref> shows that 95% of all teens with ages 12-17 are now online and 80% of those online teens are users of social media sites. People post on these sites to share their daily activities, happenings, thoughts and feelings with their contacts, and keep up with close social ties, which makes social media both a valuable data source and a great target for various areas of research and practice, including the study of cursing. While the CSCW community has made great efforts to study various aspects (e.g., credibility <ref>Morris, M. R., Counts, S., Roseway, A., Hoff, A., and Schwarz, J. Tweeting is believing?: understanding microblog credibility perceptions. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, ACM (2012), 441–450.</ref>, privacy <ref>Almuhimedi, H., Wilson, S., Liu, B., Sadeh, N., and Acquisti, A. Tweets are forever: a large-scale quantitative analysis of deleted tweets. In Proceedings of the 2013 conference on Computer supported cooperative work, ACM (2013), 897–908.</ref>) of social networking and social media, our understanding of cursing on social media still remains very limited.
Line 62: Line 62:
  
 
===Cursing vs. Emotion===
 
===Cursing vs. Emotion===
Psychology studies [8] suggest that “the main purpose of cursing is to express emotions, especially anger and frustration.” Thus, we aim to explore emotions expressed in cursing tweets and compare them with those in non-cursing tweets. We apply Machine Learning classifiers to the 51 million cursing tweets, and obtain the emotion distributions on both cursing and non-cursing tweets, which is shown in following Figure. Not surprisingly, cursing is associated with negative emotions: 21.83% and 16.79% of the cursing tweets express sadness and anger emotions, respectively. In contrast, 11.31% and 4.50% of the non-cursing tweets express sadness and anger emotions, respectively. This can be explained by the fact that curse words are usually used for venting out negative emotions, especially anger and sadness.
+
Psychology studies <ref name="The pragmatics of swearing"/> suggest that “the main purpose of cursing is to express emotions, especially anger and frustration.” Thus, we aim to explore emotions expressed in cursing tweets and compare them with those in non-cursing tweets. We apply Machine Learning classifiers to the 51 million cursing tweets, and obtain the emotion distributions on both cursing and non-cursing tweets, which is shown in following Figure. Not surprisingly, cursing is associated with negative emotions: 21.83% and 16.79% of the cursing tweets express sadness and anger emotions, respectively. In contrast, 11.31% and 4.50% of the non-cursing tweets express sadness and anger emotions, respectively. This can be explained by the fact that curse words are usually used for venting out negative emotions, especially anger and sadness.
 
[[File: EmotionRadar embed.png|thumb|left|alt=Emotion distributions in both cursing and non-cursing tweets.|Emotion distributions in both cursing and non-cursing tweets.]]
 
[[File: EmotionRadar embed.png|thumb|left|alt=Emotion distributions in both cursing and non-cursing tweets.|Emotion distributions in both cursing and non-cursing tweets.]]
  

Revision as of 06:39, 26 February 2014

Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth

Cursing is not uncommon during conversations in the physical world. On social media, people can instantly chat with friends without face-to-face interaction, usually in a more public fashion and broadly disseminated through highly connected social network. Will these distinctive features of social media lead to a change in people’s cursing behavior? In this paper, we examine the characteristics of cursing activity on a popular social media platform – Twitter, involving the analysis of about 51 million tweets and about 14 million users. In particular, we explore a set of questions that have been recognized as crucial for understanding cursing in offline communications by prior studies, including the ubiquity, utility, and contextual dependencies of cursing.

Introduction

Do you curse? Do you curse on social media? How often do you see people cursing on social media (e.g., Twitter)? Cursing, also called swearing, profanity, or bad language, is the use of certain words and phrases that are considered by some to be rude, impolite, offensive, obscene, or insulting <ref> "Profanity - Wikipedia, the free encyclopedia", Wikipedia, March 2013</ref>. In this paper, we use cursing, profanity and swearing interchangeably. As Jay <ref name="The utility and ubiquity of taboo words">Jay, T. The utility and ubiquity of taboo words. Perspectives on Psychological Science 4, 2 (2009), 153–161.</ref> pointed out, cursing is a “rich emotional, psychological and sociocultural phenomenon”, which has attracted many researchers from related fields such as psychology, sociology, and linguistics <ref>Jay, T. Do offensive words harm people? Psychology, public policy, and law 15, 2 (2009), 81.</ref> <ref name="The pragmatics of swearing">Jay, T., and Janschewitz, K. The pragmatics of swearing. Journal of Politeness Research. Language, Behaviour, Culture 4, 2 (2008), 267–288.</ref>.

Over the last decade, social media has become an integral part of our daily lives. According to the 2012 Pew Internet & American Life Project report <ref> "Pew Internet: Social Networking (full detail)", PewResearch Internet Project, February 2013</ref>, 69% of online adults use social media sites and the number is steadily increasing. Another Pew study in 2011 <ref> "How American teens navigate the new world of “digital citizenship”", PewResearch Internet Project, November 2011.</ref> shows that 95% of all teens with ages 12-17 are now online and 80% of those online teens are users of social media sites. People post on these sites to share their daily activities, happenings, thoughts and feelings with their contacts, and keep up with close social ties, which makes social media both a valuable data source and a great target for various areas of research and practice, including the study of cursing. While the CSCW community has made great efforts to study various aspects (e.g., credibility <ref>Morris, M. R., Counts, S., Roseway, A., Hoff, A., and Schwarz, J. Tweeting is believing?: understanding microblog credibility perceptions. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, ACM (2012), 441–450.</ref>, privacy <ref>Almuhimedi, H., Wilson, S., Liu, B., Sadeh, N., and Acquisti, A. Tweets are forever: a large-scale quantitative analysis of deleted tweets. In Proceedings of the 2013 conference on Computer supported cooperative work, ACM (2013), 897–908.</ref>) of social networking and social media, our understanding of cursing on social media still remains very limited.

The communication on social media has its own characteristics which differentiates it from offline interaction in the physical world. Let us take Twitter for example. The messages posted on Twitter (i.e., tweets) are usually public and can spread rapidly and widely through the highly connected user network, while the offline conversations usually remain private among the persons involved. In addition, we may find that more of our actual exchange of words in the physical world happens through face-to-face oral communication, while on Twitter we mostly communicate by writing/typing without seeing each other. Will such differences lead to a change in people’s cursing behavior? Will the existing theories on swearing during the offline communication in physical world still be supported if tested on social media?

To address such differences, this paper examines the use of English curse words on the micro-blogging platform Twitter. We collected a random sampling of all public tweets and the data of relevant user accounts every day for four weeks. We first identified English cursing tweets in the collection, and extracted numerous attributes that characterize users and users’ tweeting behaviors. We then evaluated the effect of these attributes with respect to the cursing behaviors on Twitter. This exploratory study aims to improve our understanding of cursing on social media by exploring a set of questions that have been identified as crucial in previous cursing research on offline communication. The answers to these questions may also have valuable implications for the studies of language acquisition, emotion, mental health, verbal abuse, harassment, and gender difference <ref name="The utility and ubiquity of taboo words"/>.

Specifically, we examine four research questions:

  • Q1 (Ubiquity): How often do people use curse words on Twitter? What are the most frequently used curse words?
  • Q2 (Utility): Why do people use curse words on Twitter? Previous studies <ref name="The utility and ubiquity of taboo words"/> found that the main purpose of cursing is to express emotions. Do people curse to express emotions on Twitter? What are the emotions that people express using curse words?
  • Q3 (Contextual Variables): Does the use of curse words depend on various contextual variables such as time (when to curse), location (where to curse), or communication type (how to curse)?
  • Q4: Who says curse words to whom on Twitter? Previous research <ref>Jay, T. Why we curse: A neuro-psycho-social theory of speech. John Benjamins Publishing, 2000.</ref> <ref>Kamvar, S. D., and Harris, J. We feel fine and searching the emotional web. In Proceedings of the fourth ACM international conference on Web search and data mining, ACM (2011), 117–126.</ref> suggested that gender and social rank of people play important roles in cursing; do they also affect people using or hearing curse words on Twitter?

Method and Analysis

Data Collection

Twitter provides a small random sample of all public tweets via its sample API in real time <ref>https://dev.twitter.com/docs/api/1.1/get/statuses/sample</ref>. Using this API, we continuously collected tweets for four weeks from March 11th 2013 to April 7th 2013. We kept only the users who specified ‘en’ as their language in profiles. Further, we utilized Google Chrome Browser’s embedded language detection library to remove non-English tweets <ref>https://pypi.python.org/pypi/chromium_compact_language_detector/0.2</ref>. In total, we gathered about 51M tweets from 14M distinct user accounts.

Cursing Lexicon Coding

We asked two college students who are native English speakers to independently annotate potential curse words that were collected from Internet. In the end, we kept only 788 words that are considered to be curse words in most cases by two annotators. Besides correctly spelled words, (e.g., fuck, ass), the lexicon also included different variations of curse words, e.g., a55, @$$, $h1t, b!tch, bi+ch, c0ck, f*ck, l3itch, p*ssy, and dik.

We call a tweet cursing tweet if it contains at least one curse word. Twitter users may use different variations of the same word, so we first simply compare words in a tweet against all the curse words in the lexicon. If there is no match, we remove repeating letters in the words (e.g., fuckk → fuck) of a tweet and repeat the matching process. We also convert digits or symbols in a word to their original letters: e.g., 0 → o, 9 → g, ! → i. Moreover, based on our observations, the following symbols, ' ', '%', '-', '.', '#', '\', '’', are frequently used to mask curse words: f ck, f%ck, f.ck, f#ck, f’ck → fuck. We apply the edit distance approach similar to <ref>Sood, S., Antin, J., and Churchill, E. Profanity use in online communities. In Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems, ACM (2012), 1481–1490.</ref> to spot curse words with mask symbols. Namely, if the edit distance between a candidate word (f ck) and a curse word (fuck) equals the number of mask symbols (1 in this case) in the candidate word, then it is a match. Table 1 provides an overview of the per-user count of the number of overall tweets and cursing tweets in our data collection.

To evaluate the accuracy of this lexicon-based method to spot cursing tweets, we drew a random sample of 1000 tweets, and asked two annotators to manually label them as cursing or non-cursing independently. Finally, there were 118 tweets labeled as cursing tweets for which both annotators agreed on their labels, and the other 882 tweets were labeled as non-cursing ones. We then tested the lexicon-based spotting approach on this labeled dataset, and the results showed that this lexicon-based method achieved a precision of 98.84%, a recall of 72.03% and F1 score of 83.33%. As expected, this lexicon-based approach for profanity detection provides high precision but lower recall, which is mainly due to the variations in curse words (e.g., due to misspellings and abbreviations) and context sensitivity of cursing. Though we believe that, for this work, high-precision is preferred and recall of 72.03% is considered reasonable, more sophisticated classification methods that can further improve the recall remain an interesting topic for future work.

Cursing Frequency and Choice of Curse Words

Prior studies have found that 0.5% to 0.7% of all the words we speak in our daily lives are curse words <ref>Jay, T. Cursing in America: A Psycholinguistic Study of Dirty Language in the Courts, in the Movies, in the Schoolyards, and on the Streets. John Benjamins Publishing Co, 1992.</ref> <ref name="The sounds of social life">Mehl, M. R., and Pennebaker, J. W. The sounds of social life: a psychometric analysis of students’ daily social environments and natural conversations. Journal of personality and social psychology 84, 4 (2003), 857.</ref>. Turning to Internet chatrooms, Subrahmanyam et. al. <ref name="Connecting developmental constructions to the internet">Subrahmanyam, K., Smahel, D., and Greenfield, P. Connecting developmental constructions to the internet: identity presentation and sexual exploration in online teen chat rooms. Developmental psychology 42, 3 (2006), 395.</ref> reported that 3% of utterances contain curse words. Our comparison of cursing frequencies from different studies is shown in the following Table. Compared with existing studies, our estimate of cursing frequency was conducted for a significantly larger population: 14 million Twitter users and 51 million tweets. After removing punctuation marks and emoticons, we find that curse words occurred at the rate of 0.80% on Twitter, which is more than the rate (0.5%) in <ref name="The sounds of social life"/>. About 7.73% of all the tweets in our collection contain curse words, namely, one out of 13 tweets contains curse words. If we consider one tweet as roughly one utterance, this rate is more than twice the rate (3%) in <ref name="Connecting developmental constructions to the internet"/>.

Cursing frequency over different datasets: cursing on Twitter is more frequent than that in the other two datasets – 0.80% of all words vs. 0.5% of all words, and 7.73% of all tweets vs. 3% of all utterances
Mehl 2003 et. al. <ref name="The sounds of social life"/> Subrahmanyam 2006 et. al. <ref name="Connecting developmental constructions to the internet"/> Our work
Subject 52 undergraduates 1,150 chatroom users 14 million Twitter users
Sample 4 days’ tape recording 12,258 utterance 51 million tweets
Cursing Frequency 0.5% of all words 3% of all utterances 0.80% of all words, 7.73% of all tweets

Besides the cursing frequency, we are also interested in the question: Which curse words are most popular? We manually grouped different variations of curse words into their root forms, e.g., @$$, a$$, → ass. If a curse word is the combination of two or more words, and one of its component words is also a curse word, then it will be grouped into its cursing component word, e.g., dumbass, dumbasses, @sshole, a$$h0!e, a55hole → ass. All the 788 curse words are grouped into 89 distinct groups based on the root curse words and the frequencies of the top 20 words are shown in the following Figure. The most popular curse word is fuck, which covers 33.57% of all the curse word occurrences, followed by shit (15.45%), ass (14.66%), bitch (10.67%), nigga (10.30%), hell (3.91%), whore (1.84%), dick (1.74%), piss (1.55%), and pussy (1.24%).

Counts of curse words: only top 20 curse words are shown due to space limitation.
Counts of curse words: only top 20 curse words are shown due to space limitation.
Cumulative distribution of curse words: The top 7 curse words cover 90.40% of all the curse word occurrences.
Cumulative distribution of curse words: The top 7 curse words cover 90.40% of all the curse word occurrences.

Cursing vs. Emotion

Psychology studies <ref name="The pragmatics of swearing"/> suggest that “the main purpose of cursing is to express emotions, especially anger and frustration.” Thus, we aim to explore emotions expressed in cursing tweets and compare them with those in non-cursing tweets. We apply Machine Learning classifiers to the 51 million cursing tweets, and obtain the emotion distributions on both cursing and non-cursing tweets, which is shown in following Figure. Not surprisingly, cursing is associated with negative emotions: 21.83% and 16.79% of the cursing tweets express sadness and anger emotions, respectively. In contrast, 11.31% and 4.50% of the non-cursing tweets express sadness and anger emotions, respectively. This can be explained by the fact that curse words are usually used for venting out negative emotions, especially anger and sadness.

Emotion distributions in both cursing and non-cursing tweets.
Emotion distributions in both cursing and non-cursing tweets.

Cursing vs. Time

Cursing vs. Message Type

Cursing vs. Location

Cursing vs. Gender

Limitations

Conclusion

Acknowledgments

References

<references/>