Pregnant Pause: How the way you tweet (or don’t) gives you away

New Yorker Cartoon
New Yorker Cartoon

Our engagement with social networks is quite commonly seen to be the chief marker of this age, an age where even the most fleeting of thoughts is recorded and shared, in an ongoing conversation with the world at large. This of course creates a rich data mine for researchers to dig into for a peek at people’s daily lives as well as their inner musings.

Twitter, in particular, has captured the attention of data analysts for a variety of reasons; the main being the public availability of the data, the amazing amount thereof (however, less worrying about significant samples, whether justified or not, is another question [2]), and the restricted format; followed by a number of technical reasons. Thus, twitter has provoked many attempts to analyse for patterns and find links to information from its data, and the results are fascinating; there has been research attempts ranging from detecting influenza epidemics [3] and city traffic [4] to predicting policy changes and modeling ideologies [5]. This week’s paper, Major Life Changes and Behavioral Markers in Social Media: Case of Childbirth [1], is another attempt at extracting useful information from Twitter.

The paper explores the mood and behavioural changes that mothers go through around childbirth by analysing their daily postings to Twitter. The analysis focuses on tweets made during a period of time extending from 5 months prior to 5 months after childbirth. To identify new mothers on twitter, the first step was to identify birth announcement tweets by querying a set of keywords and phrases that are typically used in newspaper birth announcements. This query process resulted in a set of possible candidates that was further narrowed down by filtering for gender through comparing first names, then crowdsourcing the process of identifying false positives using Mechanical Turk. The final set consisted of 85 validated new mothers.

Tweets of the selected candidates were compared to these of a background cohort of 50,000 randomly selected users. The comparison was set around three measures: patterns of activity, linguistic style, and emotional expression. Results of the comparison showed that childbirth was associated with some changes, albeit small, for most new mothers. However, 15% of the new mothers underwent significant changes including: lower twitter activity, reduced positive and heightened negative affects, increased use of 1st personal pronouns, and changes in vocabulary.

The authors suggest that a possible contribution of this work is a method for identifying new mothers from Twitter data. Other possibilities include extending the work to diagnose postpartum depression in new mothers using social media, and providing self-narration for said mothers to bring their attention to early signs of postpartum depression.  Concerns of privacy, as the authors argue, can be overlooked when the analysis of this publicly available data is employed in a private manner. Other concerning issues are the limitations of the analysis tools used to quantify language and emotions (e.g. not taking negation into account, not), and the small size of the studied sample. These issues are acknowledged by the authors.


How representative is the dataset obtained from Twitter? One of the methodological issues one might face when working with datasets from Twitter is sample selection. An option is to use Twitter Streaming API, which offers up to 1% of all the data Twitter has, selected through unknown means, but is free. Another option is to use Firehose API, which includes every single published tweet, but is costly in terms of money and resources.  Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose compares the two APIs in an attempt to find the answer.


Literature:

[1] De Choudhury, M., Counts, S. and Horvitz, E., 2013, February. Major life changes and behavioral markers in social media: case of childbirth. In Proceedings of the 2013 conference on Computer supported cooperative work (pp. 1431-1442). ACM. http://dl.acm.org/citation.cfm?id=2441937

[2] Morstatter, F., Pfeffer, J., Liu, H. and Carley, K.M., 2013. Is the sample good enough? comparing data from twitter’s streaming api with twitter’s firehose. arXiv preprint arXiv:1306.5204. https://arxiv.org/abs/1306.5204

[3] Aramaki, E., Maskawa, S. and Morita, M., 2011, July. Twitter catches the flu: detecting influenza epidemics using Twitter. In Proceedings of the conference on empirical methods in natural language processing (pp. 1568-1576). Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=2145600

[4] Tejaswin, P., Kumar, R. and Gupta, S., 2015, March. Tweeting Traffic: Analyzing Twitter for generating real-time city traffic insights and predictions. In Proceedings of the 2nd IKDD Conference on Data Sciences (p. 9). ACM. http://dl.acm.org/citation.cfm?id=2778874

[5] Zhang, A.X. and Counts, S., 2015, April. Modeling ideology and predicting policy change with social media: Case of same-sex marriage. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (pp. 2603-2612). ACM. http://dl.acm.org/citation.cfm?id=2702193

Leave a Reply

Your email address will not be published. Required fields are marked *