I am making claims about scientists' use of Twitter, so I need a corpus of non-scientists' tweets for comparison. But there is a big issue with all collections of Twitter data: what constitutes a representative sample? Literary scholars can restrict appeal to a canon for their selection of texts. Samples of blogs often focus on those that get the most links; samples of academic texts often choose the highest-impact journals in a field, or heavily-cited papers; samples of news often go to newspapers with large circulations or those that make some claim to be newspapers of record. But the Twitter feeds with the most followers are celebrities, and in some ways they are the least typical; they may be doing something quite different from the writers that interest me.
My science Twitter feeds are to some degree about their scientific field, so I looked for non-science feeds that were focused on specialist topics (wine, food, horses, dogs, parenting, transport). Also, my science Twitter writers each had about 1000 to 5000 followers, showing they were popular and consistent enough to attract readers far beyond their friends and connections, but not so popular that they were mainly science journalists or popularisers. So I started by looking for non-science Twitter feeds that had a similar number of followers. I searched the topic lists at We Follow and Twitter for various non-science topics. I threw out those that had too many or too few followers, those that posted infrequently, or that were just adjuncts to a blog or other site. I ruled out the feeds that claimed to be written by dogs. And I came up with this list:
@alicefeiring - American who writes about natural wines.
@anniemole - Blog name of the writer of the great Going Underground blog, about the London transit system; her day job seems to be working for a foodie web site.
@cliffysmom - Nancy J. Bailey, an American who writes about and paints pictures of horses.
@EnglishMum - mostly about food and travel, not parenting.
@dogloversdigest - Kevin Myers, dog breeding and training.
@mochadad - American writing about parenting and dating.
@paulawhite - An American teacher of English.
@SecondAveSagas - one of several Twitter feeds by Benjamin Kabak, this one focuses on the New York transit system.
@shunafish - Shuna Fish Lydon, a New York pastry chef.
@woodswines - Simon Woods, a British wine writer.
Of course many other selections would be possible, and there is perhaps some personal bias here: food, wine, cities, but no politics, music, sports, or celebrity gossip. But it was important that these people have some sort of expertise in their specialist areas, so that they might be doing something roughly comparable to the scientists using Twitter.
I wanted to keep them in separate files, so I couldn't just follow them and copy out the feed as it appeared in my box. I took about 10,000 words of each feed, to compare with the sample of ten science bloggers. There are problems with this approach to collecting the sample and preparing a corpus, but I'll come to them in another post.