header
Vol. 12 No. 4, October 2007


 

Which types of news story attract bloggers?


Mike Thelwall
Statistical Cybermetrics Research Group, School of Computing and Information Technology, University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1SB, UK

Aidan Byrne and Melissa Goody
School of Humanities, Languages and Social Sciences, University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1SB, UK


Abstract
Introduction. Blogs have been hailed as potential transformers of journalism and news values. Nevertheless, despite some major stories gestating in blogs, it is unclear what types of news are discussed in blogs and hence the extent of potential blogspace influence.
Method. We sampled 556 stories from four news Website home pages in June 2006.
Analysis. Each story was classified by topic, event type and geography, and the number of relevant blog postings from the publication day was estimated.
Results. The results showed a surprisingly close average match between blogger interests and BBC, CNN, LA Times and Fox News coverage, probably because news sites tend to publish more stories of popular types. Further analysis suggested that blogs favour participatory events and right-wing perspectives, and hence may pull mainstream news in this direction.
Conclusions. Blog discussions are not restricted to particular kinds of news, but are wide-ranging, even though some biases are evident. We also recommend simple guidelines for assessing whether individual news stories attract above average interest in 'blogspace'.


Introduction

The new century has seen the emergence of technologies that allow the public to bypass the traditional mass media for access to news, to comment in real time on emerging news stories, or to report their own news information. Most significant is probably the Weblog, or 'blog', a reverse chronological order list of the writings of an individual or group. Although blogs can be used by those who already have access to the media, such as politicians blogging to persuade or interact with constituents (Coleman 2005), and journalists blogging to report fast-breaking stories or their own opinions (Matheson 2004), there is an elite group of 'A-list' blogs, defined to be those that attract hundreds of thousands of readers with commentaries on politics, the news or technology (Trammell & Keshelashvili 2005). Moreover, many amateurs blog their lives (Huffaker & Calvert 2005) or areas of expertise (Bar-Ilan 2005), some describing themselves as personal journalists (Allan 2004). Blogs have been shown to change the news significantly (for example in the important case of Senator Lott (blogcyberzone.com 2006; Thompson 2003)), and to exert influence in other situations (reviewed below), which has led to speculation that blogs may induce widespread and fundamental changes in journalism. A limitation of most previous relevant research, however, is that it has focussed on high profile interesting case studies (e.g.,, Thompson 2003), or has surveyed a few high profile issues (Lloyd et al. 2006). This has shown the potential of blogs but not the likelihood of blog-style debating becoming an embedded component of routine news production and values.

To assess the potential for blogs to fundamentally change news production we first need to know more about how they are used to engage in news-related debates. A key part of this is identifying types of news most likely to attract blogger participation and hence the potential areas of blogger influence. Previous blog research has not systematically assessed the types of news covered in blogs, an important gap. Some idea of blogger interests can be easily gained from Websites that report the popularity of individual topics or blog posts, such as technorati.com and digg.com, but these do not allow systematic comparisons between the news stories that bloggers are exposed to and the topics that they choose to blog about. One previous study has explicitly compared the volume of blogging of the news with the volume of news discussions in U.S. news Websites. However, it focussed only on the most discussed topics (Lloyd et al. 2006). Surprisingly, blog and news volumes for individual topics showed almost no correlation over time, despite similar topics being extensively discussed. For example, film and TV show openings were extensively discussed in the news before the premiere, as part of the film's marketing build-up, but the main blogger discussions took place after they had had a chance to see the film. In contrast, bloggers started to discuss the anniversary of the World Trade Center attacks in advance, unlike most of the online newspapers.

In this research we assess the volume of blogger postings that typical news stories of different types can expect to attract. Note that this is investigating the influence of the news on bloggers, although our interest is in the influence of bloggers on the news. The reason for this choice is that we believe that this will reveal the topics that bloggers take an interest in and that these topics are those for which they may be influential. This is a practical step because directly measuring blogger influence on the news is difficult and perhaps impossible on a large scale. The results may also inform those who need to understand the types of news stories that attract the interest of bloggers for commercial and political reasons. Businesses (Glance et al. 2004; Gruhl et al. 2004) and politicians (Coleman 2005) that take note of bloggers would benefit from more information about the kinds of topics that bloggers are interested in as a way of understanding bias (if any) in blogspace (i.e., the totality of blogging activity on the Web).

News

This section briefly introduces some relevant news research. As this is a vast field, the selection is necessarily far from comprehensive.

News delivery

For most of the twentieth century news in developed nations was primarily delivered by the press and TV/radio broadcasting (Curran & Seaton 2003). Recently, however, there has been a distinct shift of public news consumption away from traditional sources and towards the Internet (Ahlers 2006). There have been decreasing audiences for traditional broadcast news (Pew Research Center... 2000) and print journalism is under threat (Lloyd 2006). Part of the shift has been to online newspapers associated with offline newspapers or broadcasters. In addition, there are more exotic Internet-only tools such as news search services, search engines and news aggregators (Chowdhury & Landoni 2006), which provide access to a range of sources. These do not pose a threat to existing media organizations, however, as long as the organizations adapt to the Internet in order to follow their audience. Traditional broadcasters attempt to compete with each other and blogs in providing fast-breaking news about significant developments, so the day's news is no longer fixed at the time of the final press runs but varies continually. Tracking this variation has been used to show how ideological components can be added to the reporting of news stories separately from the facts (Kutz & Herring 2005), perhaps after consultation with traditional news sources (Livingston & Bennett 2003). Moreover, journalists now often write blogs within online newspapers, negotiating issues of currency and authority (Matheson 2004).

The international dimension is important given the global reach of the Internet, the BBC (British Broadcasting Corporation) and CNN (Cable News Network). News reporting varies considerably on an international scale, even within Europe; for example Greek papers are aimed primarily at the social elite whereas Scandinavia has a genuine mass press (Hallin & Mancini 2005). Elsewhere broadcasting can be used for explicit political goals, such as nation building in South East Asia (Chadha & Kavoori 2005; Kitley 2003). Moreover, journalists can adopt significantly different roles, from advocate to objective gatekeeper (Janowitz 1975) and these choices vary internationally (Donsbach & Patterson 2004).

Technological impact on the mass media is far from unique to the Internet age. Previous researchers have claimed that delivery mechanisms, such as television, can produce radical changes in the content and impact of news delivery (Postman 1985) and that the technology effects can also be significant, with technologies such as hand-held cameras changing news by providing glitzy stories with high visual impact (Livingston & Bennett 2003). These changes may help to explain a growing US scepticism about media fairness (Pew Research Center... 2005). Moreover, based upon blogs and other recent Internet-enabled changes, theorists have claimed that the mass media now live in a constant state of flux in terms of modes of delivery and audience relationships (Deuze 2006; Livingston & Bennett 2003; Zelizer 2005), with this kind of instability being arguably characteristic of late modern social processes (e.g.,, Bauman's liquid modernity (2000).)

Public interest in the news

In order to understand news blogging it is useful to review first what is known about public news interests. There is a long history of influential research into public news interests. For example, pre-war UK readership surveys are credited with driving a shift to more entertainment content in newspapers (Curran & Seaton 2003: 44). It is likely that all major media news outlets monitor public interest and reactions in a variety of ways (Harrison 2006: 165-171). Online, the most effective method is probably the simplest: counting clicks in news Websites to estimate the interest in individual stories. Such information is now partially available free online: the BBC publishes lists of the top stories each month (BBC 2005), and news services like news.yahoo.com include in their home page daily, real-time lists of the most e-mailed, most viewed and most recommended recent stories.

In addition to this media self-monitoring, there are systematic audience surveys, such as those of the Pew Center for Civic Journalism in the US and academic research into various aspects of news consumption and engagement. Before reviewing a few relevant results, two contrasting theories of news reading must be noted (Dozier & Rice 1984). The play or ludenic theory posits that what people gain from reading the news is the pleasure intrinsic to the act itself (Stephenson 1967). In contrast, the uses and gratifications theory posits that news is read to satisfy a particular need, which may be external to the act of reading (Blumler & Katz 1974). (For example, some people scan news stories for future conversation topics and it seems likely that some bloggers also scan the news for blog topics.) Research over previous decades has thus moved from a perspective of exploring the effects of the media on its readership to studying the ways in which the readership uses the news for its own purposes. An example of this is identifying the role that attitudes to enjoyment plays in the way in which people use media (Nabi & Krcmar 2004). This approach can also be useful to help differentiate between the ways in which different groups react to and consume the news, such as those with a particular interest in technological news, those only interested in celebrities (e.g.,, as a form of para-social interaction (Berelson 1949; Horton & Wohl 1956)), or those who follow a wide range of news stories closely (Harrison 2006: 156-164).

The Pew Research Center public news interest and consumption survey is a good starting point for a general picture of U.S. news consumption. In 2004, weather was the top news type sought online. This is interesting because this is one of the least-blogged topics and shows that there is an element of selection in which aspects of the news are blogged, although the weather is arguably rarely a genuine 'news story'. A number of types of news were significantly searched for online, each by approximately half of the people who went online for news: Science & Health; Politics; International; Technology; Business; Entertainment; Local; Sports. This suggests a surprisingly even spread of interest in news topics (Pew Research Center... 2004). The same survey found a variable percentage of Americans tracking individual specific topics very closely using any news source, varying from Venezuelan instability (5%) to Iraq (63%). This shows that some news stories will have a ready-made receptive audience. In terms of general news interests that are followed 'very closely' by Americans, these vary from culture and arts (10%) to crime (32%): a moderately wide spread of interest levels.

Finally, it should be noted that news consumption is not homogeneous: it differs between ethnic groups (Madianou 2005) and with demographic factors such as sex (e.g.,, Knobloch-Westerwick & Alter 2006). In some countries, such as in Iran, blogs may be used as a substitute for a free press (Bond & Abtahi 2005) and may well already have a significant widespread and independent impact on what citizens there consider to be the news.

Blogs

This section is a review of some relevant research into bloggers, blog genres and topics. There are many other similar types of personal publishing, such as image sharing and bookmarking sites, discussion groups and integrated publishing spaces such as MySpace and much of the research below is applicable to all blog-like publishing.

Blog genres and topics

Blogs are used for many different purposes. The majority blog type appears to be the diary-like personal journal (Papacharissi 2004) accounting for about two thirds of all blogs (Herring et al. in press). Filter blogs are those that filter information from a variety of news sources and contain commentaries or opinions on the news. These accounted for about 16% of blogs and mixed blogs, containing a reasonably balanced mixture of genres, accounted for about 10% in April 2004 (Herring et al. in press). In addition, there is a small number (2%) of k-blogs, which are those that provide information about a specialist topic, such as Linux developments or library technologies (e.g.,, Bar-Ilan 2005). From this genre breakdown it is likely that the majority of blog content is personal information rather than news or politics. Nevertheless, much news-related material must be discussed in filter blogs and news stories must occasionally find their way into personal journals when they are sufficiently engaging or perhaps as a conversation starter.

There has been much research into political blogging, often focussing on the conflict between conservative and liberal bloggers. One study compared 40 A-list bloggers during an election in the US, showing the conservatives to be more densely interlinked (Adamic & Glance 2005). Blogs and the Internet in general seem to be logical avenues for activism. In particular, for people opposed to the mainstream media coverage of major events, blogs can provide an avenue for collective long-distance organization, as for the war-bloggingopponents of the Iraqi war. (Gorgura 2004; Thompson 2003).

No previous research seems to have systematically characterised the topics that bloggers tend to discuss. Nevertheless, one paper has commented on the most highly discussed topics in blogs during July 2005, finding the London Attacks of 2005 and a major film release to be top two stories (Thelwall 2006). This suggests that major political and entertainment events may both generate significant blogger interest (see also Thelwall & Hellsten 2006). Film openings are an example of carefully staged events designed to get people talking so that they will consider attending the film (Wasko 2003: 188-220). The value of film openings as talking points may make films particularly bloggable, even if the main blogger discussions take place once the film has been seen, in addition to the fact that films are a mass consumer product designed to engage interest and emotions.

Blogger demographics and geography

The revolutionary potential of blogs as a mass communication technology stems from the underlying software that makes them easy to create, update and connect (e.g.,, through links or comments) with other people and blogs. Nevertheless, bloggers are not typical citizens: they tend to be younger that average, although with an approximately even balance of the sexes (at least in English) (Herring et al. in press). A slight majority of bloggers seem to be students (Herring et al. in press) and to be more urban-dwelling than average (in the US) (Lin & Halavais 2004). All of these demographic factors are likely to influence the topics discussed in blogs. Of course the spread of bloggers on an international scale will be highly uneven, with countries such as the U.S. and Japan, with widespread Internet access likely to have a high proportion of the population engaged in blogging, relative to other nations. It is difficult to get reliable statistics to support this claim, however, because all values can only be estimates and are likely to be controversial (e.g.,, Riley 2006).

A number of demographic factors have an impact on blog visibility and accessibility. Only blogs that can be found are useful to researchers and so, for a researcher, the number of bloggers may equate to the number of blogs indexed in the blog search engine(s) used. It follows that countries or languages without good representation in international search engines or a local equivalent may be poorly represented in research. Another important issue is the identification of blogger nationalities. If a topic is of purely national interest or a country has a unique national language (e.g.,, Farsi in Iran) then it may be reasonable to assume that bloggers are from the country concerned. In the case of international topics and international languages, however, such as Iraq war discussions in English, it may be very difficult to identify national contributions through general blog search engines.

Theoretical perspectives

Blogs influencing the news

There has been a discussion of how blogging can introduce changes in the day-to-day practices of journalism. It is possible that the online news environment is so revolutionary that it is necessary to fundamentally rethink existing media theories of news. The following five points (in descending order of hypothesised importance) summarise how blogging may impact upon aspects of news production that existing media theories recognise as important.

Perhaps most importantly, journalists may change. Before blogs there had already been a significant shift away from the dominance of a few professional political journalists working for the main media sources that could collectively exert power over how politics was reported (Blumler & Gurevitch 2005). Now the diversity in media sources, not only the Internet but also the increase in the number of broadcasters, has undermined this authority. The amateur journalism of bloggers can potentially take an important extra step to significantly expand the number and types of people that directly produce news (Zelizer 2005).

Control of the most influential sources of news may change. Most news originates from official sources (Herman & Chomsky 1988: 18-35; Livingston & Bennett 2003; Schudson 2003: 21-22, 134-153). The influence of the social elite and well-organized pressure groups to a lesser extent, through para-journalistic corporate and governmental news agencies as primary news sources has been claimed to be important in explaining how the media works (Herman & Chomsky 1988; Schudson 2003: 147). Blogs can bypass traditional news sources by giving journalists and other bloggers free access to first hand newsworthy information, as in the case of the Baghdad blogger reporting from inside Iraq (Thompson 2003).

Media ownership may partly change. This is important because the influential propaganda model of the domination of news values and the news agenda by the social elite (Herman & Chomsky 1988), is partly underpinned by the observation that huge production costs restrict mass media ownership to the wealthy. The influence of production costs was clear in nineteenth century Britain, driving the marginalisation of a previously successful radical working class press (Curran & Seaton 2003), which continued in the twentieth century (House of Commons 1977, cited in Curran 2005). The existence of mass-readership, low production cost A-list blogs (Trammell & Keshelashvili 2005) has allowed some bloggers to gain a mass readership at the fraction of the cost of buying or setting up a newspaper.

The framing (Goffman 1974) of news stories may change. The above factors may combine to influence how individual news stories are constructed by the media, for example to encompass wider interest groups and the key issue of the extent to which dissenting views, especially those outside of the sphere of legitimate controversy, are given fair treatment, when reported (Hallin 1986; Schudson 2005). This may connect to the impact that news reporting may have within society (Carragee & Roefs 2004), although media effects theories are controversial (Branson & Stafford 2006). It is also possible that blogs, with their frequently opinionated nature, may pull journalists towards advocacy, push them towards objectivity as a way of differentiating themselves from the (amateur) bloggers, or cluster them towards uniformity. Differentiation may be important for the long-term future of a newspaper, because it is important that the newspaper be seen as credible (Schudson 2003: 40), i.e., trustworthy.

The range of influences over what is defined as news may change. The power of the social elite in defining what news should be reported is recognised, including for apparently independent and threatening investigative reporting (Protess et al. 1991). Even in democracies the mass media sometimes systematically ignores the truth, bends to the will of powerful groups and unquestioningly falls into line behind what is determined to be the national interest in times of crisis (Curran 2005). Can blogs help to provide an alternative axis of influence to promote alternative concepts of what is newsworthy or what should be investigated?

Despite the potential blog influence suggested by the above list, some believe that blogs do not really make fundamental changes in news values because of 'the degree of symbiosis' between A-list blogs and news media. (Thompson 2003). Alternatively, there have been claims that the multiple perspectives available in blogspace enable a kind of post-modern interactive journalism that will be radically different to traditional media (Wall 2005).

Blogging, politics and democracy

Many see an essential role for the press in democracies being filtering debates to make them manageable for citizens (Riddell 2006). This observation was first made, about the early US democratic press, by Alexis de Tocqueville in the middle of the nineteenth century, but market pressure from Internet-based new media also seems to be undermining this long-standing self-appointed role. (Riddell 2006). The extent to which the media have successfully informed the public in western democracies is a key point of contention, because of claims of elite dominance of the news agenda (Curran 2005; Herman & Chomsky 1988; Schudson 2003: 46). As discussed above, blogs can potentially undermine this elite dominance (e.g.,, Balnaves, Mayrhofer, & Shoesmith 2004;Gorgura 2004; Schudson 2005; Wall 2005). Conversely does the failure (so far) of blogs to radically alter the content of mainstream news (e.g.,, D'Haenens et al. 2004; LeBel 2005) undermine existing theories of elite dominance?

The debate about elite dominance of the news agenda relates to Habermas's (1989; 1991) public sphere, the idea that citizens should debate important issues in the interests of democracy. A criticism of Habermas has been the low participation rates in debates of relevance to civic society (Schudson 1998), but blogs and other Internet-enabled interactive discussion forums could potentially fix this problem.

An argument in the opposite direction, that is, seeing blogging as a potential threat to democratic participation, is that the opinionated nature of blogs and the fact that users can select which blogs they read now allows people to effectively filter out views that they do not agree with and only read opinions that broadly match their own (Sunstein 2004; Thompson 2003). If widespread, this would clearly decrease pluralism and/or increase fragmentation within society and perhaps reduce the likelihood of minority opinions reaching a wider audience, threatening a core component of democracy.

Research Objectives

The objective of this research was to discover how frequently different common types of news story are blogged. This is exploratory research (Sekaran 2000) rather than testing a specific hypothesis since this is a new type of study. Nevertheless, our preliminary hypothesis, based upon scanning sites like digg.com and checking the top read stories in news Websites, was that human interest or curiosity stories would be most likely to catch bloggers' attention. For example, just before the study was undertaken, one of the most e-mailed and read stories was about a home video of a cat chasing a bear up a tree (which was associated with a news story). The secondary objectives were to use the results to speculate about (a) the impact of blogging on the news, (b) the effectiveness of the methodology adopted and (c) how to benchmark news story blogging so that news stories that attract above average blogger interest can be identified. Hence the overall research design is a mixed methodology (Tashakkori & Teddlie 1998).

In theory, each blogger could write about every topic that they were aware of so that their blog would include a complete personal record of their reactions to the news. The following person-specific factors suggest reasons that bloggers may use to decide which topics to blog about:

In summary, the typical blog should be regarded as filtered for: topic and/or genre relevance; blogger effort or frequency of blogging; and, judged interest or use value. Hence blogspace can be expected to over-represent topics of particular interest to active bloggers (e.g.,, blogs and perhaps digital technologies) and other topics may have to be of a quite high general significance to attract a significant volume of blogging.

Method

The research plan was to obtain a sample of news stories, classify them and count the number of blog postings relating to each one. This would allow the relationship between story type and blogging to be assessed. Below is a description of how this was implemented.

Story identification

Although part of what blogging may achieve is to alter the definition of what counts as news, as a pragmatic step appropriate to preliminary research we decided to accept the definition and selection of news stories provided by existing mass media. We chose four well-known news organizations with a significant online presence: the BBC, CNN, Fox News and the Los Angeles Times. The BBC was included because it is possibly the top news source for bloggers (Thelwall 2006). We selected US news sources for the remainder because of the expected US dominance of English language blogging. CNN was selected because it is likely to be the second most popular online news source for bloggers. Fox News was selected for its popular conservative news coverage (Zelizer 2005) and the Los Angeles Times for its liberal coverage. All four have Websites that list many stories on their home page and they do not require user registration to access the full text of stories: two further selection criteria.

On each weekday at the same time (about 9.30am UK time) from June 15 to June 28 2006, two classifiers saved a copy of the home page of each of these newspapers. The complete set of stories from all four homepages on June 15 19, 20 22 23 27 and 28 comprised the news story sample (the irregular dates are due to a third classifier withdrawing during the research for personal reasons). The use of dates close together is a problem because news stories tend to go in waves. In news jargon, a story with legs can run for many days. In statistical terms this means that the contents of news stories for adjacent days are not independent. Ideally, we would have selected a random sample of days from a complete year of news coverage, but only a short period of time was possible for practical reasons. The discussion of the results mentions the few long running stories.

Story classification

Following previous research classifying news story types (Livingston & Bennett 2003; Volkmer 1999) we adopted and developed two classification facets: topic and event type. In addition, we added a geographic category. A week was used for pilot testing and refining the classification categories and descriptions. This period also served to train the classifiers and improve inter-classifier consistency (see Neuendorf 2002). For the subsequent main data collection exercise, one researcher compiled a complete list of all stories from all four news sources on a single day. Stories that were identical or very similar (e.g.,, describing different features of a common event) were treated as the same. The results were recorded in a spreadsheet, including topic, event and geographic categories and a list of the news sources mentioning the story. The classification was blind checked by a second researcher on the following day using URLs recorded by the first researcher. The purpose of the second classification was to ensure inter-classifier consistency: a third classifier arbitrated in cases of disagreement. (See appendix 1 for the classification instructions.)

Blog counting

In order to count the number of blog postings mentioning each news story, we used Google blogsearch for the day of the publication of the story, but searched two days later (using a date-specific search) to give some time for relevant postings from the day to be found. Google blogsearch was chosen because preliminary testing showed that it had good coverage compared to other blog search engines and because of its ownership by a large Internet company seemed likely to continue in existence for several years, at least. Clearly this method misses all blogs not indexed by Google. We chose to search for postings for the first day of the story only because some news stories are long-running with new angles reported on different days. In practice we are assuming that a significant proportion of blog postings are made on the same day that the story was first reported and that this proportion is reasonably similar across all of our categories. This is a biasing assumption. For example, previous research into high-profile news has shown that there are three distinct profiles (here we ignore results in the reviewed paper (Adar et al. 2004) specific to slashdot.org): stories with sustained interest over several days, stories peaking on day one and then quite rapidly decaying (mainly 'less serious' news content) and stories that peak on day two (tending to be 'serious editorial news comment') (Adar et al. 2004). Hence it is likely that our method underplays the importance of sustained news stories (serious editorial comment is not included in our sample). This is probably offset by the fact that stories with sustained interest are likely to be represented in our sample by stories on more than one day. Nevertheless, there are some events where blogging tends to naturally lead or lag news coverage (Lloyd et al. 2006) and this is a serious problem that our method cannot avoid. Note also that time zone differences across the world affect the results. In particular, some events that were stories in the UK and/or US news sources on a given day will have been stories on the previous or subsequent days elsewhere in the world and this affects the volume of blogging that the single day search is able to capture. In addition, events occurring late on a given day probably attract fewer same-day discussions than those that occur earlier.

Blogs matching a topic were found by composing a 'generous' Google advanced blog search designed to capture the majority of relevant posts. This was followed by checking a sample of up to twenty-five matching posts to identify the estimated proportion of relevant matches. Posts were judged relevant if they mentioned the news story, even if it was not the main topic of the post. If n matching posts were found and a proportion p of the sample was judged relevant then np was the estimated number of topic-relevant postings. Postings in spam blogs (splogs) or other automatically generated blogs were the majority in many cases: these were always classified as irrelevant. Spam blogs are a significant problem for the creators of blog search engines and other automatic blog analysis software (Han et al. 2006; Kolari et al. 2006).

Results

A total of 556 stories were classified and for each story a sample of up to twenty-five blog postings were checked as described above, around 5,000 in all. During the time of the study there was one major event that attracted many (31) news stories: the soccer FIFA World Cup 2006 in Germany. Soccer is the sport with the largest TV audience in the world and is the most popular spectator sport in many countries; unsurprisingly it attracted a lot of blogging. A few other stories had legs. There was tension between the US and Iran over nuclear power and between Israel and Palestine over a captured Israeli soldier, as well as ongoing US-centred discussions of terrorism around the world and ongoing military activities in Iraq and Afghanistan. Several stories discussed North Korean rocket tests. There was a minor event that attracted four postings: the election of Bishop Katharine Jefferts Schori to lead the US Episcopalians, which became connected in the news with a debate over the election of gay bishops in the US.

Category frequencies

Appendix 2, Table 3 gives a description of the topic categories and the topic frequencies are reported in Figure 1. Clearly there is a huge difference in the volume of stories related to the different topics. Note that all of the categories are essentially qualitative and so it would be impossible to claim that they all have the same natural 'size' in any meaningful sense. Nevertheless, the diagram is useful to give an overview of the common topics of front-page news stories.

fig1

Figure 1: Topic category story counts.

Appendix 2, Table 4 gives a description of the event categories and the results are reported in Figure 2. Perhaps the most interesting feature is the large amount of reporting of investigations.

fig2

Figure 2: Event category story counts.

Figure 3 gives the results of the geography category. Note that most stories in the international category involved only two countries: for example, the US president commenting on Iran. In fact this is a coding issue because coders within the US classifying a story of the US president commenting on, for example, Iran, might view it as an Iranian story, whereas an Iranian might view it as a US story and others might see it as bilateral. We chose to classify it as bilateral, as this is what we believe would be the majority view in such cases. A few stories were genuinely international, perhaps reporting on the activities of the United Nations. The dominance of the US and Canada (almost exclusively the US) is clear and, in fact, this is an underestimate of the importance of the US and Canada given that most of the international stories were bilateral events with US involvement.

figure_3

Figure 3: Geography category story counts.

Estimated blog counts

Recall that the estimated blog counts are the total number of blogs apparently discussing the story, multiplied by the proportion of the up to twenty-five tested blogs that were genuine, producing an estimate of the total number of blogs discussing the topic. For this section and the remainder of the paper we excluded all of the thirty-one FIFA World Cup stories. This is because it was an extreme and unusual event that would distort the statistics. It is extreme and unusual in the sense that it is probably the biggest world sporting event and occurs every four years. It is also unusual in the sense that the host country (Germany in 2006) is chosen by committee and changes every four years. Hence the inclusion of the FIFA World Cup would make the categories Europe (other) in particular and Sport/Sporting Event seem abnormally popular.

Figure 4 shows the spread of estimated blog counts per story. The highly skewed distribution is expected, but the first bar of the histogram (covering 0-12.5 posts) includes a surprisingly large number (163 or 30%) of the news stories on the home pages of top news sites that did not receive a single blog mention (at least as retrieved by our methods). This probably underestimates the real extent of blogger interest in the news because of a combination of factors: our blog searches not matching all relevant postings; bloggers posting about the topic before or after the day checked; blogs being harvested after we checked (e.g., relatively inactive blogs might be checked monthly by Google); and many blogs not being indexed by Google.

Note also that there was a significant but low correlation between the raw blog counts and the estimated blog counts (taking into account the proportion of spam found), indicating that a spam identification procedure is necessary to get reasonably accurate results (Spearman's rho: 0.502, p<0.001).

figure_4

Figure 4: Estimated blog post count histogram (without the FIFA World Cup).

Figure 5 displays the average estimated blog counts for the type categories. Because the data are skewed, the median is a more reliable measure of central tendency than the mean. Any mean that is significantly higher than the median indicates one or more stories with a particularly high blog count. The categories are arranged from the largest number of stories to the smallest, so that the means and medians towards the left of the graphs in this section are the most reliable.

Although religion seems to be the most blogged topic, this statistic is unreliable because of the four related stories during this period (mentioned above) and because of the low total number of stories (six). Very surprisingly for us, given the apparent popularity of bizarre events, human interest stories were not blogged much at all, although sport, culture, the environment and disasters were all popular and a few politics stories were very popular (hence the high mean).

figure_5

Figure 5: Topic category results (without the FIFA World Cup, in decreasing order of category size).

Event categories and results are reported in Figure 6. All the types of event have quite similar median blog post counts, although the peaks in the mean for diplomacy and sporting events reflect a few individual highly blogged stories in these categories. In addition, the investigation category, despite its very common usage, seems to be particularly uninteresting to bloggers.

figure_6

Figure 6: Event category results (without the FIFA World Cup, in decreasing order of category size).

Geographic categories are reported in Figure 7. If the FIFA World Cup data had been included then the most blogged location would have been Europe (other). Again, however, the amount of blogging is remarkably even across the categories, with the exception that there are a few highly popular stories in the US and international categories.

figure_7

Figure 7: Geography category results (without the FIFA World Cup, in decreasing order of category size).

News sources

The top half of Table 1 compares the four different new sources. Whilst the BBC covers the most stories, its average blogging interest is the lowest. This is perhaps because its extended coverage encompasses less interesting stories, its UK rather than US focus, or because of its public service remit which means that its objective is partly to inform as well as to entertain. The Los Angeles Times, in contrast, has the lowest coverage but also attracts relatively little blogging. This and the high results for Fox News would be consistent with active political bloggers tending to be right wing (Adamic & Glance 2005).


Table 1: Estimated blog posting counts from news sources
SourceMeanMedianStories
BBC 7.72.0231
CNN12.13.0169
Fox14.43.6141
LA Times13.62.4120
Any 16.02.0433
Any 28.33.058
Any 324.48.624
All 454.827.610

The lower half of Table 1 shows the popularity of stories by the number of news sources that reported them. There is a surprisingly wide divergence in the news stories covered by each source on the same day. This is partly due to the time lag between the US and the UK in the case of the BBC. Some of the stories categorised as different reported overlapping or related events, which would affect the results. It is clear from the table, however, that a story being covered by multiple news sources gives a good indication that it is likely to be more heavily blogged.

Top stories

Table 2 lists the top stories and their blog counts. The table suggests very different reasons for story popularity. The North Korean missile tests were portrayed as a significant threat to the US but would probably not have been controversial in the sense of generating vigorous debate; similarly for the 'Chicago plot'. In contrast, the voting rights act delay was controversial as was the call for an early withdrawal from Iraq. The sporting events were probably primarily blogged because they were exciting and important for the sport concerned. The Bonnaroo music festival is an unexpected inclusion although 80,000 participated, the major band Radiohead played, two people died, and the performers crossed a broad spectrum of music styles. In fact most of the bloggers had been at the festival and described aspects of it first-hand, for example discussing the traffic or reviewing the bands. This is an interesting example of a crossover between news and personal lives: presumably most of the bloggers discussing Bonnaroo were personal journal bloggers who do not normally discuss the news (i.e., not personal journalists) but in this circumstance were part of an event deemed newsworthy, even if they only discussed their personal experience of Bonnaroo.


Table 2: Estimated blog posting counts from news sources
HeadlineStory descriptionBlog count estimate
Nations press N. Korea on missile Japan, Australia and US say that any testing of an intercontinental missile will result in serious and stern consequences (19 June)286
US warns N Korea off missile testUS tells North Korea that its long-range missile test is aggressive (20 June)121
U.S. Democrats want pullout to begin this yearUS Democratic party politicians call for withdrawal from Iraq to begin this year.119
Brazil prevailsBrazil beat Australia in the soccer world cup115
Miami Heat go 3-2 up after thrillerDramatic finish to basketball game, one of seven in the NBA (National Basketball Association) finals.105
Music festival reaches beyond 'sweaty hippies'Bonnaroo, a camping and music festival on a Tennessee farm, grows beyond its original "neo-hippies and free spirits"99
Australia win through in thrillerAustralia draw with Croatia to qualify for the next round of the soccer world cup95
GOP halts extension of voting rights actA vote to renew a 1965 US law protecting minority voters from discrimination is delayed by objections over details that may need changing.93
Impressive Spain crush UkrainiansSpain beat Ukraine in the soccer world cup92
Seven arrests over 'Chicago plot'FBI arrests men allegedly plotting terrorist attacks on the Sears Tower and other high-profile targets90

Statistical analysis

Recall that our sample is not ideal because the inclusion of close and consecutive days means that the data for each day are not independent of the data from other days. In particular, the level of interest in individual stories with legs in small categories, such as the appointment of a female bishop (religion), can exert undue influence on the overall results. Hence no kind of probability-based statistical analysis will be valid. Nevertheless, we report Anova results here as indicative descriptive statistics.

Before processing the data, the expected blog count data were tested for normality and found to be highly skewed. A logarithmic transformation ln(1+blogcount) was used to reduce the deviation from normality, although the result was still significantly non-normal. We then conducted a bivariate analysis using Anova for type, event and geography. Type and Geography failed a Levene homogeneity of variance test and Event obtained a marginal pass. In all three cases a significant between-group variance was suggested (indicative p < 0.000) but Bonferroni tests were not powerful enough to identify particular pairs of categories that were significantly different. We also conducted two-way Anova tests to identify source factors (BBC, CNN, LA Times, Fox News) in story popularity differences but the results were not significant.

In summary, these weak and statistically not reliable tests suggest that differences exist between categories in each case but do not identify any specific differences between categories.

Discussion

Research methodology

The first important point to draw out of the results is methodological. Recall that we used a five-step procedure to gather data and estimate blog counts:

  1. identify and list all news stories in the four news Websites, matching similar stories from different sites;
  2. for each story, through experimentation construct a generous Google blogsearch for blog posts mentioning the story on the day it appeared (but searching at least two days afterwards);
  3. for each news story, check a sample of twenty-five blog posts (or fewer if there were fewer matches) identified in step 2 for being genuine human-created blogs discussing the news story;
  4. for each story, multiply the total matching blogs from step 2 by the proportion correct in step 3 to get the estimated total blog posting count; and
  5. classify each story using the three facets. This was performed independently by two classifiers with arbitration from a third.

In step 3 it was common to find that few or none of the blog search matches were valid. This was a result of the online availability of news feeds and the ease with which spam, automatically filtered or artificial blogs could be created and updated by software. This problem meant that the Google blogsearch results were normally misleading and sometimes extremely misleading, so our extra step of checking individual postings was necessary, although it may be impractical to implement this extra step on a larger scale. Hence our first finding is the unreliability of the results of Google blogsearch, at least for news-related searches. Together with the problems of identifying accurately the number of blog postings related to any given news story, a corollary of this is that quantitative news blogging research is difficult to operationalise effectively.

Story popularity

The primary aim of the research was to discover how frequently different common types of news story are blogged. The results showed a surprisingly even amount of blogging for the different topics, with the possible exception of religion. Nevertheless, in terms of common topics there was some unevenness: sport, culture, the environment and disasters attracted about double the interest of the economy, law and (surprisingly) health (Figure 5). There was very little difference between types of event (except for the rare festival category, Figure 5). Similarly, geography did not exert a significant average (median) influence on stories. Overall, giving the different sizes of coverage of the different categories of story (Figures 1 to 3) the results are broadly consistent with the four news sources tending to select stories for public interest and hence selecting more stories in popular categories, which would bring down the average blogging for popular categories through the inclusion of less important stories. This would also explain some of the particularly high means (e.g.,, for the US and international categories of Figure 7). The results are also broadly consistent with news sources including some lower interest stories that are part of serious news (e.g., economy, law), either to maintain recognition as a news outlet or to attract advertising through appealing to richer readers (Curran & Seaton 2003). It also follows that the popularity of news topics in blogspace should not be judged by the average amount of blogging per news story but by the number of news stories blogged.

Unsurprisingly, stories covered by more news Websites attracted more blogging. The results include some differences between news sources, however. The difference in median between the Los Angeles Times and Fox News suggests that politics may be a factor in story popularity amongst bloggers. In particular, it seems that a right wing slant in story selection or framing attracts more bloggers comments. This suggests that blogspace, if it has an influence, may pull the news to the right. This is not a surprising finding, given the known right-wing bias in blogspace. Nevertheless, because of the adversarial nature of many bloggers, stories may sometimes be blogged so that the blogger can disagree with them. In this context our finding is that more stories seem to be blogged to agree with them or to comment or use them in other ways.

It was remarkable that so many news stories apparently attracted so little interest, although further research is needed to confirm this. It may be that people tend to blog about only the major issues of the day and the minor issues are rarely blogable. The top stories did not follow a single recipe, however. They came from different geographic, topic and event categories. It seems clear that major US or international sporting events attract a lot of blogger comments, however. Interestingly, an event that was presumably considered barely newsworthy, the Bonnaroo festival, attracted a large amount of blogging from people who were there and wrote first-hand accounts of some aspect. Probably this is a type of event that would be more high profile in blogging than in other news interest indices, such as article readership counts. If blogs do commonly influence the news, however, then this suggests that the news may give more importance to events with a high public participation and blogging. Ironically, perhaps, if this is true then the biggest political winner might be the mass political protest march; presumably most blogging attendees would mention their participation in their blog.

In the wider context of the potential for blogs and blog-like media to influence the news and hence politics, the low estimated blogging figures suggest that public discussions of the news are not as widespread as may have been thought (e.g., by those discussing potential blogger influence), with many news stories apparently not rating a mention in any blog. We do not have enough evidence to make this a definite conclusion, however, as our data are restricted to the blogs found and indexed regularly by Google blogsearch and we suspect that our method misses a high proportion of blogs. Nevertheless, it seems likely that blogger influence on the news, if indeed it exists, must be restricted to a small number of stories that attract significant blogger comment. These will not necessarily be just the major stories of the day, but may also include stories with greater public participation, for example music festivals. Hence the results suggest that blogs may not be influential for routine news gathering and reporting, since few bloggers express a wide interest in the news. This is an argument against the existence of an active blogspace public sphere and against the possibility that blogs may routinely challenge elite dominance of the news: even though clearly there are many bloggers who discuss politics and the news, this does not seem to take the form of a detailed and wide ranging news discussion. What we have not addressed in this paper, however, is whether blogs promote an alternative news agenda to the mainstream media, or cannot be influential in individual cases (clearly they already have been); we have only found some evidence that they do not follow closely a wide traditional media agenda.

The discussion has focussed on western-style media and four high profile media ources but blogs may match news stories in different ways on regional and international scales. For example, it seems likely that the regional stories in a local online newspaper would attract less attention than its national and international stories because the latter would presumably be common across many news outlets. Similarly, media with a non-US national focus may also find that their US and international stories are more popular with bloggers (at least in English) than their other stories because they would be more likely to be picked up by US bloggers.

Benchmarking news story popularity in blogspace

A secondary aim of this research was to develop benchmarks for identifying stories of that have generated above average interest in blogspace. We present here an example of a method that is reasonable for the data here. Of course, the method is applicable only to the Google blogsearch of mid-2006, but the purpose is to suggest how such a method may work. A reasonable rule of thumb for identifying news stories with an above average interest would be to use three as the benchmark. A news story reported in one of the four news sources and attracting more than three genuine human-created blog postings indexed by Google blogsearch for the day of the story is probably a more than normally interesting story. To refine this rule, if the story concerns sport, culture, the environment or disasters then the threshold should be increased by two and if it concerns religion then the increase should perhaps be as much as twenty (although recall that the religion figures are unreliable). A threshold increase of one would be appropriate for news stories reported by CNN or Fox News, but no modification is needed for event type or geographic location.

The small differences found show that blogs could be used as a simple barometer of public interest in news stories, although there are already better ones. albeit not normally fully available, such as Website story view statistics. Those wishing to identify stories with above average blogger interest could design a benchmark procedure like the one above, using a classification exercise to generate blog count statistics for any particular blog search engine and any given date range. Nevertheless, given the relatively low range of differences identified between categories, an alternative and much simpler rule of thumb to identify an interesting story would be to see whether its blog posting count was above average for other stories on the same day from a single news source, perhaps adjusting for topic type as described above.

Conclusion

Our main finding is that blog counts match the news stories of the home pages of CNN, BBC, LA Times and Fox News surprisingly well. There were no types of story that were clearly blogged significantly more or less than average (median). A few stories attracted tens or hundreds of postings, but there was no simple pattern to their topic, event type or location. On this basis we have developed benchmarks for identifying individual news stories that attracted more attention than average. Although simple, the benchmarks are sufficient to cast doubt upon a claim by one of the authors in an earlier paper, that an average of six blog posts per day for the Danish cartoons affair before the end of January 2006 implied that it had been virtually ignored (Thelwall & Stuart 2007). In addition, we have speculated that news-related blogging may encourage changes in a minority of news stories, perhaps supporting a right-wing bias and encouraging more coverage of participatory events such as festivals and mass demonstrations. Although the logic of our method means that the findings primarily apply to major US and international media sources, it is likely that they would apply to a lesser extent to all news media because of the agenda-setting role of high profile news organizations.

Finally, this paper, as befitting exploratory research into a new and complex area, has raised more questions than it has answered. The following are important avenues for future research.

  1. Is there a better or quicker method of identifying the popularity of news stories in blogspace, for example through the use of another blog search engine or multiple blog search engines?
  2. Is the low volume of blogger interest in most reported news genuine?
  3. Are there other ways of categorising news stories that would be helpful in identifying popular story types?
  4. Would a similar study of non-US or non-US national or regional news sources show larger differences in the popularity of story types because there was a lower match between the number of stories posted of each type and the interests of the predominantly US English-language bloggers?
  5. Are there potentially newsworthy stories in blogspace that are never covered in the mass media? If so, what types of story are these?
  6. What are the factors that produce individual highly blogged stories and how do these differ from the reasons the general popularity of major news stories?
  7. Does the way in which the news is blogged vary over time?

In order to move closer to the ultimate goal of understanding the influence of blogs on the media, it will also be necessary to identify or theorise about the specific mechanisms by which blogs produce changes in the news and to theorise about the place that blogs may play in the complex social, economic and political environment that produces modern news.

Acknowledgements

The work was supported by a European Union grant for activity code NEST-2003-Path-1. It is part of the CREEN project (Critical Events in Evolving Networks, contract 012684). Thank you to a referee for very useful comments.



References

Appendix 1: Classification instructions

News Story classification instructions

For each of the main international news Websites identified in the news stories worksheet, visit the page during the morning, save the page to your computer and add each news story on the home page to the news stories spreadsheet. Each news story must be classified by Type, Geography and Event. Also record the URL of the main story page for each Website that contains the story on its home page. Two similar stories in different Websites (or the same Website) should be classified as the same.

News Story checking instructions

For each news story identified on Day n you need to count the number of blog posts from day n that mentioned the story. This must be done on day n+1 to allow time for all blog postings to be made. Use the Google blogsearch http://blogsearch.google.com/ advanced option to make a date specific search to try to capture all relevant blog posts from the previous day and no irrelevant posts. Record the query and the number in the post count checking worksheet. Classify (a) all postings if less than 25; or (b) a systematic sample of 25 if more than 25. If there are more than 25 then let n be the total number: sample every n/25th. For each post, make a decision about whether the story is relevant or not and record it in the post count checking column.




Appendix 2: Classification categories


Table 3: Type categories
CategoryDescription
Agriculturefarming, fishing, forestry, animal husbandry, hunting for food
Crime/Victimsdisorder, riots, criminal events, terrorist events, sport-related violence
Culturemusic, film, art, architecture, drama, literature, television, radio, media, Internet content, festivals and carnivals
Economyeconomics, business, trade, finance
EducationAbout education, training or schools
Environmentpollution, global warming, environmental protection, weather (non-disastrous)
Health/Medicinedrugs, treatment, medical research, exercise, nutrition
Human Interestcelebrities, sob stories, anecdotes, scandals, world records (not sport)
Human Rightslegal, sexual, political, racial, physical, other infringements of rights
International Aiddevelopment and disaster relief abroad
Lawlegal proceedings, legislative change
Military/Defencewars, civil wars, border patrols, military operations
Natural Disastersearthquakes, hurricanes, eruptions, fires, tsunamis, typhoons, storms, floods, avalanches/landslides, other damaging natural events
Politicspolitical scandal, treaties and negotiations between states, parties and international bodies, elections, political systems, international bodies
Religionorganized religion, new age religion, spirituality, religious disagreements, sectarian conflict/discrimination
Science/Technologyscientific research, development, discoveries, exploration, technological innovation
Social Servicesmedical insurance, prisons, local government services, child protection, state benefits, health services (organization/hospitals)
Sportsporting events, analysis, gambling, organization, development, governing bodies, spectating, records
Tourismtravel for leisure, places of interest, events



Table 4: Event categories
CategoryDescription
Accidentinjuries (environmental, human, animal) by negligence
Conflictwars, rebellions, disagreements, organized violence
Curiosityanecdotes, weirdness, unusual stories, celebrities
Decision/Agreementbusiness decision, court judgements, political decisions
Diplomacyrelations between countries and international bodies
Discoveryinventions, scientific discoveries, exploration
Festival/Celebrationawards ceremonies, anniversaries, parades, national holidays
Investigationpolice enquiries, trials, research (scientific and other)
Movementimmigration/emigration, tourism, NOT pressure groups, political parties, trade unions
Negotiationpolitical and business (i.e. mergers), hostage
Sport eventall sporting events
UndefinableAnything not listed or covering several categories.
Violation/ViolenceUnorganized violence - riots, breaches of physical and human rights, criminal violence, human rights violence, terrorist violence



How to cite this paper

Thelwall, M., Byrne, A. & Goody, M. (2007). "Which types of news story attract bloggers?" Information Research, 12(4), paper 327. [Available at http://InformationR.net/ir/12-4/paper327.html]
Find other papers on this subject




Check for citations, using Google Scholar


Bookmark This Page

counter
Web Counter
© the authors 2007.
Last updated: 28 August 2007
Valid XHTML 1.0!