Main-01.png Read-01.png Recipes-01.png Tactics-01.png Cases-01.png Tools-01.png

Bias Detection: Analyzing ACTA Twitter Data


Team Members

Vera Franz, Paul Girard, Olga Goriunova, Noortje Marres, Bernhard Rieder, Nate Tkacz

Introduction

This group worked on three different aspects of the analysis of the ACTA Twitter data.
  • Social Interactions
  • Exploring and comparing actor languages
  • Refining actor polarization

Social Interactions

While our Twitter dataset did not contain friend / follower relationships, we were able to map social interactions by parsing referencing via @username in mentions and retweets. This image maps 38K users and 31K interactions (click for larger image):

acta_galaxy.jpg

We see that most of the conversation about ACTA is only very loosely organized. This is only a debate space in the heavily connected cluster in the center. Also note that the network diagram makes it possible to "detect" the non ACTA related spanish tweets quite easily. Selecting nodes visually may thus be a way to clean data or to analyze subsets of users.

This chart shows the familiar power-law structure of mentions:

indegree-distribution.png

Most users (70%) are never mentioned and only a very small number (2.2%) are mentioned or retweeted five or more times. While there is a core of active users that interact in a sustained fashion over the course of the two month, most of the tweets about ACTA are not part of an ongoing conversation.

When focussing on the strongly connected cluster in the center, we can see further structuring here (click for larger image):

acta_galaxy_center.jpg

In this analysis, language communities emerge quite clearly. In terms of tweet production, the Japanese cluster was, in fact, the most active in the observed period. Node color (heat scale blue to red) here denote betweenness centrality, which can roughly be interpreted as "bridging capital" and we see that each subcommunity is organized around one account that is also well connected to the other two clusters.

Exploring and comparing actor languages

The second element of this groups work centered on comparing word frequencies for polarized actor languages. Data cleaning was a very big challenge here as characterset differences are still a huge problem when capturing data online.

One way to explore language and content beyond mere frequency is co-word analysis. If two words appear in the same tweet they are considered to be connected. Here is a map of the co-word relations for the "against ACTA" camp (without Japanese) - click for a larger image:

acta_con_coword.jpg

On this level of analysis, the result is to be expected: languages cluster together, even if Spanish and French are held closely together simply because they share a significant number of words. From this view, we can see quite well that English is the dominant language here. Unfortunately, we did not have the time to clean the dataset of "stopwords" (articles, etc.).

We also attempted to compare actor languages between the three "camps" (con-ACTA, pro-ACTA, neutral), but this proofed to be difficult due to problems with character sets, but mostly because of the fact that the latter two camps are simply a lot less prominent in our dataset. One way to make comparisons by means of visualization would be graphing the words on a three-axes parallel coordinate chart:

acta_word_compare.jpg

Such an interactive graph (here visualized with Mondrian), produces a macro view of word frequencies for the three camps and we can instantly see that although certain words are shared (EU, vote, etc.), the "pro" camp uses quite different terms. This analysis is limited by the quality of the data, but the method may allow for rather interesting explorations.

Another way to visualize such data (cleaned in a slightly different way here) is a Venn map:

bias-words-acta.jpg

Refining actor polarization

This group also worked on a new method for classifying actors in an attempt to get to a more refined way of polarizing tweets according to political leaning. The technique starts out with a small set of URLs that are manually classified in "pro" or "con". It then uses the Issuecrawlers snowball method to find a large number of new sites. The resulting network is imported into gephi and a "community detection" algorithm (modularity) is applied. The resulting groups are analyzed by issue experts and classified along the pro/con spectrum. The result is a far larger set of sites that can be used to automatically classify tweets (and, in a second step, language) into the respective camps.

This is a map of the actor network (click to see a larger image):

crawl_result_coloured_modularity_v2.jpg
I Attachment Action Size Date Who Comment
acta_con_coword.jpgjpg acta_con_coword.jpg manage 772.3 K 04 Nov 2012 - 09:54 BernhardRieder  
acta_galaxy.jpgjpg acta_galaxy.jpg manage 844.0 K 04 Nov 2012 - 09:13 BernhardRieder  
acta_galaxy_center.jpgjpg acta_galaxy_center.jpg manage 711.9 K 04 Nov 2012 - 09:28 BernhardRieder  
acta_word_compare.jpgjpg acta_word_compare.jpg manage 126.6 K 04 Nov 2012 - 10:10 BernhardRieder  
bias-words-acta.jpgjpg bias-words-acta.jpg manage 97.6 K 08 Nov 2012 - 08:14 BernhardRieder  
crawl_result_coloured_modularity_v2.jpgjpg crawl_result_coloured_modularity_v2.jpg manage 1462.2 K 04 Dec 2012 - 16:09 BernhardRieder  
indegree-distribution.pngpng indegree-distribution.png manage 15.4 K 04 Nov 2012 - 08:48 BernhardRieder  
Topic revision: r5 - 04 Dec 2012 - 16:10:46 - BernhardRieder