Main-01.png Read-01.png Recipes-01.png Tactics-01.png Cases-01.png Tools-01.png

Before and After Snowden: Detecting the Happenings of Privacy with Twitter

Team

Tommaso Venturini (Sciences Po), Hjalmar Carlsen (Goldsmiths), Helen Kennedy (University of Leeds), Sam Martin (Warwick University), Noortje Marres (Goldsmiths), Niranjan Sivakumar (LSE), Oscar Coromina (Universitat Autňnoma de Barcelona), Elinor Carmi (Goldsmiths), Nick Anstead (LSE).

Introduction

The privacy group worked with a Twitter data set to test methods for detecting, interpreting and measuring the ‘happening’ of privacy issues.

We ask: did the leak of NSA documents by Edward Snowden in early June 2013 significantly affect the composition of privacy as an issue on Twitter? How could we observe, measure and describe this with quantitative and qualitative methods of data analysis?

To address this broad question, we worked with a specific digital method in development, ‘associational profiling’, which uses co-occurence analysis to detect relations between words or related entities, and how these change over time.

Following actor-network theory, this method allows us to trace the associations that 'compose' an issue, by mapping the words, hashtags, URLS, users and so on that the issue term in question is connected to, in a given data set. We hypothesise that by tracing changes in such associational profiles over time, we may be able to detect shifts in the composition of the issues at hand, or their ‘happening.’ Importantly for this workshop this involves a combination of quantative and qualitative approaches: while the overarching aim is to qualify associations and their changes over time (do these indicate a happening affair?), both proportional and relational analysis of these associations is absolutely critical to accomplishing this.

(For more background on 'associational profiling,' see a previous study on internet governance with Twitter, Mapping WCIT.)

Applying this methodological approach to privacy issues on Twitter, we are especially interested to explore whether a public event - the leak of NSA documents in June 2013 - produced significant shifts in the composition of privacy on Twitter. How to measure, analyse and interpret this effect using methods of associational profiling?

We will then use methods of co-occurence analysis to trace how the associations between words in our privacy’ data set vary over time, to test if it provides a useful way to detect an issue’s ‘activity’, the composition and re-composition of issue profiles over time.

Project Design

Data: All tweets containing Privacy

Period: Before, during and after the NSA leak (23 May to 20 June 2013).

Methods: Co-hashtag and co-word analysis.

Tools: Twitter Capture and Analysis Tool (DMI-TCAT) and the Associational Profiler (CSISP-DMI)

Figure 1: tweets containing privacy before and after Snowden. (6 june: tool malfunction)

Research Question:

We ask: did the leak of NSA documents by Edward Snowden in early June 2013 significantly affect the composition of privacy as an issue on Twitter?

Here you can find a Timeline of Snowden revelations

Intervals for analysis

Week 1: 23/05-30/05before Snowden

week 2: 31/05-05/06: before Snowden

week 3: 06/06-12/06: during Snowden

week 4: 13/06-20/06: after Snowden

Three Groups

We divided our work among three groups:

1. INTERPRETATION (Helen, Nick, Elinor)

Interpreting co-hashtag profiles produced for key terms in the Privacy Twitter data set during a previous project during the DMI summerschool, [https://wiki.digitalmethods.net/Dmi/DetectingTheSocials][DetectingtheSocials]]

2. DETECTION (Noortje, Oscar)

Using the co-word profiler to produce new issue profiles with the Associational Profiler for selected terms in the privacy data set.

3. METRICS (Tommaso, Hjalmar, Niranjan)

Operationalisation of issuefication indicators with Gephi

Observations and Findings

1. INTERPRETATION

privacy_hashtag_profile_2.png

Figure 2: privacy co-hashtag profile before, during and after Snowden.

We asked: how to get to the issues by profiling hashtag associations?

1. we look for interesting things in the profiles of ‘happening’ hashtags;

Happening hashtags being those that are well-connected to other hashtags at a given moment in time): privacy, google, googleglass, facebook, tcot and p2, data, nsa, big data

2. we then consider the excel sheet with raw tweets per interval (all tweets containing the word priviacy), filtering for the above ‘happening’ terms, to consider manually how they are happening.

This allowed for correction of misinterpretations:

i.e. tlot - liberals, but it’s libertarians

tcot - topconservatives - and tlot are often used together:

3. We read links embedded in filtered tweets, which brought up:

a debate among libertarian conservatives versus security conversations

4. Finally we look at small sets, a dozen tweets,asking: would zooming out allow us to detect these partisan debates that were already happening, and how these are inflected by the prism event?

Hypotheses (suggested findings):

- nsa and prism as buzzwords introduced into existing conversations on privacy on Twitter

- privacy on twitter seems to be driven by the blogosphere

- very stable hashtag formations, nothing much surprising happens by adding ‘snowden’

- sophisticated hashtag users, not the everyday privacy talk

(hashtags as algorithmically generated)

- this analysis brings existing issues into focus (perhaps less issuefication)

- focus on co-occurence of hashtags produces a focus on professional communities,

can a focus on co-word address this limitation?

2. DETECTION

Our objective is to move from co-hashtag to co-word analysis in order to detect the happening of privacy issues with Twitter.

To this end we use the associational profiler tool similar to the hashtag profiler tool discussed above. However, to produce the co-word profiles, we focus on a smaller data set, to ensure the computational load remains within bounds (co-word analysis is a heavy method).

Within the data set privacy, we query “privacy AND Google” and exclude the retweets (include screenshot of the overview of the data set). We base our selection of this term on the existing hashtag profiles (see above) which suggested Google offers an especially happening issue term.

Figure 3: Tweets containing “Google AND privacy” before and after Snowden.

To produce our co-word profiles, we select terms from the top 10 words in this data sets, relying on the measure of co-occurrence (ie which words are most connected in our data set?

We ask: How do we detect ‘Snowden’ effects using co-word analysis?

(how do we evaluate the re-composition of word profiles in the privacy AND google set in the wake of the snowden event?)

nsa_coword_profile.png

Figure 4: co-word profile of ‘NSA’ in data set “privacy AND Google’ before during and after Snowden

The nsa profile demonstrates ‘issue discovery’ (from nothing to a word burst). Until the snowden case seems that nobody relates NSA to privacy issues. In the week when the case exploded we can see a prominence of words related to the case in a descriptive way: data, prism, facebook, apple. Also there is a lot of granularity. In the last week of the period analyzed is really hard to see anything.

data_coword_profile.png

figure 5: co-word profile of ‘data’ in data set “privacy AND Google’ before during and after Snowden

The first thing we can see in the data profile is an increase in volume, not necessarily related to the snowden event. Maybe, because we are in front of multiple issue dynamics at work simultaneously with different types of issue events in different intervals. This arises some questions regarding the limits of “the snowden event” as the evaluative frame for detecting issue dynamics. Anyway, maybe we can relate the last week co-ocurrences with an evolution of the issue in this case more focused on the debate.

security_coword_profile.png

figure 6: co-word profile of ‘security’ in data set “privacy AND Google’ before during and after Snowden

The security profile it’s the case where more clearly we can relate the profile to the snowden’s case timeline. That explains the increase in the volume on the two last weeks. Also, we can clearly see a shift in the type of words composing the profile from a topics more focused on the debate around security and privacy (kapersky, homeland security, national security, facebook, apple) to words strongly linked to the debate arised by the snowden case (abuse, feudal, stop, supporting: after snowden).

After obtaining the profiles we took a closer look at the file containing the tweets to bring these other events into focus (zoom in and out). What we found:

-Most from newsy sources

-Lots of replication of content (even after retweets are excluded, reality of repetition persists).

-eventfulness of co-word connections and eventfulness of content: different levels? types?

For example: sweden banning google from locating its data centre, in the last interval.

3. METRICS

Or: formulating indicators for profiling issues through time

In the last few years, network analysis has been extensively used as a technique to study issues and controversies. Networks of citations, hyperlinks, words, hash-tags and many others have been mapped and analyzed by scholars to gains insights on the structuring of the public debate.

In our approach, we aim at developing a series of metrics that can be used to detect the emergence of issues in the public sphere. The originality of this proposal comes from the fact that all the proposed metrics combines two aspects of networks rarely considered together in classic graph analysis:
  1. All metrics refer to ego-centered networks – i.e. they consider the network formed by the neighbors of a given node (‘ego’ from now on);

  2. All metrics refer to changes in dynamic networks – i.e. they compare ego-centered networks’ properties through different time intervals.

These dynamics ego-network can be called ‘associational profile’ and the analysis of the variation of their structures can be used to investigate the identity of ego and its transformation.

Though valid for different types of networks, these metrics have been devised with specific reference to the case of the co-occurrence of words and hash-tags in tweets (as extracted by the tool TCAT). In such specific case, the associational profile of a word or a hash-tag can be considered as a proxy for the collective definition of such word or hash-tag in the Twitter platform.

Metric [possible indicator]

Operationalization [how it can be computed]

What a variation in the metric can suggest in the case of Twitter co-occurrence networks

Connectivity of ego

Size of the ego-network

An increase in connectivity may suggest that ego becomes more relevant in the Twitter debate.

A decrease may suggest a specialization of the debate about ego.

Preferential connectivity

Number of edges with a weight above a given threshold / degree of ego

An increase in the preferential connectivity may suggest a specialization of the debate about ego.

Skewness of the connectivity force

Skewness of the weights of the edges of ego

An increase in the preferential connectivity may suggest a specialization of the debate about ego.

Skewness of the connectivity of neighbors

Skewness of the degree of neighbors

Clustering of neighbors

Average clustering degree of the ego-network (ego removed)

Following (Shwed & Bearman 2010) we propose the modularity measure over time as an indication of plurality/multiplicity

Homogeneity of neighbors

Distribution of neighbors on different categories (manually or automatically attributed)

Total/average value of neighbors

Sum or average of any numerical value attributed (manually or automatically) to the neighbors of ego

Stability of neighbors

Sum of the number of neighbors that remains in two subsequent intervals / total number of neighbors across all intervals

Variability of connectivity strength

Sum of the changes in the weight of edges between ego and each of its neighbors for each couple of subsequent intervals / sum of the total weight of edges between ego and each of its neighbors across all intervals




I Attachment Action Size Date Who Comment
data_coword_profile.pngpng data_coword_profile.png manage 279.8 K 28 May 2014 - 09:22 NoortjeMarres  
nsa_coword_profile.pngpng nsa_coword_profile.png manage 144.9 K 28 May 2014 - 09:21 NoortjeMarres  
privacy_hashtag_profile.pngpng privacy_hashtag_profile.png manage 1215.6 K 28 May 2014 - 09:13 NoortjeMarres  
privacy_hashtag_profile_2.pngpng privacy_hashtag_profile_2.png manage 269.7 K 28 May 2014 - 09:18 NoortjeMarres  
security_coword_profile.pngpng security_coword_profile.png manage 280.6 K 28 May 2014 - 09:22 NoortjeMarres  
Topic revision: r5 - 28 May 2014 - 09:26:59 - NoortjeMarres