Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data Environment. Analysing Public Perceptions of South Africa’s Local Elections by using Geo-located Twitter Data.

Similar presentations


Presentation on theme: "Big Data Environment. Analysing Public Perceptions of South Africa’s Local Elections by using Geo-located Twitter Data."— Presentation transcript:

1 Analysing Public Perceptions of South Africa’s Local Elections by using Geo-located Twitter Data.

2 Big Data Environment

3 Cloud Infrastructure 3 Cloud Instances 2 Data Nodes.
1 Master Nodes (Name Node and Secondary Name Node). 1.5TB Storage CPU’s – 12 Cores RAM – 65GB’s

4 Data Lifecycle Over Million tweets
750k Usable after geo-location, language filtering, etc. Live Database connection to Big Query. 5-10min delay from live Twitter feed into analytics.

5 Language Filter Removes tweets that are less than 50% English
Language Filter Removes non English Tweets This is due to the data dictionary complexities and having to build multiple data dictionaries to classify other languages.

6 Data Dictionary The Data Dictionary used is a sentiment lexicon containing 6800 words from the English language and a sentiment (very positive, positive, negative, very negative or neutral). A polarity was given to each word (i.e. Very Positive = >+3, Positive = +1 to +3, Negative = -1 to -3, Very Negative = <-3 or Neutral = 0) The philosophy of the ANC is get rich or die trying. The ANC is inherently corrupt. E.g. The(0) philosophy(0) of(0) the(0) ANC(0) is(0) get(0) rich(2) or(0) die(-3) trying(0). The(0) ANC(0) is(0) inherently(0) corrupt(-3). = Avg. of -1.33 = Negative Sentiment Used an average rather than a summation, remove outliers and smooth the cancelling effect of positive compared to negative as well as take into account the tweet length by averaging.

7 Geo-location Hierarchy
User specified locations are used to create a geographic hierarchy. Users profiles are mapped to major cities -> provinces -> country Only the largest cities by population are used due to having to manually create this hierarchy.

8 Geo-location Hierarchy
Showing tweet pings per day of sentiments from different cities. Big Spikes in early August and towards voting time

9 Sentiment Timeline Massive Volume spike in the final rally weekend and during the lead up to voting day and election results First time positive sentiment overtakes the negative sentiment clearly. Largest very positive sentiment Spike – ElectionResults, ANC, ImVotingDA Large negative spike on the 6th – RememberKhwezi, Zuma, ElectionResults

10 Trending Topic Identification

11 Geo-location Hierarchy
We can break this down by publics sentiment on each party. Negative sentiment and the end due to: RememberKhwezi Election Results Zuma

12 Geo-location Hierarchy
EFF DA ANC Overall Sentiment for collection duration per political party mentions

13 Flaming Index This is a view of political parties’ Twitter accounts mentioning any political party in their posts. Of the Twitter handles of political parties, only the 3 accounts mentioned below were significantly active and mentioned a political party in their tweet during the data collection time. Both the EFF and the DA had a number of tweets mentioning other parties while the ANC kept their tweets focused on themselves. The EFF focuses on uniting people with hope and promise of freedom The DA’s approach of letting the people know they are ready and are wanting to win while also being unshaken by criticism. The ANC’s tweets mention more that they need to win and try to appeal to the publics emotions by mentioning the communities and having a special place in the heart for them.

14 Network Analysis To Identify the most influential user we focused on:
Number of followers Number of distinct tweets made in a certain time period (frequency) Number of friends Number of retweets of a tweet attributed to a particular user Number of mentions and hashtags referring to a user And used a ranking method called: Eigen’s Vector - not all connections are equal and connections to other influential people will then make the person connected, have a greater influence.

15 Network Analysis After Removing some of the noise we are left with the above diagram. We now included the size of the node by the number of incoming connections. This means people may be seeing their posts, tweeting to them directly or mentioning them in texts or hashtags often. We still need to remove the purple nodes with a 0 centrality.

16 Network Analysis Left: Identifies inward connections
The larger the node the more the users are mentioned, hash-tagged and retweeted the most in the network which suggests these key nodes or users have a high impact on their connections and the Twittersphere of South African local elections. Users may find tweets, facts or opinions of these users important, relevant or they may agree or disagree with the conveyed messaged and therefore convey the message of this influential user. Right: Identifies outward connections (the larger the node the more this user interacted with other users). This alternate view shows influential users engagement levels with the community and how interactive and hands on they are. we therefore able to identify conversation starters on Twitter and who they start the conversation about. Both the influential users and engaging users can be seen as user who control the flow of information within the Twittersphere on the South African local elections.

17 Questions? Left: Identifies inward connections
The larger the node the more the users are mentioned, hash-tagged and retweeted the most in the network which suggests these key nodes or users have a high impact on their connections and the Twittersphere of South African local elections. Users may find tweets, facts or opinions of these users important, relevant or they may agree or disagree with the conveyed messaged and therefore convey the message of this influential user. Right: Identifies outward connections (the larger the node the more this user interacted with other users). This alternate view shows influential users engagement levels with the community and how interactive and hands on they are. we therefore able to identify conversation starters on Twitter and who they start the conversation about. Both the influential users and engaging users can be seen as user who control the flow of information within the Twittersphere on the South African local elections.


Download ppt "Big Data Environment. Analysing Public Perceptions of South Africa’s Local Elections by using Geo-located Twitter Data."

Similar presentations


Ads by Google