Download presentation
Presentation is loading. Please wait.
Published byVirginia Bement Modified over 10 years ago
1
BIA 660 Web Analytics - Midterm Akshta Chougule Hao Han Di Huo Xi Lu Laura Sills Bank Of America
2
Business Problem Customer Strategy: grow base by forming life- long banking relationships with young adults Current Account Demographics Report Shows ●fewer new student accounts ●increase in cancellation of accounts by the young adult demographic Impact: Losing market share to other banks
3
Business Questions ●What is Bank of America’s reputation with this age group - do they like Bank of America or not? ●How does Bank of America compare to other banks? ●Are customers in this demographic group unhappy with the bank’s services? ●Are there any banking products which customers in this group want not offered by Bank of America?
4
Source of Information Online social media sites are a good source for comments from this age group
5
YouTube Statistics ● More than 1 billion unique users monthly ●Nielsen ratings show that YouTube reaches more US adults ages 18-34 than any other cable network http://www.youtube.com/yt/press/statistics.html
6
Demographics of Reddit http://www.theatlantic.com/technology/archive/2013/07/reddit-demographics-in-one- chart/277513/
7
What do People Think About Banks?
8
TopicRedditYouTubeTwitter mortgage5%6%30% loan5%13%0% fraud6%7%0% insurance1%2%0% branch3%1%0% hours2%1%0% account19%16%20% overdraft8%1%0% bailout1%6%0% fee18%11%20% customer13%8%0% representative / teller7%18%20% [credit] union10%7%10% computer1% 0% CEO2% 0%
9
Data Gathering and Validation Use Python to obtain comments from web ●Crawling Reddit ●API for Twitter ●API for YouTube
10
Data Cleansing and Exploration ●Delete incomplete comments, extra whitespace, and punctuation, stopwords ●Explore data using Python to analyze the frequency of words in the comments in order to identify “key words” related to banking ●Word scan confirmed the key words
11
Gathering data from Twitter ●Technique: twitter API ●Amount of tweets: BOA -- 125KB Citibank-- 104 KB Chase -- 100 KB ●Timestamp: 1 week ●Type of Data: Tweet text Tweet created_at Geocode
12
Data Processing ●Two libraries: positive & negative ●Score each tweet
13
Tweets by Location
14
Data Processing ● Summary for BOA tweets: ●Good or bad? Min.1st Qu.MedianMean3rd Qu.Max. -0.20000-0.043480.00000-0.011760.028570.20000
15
Competitor Analysis
16
Distribution for tweets’ score Mean: BOA: -0.01176 Citi bank: -0.0006146 Chase: -0.00731
17
Two Sample T-test Null hypothesis: true difference in means is equal to 0 Alpha=0.1 ●BOA and Citi bank: p-value = 0.0009004 < 0.1 ●Citi bank and Chase: p-value = 0.06971 < 0.1 ●BOA and Chase p-value = 0.2289 > 0.1
18
Gathering data from YouTube ● Techniques: BeautifulSoup g.data ●Amount for general analysis: 3097
19
TopicRedditYouTubeTwitter mortgage5%6%30% loan5%13%0% fraud6%7%0% insurance1%2%0% branch3%1%0% hours2%1%0% account19%16%20% overdraft8%1%0% bailout1%6%0% fee18%11%20% customer13%8%0% representative / teller7%18%20% [credit] union10%7%10% computer1% 0% CEO2% 0%
20
YouTube data for each category ● Training data: 600 ●Loan: 2430 ●Account: 2700 ●Service: 520
21
Naive Bayes Classification Algorithm A naive Bayes classifier assumes that the presence or absence of a particular feature is unrelated to the presence or absence of any other feature, given the class variable 。
22
Naive Bayes Classification Algorithm Splitting the dataset into training and test data (Manual rating of comments) ●Training (400) ●Testing (200) ●Predicting (5700)
23
Primary Categories of Customer Complaints
24
Accuracy of Classification ● Mortgage: 64.5% ●Accounts: 58.7% ●Service: 68.4%
25
Mortgage
26
Account
27
Service
28
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.