Download presentation
Presentation is loading. Please wait.
Published bySUNIL LALA Modified over 5 years ago
1
R Programming Topic: Analyzing a dataset
2
Dataset: Amazon Alexa Reviews
3
Introduction This dataset consists of a nearly 3000 Amazon customer reviews (input text), and 5 variables which are star ratings, date of review, variant, verified reviews and feedback of various Amazon Alexa products like Alexa Echo, Echo dots, Alexa Firesticks etc.
4
Explanation of variables: 1. Star ratings(num): The star ratings are the ratings given by customers to the various products of amazon after testing and using them and generally range from 1 to 5. 2. Date of review(num): The date on which the customers are reviewing the various products of amazon. 3. Variation(char): The different variants of the products of amazon. For E.g.: Black variant of echo show & white variant of echo show are two different variants of echo show. 4. Verified reviews(char): These reviews are given by customers or official personnel who have genuinely tested the various amazon products and are marked as verified by Amazon. 5. Feedback(num): This variable is having two face value i.e. 1 and 0. 1 indicates a positive feedback from the customer. 0 indicates a negative feedback from the customer.
5
Amazon Alexa Amazon Alexa known simply as Alexa is a virtual assistant developed by Amazon, first used in the Amazon Echo and the Amazon Echo Dot smart speakers developed by Amazon Lab126. It is capable of voice interaction, music playback, making to-do lists, setting alarms, streaming podcasts, playing audiobooks, and providing weather, traffic, sports, and other real-time information, such as news. Alexa can also control several smart devices using itself as a home automation system.
6
Objectives Discover insights into consumer reviews and assist with machine learning models. Train your machine models for sentiment analysis. Analyze customer reviews how many positive reviews. How many negative reviews. Bringing out relation between variables. Summarizing data and finding patterns. Analyzing patterns through plot of graph.
7
Features of Amazon Alexa
8
Products of Amazon ECHO DOT ECHO PLUSAMAZON ECHO ECHO SPOT
9
FIRE TV STICKECHO SHOW Based on their experience, the customers give reviews, feedback and star ratings to the various products they have used and tested.
10
Structure & Summary function
11
which.min & which.max function
12
Subset Function
13
Table function
21
Histogram function
22
hist(amazon_alexa$i..rating, xlab=“Alexa Rating”, ylab=“Number of Ratings”, main=“Bar Chart of Alexa Ratings”)
23
Tapply () function Syntax: Tapply(x, INDEX, FUN = NULL, …, default = NA, simplify = TRUE) 1. X = a vector, 2. INDEX = list of one or more factor, 3. FUN = function or operation that needs to be applied. Lets understand with data :- Now we have to summarize data by date with rating to see interesting patterns
25
Regression Analysis Regression Analysis is a very widely used statistical tool to establish a relationship model between two variables. One of these variable is called predictor variable whose value is gathered through experiments. The other variable is called response variable whose value is derived from the predictor variable. Syntax: Lm(formula,data) Formula is a symbol presenting the relation between x and y. Data is the vector on which the formula will be applied. Adjusted r square:-the adjusted r-squared compares the explanatory power of regression models that contain different numbers of predictors. Interpret the p-values:-a low p-value (< 0.05) indicates that you can reject the null hypothesis. In other words, a predictor that has a low p-value is likely to be a meaningful addition to your model because changes in the predictor's value are related to changes in the response variable.
26
Linear regression
27
Logistic Regression
28
Conclusion By analyzing this dataset named “Amazon Alexa” we can give the potential customers a clear understanding of the product by providing them the reviews of those customers who already have Alexa. This analysis shows that the products star ratings have maximum frequency at 5 star and minimum frequency at 2 star, this takes us to the conclusion that the product is liked by the customers. From the histogram we had plotted for the variable star ratings, a conclusion can be drawn that almost 2500 customers out of the total 3150 customers have given a 5-star rating to the various products of amazon, thus making it the most frequent star rating given by the customers to the products.
29
Relationships between the different variables were analyzed. The data was summarized and interesting patterns were obtained. These patterns were then further analyzed by using various plot functions such as histogram, boxplot, etc.
30
Thank You!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.