STATISTICS PROJECT Priya Mariam Simon Aparna Rajeev Sudhit Sethi Jinto Antony Kurian
Objective The main objective of our project is to acquire in-depth understanding of collection, organization and interpretation of numerical facts for taking managerial decisions. The data for the project has been collected from the Outlook (India) site. We have referred the link for the top 100 engineering colleges in India.
Variables Here the variables used are Nominal variables, Ordinal variables and some other variables. Nominal variables used are Name of the Institution, City and G/P (Government/Private). The ordinal variable used is Rank. We have also used various other variables such as IC (Intellectual Capital), I&F (Infrastructure and Facilities), PS (Pedagogic Systems), II (Industry Interface) and P (Placement).
Attributes IC (Intellectual Capital) represents quality of students each institute possesses. I&F (Infrastructure and Facilities) include land, building and various other facilities which an institute possesses. PS (Pedagogic Systems) refer to instructional methods used for educational purposes. II (Industry Interface) means corporate interaction with the college. P (Placements) refer to campus recruitments
Frequency Distribution Table Class interval midpoints(x) frequency(f) cf fx d d^2 fd^2 50-55 52.5 -18.05 325.8025 55-60 57.5 5 287.5 -13.05 170.3025 851.5125 60-65 62.5 21 26 1312.5 -8.05 64.8025 1360.853 65-70 67.5 33 59 2227.5 -3.05 9.3025 306.9825 70-75 72.5 18 77 1305 1.95 3.8025 68.445 75-80 77.5 10 87 775 6.95 48.3025 483.025 80-85 82.5 92 412.5 11.95 142.8025 714.0125 85-90 87.5 2 94 175 16.95 287.3025 574.605 90-95 92.5 99 462.5 21.95 481.8025 2409.013 95-100 97.5 1 100 26.95 726.3025 7055 7494.75
Statistical details Mean Median Mode S.D 70.55 66.65 65.1 8.66
The Histogram above shows each separate class in the distribution The Histogram above shows each separate class in the distribution. In the above histogram, we have 5 elements in the class between 55 to 60.
Frequency Polygon Frequency Polygon shows the outline of the data pattern more clearly.
Less than ogive The Ogive curve above is a graphical representation of the cumulative frequency distribution.
Pie Chart showing distribution of Engineering Institutes
Table showing Correlation and Regression between IC & P 0.81 slope 0.57 intercept 0.71 R^2 0.649154 R 0.805701
Correlation & Regression line
Table showing Correlation and Regression between II & P Correlation coefficient 0.630374 Slope 0.853955 Intercept 3.843427 r^2 0.397371 r
Rank correlation 1-6∑D^2+(m^3-m/12)/n(n^2-1) sum of squares of the differences D 14768 no. of repetitions m 2382 Rank correlation 0.910192019 The rank correlation is calculated by the method of sum of the squares of the differences of the rank. Since we have repetitive ranks, a modified formulae is used which is shown in the above tabular column. Using this approach the Rank Correlation is found to be .9101
Probability Frequency Government Private Total 50-60 5 13 18 60-70 28 23 51 70-80 17 3 20 80-90 7 1 8 90-100 60 40 100
Cont… What is the probability that a college selected at random falls in the range 70-80? P(E)= 20/100 0.2 What is the probability that a college selected at random is a private college and falls under the range 60-70 ? P(E/P) = 23/40 0.575 What is the probability that a college selected at random is a government college or falls under the range 80 to 90? P(gUE) = P(g) + P(E) - P(g∩E) = 60/100 + 8/100 - 7/100 0.08
Normal Distribution A continuous distribution. Shows the distribution of data as an area under the curve. From the data, 72% of colleges lie between 65 and 90. Right skewed curve.