Understanding the field & setting expectations
Personal International UNT Alumni (Mathematics) Academic Economics & Mathematics Professional Academic Research, Hilton, Ansira, Sabre
Analytics: Discovery and communication of meaningful patterns in Data Data Science: The novel application of algorithms and statistical techniques to solve business problems. Reality: Different meanings at different companies The culture of the company determines the nature of work that you do A relatively new field Most Companies are in the process of defining their analytics strategy Titles common to the field: Data Scientist, Analytics Consultant, Statistical Modeler, Risk Analyst, Statistician.
Forecasting “Predictive Analytics”: Classification Logistic Regression, SVM, Random Forest, Gradient Boosting Fraud, Customer Acquisition Customer Retention/ Churn Modeling Who is likely to leave for a competitor Recommendation Engines Netflix Challenge Customer Choice Modeling What will people buy Multinomial Logit Model Optimization Market Mix Modeling Clustering/ Market Basket Analysis
Most Companies house their data in relational databases Oracle, Teradata, IBM DB2, Microsoft SQL SQL queries used to retrieve data SQL: a basic entry level requirement to work in this field Most of tasks require significant amounts of time and energy combining tables and data Hadoop -An open source distributed framework for storing and processing large amounts of data Petabytes Java based Map-Reduce Pig, Hive-SQL syntax-Facebook, Impala-SQL syntax, Spark Spark – UTD offers a Spark Course HTML JSON
Statistical Programming Languages R- Open Source, easy to learn, unparalleled no. of packages and functionality, Memory Limitations. SAS – Very Common in Businesses but losing popularity, expensive, losing market share to R, handles large data sets well. Python – Versatile, reasonable no. of packages, R’s biggest competitor. Matlab – More common in Engineering field. General Programming Languages JAVA – Not knowing java has cost me at least 4 jobs. C/ C++ - For writing faster R programs Scala – Spark more common among people on the forefront of development
Search for positions you are overqualified for. More likely to sponsor you State your status as soon as possible Some companies have policies against hiring international students. myvisajobs.com See companies that are sponsoring See salaries for negotiation purposes Others.
SQL Fundamental Requirement Experience with Large Data Sets 10k records is no large SAS/ R Take courses Free courses at UNT Very Strong in at least one area (Optimization, Forecasting, Classification) Specialize in something Linux Experience Get exposure JAVA Learn it. Multiple Projects (At least 3)- Code Research Paper, Apply a technique to company data, participate in Kaggle, do internship.
Universities UTD – School of Management/ Operations Research OSU (Oklahoma) – Analytics and Data Mining Programs UNT-Economics SMU- Statistics Economics, Mathematics, Statistics, Operations Research, Computer Science, Engineering. Companies AT&T, Sabre, Epsilon, Amazon, AnalyticRecruiting.com (lots of Phone Interviews), Kforce.com (Very Promising and takes care of Visa issues)
Kaggle.com The Home of Data Science Company recruiting & Pays winners Many Kaggle winners manage Analytics teams Compete! Get recognized. Internships are extremely important AT&T, Sabre, Epsilon, Amazon, Santander, Capital One in Plano Companies prefer to hire Mathematicians Never accept first offer Jumping around vs. Staying at one company They always divide by 2 Dallas R user group- Network Meetup.com – Network Informs local chapter
The Elements of Statistical Learning: Data Mining, Inference and Prediction. The Art of R Programming The Theory and Practice of Revenue Management