An Introduction to Data Science using Python Ganesh Lohani Sr Data Analyst Lockheed Martin ganeshlohani@hotmail.com
What is happening? Massive amount of data 90% of the data in the world today has been created in the last two years alone Big Data (3 Vs):Volume Velocity Variety Data is Everywhere and in many formats: Structured Data Semi structured Data Unstructured Data Data has been considered as assets More opportunities to work on data platform Turn data into Information, Decision Making and Business Value
What is Data Science Data science is a field to extract insights/trends/ intelligence that supports the business leaders to make the better decision Data Science is also a process of validating assumption model hypothesis related to business activities Data science is a relatively new field and deeply rooted to Statistics and Decision Support System It is a Multidisciplinary field ( Domain Knowledge, Tools & technology, Mathematics & Statistics, Programmimg languages)
Data Science Methodology Statement of the problem/Objective of the Study Data Preparation Feature selection Exploratory Data Analysis Model development Test the Model/Hypothesis Communicate the findings to the stakeholders Deployment ( data as a product) Feedback/Lesson Learned and Continuous improvement
Python For Data Science/Data Analysis Python is a open source software used as Data Science tool It is user friendly The Code syntax is simple to read and follow. It supports functional, object oriented, and structural programming languages
Python For Data Science/Data Analysis Python Basics: Variables and Data Types Data Frame ( holds the data, like table in SQL Server) Tuples ( initialized with small brackets, inmutables ) List ( collection of values, mutable), Dictionary ( key value pair) Operations ( comparison Mathematical and and Boolean) Function Methods, Conditional Statement ( If Else, While Loop, For Loop) Python Libraries NumPy (Numerical Computation) Pandas ( Data Analysis) Matplotlib ( Data Visualization) SciKit-Learn ( Machine learning Algorithms)
Machine Learning It is a technique to teach the computer that use data instead of explicitly writing the code. It is a branch of Artificial Intelligence (AI) and deeply rooted to Statistics and Mathematics The output is never 100 accurate. Our goal is to optimize the algorithm/model Example: Weather Forecast: 50 % Chance of rain today
Common Types of Machine Learning Algorithms Supervised Learning Classification ( Email: Spam, No Spam) Regression ( Forecast the Car price, Share price over time) Decision Tree ( Will Rain Today? Yes, No) Unsupervised Learning Clustering ( Customer Segmentation: Gold, Silver, Bronge) Reinforcement Learning React to the environment ( Autonomous Car) Natural Language Processing Text Mining ( Twitter Data Analysis, Customer Survey Data )
Machine Learning Model Simple Regression Demo Machine Learning Model Simple Regression
What Feedback do you have for me? Question & Answer What Feedback do you have for me?
Useful Links https://www.python.org/downloads/ https://www.python.org/doc/ https://numpy.org/devdocs/user/quickstart.html https://pandas.pydata.org/pandas-docs/stable/getting_started/index.html https://matplotlib.org/tutorials/index.html https://scikit-learn.org/stable/index.html https://www.datasciencecentral.com/ https://www.kaggle.com/ https://azure.microsoft.com/en-us/services/machine-learning-studio/ https://machinelearningmastery.com/machine-learning-in-python-step-by-step/