Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Introduction to Data Science using Python

Similar presentations


Presentation on theme: "An Introduction to Data Science using Python"— Presentation transcript:

1 An Introduction to Data Science using Python
Ganesh Lohani Sr Data Analyst Lockheed Martin

2 Meeting Agenda/Learning Outcome
What is Data Science ? Data Science Methodology Python Basics Variable and Data Types Reading Data Selecting Filtering the Data Data manipulation, sorting, grouping, Python Libraries for Data Science NumPy (Numerical Computation) Pandas ( Data Analysis) Matplotlib ( Data Visualization) SciKit-Learn ( Machine learning Algorithms) Machine Learning Supervised Machine Learning Unsupervised Machine Learning Demo on Machine Learning Classification

3 What is Data Science The process of finding insights/trends/ intelligence from the data A relatively new field Deeply rooted to Statistics and Decision Support System A Multidisciplinary field ( Domain Knowledge, Tools & technology, Mathematics & Statistics, Problem Solving Skills)

4 Data Science Methodology
Statement of the problem/Objective of the Study Data Preparation Feature selection Exploratory Data Analysis Model development Test the Model/Hypothesis Testing Communicate the findings to the Business Leaders Deployment ( Data as a product) Feedback/Lesson Learned and Continuous improvement

5 Python Basics What is Python
A high-level general-purpose programming language. A very popular Data Science tool for data analysis, data visualization and Machine Learning tasks It is a open source and free tool

6 Python Basics How to Download Python
Download the python from the following link You can also download Python, and Jupytor Notebook from the following link

7 Python Basics Common Tools in Python Environment
The Python interactive console: Also called the Python interpreter or Python shell and provides programmers with a quick way to execute commands and try out or test code without creating a file. ( Spyder: It is a powerful scientific environment written in Python, for Python, and designed by and for scientists, engineers and data analysts. It offers a unique combination of the advanced editing, analysis, debugging, and profiling. ( Jupiter Notebook: It is an open source web application that you can use to create and share works (code, equations, visualizations, Machine Learning models and texts. (

8 Most Popular Python Libraries for Data Science
Python Basics Most Popular Python Libraries for Data Science

9 Python Basics Variable and Types
Variable is a memory location and placeholder to hold the data Most common Python Data Types: float, int, str, List, Tuple, Dictionary

10 Python Basics Basic Operations in Python Arithmetic Operations
Addition Subtraction Multiplication Division, Modulo Relational Operations Euqal Greater/Greater than Less/less Than Logical Operations TRUE/FALSE AND IN OR

11 Python Basics List A common Data type in python
Collection comma-separated values (items) between square brackets Contain same or different types Mutable behavior Values can add, remove, update/replace the value, slice and dice the members

12 Python Basics Tuple A common Type in Python
A tuple is very similar to List A collection of items inside the parenthesis() Tuple is Immutable ( The value cannot be changed) Can slice and dice add elements and Delete the entire tuple

13 Python Basics Dictionary Another common and popular type in Python
A collection of unordered data values A dictionary holds key value pairs of data The items are separated by commas, and the whole thing is enclosed in curly braces Keys are immutable but the values are mutable - can add modify and Delete values

14 Python Basics Function A function is a collection of reusable codes
We write the function one time and call it to solve the particular task Two Types of Function: System Function: max(), min(), len() User Defined Function – created by the programmer/developer Main Components of Function: Input, computation, output Global and local function

15 Python Basics Looping - For Loop
The for loop that is used to iterate over elements of a sequence It is often used when we have a piece of code which we want to repeat "n" number of time.

16 Python Basics Looping - While Loop
The while loop tells the computer to do something as long as the condition is met It's construct consists of a block of code and a condition.

17 Python Library NumPy It uses multidimensional arrays and matrices, as well as functions to perform the computation Allow to perform advanced mathematical and statistical operations on the above objects It provides vectorization of mathematical operations on arrays and matrices many other python libraries are built on the top of NumPy library

18 Python Library Pandas It is a Data Analysis tool in Python
It adds data structures and tools ( Series and Data Frame) designed to work with table-like data (similar to table in SQL Server environment) It provides tools for data manipulation: selecting, reshaping, merging, sorting, slicing, aggregation etc. It also handles missing data

19 Python Library Matplotlib
It is a two- dimensional Data Plotting and Data Visualization library in Python We can create line plots, scatter plots, barcharts, histograms, pie charts etc.

20 Python Library Seaborn
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics

21 Python Library SciKit-Learn
It provides machine learning algorithms: classification, regression, clustering, model validation etc. It is built on the top of NumPy, SciPy and matplotlib

22 What is Machine Learning
Create and use models that are learned from the data Teach the computer that use data instead of explicitly writing the code. The goal of ML is to use existing data to develop models that can use to predict various outcome to new data set Example: We see and use ML algorithms in our everyday life. Predicting whether an message is spam or not Predicting whether the credit card transaction is fraudulent or not Predicting the share price Predicting who will buy the product Clustering customers into different group based on some criteria

23 Supervised Machine Learning
The Supervised Learning Uses Unlabeled Data The primary goal of supervised learning is to scale the scope of data and to make predictions on labeled sample data. For Example: Will the loan be approved ? ( Yes/No)

24 Unsupervised Machine Learning
It Uses labeled Data The goal for unsupervised learning is to model the underlying structure or pattern in the data For Example: Who are my loyal customers ? ( Customer Segmentation based on certain criteria)

25 Common Types of Machine Learning Algorithms
Supervised Learning Classification Regression Decision Tree Unsupervised Learning Clustering Association Semi Supervised Learning Mix of Supervised and Unsupervised Learning Natural Language Processing Text Mining

26 Machine Learning Model Regression Classification
Demo Machine Learning Model Regression Classification

27 Question & Answer Thank you for attending the session & Have a Great Day! What Feedback do you have for me? Questions: (Ganesh Lohani)

28 Useful Links https://www.python.org/downloads/


Download ppt "An Introduction to Data Science using Python"

Similar presentations


Ads by Google