Download presentation
Presentation is loading. Please wait.
Published byΖέφυρος Στεφανόπουλος Modified over 5 years ago
1
An Introduction to Data Science using Python
Ganesh Lohani Sr Data Analyst Lockheed Martin
2
Meeting Agenda/Learning Outcome
What is Data Science ? Data Science Methodology Python Basics Variable and Data Types Reading Data Selecting Filtering the Data Data manipulation, sorting, grouping, Python Libraries for Data Science NumPy (Numerical Computation) Pandas ( Data Analysis) Matplotlib ( Data Visualization) SciKit-Learn ( Machine learning Algorithms) Machine Learning Supervised Machine Learning Unsupervised Machine Learning Demo on Machine Learning Classification
3
What is Data Science Data science is the process of finding insights/trends/ intelligence that supports the business leaders to make the better decision Data science is a relatively new field and deeply rooted to Statistics and Decision Support System It is a Multidisciplinary field ( Domain Knowledge, Tools & technology, Mathematics & Statistics, Problem Solving Skills)
4
Data Science Methodology
Statement of the problem/Objective of the Study Data Preparation Feature selection Exploratory Data Analysis Model development Test the Model/Hypothesis Testing Communicate the findings to the Business Leaders Deployment ( Data as a product) Feedback/Lesson Learned and Continuous improvement
5
Python Basics What is Python
A high-level general-purpose programming language. A very popular Data Science tool for data analysis, data visualization and Machine Learning tasks It is a open source and free tool
6
Python Basics How to Download Python
Download the python from the following link You can also download Python, and Jupytor Notebook from the following link
7
Python Basics Common Tools in Python Environment
The Python interactive console: It is also called the Python interpreter or Python shel and provides programmers with a quick way to execute commands and try out or test code without creating a file. ( Spyder: It is a powerful scientific environment written in Python, for Python, and designed by and for scientists, engineers and data analysts. It offers a unique combination of the advanced editing, analysis, debugging, and profiling. ( Jupiter Notebook: It is an open source web application that you can use to create and share works (code, equations, visualizations, Machine Learning models and texts. (
8
Most Popular Python Libraries for Data Science
Python Basics Most Popular Python Libraries for Data Science
9
Python Basics Variable and Types
Variable is a memory location and placeholder to hold the data Most common Python Data Types: float, int, str, List, Tuple, Dictionary
10
Python Basics Basic Operations in Python Arithmetic Operations
Addition Subtraction Multiplication Division, Modulo Relational Operations Euqal Greater/Greater than Less/less Than Logical Operations TRUE/FALSE AND IN OR
11
Python Basics List A common Data type in python
Collection comma-separated values (items) between square brackets Contain same or different types Mutable behavior Values can add, remove, update/replace the value, slice and dice the members
12
Python Basics Tuple A common Type in Python
A tuple is very similar to List A collection of items inside the parenthesis() Tuple is Immutable ( The value cannot be changed) Can slice and dice add elements and Delete the entire tuple
13
Python Basics Dictionary Another common and popular type in Python
A collection of unordered data values A dictionary holds key value pairs of data The items are separated by commas, and the whole thing is enclosed in curly braces Keys are immutable but the values are mutable - can add modify and Delete values
14
Python Basics Function A function is a collection of reusable codes
We write the function one time and call it to solve the particular task Two Types of Function: System Function: max(), min(), len() User Defined Function – created by the programmer/developer Main Components of Function: Input, computation, output Global and local function
15
Python Basics Looping - For Loop
The for loop that is used to iterate over elements of a sequence It is often used when we have a piece of code which we want to repeat "n" number of time.
16
Python Basics Looping - While Loop
The while loop tells the computer to do something as long as the condition is met It's construct consists of a block of code and a condition.
17
Python Library NumPy It uses multidimensional arrays and matrices, as well as functions to perform the computation Allow to perform advanced mathematical and statistical operations on the above objects It provides vectorization of mathematical operations on arrays and matrices which significantly improves the performance many other python libraries are built on the top of NumPy library
18
Python Library Pandas It is a Data Analysis tool in Python
It adds data structures and tools designed to work with table-like data (similar to table in SQL Server environment) It provides tools for data manipulation: selecting, reshaping, merging, sorting, slicing, aggregation etc. It also handles missing data
19
Python Library Matplotlib
It is a two- dimensional plotting library in Python a set of functionalities similar to those of MATLAB We can create line plots, scatter plots, barcharts, histograms, pie charts etc.
20
Python Library Seaborn http://seaborn.pydata.org
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics
21
Python Library SciKit-Learn
It provides machine learning algorithms: classification, regression, clustering, model validation etc. It is built on the top of NumPy, SciPy and matplotlib
22
What is Machine Learning
ML refers to creating and using models that are learned from the data It is a technique to teach the computer that use data instead of explicitly writing the code. The goal of ML is to use existing data to develop models that can use to predict various outcome to new models It is a branch of Artificial Intelligence (AI) and deeply rooted to Statistics and Mathematics Example: Predicting whether an message is spam or not Predicting whether the credit card transaction is fraudulent or not Predicting the share price Predicting who will buy the product
23
Supervised Machine Learning
The Supervised Learning Uses Unlabeled Data The primary goal of supervised learning is to scale the scope of data and to make predictions on labeled sample data. For Example: Will the loan be approved ? ( Yes/No)
24
Unsupervised Machine Learning
It Uses labeled Data The goal for unsupervised learning is to model the underlying structure or pattern in the data For Example: Who are my loyal customers ? ( Customer Segmentation based on certain criteria)
25
Common Types of Machine Learning Algorithms
Supervised Learning Classification Regression Decision Tree Unsupervised Learning Clustering Association Semi Supervised Learning Mix of Supervised and Unsupervised Learning ( For Example Classification and clustering) Natural Language Processing Text Mining
26
Machine Learning Model Classification
Demo Machine Learning Model Classification
27
Question & Answer Thank you for attending the session & Have a Great Day! What Feedback do you have for me? Questions: (Ganesh Lohani)
28
Useful Links https://www.python.org/downloads/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.