An Introduction to Data Science using Python

Slides:



Advertisements
Similar presentations
Python for Science Shane Grigsby. What is python? Why python? Interpreted, object oriented language Free and open source Focus is on readability Fast.
Advertisements

ASP.NET Programming with C# and SQL Server First Edition
Introduction to Computational Linguistics Programming I.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Introducing Python CS 4320, SPRING Resources We will be following the Python tutorialPython tutorial These notes will cover the following sections.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Guide to Programming with Python Chapter Five Lists and dictionaries (data structure); The Hangman Game.
CS190/295 Programming in Python for Life Sciences: Lecture 6 Instructor: Xiaohui Xie University of California, Irvine.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:
IS 350 Course Introduction. Slide 2 Objectives Identify the steps performed in the software development life cycle Describe selected tools used to design.
Programming Paradigms, Software Architectural Patterns, and MVC CS 378 – Mobile Computing for iOS Dr. William C. Bulko.
Arrays Chapter 7.
Integrating Algorithms and Coding into the Mathematics Classroom
Introduction to python programming
PH2150 Scientific Computing Skills
Python for Data Analysis
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Python for data analysis Prakhar Amlathe Utah State University
Machine Learning with Spark MLlib
Software Testing.
Machine Learning for Computer Security
CSCI-235 Micro-Computer Applications
DATA MINING © Prentice Hall.
Introduction to R.
Containers and Lists CIS 40 – Introduction to Programming in Python
Introduction Python is an interpreted, object-oriented and high-level programming language, which is different from a compiled one like C/C++/Java. Its.
It’s All About Me From Big Data Models to Personalized Experience
System Design.
Data Mining 101 with Scikit-Learn
Scripts & Functions Scripts and functions are contained in .m-files
JavaScript: Functions.
Intro to Computer Science CS1510 Dr. Sarah Diesburg
Basic machine learning background with Python scikit-learn
INTRODUCTION TO BASIC MATLAB
Prepared by Kimberly Sayre and Jinbo Bi
Kathi Kellenberger Redgate Software
Chapter 5 - Functions Outline 5.1 Introduction
Introduction to Azure Machine Learning Studio
Introduction to MATLAB
Learning to Program in Python
Introduction to Python
Use of Mathematics using Technology (Maltlab)
Brief Intro to Python for Statistics
Guide to Programming with Python
CS190/295 Programming in Python for Life Sciences: Lecture 6
Data Science introduction.
Data Science with Python
Intro to Computer Science CS1510 Dr. Sarah Diesburg
Communication and Coding Theory Lab(CS491)
Creating Computer Programs
A QUICK START TO OPL IBM ILOG OPL V6.3 > Starting Kit >
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Intro to Computer Science CS1510 Dr. Sarah Diesburg
CISC101 Reminders Assignment 2 due today.
CHAPTER 4: Lists, Tuples and Dictionaries
Simulation And Modeling
Intro to Computer Science CS1510 Dr. Sarah Diesburg
Python for Data Analysis
Creating Computer Programs
Programming Logic and Design Eighth Edition
Dictionary.
Igor Stančin, Alan Jović to: {igor.stancin,
Machine Learning in Business John C. Hull
Introduction to Computer Science
An Introduction to Data Science using Python
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
An Introduction to Data Science using Python
PYTHON - VARIABLES AND OPERATORS
Presentation transcript:

An Introduction to Data Science using Python Ganesh Lohani Sr Data Analyst Lockheed Martin ganeshlohani@hotmail.com

Meeting Agenda/Learning Outcome What is Data Science ? Data Science Methodology Python Basics Variable and Data Types Reading Data Selecting Filtering the Data Data manipulation, sorting, grouping, Python Libraries for Data Science NumPy (Numerical Computation) Pandas ( Data Analysis) Matplotlib ( Data Visualization) SciKit-Learn ( Machine learning Algorithms) Machine Learning Supervised Machine Learning Unsupervised Machine Learning Demo on Machine Learning Classification

What is Data Science Data science is the process of finding insights/trends/ intelligence that supports the business leaders to make the better decision Data science is a relatively new field and deeply rooted to Statistics and Decision Support System It is a Multidisciplinary field ( Domain Knowledge, Tools & technology, Mathematics & Statistics, Problem Solving Skills)

Data Science Methodology Statement of the problem/Objective of the Study Data Preparation Feature selection Exploratory Data Analysis Model development Test the Model/Hypothesis Testing Communicate the findings to the Business Leaders Deployment ( Data as a product) Feedback/Lesson Learned and Continuous improvement

Python Basics What is Python A high-level general-purpose programming language. A very popular Data Science tool for data analysis, data visualization and Machine Learning tasks It is a open source and free tool

Python Basics How to Download Python Download the python from the following link https://www.python.org/downloads You can also download Python, and Jupytor Notebook from the following link https://www.anaconda.com/why-anaconda/

Python Basics Common Tools in Python Environment The Python interactive console: It is also called the Python interpreter or Python shel and provides programmers with a quick way to execute commands and try out or test code without creating a file. (https://www.python.org/shell/) Spyder: It is a powerful scientific environment written in Python, for Python, and designed by and for scientists, engineers and data analysts. It offers a unique combination of the advanced editing, analysis, debugging, and profiling. (https://pypi.org/project/spyder/) Jupiter Notebook: It is an open source web application that you can use to create and share works (code, equations, visualizations, Machine Learning models and texts. (https://jupyter.org)

Most Popular Python Libraries for Data Science Python Basics Most Popular Python Libraries for Data Science

Python Basics Variable and Types Variable is a memory location and placeholder to hold the data Most common Python Data Types: float, int, str, List, Tuple, Dictionary

Python Basics Basic Operations in Python Arithmetic Operations Addition Subtraction Multiplication Division, Modulo Relational Operations Euqal Greater/Greater than Less/less Than Logical Operations TRUE/FALSE AND IN OR

Python Basics List A common Data type in python Collection comma-separated values (items) between square brackets Contain same or different types Mutable behavior Values can add, remove, update/replace the value, slice and dice the members

Python Basics Tuple A common Type in Python A tuple is very similar to List A collection of items inside the parenthesis() Tuple is Immutable ( The value cannot be changed) Can slice and dice add elements and Delete the entire tuple

Python Basics Dictionary Another common and popular type in Python A collection of unordered data values A dictionary holds key value pairs of data The items are separated by commas, and the whole thing is enclosed in curly braces Keys are immutable but the values are mutable - can add modify and Delete values

Python Basics Function A function is a collection of reusable codes We write the function one time and call it to solve the particular task Two Types of Function: System Function: max(), min(), len() User Defined Function – created by the programmer/developer Main Components of Function: Input, computation, output Global and local function

Python Basics Looping - For Loop The for loop that is used to iterate over elements of a sequence It is often used when we have a piece of code which we want to repeat "n" number of time.

Python Basics Looping - While Loop The while loop tells the computer to do something as long as the condition is met It's construct consists of a block of code and a condition.

Python Library NumPy It uses multidimensional arrays and matrices, as well as functions to perform the computation Allow to perform advanced mathematical and statistical operations on the above objects It provides vectorization of mathematical operations on arrays and matrices which significantly improves the performance many other python libraries are built on the top of NumPy library https://numpy.org

Python Library Pandas It is a Data Analysis tool in Python It adds data structures and tools designed to work with table-like data (similar to table in SQL Server environment) It provides tools for data manipulation: selecting, reshaping, merging, sorting, slicing, aggregation etc. It also handles missing data https://pandas.pydata.org

Python Library Matplotlib It is a two- dimensional plotting library in Python a set of functionalities similar to those of MATLAB We can create line plots, scatter plots, barcharts, histograms, pie charts etc. https://matplotlib.org

Python Library Seaborn http://seaborn.pydata.org Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics http://seaborn.pydata.org

Python Library SciKit-Learn It provides machine learning algorithms: classification, regression, clustering, model validation etc. It is built on the top of NumPy, SciPy and matplotlib https://scikit-learn.org/stable

What is Machine Learning ML refers to creating and using models that are learned from the data It is a technique to teach the computer that use data instead of explicitly writing the code. The goal of ML is to use existing data to develop models that can use to predict various outcome to new models It is a branch of Artificial Intelligence (AI) and deeply rooted to Statistics and Mathematics Example: Predicting whether an email message is spam or not Predicting whether the credit card transaction is fraudulent or not Predicting the share price Predicting who will buy the product

Supervised Machine Learning The Supervised Learning Uses Unlabeled Data The primary goal of supervised learning is to scale the scope of data and to make predictions on labeled sample data. For Example: Will the loan be approved ? ( Yes/No)

Unsupervised Machine Learning It Uses labeled Data The goal for unsupervised learning is to model the underlying structure or pattern in the data For Example: Who are my loyal customers ? ( Customer Segmentation based on certain criteria)

Common Types of Machine Learning Algorithms Supervised Learning Classification Regression Decision Tree Unsupervised Learning Clustering Association Semi Supervised Learning Mix of Supervised and Unsupervised Learning ( For Example Classification and clustering) Natural Language Processing Text Mining

Machine Learning Model Classification Demo Machine Learning Model Classification

Question & Answer Thank you for attending the session & Have a Great Day! What Feedback do you have for me? Questions: ganeshlohani@Hotmail.com (Ganesh Lohani)

Useful Links https://www.python.org/downloads/ https://www.python.org/doc/ https://www.datasciencecentral.com/ https://www.kaggle.com/ https://azure.microsoft.com/en-us/services/machine-learning-studio/ https://machinelearningmastery.com/machine-learning-in-python-step-by-step/