Social Computing and Big Data Analytics 社群運算與大數據分析 1 1042SCBDA06 MIS MBA (M2226) (8628) Wed, 8,9, (15:10-17:00) (B309) Finance Big Data Analytics with.

Slides:



Advertisements
Similar presentations
Suggested Course Outline Cloud Computing Bahga & Madisetti, © 2014Book website:
Advertisements

Technology of Data Analytics. INTRODUCTION OBJECTIVE  Data Analytics mindset – shallow and wide, deep when you need it  Quick overview, useful tidbits,
Python Lab Proteomics Informatics, Spring 2014 Week 1 28 th Jan, 2014 Himanshu Grover
Sarah Reonomy OSCON 2014 ANALYZING DATA WITH PYTHON.
Financial Engineering Club Career Path and Prep. Entry Level Career Paths Type 1: Research based Background: Physics, Electrical Engineering, Applied.
Opportunities in Quantitative Finance in the Department of Mathematics.
Social Media Apps Programming Min-Yuh Day, Ph.D. Assistant Professor Department of Information Management Tamkang University
Social Media Apps Programming Min-Yuh Day, Ph.D. Assistant Professor Department of Information Management Tamkang University
Chapter 5 Application Software.
Social Media Apps Programming Min-Yuh Day, Ph.D. Assistant Professor Department of Information Management Tamkang University
Social Media Apps Programming Min-Yuh Day, Ph.D. Assistant Professor Department of Information Management Tamkang University
© Paradigm Publishing, Inc. 5-1 Chapter 5 Application Software Chapter 5 Application Software.
© Paradigm Publishing Inc. 5-1 Chapter 5 Application Software.
ACIS Introduction to Data Analytics & Business Intelligence Big Data Import.
Social Media Apps Programming Min-Yuh Day, Ph.D. Assistant Professor Department of Information Management Tamkang University
A Powerful Python Library for Data Analysis BY BADRI PRUDHVI BADRI PRUDHVI.
Paperless Publishing web publishing. ebooks. digital paper.
Computer Engineering 1502 Advanced Digital Design Professor Donald Chiarulli Computer Science Dept Sennott Square
Scientific Python Numpy Linear Algebra Matrices Scipy Signal Processing Optimization IPython Interactive Console Matplotlib Plotting Sympy Symbolic Math.
Python for: Data Science. Python  Python is an open source scripting language.  Developed by Guido Van Rossum in late 1980s  Named after Monty Python.
Social Media Apps Programming Min-Yuh Day, Ph.D. Assistant Professor Department of Information Management Tamkang University
DATA MINING Pandas. Python Data Analysis Library A library for data analysis of (mostly) tabular data Gives capabilities similar to Excel and SQL but.
Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.
Practical Kinetics Exercise 2: Integral Rate Laws Objectives: 1.Import Experimental Data (Excel, CSV) 2.Fit to zero-, first-, and second-order rate laws.
Python & NetworkX Youn-Hee Han
Social Media Apps Programming Min-Yuh Day, Ph.D. Assistant Professor Department of Information Management Tamkang University
Social Computing and Big Data Analytics 社群運算與大數據分析
Social Computing and Big Data Analytics 社群運算與大數據分析
Social Computing and Big Data Analytics 社群運算與大數據分析 SCBDA02 MIS MBA (M2226) (8628) Wed, 8,9, (15:10-17:00) (Q201) Data Science and Big Data Analytics:
D ATA S CIENTISTS Who are they and what do they do?
1 Seattle University Master’s of Science in Business Analytics Key skills, learning outcomes, and a sample of jobs to apply for, or aim to qualify for,
Social Computing and Big Data Analytics 社群運算與大數據分析 SCBDA10 MIS MBA (M2226) (8628) Wed, 8,9, (15:10-17:00) (B309) Deep Learning with Google TensorFlow.
社群網路行銷管理 Social Media Marketing Management SMMM07 MIS EMBA (M2200) (8615) Thu, 12,13,14 (19:20-22:10) (D309) Min-Yuh Day 戴敏育 Assistant Professor.
Social Computing and Big Data Analytics 社群運算與大數據分析 SCBDA05 MIS MBA (M2226) (8628) Wed, 8,9, (15:10-17:00) (B309) Big Data Analytics with Numpy in.
Social Computing and Big Data Analytics 社群運算與大數據分析 SCBDA12 MIS MBA (M2226) (8628) Wed, 8,9, (15:10-17:00) (B309) Social Network Analysis ( 社會網絡分析.
Social Computing and Big Data Analytics 社群運算與大數據分析
Social Computing and Big Data Analytics 社群運算與大數據分析
Hadoop Big Data Usability Tools and Methods. On the subject of massive data analytics, usability is simply as crucial as performance. Right here are three.
How to Get Started With Python
Social Computing and Big Data Analytics 社群運算與大數據分析
Big data toolbox.
Python for data analysis Prakhar Amlathe Utah State University
Social Computing and Big Data Analytics 社群運算與大數據分析
Hadoop and Analytics at CERN IT
Big Data A Quick Review on Analytical Tools
Metis Data Science Meetup:
Map Reduce.
IPYTHON AND MATPLOTLIB Python for computational science
DATA MINING Python.
External libraries A very complete list can be found at PyPi the Python Package Index: To install, use pip, which comes with.
Python for Quant Finance
Network Visualization
Python Visualization Tools: Pandas, Seaborn, ggplot
Social Media Marketing Analytics 社群網路行銷分析
Data Science with Python Pandas
Deep Learning Packages
Preparing your Data using Python
Preparing your Data using Python
1.
Course Introduction CSC 576: Data Mining.
商業智慧實務 Practices of Business Intelligence
Discrete-Event Simulation and Performance Evaluation
Matplotlib and Pandas
Collecting, Analyzing, and Visualizing Data with Python Part I
Researching Industry Financial Statistics
Collecting, Analyzing, and Visualizing Data with Python Part II
Igor Stančin, Alan Jović to: {igor.stancin,
Beating the market -- forecasting the S&P 500 Index
Data Analytics Certificate Program
Presentation transcript:

Social Computing and Big Data Analytics 社群運算與大數據分析 SCBDA06 MIS MBA (M2226) (8628) Wed, 8,9, (15:10-17:00) (B309) Finance Big Data Analytics with Pandas in Python (Python Pandas 財務大數據分析 ) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University Dept. of Information ManagementTamkang University 淡江大學 淡江大學 資訊管理學系 資訊管理學系 tku.edu.tw/myday/ Tamkang University

週次 (Week) 日期 (Date) 內容 (Subject/Topics) /02/17 Course Orientation for Social Computing and Big Data Analytics ( 社群運算與大數據分析課程介紹 ) /02/24 Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data ( 資料科學與大數據分析: 探索、分析、視覺化與呈現資料 ) /03/02 Fundamental Big Data: MapReduce Paradigm, Hadoop and Spark Ecosystem ( 大數據基礎: MapReduce 典範、 Hadoop 與 Spark 生態系統 ) 課程大綱 (Syllabus) 2

週次 (Week) 日期 (Date) 內容 (Subject/Topics) /03/09 Big Data Processing Platforms with SMACK: Spark, Mesos, Akka, Cassandra and Kafka ( 大數據處理平台 SMACK : Spark, Mesos, Akka, Cassandra, Kafka) /03/16 Big Data Analytics with Numpy in Python (Python Numpy 大數據分析 ) /03/23 Finance Big Data Analytics with Pandas in Python (Python Pandas 財務大數據分析 ) /03/30 Text Mining Techniques and Natural Language Processing ( 文字探勘分析技術與自然語言處理 ) /04/06 Off-campus study ( 教學行政觀摩日 ) 課程大綱 (Syllabus) 3

週次 (Week) 日期 (Date) 內容 (Subject/Topics) /04/13 Social Media Marketing Analytics ( 社群媒體行銷分析 ) /04/20 期中報告 (Midterm Project Report) /04/27 Deep Learning with Theano and Keras in Python (Python Theano 和 Keras 深度學習 ) /05/04 Deep Learning with Google TensorFlow (Google TensorFlow 深度學習 ) /05/11 Sentiment Analysis on Social Media with Deep Learning ( 深度學習社群媒體情感分析 ) 課程大綱 (Syllabus) 4

週次 (Week) 日期 (Date) 內容 (Subject/Topics) /05/18 Social Network Analysis ( 社會網絡分析 ) /05/25 Measurements of Social Network ( 社會網絡量測 ) /06/01 Tools of Social Network Analysis ( 社會網絡分析工具 ) /06/08 Final Project Presentation I ( 期末報告 I) /06/15 Final Project Presentation II ( 期末報告 II) 課程大綱 (Syllabus) 5

6 Source:

Yahoo Finance Charts Apple Inc. (AAPL) 7

8 Source: Wes McKinney (2012), Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, O'Reilly Media

Yves Hilpisch, Python for Finance: Analyze Big Financial Data, O'Reilly, Source:

10 Source: Yves Hilpisch (2015), Derivatives Analytics with Python: Data Analysis, Models, Simulation, Calibration and Hedging, Wiley

11 Michael Heydt, Mastering Pandas for Finance, Packt Publishing, 2015 Source:

12 pandas

pandas Python Data Analysis Library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. 13 Source:

pandas: powerful Python data analysis toolkit 14

Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet Ordered and unordered (not necessarily fixed- frequency) time series data. Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure 15 pandas: powerful Python data analysis toolkit Source:

Series DataFrame The two primary data structures of pandas, Series (1-dimensional) and DataFrame (2-dimensional), handle the vast majority of typical use cases in finance, statistics, social science, and many areas of engineering. 16 Source:

pandas DataFrame DataFrame provides everything that R’s data.frame provides and much more. pandas is built on top of NumPy and is intended to integrate well within a scientific computing environment with many other 3rd party libraries. 17

pandas Comparison with SAS pandasSAS DataFramedata set columnvariable rowobservation groupbyBY-group NaN. 18 Source:

Yahoo Finance 19

20 conda update pandas

21 jupyter notebook

Jupyter New Python 2 Notebook 22

23 Jupyter New Python 2 Notebook import numpy as np import pandas as pd import matplotlib.pyplot as plt print('Hello Pandas') Source:

24 echo $SHELL

25 ls –a touch ~/.bash_profile; open ~/.bash_profile.bash_profile

26. bash_profile

27

28 ls –a -l.bash_profile

29 $ sudo vim ~/.bash_profile

30 :w $ sudo vim ~/.bash_profile i

31 :x.bash_profile

32 $ source ~/.bash_profile

33 import numpy as np import pandas as pd import matplotlib.pyplot as plt print('Hello Pandas') s = pd.Series([1,3,5,np.nan,6,8]) s dates = pd.date_range(' ', periods=6) dates Source:

34

35 df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD')) df

36 df = pd.DataFrame(np.random.randn(4,6), index=['student1','student2','student3','student4'], columns=list('ABCDEF')) df

37 df2 = pd.DataFrame({ 'A' : 1., 'B' : pd.Timestamp(' '), 'C' : pd.Series(2.5,index=list(range(4)),dtype='float32'), 'D' : np.array([3] * 4,dtype='int32'), 'E' : pd.Categorical(["test","train","test","train"]), 'F' : 'foo' }) df2

38 df2.dtypes

Apple Inc. (AAPL) -NasdaqGS 39

Yahoo Finance Charts Apple Inc. (AAPL) 40

Apple Inc. (AAPL) 41

42 Yahoo Finance Historical Prices Apple Inc. (AAPL)

Yahoo Finance Historical Prices 43 Date,Open,High,Low,Close,Volume,Adj Close ,105.25, , , , , ,105.93, , , , , , ,106.50, , , , , , , , , , , , , , , , , ,105.18, , , , , , , , , , , , ,101.50, , , , , , , , , , , , , , , , , , , , , , , , , , , , ,103.75, , , , , , , ,101.50, , , , , ,100.75, , , , , , , , , , , , , , , , , , , , , , ,95.25, , , , , ,93.32, , , , ,96.50, , , , , , , , , , ,96.00, , , , , , , , , , , , , , , , , table.csv

44 Yahoo Finance Charts Alphabet Inc. (GOOG) Alphabet Inc. (GOOG)

45 Taiwan Semiconductor Manufacturing Company Limited (2330.TW)

46 Yahoo Finance Charts TSMC (2330.TW)

Dow Jones Industrial Average (^DJI) 47

TSEC weighted index (^TWII) - Taiwan 48

49 import pandas.io.data as web #df = web.DataReader('AAPL', 'yahoo') df = web.DataReader('AAPL', data_source='yahoo', start='1/1/2010', end='3/22/2016') #df = web.DataReader('AAPL', data_source='google', start='1/1/2010', end='3/22/2016') df.tail()

50 df = web.DataReader('GOOG', data_source='yahoo', start='1/1/1980', end='3/23/2016') df.head(10)

51 df.tail(10)

52 df2.count()

53 import pandas.io.data as web import datetime start = datetime.datetime(2010, 1, 1) end = datetime.datetime(2015, 12, 31) f = web.DataReader(”F", 'yahoo', start, end) f.ix[' ']

54 import pandas.io.data as web import datetime start = datetime.datetime(2010, 1, 1) end = datetime.datetime(2016, 3, 23) f = web.DataReader("2330.TW", 'yahoo', start, end) f.ix[' ']

55 f.to_csv('2330.TW.Yahoo.Finance.Data.csv')

56 sSymbol = "AAPL" #sSymbol = "GOOG" #sSymbol = "IBM" #sSymbol = "MSFT" #sSymbol = "^TWII" #sSymbol = " SS" #sSymbol = "2330.TW" #sSymbol = "2317.TW" # sURL = " # sBaseURL = " sURL = " + sSymbol #req = requests.get(" #req = requests.get(" req = requests.get(sURL) sText = req.text #print(sText) #df = web.DataReader(sSymbol, 'yahoo', starttime, endtime) #df = web.DataReader("2330.TW", 'yahoo') sPath = "data/" sPathFilename = sPath + sSymbol + ".csv" print(sPathFilename) f = open(sPathFilename, 'w') f.write(sText) f.close() sIOdata = io.StringIO(sText) df = pd.DataFrame.from_csv(sIOdata) df.head(5)

57

58 sSymbol = "AAPL" #sSymbol = "GOOG" #sSymbol = "IBM" #sSymbol = "MSFT" #sSymbol = "^TWII" #sSymbol = " SS" #sSymbol = "2330.TW" #sSymbol = "2317.TW" # sURL = " # sBaseURL = " sURL = " + sSymbol #req = requests.get(" #req = requests.get(" req = requests.get(sURL) sText = req.text #print(sText) #df = web.DataReader(sSymbol, 'yahoo', starttime, endtime) #df = web.DataReader("2330.TW", 'yahoo') sPath = "data/" sPathFilename = sPath + sSymbol + ".csv" print(sPathFilename) f = open(sPathFilename, 'w') f.write(sText) f.close() sIOdata = io.StringIO(sText) df = pd.DataFrame.from_csv(sIOdata) df.head(5)

59

60 df.tail(5)

61 sSymbol = "AAPL” # sURL = " sURL = " + sSymbol #req = requests.get(" req = requests.get(sURL) sText = req.text #print(sText) sPath = "data/" sPathFilename = sPath + sSymbol + ".csv" print(sPathFilename) f = open(sPathFilename, 'w') f.write(sText) f.close() sIOdata = io.StringIO(sText) df = pd.DataFrame.from_csv(sIOdata) df.head(5)

62

63

64 def getYahooFinanceData(sSymbol, starttime, endtime, sDir): #GetMarketFinanceData_From_YahooFinance #"^TWII" #" SS" #"AAPL" #SHA:000016" #" SS" #"2330.TW" #sSymbol = "^TWII" starttime = datetime.datetime(2000, 1, 1) endtime = datetime.datetime(2015, 12, 31) sPath = sDir #sPath = "data/financedata/" df_YahooFinance = web.DataReader(sSymbol, 'yahoo', starttime, endtime) #df_01 = web.DataReader("2330.TW", 'yahoo') sSymbol = sSymbol.replace(":","_") sSymbol = sSymbol.replace("^","_") sPathFilename = sPath + sSymbol + "_Yahoo_Finance.csv" df_YahooFinance.to_csv(sPathFilename) #df_YahooFinance.head(5) return sPathFilename #End def getYahooFinanceData(sSymbol, starttime, endtime, sDir):

65

66 sSymbol = "AAPL” starttime = datetime.datetime(2000, 1, 1) endtime = datetime.datetime(2015, 12, 31) sDir = "data/financedata/" sPathFilename = getYahooFinanceData(sSymbol, starttime, endtime, sDir) print(sPathFilename)

References Wes McKinney (2012), Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, O'Reilly Media Yves Hilpisch (2014), Python for Finance: Analyze Big Financial Data, O'Reilly Yves Hilpisch (2015), Derivatives Analytics with Python: Data Analysis, Models, Simulation, Calibration and Hedging, Wiley Michael Heydt (2015), Mastering Pandas for Finance, Packt Publishing Michael Heydt (2015), Learning Pandas - Python Data Discovery and Analysis Made Easy, Packt Publishing James Ma Weiming (2015), Mastering Python for Finance, Packt Publishing Fabio Nelli (2015), Python Data Analytics: Data Analysis and Science using PANDAs, matplotlib and the Python Programming Language, Apress Wes McKinney (2013), 10-minute tour of pandas, Jason Wirth (2015), A Visual Guide To Pandas, Ti6onewhttps:// Ti6onew Edward Schofield (2013), Modern scientific computing and big data analytics in Python, PyCon Australia, 67