Focus
There are too many distractions. So it is very important to focus.
There are too many distractions. So it is very important to focus.
This post download data from Yahoo finance and does some basic EDA.
!conda env list
%matplotlib inline
#Importing The Data
from pandas_datareader import data
import pandas as pd
# Define the instruments to download. We would like to see Apple, Microsoft and the S&P500 index.
# tickers = ['W', 'HD', 'AMZN','LOW','LL']
tickers = ['ROIC', 'SKT', 'TCO' ,'SPG' ,'MAC']
# Define which online source one should use
data_source = 'yahoo'
# We would like all available data from 01/01/2000 until 12/31/2016.
start_date = '2016-12-01'
end_date = '2017-12-31'
# User pandas_reader.data.DataReader to load the desired data. As simple as that.
panel_data = data.DataReader(tickers, data_source, start_date, end_date)
panel_data.head()
df = panel_data['Adj Close']
# Basic Description of the Data
df.describe()
first = df.head()
last = df.tail()
print(first)
print(last)
df.sample(6)
# A Closer Look At Your Data: Queries
df.query('MAC == ROIC')
#cleaning
print(df.columns[df.isnull().any()])
# Getting all weekdays between 01/01/2000 and 12/31/2016
all_weekdays = pd.date_range(start=start_date, end=end_date, freq='B')
# How do we align the existing prices in adj_close with our new set of dates?
# All we need to do is reindex adj_close using all_weekdays as the new index
df = df.reindex(all_weekdays)
# Reindexing will insert missing values (NaN) for the dates that were not present
# in the original set. To cope with this, we can fill the missing by replacing them
# with the latest available price for each instrument.
df = df.fillna(method='ffill')
df.isnull().head()
# Define your own bins
mybins = range(int(df.MAC.min()), int(df.MAC.max()), 2)
# Cut the data with the help of the bins
df['MAC_bucket'] = pd.cut(df.MAC, bins=mybins)
# Count the number of values per bucket
df['MAC_bucket'].value_counts()
from pandas_datareader import data
import pandas as pd
%matplotlib inline
# Define the instruments to download. We would like to see Apple, Microsoft and the S&P500 index.
# tickers = ['W', 'HD', 'AMZN','LOW','LL']
tickers = ['ROIC', 'SKT', 'TCO' ,'SPG' ,'MAC']
# Define which online source one should use
data_source = 'yahoo'
# We would like all available data from 01/01/2000 until 12/31/2016.
start_date = '2016-12-01'
end_date = '2017-12-31'
# User pandas_reader.data.DataReader to load the desired data. As simple as that.
panel_data = data.DataReader(tickers, data_source, start_date, end_date)
# Getting just the adjusted closing prices. This will return a Pandas DataFrame
# The index in this DataFrame is the major index of the panel_data.
close = panel_data.ix['Close']
# Getting all weekdays between 01/01/2000 and 12/31/2016
all_weekdays = pd.date_range(start=start_date, end=end_date, freq='B')
# How do we align the existing prices in adj_close with our new set of dates?
# All we need to do is reindex close using all_weekdays as the new index
close = close.reindex(all_weekdays)
close.head()
# Getting just the adjusted closing prices. This will return a Pandas DataFrame
# The index in this DataFrame is the major index of the panel_data.
adj_close = panel_data.ix['Adj Close']
# Getting all weekdays between 01/01/2000 and 12/31/2016
all_weekdays = pd.date_range(start=start_date, end=end_date, freq='B')
# How do we align the existing prices in adj_close with our new set of dates?
# All we need to do is reindex adj_close using all_weekdays as the new index
adj_close = adj_close.reindex(all_weekdays)
# Reindexing will insert missing values (NaN) for the dates that were not present
# in the original set. To cope with this, we can fill the missing by replacing them
# with the latest available price for each instrument.
adj_close = adj_close.fillna(method='ffill')
adj_close.describe()
# Define your own bins
mybins = range(df.MAC.min(), df.MAC.max(), 10)
# Cut the data with the help of the bins
df['MAC_bucket'] = pd.cut(df.MAC, bins=mybins)
# Count the number of values per bucket
df['MAC_bucket'].value_counts()
Reference: http://tetration.xyz/lumpsum_vs_dca/
import pandas as pd
import pandas_datareader.data as web
import datetime
pd.set_option('display.width', 200) # Displaying more columns in one row
# Data date range, Google provides up to 4000 entries in one call
start = datetime.datetime(2017, 9, 1)
end = datetime.datetime(2018, 2, 3)
spy = web.DataReader("AAPL", "yahoo", start, end)
print(spy.head()) # See first few rows
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
from matplotlib import style
style.use('fivethirtyeight')
spy['Adj Close'].plot(figsize=(20,10))
ax = plt.subplot()
ax.yaxis.set_major_formatter(FuncFormatter(lambda x, pos: '${:,.0f}'.format(x))) # Y axis dollarsymbols
plt.title('AAPL Historical Price on Close')
plt.xlabel('')
plt.ylabel('Stock Price ($)');
value_price = spy['Adj Close'][-1] # The final value of our stock
initial_investment = 10000 # Our initial investment of $10k
num_stocks_bought = initial_investment / spy['Adj Close']
lumpsum = num_stocks_bought * value_price
lumpsum.name = 'Lump Sum'
lumpsum.plot(figsize=(20,10))
ax = plt.subplot()
ax.yaxis.set_major_formatter(FuncFormatter(lambda x, pos: '${:,.0f}'.format(x))) # Y axis dollarsymbols
plt.title('Lump sum - Value today of $10,000 invested on date')
plt.xlabel('')
plt.ylabel('Investment Value ($)');
def doDCA(investment, start_date):
# Get 12 investment dates in 7 day increments starting from start date
investment_dates_all = pd.date_range(start_date,periods=12,freq='7D')
# Remove those dates beyond our known data range
investment_dates = investment_dates_all[investment_dates_all < spy.index[-1]]
# Get closest business dates with available data
closest_investment_dates = spy.index.searchsorted(investment_dates)
# How much to invest on each date
portion = investment/12.0 # (Python 3.0 does implicit double conversion, Python 2.7 does not)
# Get the total of all stocks purchased for each of those dates (on the Close)
stocks_invested = sum(portion / spy['Adj Close'][closest_investment_dates])
# Add uninvested amount back
uninvested_dollars = portion * sum(investment_dates_all >= spy.index[-1])
# value of stocks today
total_value = value_price*stocks_invested + uninvested_dollars
return total_value
# Generate DCA series for every possible date
dca = pd.Series(spy.index.map(lambda x: doDCA(initial_investment, x)), index=spy.index, name='Dollar Cost Averaging (DCA)')
dca.plot(figsize=(20,10))
ax = plt.subplot()
ax.yaxis.set_major_formatter(FuncFormatter(lambda x, pos: '${:,.0f}'.format(x))) # Y axis dollarsymbols
plt.title('Dollar Cost Averaging - Value today of $10,000 invested on date')
plt.xlabel('')
plt.ylabel('Investment Value ($)');
Step 1: Download and install Anaconda
Step 2: Launch IPython
Step 3: Copy code from below and paste to your Notebook
Download and install Anaconda (this gives you ipython and a nice enviroment)
Install Nikola
Follow the tutorial
Helpful Links:
http://www.damian.oquanta.info/posts/ipython-plugin-for-nikola-updated.html
https://shankarmsy.github.io/posts/blogging-with-the-awesome-nikola-ipython-and-github.html
# http://nbviewer.jupyter.org/gist/theandygross/4544012
import sys
print('Python: {}'.format(sys.version))
print ('This is really exciting')
print (12)
print ('hello, notebook blog :)')
https://shankarmsy.github.io/posts/blogging-with-the-awesome-nikola-ipython-and-github.html
Problem 1: conda, virtual env doesn't work for me.
https://conda.io/docs/using/envs.html#create-an-environment
conda create -n blog python
jakehku blog $ conda create -n blog Fetching package metadata ........... Solving package specifications: Package plan for installation in environment /anaconda/envs/blog:
Proceed ([y]/n)? y
# # To activate this environment, use: # > source activate blog # # To deactivate an active environment, use: # > source deactivate #.
Deploying to GitHub
Nikola provides a separate command github_deploy to deploy your site to GitHub Pages. The command builds the site, commits the output to a gh-pages branch and pushes the output to GitHub. Nikola uses the ghp-import command for this.
In order to use this feature, you need to configure a few things first. Make sure you have nikola and git installed on your PATH.
Initialize a Nikola site, if you haven’t already.
Initialize a git repository in your Nikola source directory by running:
git init . git remote add origin git@github.com:user/repository.git Setup branches and remotes in conf.py:
GITHUB_DEPLOY_BRANCH is the branch where Nikola-generated HTML files will be deployed. It should be gh-pages for project pages and master for user pages (user.github.io). GITHUB_SOURCE_BRANCH is the branch where your Nikola site source will be deployed. We recommend and default to src. GITHUB_REMOTE_NAME is the remote to which changes are pushed. GITHUB_COMMIT_SOURCE controls whether or not the source branch is automatically committed to and pushed. We recommend setting it to True, unless you are automating builds with Travis CI. Create a .gitignore file. We recommend adding at least the following entries:
cache .doit.db __pycache__ output If you set GITHUB_COMMIT_SOURCE to False, you must switch to your source branch and commit to it. Otherwise, this is done for you.
Run nikola
github_deploy
. This will build the site, commit the output folder to your deploy branch, and push to GitHub. Your website should be up and running within a few minutes.
If you want to use a custom domain, create your CNAME file in files/CNAME on the source branch. Nikola will copy it to the output directory. To add a custom commit message, use the -m option, followed by your message.
git pull origin src --allow-unrelated-histories