Prototyping Trading Strategies with Python


Getting Started with Python for Algorithmic Trading


© Ran Aroussi
@aroussi | aroussi.com | github.com/ranaroussi



February, 2017

About me

History

  • Programming since forever
  • Operating in Online / AdTech space since 1997
  • Trading since 2013
  • Switched to automated trading in 2014

Co-Founder of Native Alpha

Trading mostly Futures and Options using automated and semi-automated strategies, with a 100% technical / quantitative approach

Developer of ezIBpy

A Pythonic wrapper for the IbPy library, that ease the communication with Interactive Brokers for market data and order execution.

Available via github.com/ranaroussi/ezibpy

Developer of QTPyLib

A simple, event-driven, algorithmic trading system written in Python, that supports backtesting and live trading using Interactive Brokers for market data and order execution (QTPyLib stands for: Quantitative Trading Python Library).

Available via github.com/ranaroussi/qtpylib

Agenda

  • Why Pyhton for Algorithmic Trading?
  • The Python Toolbox: Platforms and Libraries for Trading
  • Data Wrangling and Working with Market Data
  • Using Technical Indicators with Python
  • Coming Up with Trading Ideas
  • Writing and Testing your first Trading Strategy
  • Strategy Development Workflow

* This is the first webinar of a multi webinar series

Why Python?

Python's Benifits

  • A very high-level language with clean and readable syntax
  • Fast learning curve
  • Object-oriented, multi-paradigm
  • A rich standard library of modules
  • One of the most used languages in IoT, data science, finance, and web development
  • A strong open-source community

"Hello World" in Python

In [ ]:
print("hello world")

Compared to C#...

In [ ]:
public class Hello
{
   public static void Main()
   {
      System.Console.WriteLine("Hello, World!");
   }
}

Why Python for Trading?

Python for Algorithmic Trading

  • Python's syntax is attractive for science, and particularly finance
  • Fast learning curve
  • Solid Machine Learning libraries
  • Quick protopyting, fast enough execution
  • Many Algorithmic Trading Libraries and Tools (Zipline, PyAlgoTrade, PyBacktest, QTPyLib)
  • Pandas!

The Python Toolbox

Essential Libraries

  • Pandas - times series and tabular data wrangler
  • NumPy - fast (vectorized) array operations
  • Matplotlib - plotting library
  • Jupyter Notebook - research platform

Basically - just install Anaconda :-)

  • TA-Lib - technical indicators
  • scikit-learn - machine learning algorithms
  • Tensorflow - deep learning / neural network
  • SciPy - scientific functions
  • statsmodels - statistical functions
  • SQLAlchemy - relational database abstraction library
  • TsTables - tick data storage and retrieval
  • ezIBpy - Interactive Brokers bridge
  • OandaPy - Oanda bridge

Python 2 or Python 3?

Python 3!

Data Wrangling and Working with Market Data

Pandas - the times series and tabular data wrangler

  • Pandas is one of the best libraries (in any language, IMO) for data exploration and manipulation
  • Main objects are DataFrame and Series
  • Has many available built-in operations
In [3]:
import pandas as pd
import numpy as np
from pandas_datareader import data

spy = data.get_data_yahoo("SPY", start="2000-01-01")
spy.head()
Out[3]:
Open High Low Close Volume Adj Close
Date
2000-01-03 148.250000 148.250000 143.875000 145.4375 8164300 105.825332
2000-01-04 143.531204 144.062500 139.640594 139.7500 8089800 101.686912
2000-01-05 139.937500 141.531204 137.250000 140.0000 12177900 101.868820
2000-01-06 139.625000 141.500000 137.750000 137.7500 6227200 100.231643
2000-01-07 140.312500 145.750000 140.062500 145.7500 8066500 106.052718
In [4]:
ratio = spy["Close"] / spy["Adj Close"]

spy["close"]  = spy["Adj Close"]
spy["open"]   = spy["Open"] / ratio
spy["high"]   = spy["High"] / ratio
spy["low"]    = spy["Low"] / ratio
spy["volume"] = spy["Volume"]

spy = spy[['open','high','low','close','volume']]
spy.head()
Out[4]:
open high low close volume
Date
2000-01-03 107.871804 107.871804 104.688403 105.825332 8164300
2000-01-04 104.438246 104.824835 101.607304 101.686912 8089800
2000-01-05 101.823343 102.982977 99.867825 101.868820 12177900
2000-01-06 101.595958 102.960272 100.231643 100.231643 6227200
2000-01-07 102.096206 106.052718 101.914297 106.052718 8066500
In [5]:
spy.to_csv('~/Desktop/sp500_ohlc.csv')
In [6]:
spy.shape
Out[6]:
(4312, 5)
In [7]:
# calculate returns
spy['return'] = spy['close'].pct_change()
spy['return'].describe()
Out[7]:
count    4311.000000
mean        0.000264
std         0.012463
min        -0.098448
25%        -0.005092
50%         0.000652
75%         0.005935
max         0.145198
Name: return, dtype: float64
In [8]:
# slicing
spy[5:10][["close", "return"]]
Out[8]:
close return
Date
2000-01-10 106.416535 0.003431
2000-01-11 105.143175 -0.011966
2000-01-12 104.097201 -0.009948
2000-01-13 105.506992 0.013543
2000-01-14 106.939489 0.013577
In [9]:
spy[ spy['return'] > 0.005 ]['return'].describe()
Out[9]:
count    1244.000000
mean        0.012878
std         0.009962
min         0.005002
25%         0.006912
50%         0.010097
75%         0.014883
max         0.145198
Name: return, dtype: float64
In [10]:
spy['close'].plot()
Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x110e1d9b0>
In [11]:
spy['return'].plot.hist(bins=100, edgecolor='white')
Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x112b8a780>
In [12]:
# resampling
spy['return'].resample("1A").sum() * 100
Out[12]:
Date
2000-12-31    -6.429918
2001-12-31   -10.111384
2002-12-31   -20.833354
2003-12-31    26.198963
2004-12-31    10.784570
2005-12-31     5.246394
2006-12-31    15.209572
2007-12-31     6.277696
2008-12-31   -37.358504
2009-12-31    26.929435
2010-12-31    15.631094
2011-12-31     4.527766
2012-12-31    15.638902
2013-12-31    28.622693
2014-12-31    13.265248
2015-12-31     2.414655
2016-12-31    12.184697
2017-12-31     5.575270
Freq: A-DEC, Name: return, dtype: float64

Using Technical Indicators with Python

Technical Indicators with Python

  • For basic indicators use Pandas methods
  • Write your own indicators in Python
  • Use TA-Lib
In [35]:
# pandas' rolling mean
spy['ma1'] = spy['close'].rolling(window=50).mean()
spy['ma2'] = spy['close'].rolling(window=200).mean()
In [13]:
spy[['ma1', 'ma2', 'close']].plot(linewidth=1)
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x1192ebfd0>
In [14]:
# rolling standard deviation
spy['return'].rolling(window=20).std().plot()
Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x119a234a8>
In [15]:
# Calculating log returns and volatility
spy['logret'] = np.log(spy['close'] / spy['close'].shift(1))
spy['volatility'] = spy['logret'].rolling(window=252).std() * np.sqrt(252)

spy[['close', 'volatility']].plot(subplots=True)
Out[15]:
array([<matplotlib.axes._subplots.AxesSubplot object at 0x11a0c9518>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x11a0ff198>], dtype=object)
In [17]:
# pure python bollinger bands
spy['sma'] = spy['close'].rolling(window=20).mean()
spy['std'] = spy['close'].rolling(window=20).std()

spy['upperbb'] = spy['sma'] + (spy['std'] * 2)
spy['lowerbb'] = spy['sma'] - (spy['std'] * 2)

plot_candlestick(spy[-100:])
plt.plot(spy[-100:][['upperbb', 'sma', 'lowerbb']], linewidth=1)
Out[17]:
[<matplotlib.lines.Line2D at 0x11a0b7048>,
 <matplotlib.lines.Line2D at 0x11b012898>,
 <matplotlib.lines.Line2D at 0x11b012ba8>]
In [36]:
# using TA-Lib..
import talib as ta

spy['rsi'] = ta.RSI(spy['close'].values, timeperiod=2)
spy[-100:][['close', 'rsi']].plot(subplots=True)
Out[36]:
array([<matplotlib.axes._subplots.AxesSubplot object at 0x11d499d68>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x11d46c320>], dtype=object)
In [19]:
# having some fun :)

data = spy[-100:]

fig = plt.figure()
gs = gridspec.GridSpec(2, 1, height_ratios=[4,1])

ax0 = plt.subplot(gs[0])
plt.plot(data['close'])
ax0.set_ylabel('close')
ax0.fill_between(data.index, data['close'].min(), data['close'], alpha=.25)

ax1 = plt.subplot(gs[1], sharex=ax0)
ax1.set_ylabel('RSI 2')
ax1.plot(data['rsi'], color="navy", linewidth=1)
ax1.axhline(90, color='r', linewidth=1.5)
ax1.axhline(10, color='g', linewidth=1.5)
Out[19]:
<matplotlib.lines.Line2D at 0x11c0903c8>

Coming Up with Trading Ideas

Trading Ideas Sources

  • Books
  • Podcasts
  • Forums
  • Workshops and Meetups
  • SSRN (Social Science Research Network)
  • Observations
  • Bugs and miscalculations 😁

Strategy Development Workflow

My Workflow

  • Quick proof of concept of a trading idea
  • Use a vectoried backtest first for fast testing and dumping ideas
  • Move to an event-based backtest for trading ideas that shows potential
  • Scrutinize and Optimize
  • Forward test
  • Live Trading

Vectorized vs. Event Based Backtesting

Vectorized Backtesting

Pros:

  • Very fast compared to alternative
  • Great for quickly testing ideas with minimal coding

Cons:

  • Danger of look-ahead bias
  • Works of pre-existing, pre-loaded data
  • Code usually cannot be used "as-is" for live trading

Event-Based Backtesting

Pros:

  • More reliable backtesting (no look-ahead bias)
  • Can handle live data
  • Code can be used "as-is" for live trading

Cons:

  • Takes longer to write and backtest
  • Not the best choise for prototyping IMO

Basic Strategy Example

  • BUY ON CLOSE when SPY drops 0.5% (or more)
  • SELL ON CLOSE of next day

Vectorized Prototyping

In [20]:
portfolio = pd.DataFrame(data={ 'spy': spy['return'] })
portfolio['strategy'] = portfolio[portfolio['spy'].shift(1) <= -0.005]['spy']
In [21]:
portfolio.fillna(0).cumsum().plot()
Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x119ad32b0>

Measure Performance with Sharpe Ratio

Sharpe Ratio

Measures of risk adjusted return of investment (>1 is good, >2 is great)

Formula:


Sharpe(X) = (rX - Rf) / StdDev(X)

Where:

  • X is the investment
  • rX is the average rate of the return of X
  • Rf is the best available risk-free security (i.e. T-bills)
  • StdDev(X) is the standard deviation of rX

Annualized Sharpe Ratio

In [22]:
def sharpe(returns, periods=252, riskfree=0):
    returns = returns.dropna()
    return np.sqrt(periods) * (np.mean(returns-riskfree)) / np.std(returns)

Initial (and very basic) metrics

In [23]:
# benchmark sharpe
sharpe(portfolio['spy'])
Out[23]:
0.33619610122573501
In [24]:
# strategy sharpe
sharpe(portfolio['strategy'])
Out[24]:
1.3283765902862263
In [25]:
# time in market
len(portfolio['strategy'].dropna()) / len(portfolio)
Out[25]:
0.25185528756957326

EOY Returns

In [38]:
eoy = portfolio.resample("A").sum()
eoy['diff'] = eoy['strategy']/eoy['spy']

print( np.round(eoy[['spy', 'strategy', 'diff']] * 100, 2) )
              spy  strategy    diff
Date                               
2000-12-31  -6.43      3.69  -57.44
2001-12-31 -10.11     -5.20   51.39
2002-12-31 -20.83      9.04  -43.40
2003-12-31  26.20     13.54   51.69
2004-12-31  10.78      2.37   22.00
2005-12-31   5.25      9.82  187.18
2006-12-31  15.21      7.22   47.48
2007-12-31   6.28     19.61  312.41
2008-12-31 -37.36     23.44  -62.75
2009-12-31  26.93      8.87   32.96
2010-12-31  15.63     17.44  111.55
2011-12-31   4.53     10.44  230.57
2012-12-31  15.64      5.60   35.82
2013-12-31  28.62      9.61   33.58
2014-12-31  13.27      6.53   49.20
2015-12-31   2.41     -2.61 -108.18
2016-12-31  12.18      8.18   67.12
2017-12-31   5.58     -0.01   -0.16

Enother Example

  • GO LONG where 50-day SMA > 200-day SMA
  • GO SHORT where 50-day SMA < 200-day SMA
In [41]:
ma_portfolio = spy[['close', 'return']].copy()
ma_portfolio.rename(columns={'return':'spy'}, inplace=True)
In [42]:
ma_portfolio['ma1'] = ma_portfolio['close'].rolling(window=50).mean()
ma_portfolio['ma2'] = ma_portfolio['close'].rolling(window=200).mean()
In [43]:
ma_portfolio[['ma1', 'ma2', 'close']].plot(linewidth=1)
Out[43]:
<matplotlib.axes._subplots.AxesSubplot at 0x11da647f0>
In [28]:
ma_portfolio['position'] = np.where(
    ma_portfolio['ma1'].shift(1) > ma_portfolio['ma2'].shift(1), 1, -1)

ma_portfolio['strategy'] = ma_portfolio['position'] * ma_portfolio['spy']
In [29]:
ma_portfolio[['strategy', 'spy']].cumsum().plot()
Out[29]:
<matplotlib.axes._subplots.AxesSubplot at 0x11c13d2b0>

Initial metrics

In [30]:
# benchmark sharpe
sharpe(ma_portfolio['spy'])
Out[30]:
0.33619610122573501
In [31]:
# strategy sharpe
sharpe(ma_portfolio['strategy'])
Out[31]:
0.43439861413278519
In [32]:
# time in market
len(ma_portfolio['strategy'].dropna()) / len(ma_portfolio)
Out[32]:
0.9997680890538033
In [40]:
eoy = ma_portfolio.resample("A").sum()
eoy['diff'] = eoy['strategy']/eoy['spy']
print( np.round(eoy[['spy', 'strategy', 'diff']] * 100, 2) )
              spy  strategy    diff
Date                               
2000-12-31  -6.43     13.20 -205.34
2001-12-31 -10.11     10.11 -100.00
2002-12-31 -20.83     17.73  -85.10
2003-12-31  26.20      9.20   35.13
2004-12-31  10.78      6.56   60.84
2005-12-31   5.25      5.25  100.00
2006-12-31  15.21      9.06   59.57
2007-12-31   6.28      7.76  123.58
2008-12-31 -37.36     37.36 -100.00
2009-12-31  26.93     15.67   58.19
2010-12-31  15.63     -7.32  -46.82
2011-12-31   4.53    -10.19 -225.06
2012-12-31  15.64      6.07   38.84
2013-12-31  28.62     28.62  100.00
2014-12-31  13.27     13.27  100.00
2015-12-31   2.41     -9.04 -374.21
2016-12-31  12.18    -11.91  -97.75
2017-12-31   5.58      5.58  100.00

To Be Continued...

This was the first webinar out of 4 webinars we're planning.

Up Next...

  • Prototyping Trading Strategies
  • Backtesting & Optimization
  • Live Trading
  • Using Machine Learning in Trading

Q&A Time

Prototyping Trading Strategies with Python


Thank you for attending!


© Ran Aroussi
@aroussi | aroussi.com | github.com/ranaroussi



February, 2017