Exploring Pipeline APIs on Blueshift Notebooks
The pipeline APIs on Blueshift provides a systematic way to analyse a large universe of instruments. In this notebook, we will set up a simple notebook to understand how to create and use pipelines.
The first thing to do is to import the necessary classes from the blueshift.pipeline
module, namely the Pipeline
constructor, the CustomFilter
and the CustomFactor
. The last two are required as we want to define our own filters and factors (instead of using the built-in ones). We also need to import EquityPricing
- this is a column definition that binds the pipeline to its required inputs. Let’s define a custom filter and a custom factor as show below.
[1]:
from blueshift.research import use_dataset, run_pipeline
from blueshift.pipeline import Pipeline, CustomFilter, CustomFactor
from blueshift.pipeline.data import EquityPricing
class TypicalPriceUp(CustomFilter):
inputs = [EquityPricing.high, EquityPricing.low, EquityPricing.close]
def compute(self,today,assets,out, high_price, low_price, close_price):
typical = (high_price + low_price + close_price)/3
out[:] = typical[-1] > typical[-2]
class PeriodReturns(CustomFactor):
inputs = [EquityPricing.close]
def compute(self,today,assets,out, close_price):
returns = close_price[-1]/close_price[0] - 1
out[:] = returns
In the above custom factor (PeriodReturns
), we define the inputs to the pipeline computation to be only the “close” price column. However, the filter (TypicalPriceUp
) requires three pricing columns. These are defined as the respective class variables inputs
. The pipeline class, when running the computation, automatically refers to this inputs
class variables and passes on the required columns to the compute
method. As a result, the compute
method for the filter expects
three extra pricing columns (apart from the always preset today
, assets
and out
parameters) and the same for the factor has only one extra pricing column. With this, we simply write the required logic in the compute
function and make sure the results are put back in the provided out
parameter. Note: the filter must return True
or False
from its computations and the factors output should real values.
Now lets create a pipeline and run it
[2]:
pipe = Pipeline()
pipe.add(PeriodReturns(window_length=10), 'returns')
pipe.set_screen(TypicalPriceUp(window_length=5))
We have added the PeriodReturns
factor using the add
method, and named it simply “returns”. Also we used the set_screen
method to add the filter TypicalPriceUp
as well. Now let’s run the pipeline over some period.
[3]:
use_dataset('nse')
results = run_pipeline(pipe, '2020-10-10', '2020-10-15')
results
[3]:
returns | ||
---|---|---|
2020-10-12 00:00:00+05:30 | Equity(3MINDIA [3]) | 0.072567 |
Equity(AARTIDRUGS [7]) | 0.324145 | |
Equity(AAVAS [10]) | 0.061269 | |
Equity(ABB [12]) | 0.028202 | |
Equity(ABBOTINDIA [13]) | -0.006489 | |
... | ... | ... |
2020-10-15 00:00:00+05:30 | Equity(WINPRO [710]) | 0.001053 |
Equity(XCHANGING [1492]) | -0.050919 | |
Equity(ZODIACLOTH [1503]) | -0.001485 | |
Equity(ZOTA [1505]) | -0.008174 | |
Equity(ZYDUSWELL [1508]) | -0.008715 |
1437 rows × 1 columns
The input to the pipeline compute
function is automatically computed (based on the all surviving assets
on the day of computation and the required pricing fields as implied by the inputs
class variable). The output is a multi-index dataframe with compute date as the first level index and the assets (passing the filter on that day) as the second level. The columns (if any) will be the factors that were add
-ed to the pipe. We can easily subset for further analysis, for example the
output on 12th Oct is as below
[4]:
import pandas as pd
results.xs(pd.Timestamp('2020-10-12', tz='Asia/Calcutta'))
[4]:
returns | |
---|---|
Equity(3MINDIA [3]) | 0.072567 |
Equity(AARTIDRUGS [7]) | 0.324145 |
Equity(AAVAS [10]) | 0.061269 |
Equity(ABB [12]) | 0.028202 |
Equity(ABBOTINDIA [13]) | -0.006489 |
... | ... |
Equity(WONDERLA [1490]) | 0.070835 |
Equity(ZENTEC [1501]) | 0.091623 |
Equity(ZOTA [1505]) | 0.002055 |
Equity(ZYDUSLIFE [221]) | 0.130435 |
Equity(ZYDUSWELL [1508]) | -0.005943 |
377 rows × 1 columns
When you are using pipline APIs in a strategy, this is exactly what you get as the returned value from the pipeline_output
API function - a dataframe like above computed for the current date for the strategy.
Pipline APIs provide a powerful way to anaylze large universe, including factor and ML strategies. Now that you know how to create and use pipeline, feel free to explore more!