Exploring Pipeline APIs on Blueshift Notebooks

The pipeline APIs on Blueshift provides a systematic way to analyse a large universe of instruments. In this notebook, we will set up a simple notebook to understand how to create and use pipelines.

The first thing to do is to import the necessary classes from the blueshift.pipeline module, namely the Pipeline constructor, the CustomFilter and the CustomFactor. The last two are required as we want to define our own filters and factors (instead of using the built-in ones). We also need to import EquityPricing - this is a column definition that binds the pipeline to its required inputs. Let’s define a custom filter and a custom factor as show below.

[1]:

from blueshift.research import use_dataset, run_pipeline

from blueshift.pipeline import Pipeline, CustomFilter, CustomFactor
from blueshift.pipeline.data import EquityPricing

class TypicalPriceUp(CustomFilter):
    inputs = [EquityPricing.high, EquityPricing.low, EquityPricing.close]

    def compute(self,today,assets,out, high_price, low_price, close_price):
        typical = (high_price + low_price + close_price)/3
        out[:] = typical[-1] > typical[-2]

class PeriodReturns(CustomFactor):
    inputs = [EquityPricing.close]

    def compute(self,today,assets,out, close_price):
        returns = close_price[-1]/close_price[0] - 1
        out[:] = returns

In the above custom factor (PeriodReturns), we define the inputs to the pipeline computation to be only the “close” price column. However, the filter (TypicalPriceUp) requires three pricing columns. These are defined as the respective class variables inputs. The pipeline class, when running the computation, automatically refers to this inputs class variables and passes on the required columns to the compute method. As a result, the compute method for the filter expects three extra pricing columns (apart from the always preset today, assets and out parameters) and the same for the factor has only one extra pricing column. With this, we simply write the required logic in the compute function and make sure the results are put back in the provided out parameter. Note: the filter must return True or False from its computations and the factors output should real values.

Now lets create a pipeline and run it

[2]:

pipe = Pipeline()
pipe.add(PeriodReturns(window_length=10), 'returns')
pipe.set_screen(TypicalPriceUp(window_length=5))

We have added the PeriodReturns factor using the add method, and named it simply “returns”. Also we used the set_screen method to add the filter TypicalPriceUp as well. Now let’s run the pipeline over some period.

[3]:

use_dataset('nse')

results = run_pipeline(pipe, '2020-10-10', '2020-10-15')
results

[3]:

		returns
2020-10-12 00:00:00+05:30	Equity(3MINDIA [3])	0.072567
	Equity(AARTIDRUGS [7])	0.324145
	Equity(AAVAS [10])	0.061269
	Equity(ABB [12])	0.028202
	Equity(ABBOTINDIA [13])	-0.006489
...	...	...
2020-10-15 00:00:00+05:30	Equity(WINPRO [710])	0.001053
	Equity(XCHANGING [1492])	-0.050919
	Equity(ZODIACLOTH [1503])	-0.001485
	Equity(ZOTA [1505])	-0.008174
	Equity(ZYDUSWELL [1508])	-0.008715

1437 rows × 1 columns

The input to the pipeline compute function is automatically computed (based on the all surviving assets on the day of computation and the required pricing fields as implied by the inputs class variable). The output is a multi-index dataframe with compute date as the first level index and the assets (passing the filter on that day) as the second level. The columns (if any) will be the factors that were add-ed to the pipe. We can easily subset for further analysis, for example the output on 12th Oct is as below

[4]:

import pandas as pd
results.xs(pd.Timestamp('2020-10-12', tz='Asia/Calcutta'))

[4]:

	returns
Equity(3MINDIA [3])	0.072567
Equity(AARTIDRUGS [7])	0.324145
Equity(AAVAS [10])	0.061269
Equity(ABB [12])	0.028202
Equity(ABBOTINDIA [13])	-0.006489
...	...
Equity(WONDERLA [1490])	0.070835
Equity(ZENTEC [1501])	0.091623
Equity(ZOTA [1505])	0.002055
Equity(ZYDUSLIFE [221])	0.130435
Equity(ZYDUSWELL [1508])	-0.005943

377 rows × 1 columns

When you are using pipline APIs in a strategy, this is exactly what you get as the returned value from the pipeline_output API function - a dataframe like above computed for the current date for the strategy.

Pipline APIs provide a powerful way to anaylze large universe, including factor and ML strategies. Now that you know how to create and use pipeline, feel free to explore more!