Jupyter Notebook on Blueshift

Blueshift includes a Jupyter Notebook interface for open ended research. Blueshift Jupyter Notebook interface allows us to have an open ended exploratory analysis capabilites to form trade ideas before moving on to testing ( backtesting or paper trading) and deployment on the platform.

Blueshift works differently on Jupyter Notebook than what you would have written in a regular strategy code. The major difference is there is no running algo or associated algo event loop. This means none of the event handler functions are available. Also, since there is no running algo, none of the algo specific APIs - including context, data or anything that is imported from blueshift.api are not available either. Instead we rely on the a set of functions imported from the blueshift.research module, to query the built-in datasets and build our exploratory models using the Python packages available on the platform.

NoteBook Workflow

The Blueshift research workflow using the Jupyter notebook usually starts with selecting a dataset on which to run further analysis.

from blueshift.research import list_datasets, use_dataset

# list the available datasets
list_datasets()

# select dataset. You can change it at any point in time
use_dataset('nse')

Once a dataset is selected, we can use the other API functions from the research module.

from blueshift.research import use_dataset, symbol. history

# select dataset. You can change it at any point in time
use_dataset('nse')

# get a reference to the asset object
asset = symbol('ACC')
prices = history(asset, 'close', 20, '1m')
prices.plot()

Research (NoteBook) APIs

The following are the available API functions in the Blueshift notebook environment

blueshift.research.list_datasets()

List available dataset names.

blueshift.research.use_dataset(name)

Set the current dataset by name.

blueshift.research.symbol(sym, dt=None, *args, **kwargs)

Get the asset for a given instrument symbol.

blueshift.research.sid(sec_id)

Get the asset for a given security ID from the pipeline store.

blueshift.research.current(assets, columns='close', dt=None, last_known=True)

Return last available price. If either assets or columns is a list, a series is returned, indexed by assets or fields, respectively. If both are lists, a dataframe is returned. Otherwise, a scalar is returned. Only OHLCV column names are supported in general. However, for futures and options, open_interest, implied_vol and greeks are supported as well.

Parameters:
  • assets (asset object or a list of assets.) – An asset or a list for which to fetch data.

  • columns (str or a list.) – A field name or a list of OHLCV columns.

  • dt (pd.Timestamp or string that can be convereted to Timestamp.) – The timestamp at which to fetch the data.

  • last_known (bool) – If missing, last known good value (instead of NaN).

Returns:

current price of the asset(s).

Return type:

float (int in case of volume), pandas.Series or pandas.DataFrame.

blueshift.research.history(assets, columns, nbars, frequency, dt=None, adjusted=True)

Returns given number of bars for the assets. If more than one asset or more than one column supplied, returns a dataframe, with assets or fields as column names. If both assets and columns are multiple, returns a multi-index dataframe with columns as the column names and asset as the second index level. For a single asset and a single field, returns a series. Only OHLCV column names are supported. However, for futures and options, open_interest, implied_vol and greeks are also supported.

Parameters:
  • assets (asset object or a list of assets.) – An asset or a list for which to fetch data.

  • columns (str or a list.) – A field name or a list of OHLCV columns.

  • nbars (int) – Number of bars to fetch.

  • frequency (str) – Frequency of bars (either ‘1m’ or ‘1d’).

  • dt (pd.Timestamp or string that can be convereted to Timestamp.) – The timestamp at which to fetch the data.

  • adjusted (bool) – Whether to apply adjustments.

Returns:

historical bars for the asset(s).

Return type:

pandas.Series or pandas.DataFrame.

# this assumes we have already selected the dataset
from blueshift.research import symbol, current, history

# fetch an asset by symbol "ABC"
asset = symbol('ABC')

# fetch historical data as of a given date
df = history(asset, ['close','high'], 10, '1m', dt="2023-05-05 14:30:00")
df.close.plot()
blueshift.research.run_pipeline(pipeline, start_date, end_date)

Run a pipeline between given dates. This will return a pandas multi-index dataframe with timestamp as the first index, the selected assets the second index and the pipeline factor(s) as the column(s).

Important

This function will work only in the research environment, using this in a strategy will throw error.

# import Pipeline constructor and built-in facotrs/ filters
# this assumes we have already selected the dataset

from blueshift.research import run_pipeline
from blueshift.pipeline import Pipeline
from blueshift.library.pipelines import select_universe, period_returns

# create the pipeline
def create_pipeline():
    pipe = Pipeline()
    liquidity = select_universe(200, 200) # a built-in liquidy screener
    mom = period_returns(20) # return over period factor
    mom_screener = mom > 0 # positive momentum

    pipe.add(mom,'momentum')
    pipe.set_screen(liquidity & mom_filter)

    return pipe

# run the pipeline
pipe = create_pipeline()
results = run_pipeline(pipe, '2022-05-04', '2023-05-05')

# returns multi-index df with dates as the first level and 
# filtered assets as the second level indices, with factors 
# added using pipe.add() as columns
print(results.describe())
blueshift.research.get_data_portal()

get the current data portal object for selected dataset.

Example Notebooks