Jupyter Notebook on Blueshift
Blueshift includes a Jupyter Notebook
interface for open ended research.
Blueshift Jupyter Notebook interface allows us to have an open ended exploratory
analysis capabilites to form trade ideas before moving on to testing (
backtesting or paper trading) and deployment on the platform.
Blueshift works differently on Jupyter Notebook than what you would have written
in a regular strategy code. The major difference is there is no running algo or
associated algo event loop. This means none of the event handler functions are
available. Also, since there is no running algo, none of the algo specific APIs
- including context
, data
or anything that is imported from blueshift.api
are not available either. Instead we rely on the a set of functions imported
from the blueshift.research
module, to query the built-in datasets and
build our exploratory models using the Python packages available on the platform.
NoteBook Workflow
The Blueshift research workflow using the Jupyter notebook usually starts with selecting a dataset on which to run further analysis.
from blueshift.research import list_datasets, use_dataset
# list the available datasets
list_datasets()
# select dataset. You can change it at any point in time
use_dataset('nse')
Once a dataset is selected, we can use the other API functions from the research module.
from blueshift.research import use_dataset, symbol. history
# select dataset. You can change it at any point in time
use_dataset('nse')
# get a reference to the asset object
asset = symbol('ACC')
prices = history(asset, 'close', 20, '1m')
prices.plot()
Research (NoteBook) APIs
The following are the available API functions in the Blueshift notebook environment
- blueshift.research.list_datasets()
List available dataset names.
- blueshift.research.use_dataset(name)
Set the current dataset by name.
- blueshift.research.symbol(sym, dt=None, *args, **kwargs)
Get the asset for a given instrument symbol.
- blueshift.research.sid(sec_id)
Get the asset for a given security ID from the pipeline store.
- blueshift.research.current(assets, columns='close', dt=None, last_known=True)
Return last available price. If either assets or columns is a list, a series is returned, indexed by assets or fields, respectively. If both are lists, a dataframe is returned. Otherwise, a scalar is returned. Only OHLCV column names are supported in general. However, for futures and options,
open_interest
,implied_vol
and greeks are supported as well.- Parameters:
assets (asset object or a
list
of assets.) – An asset or a list for which to fetch data.columns (
str
or alist
.) – A field name or a list of OHLCV columns.dt (pd.Timestamp or string that can be convereted to Timestamp.) – The timestamp at which to fetch the data.
last_known (bool) – If missing, last known good value (instead of NaN).
- Returns:
current price of the asset(s).
- Return type:
float
(int
in case of volume),pandas.Series
orpandas.DataFrame
.
- blueshift.research.history(assets, columns, nbars, frequency, dt=None, adjusted=True)
Returns given number of bars for the assets. If more than one asset or more than one column supplied, returns a dataframe, with assets or fields as column names. If both assets and columns are multiple, returns a multi-index dataframe with columns as the column names and asset as the second index level. For a single asset and a single field, returns a series. Only OHLCV column names are supported. However, for futures and options,
open_interest
,implied_vol
and greeks are also supported.- Parameters:
assets (asset object or a
list
of assets.) – An asset or a list for which to fetch data.columns (
str
or alist
.) – A field name or a list of OHLCV columns.nbars (int) – Number of bars to fetch.
frequency (str) – Frequency of bars (either ‘1m’ or ‘1d’).
dt (pd.Timestamp or string that can be convereted to Timestamp.) – The timestamp at which to fetch the data.
adjusted (bool) – Whether to apply adjustments.
- Returns:
historical bars for the asset(s).
- Return type:
pandas.Series
orpandas.DataFrame
.
# this assumes we have already selected the dataset from blueshift.research import symbol, current, history # fetch an asset by symbol "ABC" asset = symbol('ABC') # fetch historical data as of a given date df = history(asset, ['close','high'], 10, '1m', dt="2023-05-05 14:30:00") df.close.plot()
- blueshift.research.run_pipeline(pipeline, start_date, end_date)
Run a pipeline between given dates. This will return a pandas multi-index dataframe with timestamp as the first index, the selected assets the second index and the pipeline factor(s) as the column(s).
Important
This function will work only in the research environment, using this in a strategy will throw error.
# import Pipeline constructor and built-in facotrs/ filters # this assumes we have already selected the dataset from blueshift.research import run_pipeline from blueshift.pipeline import Pipeline from blueshift.library.pipelines import select_universe, period_returns # create the pipeline def create_pipeline(): pipe = Pipeline() liquidity = select_universe(200, 200) # a built-in liquidy screener mom = period_returns(20) # return over period factor mom_screener = mom > 0 # positive momentum pipe.add(mom,'momentum') pipe.set_screen(liquidity & mom_filter) return pipe # run the pipeline pipe = create_pipeline() results = run_pipeline(pipe, '2022-05-04', '2023-05-05') # returns multi-index df with dates as the first level and # filtered assets as the second level indices, with factors # added using pipe.add() as columns print(results.describe())
- blueshift.research.get_data_portal()
get the current data portal object for selected dataset.