Datasets On Blueshift®¶
Info
This page details data feeds available for research/ backtests only. For live runs, your broker is source of all data your algo can fetch and use (except Pipeline APIs, if available).
Available datasets¶
On Blueshift®, you have to explicitly choose a dataset while running a strategy for backtest. You must pick a dataset consistent with your strategy. Example, if you developed a strategy to trade Apple, you must choose the NYSE datasets to run it without error. At present, we have the following datasets available for research/ backtests):
-
Equities and ETFs market data for the US market – minute levels with corporate actions. Updated once every day after market close.
-
FX data - 10 currency pairs -
AUD/USD
,EUR/CHF
,EUR/JPY
,EUR/USD
,GBP/JPY
,GBP/USD
,NZD/USD
,USD/CAD
,USD/CHF
,USD/JPY
. Minute-level data , updated once every day. -
Crypto data - 9 coins against Thether (USDT) and Indian Rupees (INR), also includes USDT/INR. The nine coins are
BTC
,ETH
,ADA
,BNB
,MATIC
,XRP
,SOL
,DOT
andLUNA
. You must specify them as pair, either againstUSDT
(e.g.BTC/USDT
) orINR
(e.g.BTC/INR
).
Data feed in live trading
For live trading, the data feed is directly linked to the broker, and
only data supported by your broker (e.g. supported tickers, frequency,
maximum history etc.) can be fetched. It is delivered to the requesting
algo as is, after conversion to standard columns (open
, high
,
low
, close
and volume
). No adjustments for corporate events,
missing data or anything else whatsoever are applied. However, all
Pipeline API calls are executed on one
(and only one) of these datasets (whichever is applicable for the
choosen broker).
At present we do not support uploading your own datasets but this is on our to-do list. Also in all datasets we endeavour to track a liquid benchmark1 or set as we believe these are more suitable for the kind of trading style our users will look for. See below for more details.
Data adjustments¶
For backtests, all trades are simulated with as-traded prices (unadjusted data) to accurately capture actual trading conditions. However, an API call for historical data will return adjusted data for ease of strategy development (like computing moving averages). Adjustments are applied on EOD basis.
We handle missing data in our input datasets by the standard LOCF
(last
observation carried forward) imputation method, but without the volume data
carried forward. This allows un-broken algo computation (like moving average
) on price data, but avoids trade fills on missing data (which may results
in warning messages).
Dataset details¶
Forex
Description: Updated daily, this dataset provides minute-level price
data (open
, high
, low
, and close
and a proxy volume
for top
10 currency pairs. The available pairs are: AUD/USD
, EUR/CHF
,
EUR/JPY
, EUR/USD
, GBP/JPY
, GBP/USD
, NZD/USD
, USD/CAD
,
USD/CHF
and USD/JPY
. In addition, minute level bid-ask data is also
available.
Access: Free
Benchmark / Coverage / Frequency: Trade Weighted U.S. Dollar Index
(Broad) as published by the US Federal Reserve is the default benchmark.
Dataset covers top 10 global currency pairs. Data frequency is
minute-level (updated with one day lag). Starts from July 2008
.
Simulation Details: All trades are done on the margin (margin
traded products). The default (and recommended) simulation model
uses dataset bid-ask spreads to simulate trades. The default margin
model requires 5% margin(both maintenance and initial). This dataset
is also used for Pipeline feed for forex brokers. Default accounting
currency is USD
.
US Equities
Description: Updated daily, this dataset provides minute-level
price/ volume data (open
, high
, low
, close
and volume
) for
a liquidity filtered universe of top 1000 stocks and ETFs. An asset
symbol will be available to trade only for the time period it was a
member of the benchmark1. If a member undergoes a change in ticker
symbol, the old symbol will be discontinued. In a low liquidity period
we carry forward last traded price with volume set at zero. This ensures
continuity in strategy computation (like moving averages), but will
not allow any simulated trades to take place on such a period.
Access: Free
Benchmark / Coverage / Frequency: SPDR S&P 500 ETF (SPY) is the
default benchmark. Data frequency is minute-level (updated with around
one day lag). Starts from July 2008
.
Simulation Details: The default simulation closely resembles a Reg T account, with margin trading and margin at 50% (both initial and maintenance). This means a 2x buying power. The default model also assumes a traded volume based slippage model. Traded volumnes per order are capped at 2% available volume at that bar. The simulation will NOT track or flag any pattern day trading behaviour. This dataset is also used for Pipeline feed for US equities brokers. Default accounting currency is local currency.
Crypto
Description: Updated daily, this dataset provides minute-level
price/ volume data (open, high, low, close and volume) for its members.
Members include a selection of top coins by market cap. The universe
consists of 9 coins each in the USDT and INR markets, in addition to
USDT/INR pair. The universe is static. The prices does not reflect
any fork
actions.
Access: Free
Vendor: Binance Data for USDT market and Bitbns for INR market.
Benchmark / Coverage / Frequency: BTC/USDT is the benchmark.
Data frequency is minute-level (updated with around one day lag).
Starts from Jan 2018
.
Simulation Details: The default simulation is margin trading for
the coins requiring 10% margin (both initial and maintenance). The default model
also assumes a 10 bps commission cost. Trades are filled immediately
upto a certain volume (upto 10000 times the minimum trade
quantity), with a slippage of 25bps. All trades are fractional (i.e.
trade quantity need not be an integer) and trade quantity must be a
multiple of minimum trade quantity for the particular coin. The default
accounting currency is USDT
.
Important points on the Crypto dataset.
The dataset is currently experimental. It consists of a static universe, and
hence is not immune to survivorship bias. Also the dataset does not
include price adjustments (fork
adjustments), if any. Finally, given
their nature (fractional trading and high volatility), we strongly
recommend avoiding use of any target ordering function except
order_target
. This is to avoid a large number of orders in response
to price moves of the underlyings.
-
Benchmark is based on liquidity filter as well as penny stock filter. Benchmark is re-calculated quarterly. All equities trading at that point in time with a price less than a threshold are automatically filtered out. The remaining assets are ranked in decreasing order of historical average daily dollar volume. Top 1000 of this list becomes the new benchmark applicable for the next business day onwards till the end of the quarter. ↩↩