Datasets On Blueshift®¶

Info

This page details data feeds available for research/ backtests only. For live runs, your broker is source of all data your algo can fetch and use (except Pipeline APIs, if available).

Available datasets¶

On Blueshift®, you have to explicitly choose a dataset while running a strategy for backtest. You must pick a dataset consistent with your strategy. Example, if you developed a strategy to trade Apple, you must choose the NYSE datasets to run it without error. At present, we have the following datasets available for research/ backtests):

Equities and ETFs market data for the US market – minute levels with corporate actions. Updated once every day after market close.
FX data - 10 currency pairs - AUD/USD, EUR/CHF, EUR/JPY, EUR/USD, GBP/JPY, GBP/USD, NZD/USD, USD/CAD, USD/CHF, USD/JPY. Minute-level data , updated once every day.
Crypto data - 9 coins against Thether (USDT) and Indian Rupees (INR), also includes USDT/INR. The nine coins are BTC, ETH, ADA, BNB, MATIC, XRP, SOL, DOT and LUNA. You must specify them as pair, either against USDT (e.g. BTC/USDT) or INR (e.g. BTC/INR).

Data feed in live trading

For live trading, the data feed is directly linked to the broker, and only data supported by your broker (e.g. supported tickers, frequency, maximum history etc.) can be fetched. It is delivered to the requesting algo as is, after conversion to standard columns (open, high, low, close and volume). No adjustments for corporate events, missing data or anything else whatsoever are applied. However, all Pipeline API calls are executed on one (and only one) of these datasets (whichever is applicable for the choosen broker).

At present we do not support uploading your own datasets but this is on our to-do list. Also in all datasets we endeavour to track a liquid benchmark¹ or set as we believe these are more suitable for the kind of trading style our users will look for. See below for more details.

Data adjustments¶

For backtests, all trades are simulated with as-traded prices (unadjusted data) to accurately capture actual trading conditions. However, an API call for historical data will return adjusted data for ease of strategy development (like computing moving averages). Adjustments are applied on EOD basis.

We handle missing data in our input datasets by the standard LOCF (last observation carried forward) imputation method, but without the volume data carried forward. This allows un-broken algo computation (like moving average ) on price data, but avoids trade fills on missing data (which may results in warning messages).

Dataset details¶

Forex

Description: Updated daily, this dataset provides minute-level price data (open, high, low, and close and a proxy volume for top 10 currency pairs. The available pairs are: AUD/USD, EUR/CHF, EUR/JPY, EUR/USD, GBP/JPY, GBP/USD, NZD/USD, USD/CAD, USD/CHF and USD/JPY. In addition, minute level bid-ask data is also available.

Access: Free

Vendor:

Benchmark / Coverage / Frequency: Trade Weighted U.S. Dollar Index (Broad) as published by the US Federal Reserve is the default benchmark. Dataset covers top 10 global currency pairs. Data frequency is minute-level (updated with one day lag). Starts from July 2008.

Simulation Details: All trades are done on the margin (margin traded products). The default (and recommended) simulation model uses dataset bid-ask spreads to simulate trades. The default margin model requires 5% margin(both maintenance and initial). This dataset is also used for Pipeline feed for forex brokers. Default accounting currency is USD.

US Equities

Description: Updated daily, this dataset provides minute-level price/ volume data (open, high, low, close and volume) for a liquidity filtered universe of top 1000 stocks and ETFs. An asset symbol will be available to trade only for the time period it was a member of the benchmark¹. If a member undergoes a change in ticker symbol, the old symbol will be discontinued. In a low liquidity period we carry forward last traded price with volume set at zero. This ensures continuity in strategy computation (like moving averages), but will not allow any simulated trades to take place on such a period.

Access: Free

Vendor:

Benchmark / Coverage / Frequency: SPDR S&P 500 ETF (SPY) is the default benchmark. Data frequency is minute-level (updated with around one day lag). Starts from July 2008.

Simulation Details: The default simulation closely resembles a Reg T account, with margin trading and margin at 50% (both initial and maintenance). This means a 2x buying power. The default model also assumes a traded volume based slippage model. Traded volumnes per order are capped at 2% available volume at that bar. The simulation will NOT track or flag any pattern day trading behaviour. This dataset is also used for Pipeline feed for US equities brokers. Default accounting currency is local currency.

Crypto

Description: Updated daily, this dataset provides minute-level price/ volume data (open, high, low, close and volume) for its members. Members include a selection of top coins by market cap. The universe consists of 9 coins each in the USDT and INR markets, in addition to USDT/INR pair. The universe is static. The prices does not reflect any fork actions.

Access: Free

Vendor: Binance Data for USDT market and Bitbns for INR market.

Benchmark / Coverage / Frequency: BTC/USDT is the benchmark. Data frequency is minute-level (updated with around one day lag). Starts from Jan 2018.

Simulation Details: The default simulation is margin trading for the coins requiring 10% margin (both initial and maintenance). The default model also assumes a 10 bps commission cost. Trades are filled immediately upto a certain volume (upto 10000 times the minimum trade quantity), with a slippage of 25bps. All trades are fractional (i.e. trade quantity need not be an integer) and trade quantity must be a multiple of minimum trade quantity for the particular coin. The default accounting currency is USDT.

Important points on the Crypto dataset.

The dataset is currently experimental. It consists of a static universe, and hence is not immune to survivorship bias. Also the dataset does not include price adjustments (fork adjustments), if any. Finally, given their nature (fractional trading and high volatility), we strongly recommend avoiding use of any target ordering function except order_target. This is to avoid a large number of orders in response to price moves of the underlyings.

Benchmark is based on liquidity filter as well as penny stock filter. Benchmark is re-calculated quarterly. All equities trading at that point in time with a price less than a threshold are automatically filtered out. The remaining assets are ranked in decreasing order of historical average daily dollar volume. Top 1000 of this list becomes the new benchmark applicable for the next business day onwards till the end of the quarter. ↩↩