Fetching Data on Blueshift娦
The data object¶
On Blueshift®, most of the main entry-point functions has two arguments,
context
and data
. These are special variables that are maintained
by the platform. User program can query these variables to get a lot of
useful information. Specifically, the data
object is the portal to all
data that the user program can access. User program must direct all
data query to this object instead of querying the underlying data source
directly1. This way we can ensure the user program has no look-ahead bias
- the strategy code cannot, even inadvertently, get data ahead of the
current simulated time in a backtest. The other upside is that the
strategy code has a single and standardized interface to data, irrespective
of the data type or source.
For more on the data
object, see the
API Reference
Data object API functions¶
The data
object exposes two functions to the user program to fetch
data. One is the current
method, the other is the history
method.
Signatures of these functions are as below.
current¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
history¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
Examples¶
Examples below shows the use of the current
method:
def initialize(context):
context.universe = [symbol("AAPL"), symbol("MSFT")]
def handle_data(context, data):
cpx1 = data.current(context.universe[0], "close")
cpx2 = data.current(context.universe, "close")
cpx3 = data.current(context.universe[0], ["open","close"])
cpx4 = data.current(context.universe, ["open","close"])
In the code snippet above, we set up our stock universe in the initialize
function as usual, and query data on prices of the stocks in this universe
in the handle_data
function. The types of the return values are as
below:
- cpx1: a number (float)
- cpx2: Pandas
Series
with asset as index - cpx3: Pandas
Series
with price fields (open
andclose
) as index - cpx4: Pandas
DataFrame
with assets as index and price fields as columns
For history
method we have similar results:
def initialize(context):
context.universe = [symbol("AAPL"), symbol("MSFT")]
def handle_data(context, data):
px1 = data.history(context.universe[0], "close", 10, "1m")
px2 = data.history(context.universe, "close", 10, "1m")
px3 = data.history(context.universe[0], ["open","close"], 10, "1m")
px4 = data.history(context.universe, ["open","close"], 10, "1m")
Here we query for historical data for 10 bars
(or candles
) at the
given frequency (1m
i.e. minutely). The returned data types are:
- px1: Pandas
Series
with date-time as index - px2: Pandas
DataFrame
with date-time index and assets as columns - px3: Pandas
DataFrame
with date-time index and price fields as columns - px4: Pandas
Panel
data2 in the current version. PandasMulti-indexed
dataframe in future version.
Using the fetched data¶
Depending on the returned type, we may need to apply proper sub-setting
to the returned data. Continuing from the above examples below are ways
to access particular data points for current
method:
def handle_data(context, data):
... # continuing from the code above
print(cpx1)
print(cpx2[context.universe[0]])
print(cpx3["close"])
print(cpx4.loc[context.universe[0],"close"])
All of these will print the current close
price of the first stock
(AAPL) in our universe. Similarly for history
method:
def handle_data(context, data):
... # continuing from the code above
print(px1)
print(px2[context.universe[0]])
print(px3["close"])
print(px4.minor_xs(context.universe[0])["close"])
Again, all of the above will print the last 10 minute close price for the
first stock in the universe. Note, for Pandas Panel
format, the
securities are along the minor axis
(price fields are along the columns).
In case the returned data is a Pandas Multi-index DataFrame
, we can
get the underlying dataframe for each asset by simply subsetting the
multi-index dataframe by asset, e.g. px[context.universe[0]]
will
return the dataframe (with price fields as columns) for the first asset.
For a complete working example, see out GitHub repo.
Data fetching errors¶
One of the most common error in data fetching is requesting data which are not available. Suppose the dataset starts on 1st January 2008. If you request for 10 day bars on the first day, Blueshift® will throw and error. In case of live runs, data comes from your broker. If your broker does not support minute candles, say, beyond 10 days, and you ask for 20 days, a similar error with be thrown. The platform may NOT catch such errors and your program may exit. The other typical error is to query for an asset which no longer trades - for example querying for the March futures on 1st of April, or an equity that has de-listed.