Fetching Data on Blueshift®¶

The data object¶

On Blueshift®, most of the main entry-point functions has two arguments, context and data. These are special variables that are maintained by the platform. User program can query these variables to get a lot of useful information. Specifically, the data object is the portal to all data that the user program can access. User program must direct all data query to this object instead of querying the underlying data source directly¹. This way we can ensure the user program has no look-ahead bias - the strategy code cannot, even inadvertently, get data ahead of the current simulated time in a backtest. The other upside is that the strategy code has a single and standardized interface to data, irrespective of the data type or source.

For more on the data object, see the API Reference

Data object API functions¶

The data object exposes two functions to the user program to fetch data. One is the current method, the other is the history method. Signatures of these functions are as below.

current¶

    """ 
        This method returns the current (latest available) price data 
        for the specified assets.

        Args:
            ``assets(list)``: A list of assets to fetch data for.

            ``fields(list)``: A list of fields to fetch data for Allowed
            fields are in [`open`, `high`, `low`, `close`, `volume`, 
            `last`].

        Returns:
            A float in case of a single asset and field, a Pandas data 
            ``Series`` in case of either multiple asset and single field 
            (keyed by assets) or single assets and multiple field 
            (keyed by fields). For both multiple assets and fields a 
            Pandas ``DataFrame`` will be returned.
    """

history¶

    """ 
        This method returns historical price data for the specified 
        assets for the range asked (or available from the source).

        Args:
            ``assets(list)``: A list of assets to fetch data for.

            ``fields(list)``: A list of fields to fetch data for Allowed
            fields are in [`open`, `high`, `low`, `close`, `volume`, 
            `last`].

            ``bars(int)``: Number of bars to return data.

            ``frequency``: Frequency of data, can be either ``1m`` for 
            (for minute bars), or ``1d`` (for daily bars).

        Returns:
            A Pandas ``Series`` in case of a single asset and field, or a 
            Pandas ``DataFrame`` for either single asset and multiple 
            fields (date-time as index and fields as columns) or 
            mutliple assets and single field (date-time as index and 
            assets as columns). In case of both multiple assets and 
            fields, a Pandas ``MultiIndex DataFrame`` will be returned,
            with assets as second levels of index.
    """

Examples¶

Examples below shows the use of the current method:

def initialize(context):
    context.universe = [symbol("AAPL"), symbol("MSFT")]

def handle_data(context, data):
    cpx1 = data.current(context.universe[0], "close")
    cpx2 = data.current(context.universe, "close")
    cpx3 = data.current(context.universe[0], ["open","close"])
    cpx4 = data.current(context.universe, ["open","close"])

In the code snippet above, we set up our stock universe in the initialize function as usual, and query data on prices of the stocks in this universe in the handle_data function. The types of the return values are as below:

cpx1: a number (float)
cpx2: Pandas Series with asset as index
cpx3: Pandas Series with price fields (open and close) as index
cpx4: Pandas DataFrame with assets as index and price fields as columns

For history method we have similar results:

def initialize(context):
    context.universe = [symbol("AAPL"), symbol("MSFT")]

def handle_data(context, data):
    px1 = data.history(context.universe[0], "close", 10, "1m")
    px2 = data.history(context.universe, "close", 10, "1m")
    px3 = data.history(context.universe[0], ["open","close"], 10, "1m")
    px4 = data.history(context.universe, ["open","close"], 10, "1m")

Here we query for historical data for 10 bars (or candles) at the given frequency (1m i.e. minutely). The returned data types are:

px1: Pandas Series with date-time as index
px2: Pandas DataFrame with date-time index and assets as columns
px3: Pandas DataFrame with date-time index and price fields as columns
px4: Pandas Panel data² in the current version. Pandas Multi-indexed dataframe in future version.

Using the fetched data¶

Depending on the returned type, we may need to apply proper sub-setting to the returned data. Continuing from the above examples below are ways to access particular data points for current method:

def handle_data(context, data):
    ... # continuing from the code above
    print(cpx1)
    print(cpx2[context.universe[0]])
    print(cpx3["close"])
    print(cpx4.loc[context.universe[0],"close"])

All of these will print the current close price of the first stock (AAPL) in our universe. Similarly for history method:

def handle_data(context, data):
    ... # continuing from the code above
    print(px1)
    print(px2[context.universe[0]])
    print(px3["close"])
    print(px4.minor_xs(context.universe[0])["close"])

Again, all of the above will print the last 10 minute close price for the first stock in the universe. Note, for Pandas Panel format, the securities are along the minor axis (price fields are along the columns). In case the returned data is a Pandas Multi-index DataFrame, we can get the underlying dataframe for each asset by simply subsetting the multi-index dataframe by asset, e.g. px[context.universe[0]] will return the dataframe (with price fields as columns) for the first asset.

For a complete working example, see out GitHub repo.

Data fetching errors¶

One of the most common error in data fetching is requesting data which are not available. Suppose the dataset starts on 1^st January 2008. If you request for 10 day bars on the first day, Blueshift® will throw and error. In case of live runs, data comes from your broker. If your broker does not support minute candles, say, beyond 10 days, and you ask for 20 days, a similar error with be thrown. The platform may NOT catch such errors and your program may exit. The other typical error is to query for an asset which no longer trades - for example querying for the March futures on 1^st of April, or an equity that has de-listed.

There is not way to query the underlying data sources from your strategy code actually. ↩
Panel data is deprecated in Pandas, and we will move to multi-index dataframe in future. ↩