Intraday data is rapidly becoming an essential resource for market forecasters as numerous studies now point to the superiority of intraday data in predicting market movements, even for longer time-horizons such as over one year. (Introduction to Intraday ETF Data)

The selection of a data vendor is, however, very fraught since there are numerous to choose from and the quality and consistency of the data is not easy to verify prior to purchase. At FirstRate Data we recently selected our two primary data vendors after vetting over twenty vendors and we found several criteria were indicative of a high-quality vendor.



The quality of support has a direct correlation with the quality of the data. We persistently found that vendors with high-quality intraday data also provided excellent pre-sales support. We would typically test this by sending a detailed list of questions to a vendor on a Friday and time the response. High-quality vendors would typically reply on the day or even on the Saturday, whilst low-quality vendors would take upwards of five days to respond and often give incomplete or superficial answers.

The questions we typically asked were for details on how dividends were adjusted in stock data, what exchanges (including dark pools) data was sourced from, how frequently (and at what times during the day) were datasets updated, and what work was done on validating the completeness and accuracy of a dataset.


Sample Files

Vendors that do not provide sample files were generally of poor quality. In addition, sample files should be of sufficient size for a sample test to be done (this is usually 10 days for intraday data and 2 days for tick data).

Sample files should be accompanied by full details on the data set (ie timezone and timestamp details, policy on zero volume bars, volume units, exchanges covered).


Details on Data Cleaning and Testing 

High-quality vendors normally provide details on how the datasets are cleaned and tested. For intraday data expect to see details such as how zero volume bars are dealt with, are outlier data-points removed or only flagged, are stock prices adjusted for splits and dividends (and how the dividend-adjusted price is calculated).

For tick data, expect to see details on how simultaneous ticks are dealt with, is only trade-tick data available or is it combined with bid-ask tick data, finally, how are errors coming from the exchange datafeed such as zero volume or zero/negative prices dealt with.


Test Purchase

Once we were satisfied with the above checks we proceeded to a test purchase (of 10+ years intraday data or 2+ years tick data) for detailed testing.

We found that several types of errors were indicative of issues in the broader datasets, namely:

  • Time format issues. This could be as simple as inconsistent text formatting (such as switching from ‘2010-09-05’ to ‘2010.9.5’) or as fundamental as changing from yyyy-mm-dd to yyyy-dd-mm.
  • Price errors – such as zero or negative prices or open/close outside the high-low bar range
  • Gaps – some vendors omit zero volume bars so it can appear as a gap although this is not an error but a feature since zero volume bars have little informational value. In such as circumstance a simple sequence check was insufficient so we performed two tests, firstly a check of the total bars of each day this should be relatively constant (typically days with gaps will have a greater than 30% drop in the number of bars) secondly we tested for abnormal gaps around the open and close when trading is most intense and should be nearly continuous for all the smallest cap stocks.
  • Prices versus other datasets – the most powerful test is to compare the dataset with a reference dataset known to be complete and accurate. This highlights any abnormalities although it should be noted that this test cannot be extended to volumes. Intraday volumes vary considerably between most (even high-quality vendors) and are very difficult to reconcile.
