Methodology#

  1. Collect data from the census for population estimate on various geography

  2. Polygon preparation

    • Scrape brand store locations from official sources (e.g. official website)

    • Manually go through all brand store locations, cross reference with 3rd party source for store location (e.g. Google Maps), correct brand store locations if required

    • Using (corrected) brand store locations, draw the store boundary

    • Manually vet all drawn polygons in a consistent manner

      • Corporate office should not be included

      • Only polygons drawn with minimal guesswork should be included1

      • Brand stores built under any additional structure (e.g. apartment blocks) are excluded, unless they are overhead car park

  3. Selecting a panel

    • This is the entire population with some filters to ensure consistent data collection behaviour in our panel;

  4. Quantifying sampling bias on a geographical level

    • Compare the geopgrahical distribution of devices in our dataset against that in the census

    • Compute the up-sampling(/down-sampling) factor needed to apply to each geopgrahical region and devices

  5. Detect continuous visit within drawn polygons

  6. Extract visit durations and remove outliers

    • Outliers include exceptionally short visits (possibly due to location data inaccuracies)

    • Outliers include exceptionally long visits (possibly due device belong to the store or store worker)

  7. Apply sampling bias adjustment factor

    • Down-sampling the oversampled regions

    • Up-sampling the undersampled regions

  8. Apply time series smoothing

    • Using a simple rolling median methodology

  9. Compute visit time per panel capita metric

    • By divde the sampling biased visit time sum by panel population count in the same month;


1

For example, if the store is in a shopping mall and no shop boundary is shown on a map nor is there any external building features we could use to infer it, we will not be include that store.