Methodology
Methodology#
Collect data from the census for population estimate on various geography
Polygon preparation
Scrape brand store locations from official sources (e.g. official website)
Manually go through all brand store locations, cross reference with 3rd party source for store location (e.g. Google Maps), correct brand store locations if required
Using (corrected) brand store locations, draw the store boundary
Manually vet all drawn polygons in a consistent manner
Corporate office should not be included
Only polygons drawn with minimal guesswork should be included1
Brand stores built under any additional structure (e.g. apartment blocks) are excluded, unless they are overhead car park
Selecting a panel
This is the entire population with some filters to ensure consistent data collection behaviour in our panel;
Quantifying sampling bias on a geographical level
Compare the geopgrahical distribution of devices in our dataset against that in the census
Compute the up-sampling(/down-sampling) factor needed to apply to each geopgrahical region and devices
Detect continuous visit within drawn polygons
Extract visit durations and remove outliers
Outliers include exceptionally short visits (possibly due to location data inaccuracies)
Outliers include exceptionally long visits (possibly due device belong to the store or store worker)
Apply sampling bias adjustment factor
Down-sampling the oversampled regions
Up-sampling the undersampled regions
Apply time series smoothing
Using a simple rolling median methodology
Compute visit time per panel capita metric
By divde the sampling biased visit time sum by panel population count in the same month;
- 1
For example, if the store is in a shopping mall and no shop boundary is shown on a map nor is there any external building features we could use to infer it, we will not be include that store.