Introduction to Geoinformatics

BMIN7053

Cole Brokamp

2022-09-26

Geoinformatics

Geoinformatics

Transforming place-based data into actionable knowledge

  • geospatial data science
  • geographic information services
  • public health informatics
  • place-based determinants, outcomes

Geoinformatics and Health

  • Identifying study population
  • Identifying potential sources and routes of exposures
  • Estimating environmental levels of pollutants
  • Measuring community characteristics
  • Estimating personal socioeconomic characteristics
  • Statistical models with spatial correlations
  • Temporal exposure estimates

Geoinformatics and Health

  • Direct measurement of exposure or personal characteristics often not feasible in cohort- or EHR- based studies
    • Often use retrospective records, only containing home addresses
    • Geomarkers are highly variable with respect to time and space
  • Links outcome, exposure, and confounding data by location
  • Data usually available publicly

Geomarkers

Geomarkers

  • Geomarker: Any geospatial measure that influences or predicts health
  • Geocoding: Converting a string of text into spatial coordinates or boundaries

place (+ time) → estimating past “exposures”

Neighborhood Characteristics

interactive deprivation index map

Geomarkers Space Time Source Available Software Products
Population, socioeconomic indicators, etc Census tract or block Yearly US Census American Community Survey (ACS)
Built environment & location efficiency, access to jobs and workers via transit, walkability index Census tract 2021 EPA Smart Location Mapping
Neighborhood Deprivation Index (poverty, education, housing, and health care coverage) Census tract or zip code 2015, 2018 Material Deprivation Index dep index R code; dep index DeGAUSS Container
Child Opportunity Index (education, health & environment, social & economic neighborhood characteristics) Census tract 2010, 2015 Child Opportunity Index 2.0
Population, racial composition, socioeconomic indicators, racial and socioeconomic Index of Concentration at the Extremes, crowding County, zip code, or census tract 2018 The Public Health Disparities Geocoding Project
Racial Index of Concentration at the Extremes (ICE) Census tract or zip code Yearly Racial ICE zctaDB R package
Relative crime risk by crime subtype* Census block group Average (2010 - 2017) AGS Crime Risk
Location and type of gun violence incident Exact Annual (2014 - current) Gun Violence Archive (GVA)
Fraction of food retailers that are “healthy” Census tract 2011 Modified Retail Food Environment Index (mRFEI)
Food access indicators, SNAP benefits, demographics Census tract 2015 USDA Food Access Research Atlas
Medically Underserved Areas County / Census tract Current Health Resources and Services Administration (HRSA) MUA App
Social Vulnerability Index County / Census tract 2000, 2010, 2014, 2016, 2018 Agency for Toxic Substances and Disease Registry (ATSDR)
Community Resilience Estimates County / Census tract Current US Census
Social Deprivation Index (Area level deprivation based on ACS data) County, census tract, ZCTA and PCSA 2015 Robert Graham Center
Neighborhood Atlas Census block group 2015, 2019 University of Wisconsin School of Medicine and Public Health

Air Pollution

Geomarkers Space Time Source Available DeGAUSS Products
PM2.5 0.74 sq km h3 hexagonal grid daily (2000 - 2020) Brokamp exposure model addPmData R package; pm DeGAUSS container
PM2.5, NO2, O3** 1 sq km grid daily (2000 - 2016) Schwartz exposure model schwartz DeGAUSS container
air toxics cancer risk, respiratory hazard index, diesel PM, PM2.5, ozone, traffic proximity and volume, lead paint indicator, proximity to RPM sites, proximity to hazardous waste facilities, proximity to NPL sites, wastewater discharge indicator Census block groups Annual (2015 - 2020) EPA EJ Screen zctaDB R package
Location and amount of emissions Exact location Yearly (2008, 2011, 2014, 2017) National Emissions Inventory (NEI)
PM2.5, PM10, O3, CO, SO2, NO2 Census block group (Contiguous US) Annual (1999 - 2015) CACES LUR
PM2.5, SO4, NO3, NH4, OM, BC, DUST, SS 0.01° × 0.01° (North America) Annual (2000 - 2018) ACAG MAPLE
PM2.5, GBD-MAPS, NO2, Inverse Visibility, OM/OC, Surface Area, NOy Deposition, Dry Deposition, AMF Code 0.01° × 0.01° (North America) Annual / Monthly ACAG Washington University in St. Louis
PM2.5, NO2 Tract Centers (Contiguous US) Monthly (1999 – 2008) Beckerman

Environmental Exposures

Geomarkers Space Time Source Available DeGAUSS Products
Air temperature, planetary boundary height, relative humidity, precipitation, wind North America (0.3°×0.3°) 8-times a day (2001 – current); daily means (1979 – current) North American Regional Reanalysis (NARR) addNarrData R package; narr DeGAUSS container
Greenspace as Enhanced Vegetation Index (EVI) 250 × 250 m grid (contiguous US) June 10 and June 25, 2018 composite Moderate Resolution Imaging Spectroradiometer (MODIS) greenspace DeGAUSS container
Land cover classifications (21 classes) 30 x 30 m grid Annual (2001, 2006, 2011, 2016) National Land Cover Database (NLCD) addNlcdData R package; nlcd DeGAUSS container
Fractional imperviousness 30 x 30 m grid Annual (2001, 2006, 2011, 2016) NLCD Imperviousness addNlcdData R package; nlcd DeGAUSS container
Roadway type, development intensity 30 x 30 m grid Annual (2001, 2006, 2011, 2016) NLCD Imperviousness Classification addNlcdData R package; nlcd DeGAUSS container
Vegetation Type 30 x 30 m grid Annual (2011, 2016) NLCD Tree Canopy addNlcdData R package; nlcd DeGAUSS container
Greenspace as fraction of land classified as green 30 x 30 m grid Annual (2001, 2006, 2011, 2016) NLCD Greenness addNlcdData R package; nlcd DeGAUSS container
Highway traffic intensity Exact location (nationwide) Yearly US Department of Transportation (DOT) addAadtData R package; aadt DeGAUSS container
Noise level 270 m sq grids (nationwide) 2015 National Park Service Geospatial Sound Modeling (GSM)

Greater Cincinnati Area Only

Geomarkers Space Time Source Available DeGAUSS Products
Parcel classification and characteristics Exact location (Greater Cincinnati Area) Current Cincinnati Area Geographic Information Systems (CAGIS)
Home images, value, age, other physical characteristics, owner name Exact Exact Inspection dates Hamilton County Auditor hamilton R package for parcel-based geocoding
Traffic Related Air Pollution Exact location (seven county area served by CCHMC) Monthly (2001 – current) Air pollution exposure model ecat R package
PM2.5 components: Al, Cu, Fe, K, Mn, Pb, S, Si, V, Zn Exact location Average daily exposure (2002-2006) Air pollution exposure model
Lead Air Pollution Exact location (seven county area served by CCHMC) Monthly (2001 – current) Air pollution exposure model airPb R package
Estimated travel time by car Census tract and traffic analysis area Hourly (2016-2017) Uber Move
Combined Sewer Overflow events Exact location Daily (2012 - 2016) Metropolitan Sewer District (MSD)
Areas without access to medication-assisted treatment and/or behavioral therapy for opioid addiction Exact location (seven county area served by CCHMC) 2019 MAT Desert MAT Desert shapfiles
Crime Exact location (Ohio) Annual (2013 - 2018) Ohio Incident Based Reporting System

Harmonizing spatiotemporal resolution and extent

Geomarker Assessment

Geocoding

  • converting location information text into coordinates
  • most often a postal address into latitude and longitude
  • most consider it a “magic black box”, but is error prone

Street range address

Exact location

Exposure Assessment

  1. Containing Geography
    • Census tract linkage to survey data
    • Census block linked to population density
    • Neighborhood linked to policies or characteristics

Exposure Assessment

  1. Radial Measures
    • Buffer designated around location with a radius
    • Length / density of predicted sources
    • Calculate mean, total length, or fraction within buffer

Exposure Assessment

  1. Exact Location
    • Proximity to predicted source
    • Nearest neighbor weighting and krigging
    • Prediction models (land use models, etc)

Implementation Challenges

  • Large data + inefficient manual data curation
  • Technical expertise and software skills
  • Privacy restrictions

Privacy Approaches

Protected Health Information (PHI)

  • Confidentiality of research subjects must be safeguarded
  • HIPAA-defined “Safe Harbor” provision prohibits sharing of identifiers and quasi-identifiers, such as:
    • time finer than a calendar year
    • spatial boundary with < 20,000 residents

Sharing Data with PHI

  • Consent often not obtained for unforeseen future analyses
  • Retrospective consent often not feasible + consent bias
  • IRB and institutional DUA approvals can be lengthy and have different requirements
  • Transmission of PHI to a third party often not possible

Anonymity and Reidentification

  • Anonymity can ensure small, but non-zero, chance of reidentification
  • Don’t conflate re-identification of identifiers with re-identification of quasi-identifiers
    • quasi-identifiers recovered by merging with extant datasets
    • institutional restrictions on sharing of quasi-identifiers

Exisiting Approaches

  • Anonymization: geomasking, date shifting, generalization
  • Independent Geomarker Assessment: differences in methods introduce differential bias
  • Existing Software Approaches: costly, not scaleable, not reproducible

Decentralized Computation

Multi-Site Study

Bringing Data to Computation

Sending Computation to Data

Getting Computation Results

DeGAUSS

DEcentralized Geomarker Assessment for mUlti Site Studies

https://degauss.org

Curated and standardized library for secure, efficient, automated, and reproducible linkage of geomarkers to protected health and geolocation data

DeGAUSS

  • Container framework that reads and writes CSV files
  • No extensive computational resources
  • No geospatial or computing expertise required
  • PHI is never exposed to a third party or the internet

DeGAUSS

  • Free and open source
  • Automated and continuous documentation and integration
  • Metadata curation and integration
  • Community supports and contributions

High Resolution Spatiotemporal

High Resolution Spatiotemporal

Examples

Elevated Blood Lead

CFPOPD

Community Table

Thank You

https://colebrokamp.com

https://degauss.org

@cole_brokamp

cole.brokamp@cchmc.org