Melvin's digital garden

Protecting sensitive datasets: cloud tools you can use

[2019-03-16 Sat 14:48:55] speaker: Felipa Hoffa

NYC taxi dataset

  • sensitive data exposed

Cloud DLP (data loss preventation)

  • redact sensitive data
  • format preserving encryption

Data bucketing

k-anonymity

  • at least k records have the same attributes
  • k = 1 means record is unique among the sample
  • usually require k=5

k-map, reidentification risk

  • can zip code and age identify someone
  • depends on the population of the zip code

L-diversity

  • does everyone in the group have a common characteristic

delta-presence

Links to this note