Protecting sensitive datasets: cloud tools you can use
[2019-03-16 Sat 14:48:55] speaker: Felipa Hoffa
NYC taxi dataset
- sensitive data exposed
Cloud DLP (data loss preventation)
- redact sensitive data
- format preserving encryption
Data bucketing
k-anonymity
- at least k records have the same attributes
- k = 1 means record is unique among the sample
- usually require k=5
k-map, reidentification risk
- can zip code and age identify someone
- depends on the population of the zip code
L-diversity
- does everyone in the group have a common characteristic
delta-presence