Melvin's digital garden

What makes ML for anti-abuse interesting?

[2017-09-29 Fri 10:38:00] speaker: Elie Bursztein

Deep learning at Google started in 2012

Early success with photo tagging

PlaNet - geolocation of photos with CNN

  • assign photo to geographic cell

Q-learning for Atari games

AlphaGo

Criteria for good AI problems

  • multiple applications
  • clear success metrics
  • safe exploration - errors are not fatal
  • scalable - large use cases
  • inifinite data
  • reproducible

Challenges for anti-abuse

Unknown attacks

  • spamming comments is a known attack
  • but when allo launched, no known how users will abuse
  • leverage domain knowledge with trans-learning
  • use anomaly detection as first and last line of defense
  • implement non-AI defense mechanism
    • rate limiting

Lack of ground truth

  • fake comments on android apps
  • fake ratings
  • clustering to find cliques of bad actors
  • generate ground-truth with honeypot
  • use anomaly detection

Lack of obvious features

  • YouTube views and likes
  • which ones are real/fake
  • leverage context features
    • user agent

Ambiguous data

  • detecting offensive speech
  • depends on context, culture, setting
  • use personalized model
    • users mark promotiom email they signed up for as spam

Predicting is not explaining

  • why is a certain email considered spam
  • classifying attacker motive is hard

Respecting privacy

  • prevent sending spam SMS via hangouts
  • flag potential spam using contextual features
  • ask for permission to classify the message

Real-time detection

  • live streams

Cost on manual reviewers

  • negate the negative aspects
    • use black and white images
    • selective blurring

Adversarial input

  • location spoofing

Abusers keep changing their tactics

  • insufficient ground truth to retrain the models

Links to this note