Melvin's digital garden

DeepLearning4j, data parallel deep learning on spark

[2016-09-01 Thu 19:04:05] speaker: Adam Gibson, skymind.io event: Data Science SG

skymind started in 2013, San Francisco

JVM is slow for numerical compute

  • hardware accel required
  • spark - data access layer
  • cuda - compute layer

why JVM/Java?

  • existing systems are in Java, e.g. Hadoop, Kafka
  • target the data engineering, production deployment

current landscape

  • spark assumes columnar data
  • audio/images are hard to work with

the solution

  • javacpp (cython for java)
  • off heap memory
  • numerical compute layer (nd4j)
    • supports cuda, x86
  • training as a spark job
  • keras on python for prototyping

applications

  • image search
  • voice search
  • recommendation
  • ad targetting
  • object recognition
  • translation
  • spam filtering

Links to this note