Espresso
A minimal high performance parallel neural network framework running on iOS
Zhihao Li (zhihaol) and Zhenrui Zhang (zhenruiz)
Background
According to Morgan Stanley Research, as of the year of 2011, half of the computing devices worldwide are mobile devices 1. The intelligent mobile applications are changing people’s lives. However a quite thorough survey, we find no fully functional deep neural network framework on iOS. Therefore, we want to implement our own.
This framework features well designed and easy to use API, and high performance parallel neural network implementation based on Metal.
With such framework, software engineers can easily train and test network on their iOS devices. This can potentially lead to many interesting applications. For example, an application that can recognize daily objects in real time without connection to internet. Or photo collection applications that can recognize all your friends based on personalized fine-tuning without any threat to privacy.
We envision a great future market opening for such framework.
The Challenge
The task of training and running neural networks on a iOS device is itself challenging.
- Memory Limitation The latest version of iPhone (iPhone 6S) has only 2 GB RAM. This makes running a network on such device very difficult, not to mention training on it. To compensate this issue, we may take advantage of recent research outgrowth on compression of deep neural networks 2 3 4 or use low-precision networks 5 6.
- High Performance Computing Parallelizing a neural network implementation on iOS devices is an unprecedented task. We will explore the possibility of Metal API to implement a GPGPU version of the framework.
- Learning of a New Language Both of us are not familiar with Swift 2, the programming language. This would be a challenge for us in the early stage of implementation.
Bearing so many challenges, this project is still promising. The task of training and testing neural networks is highly parallizable, as the computation inside a layer is independent (we currently don’t support intra-layer connections). The locality should be good as weights within the same layer should be stores adjacently. And typically there is not much divergence in the network training and testing - all the weights are updated at the same time within a layer.
Resources
We will start the project from scratch. The framework will be mainly running on iOS devices with limited support to OSX devices. We will use the high-level architecture of Caffe 7 as our reference.
We are in need of a apple developer account to test the framework on real devices.
Goals and Deliverables
PLAN TO ACHIEVE
In this project, we want to develop a Caffe-like deep neural network framework running on iOS/OSX devices, in both CPU and GPU, that provides usable primitives to
- Define a neural netowrk
- Train a small neural network
- Run compressed models
To achieve this, we will implement
- Layers
ImageData
layerConvolution
layerReLU
layerFullyConnected
layerSoftmax
layer: as output layer, no BP neededSoftmaxWithLoss
layerPooling
layer: max pooling and average poolingDropout
layerLRN
layer
- Optimizer
SGDOptimizer
We want our system to be usable in mobile devices, therefore, the performance goal would be to have a user acceptable memory, energy, computation cost and response time to train on a reasonably sized dataset, and to run a compressed model.
HOPE TO ACHIEVE
If we are ahead of schedule, we plan to port some other layers and optimizers from Caffe to our framework.
Demo
We will be demonstrating an application developed based on our framework. It could be a application to recognize things. Also, we will be comparing the CPU implementation and GPU implementation in terms of speedup and energy consumption.
Platform Choice
OSX and iOS based on Metal framework.
Schedule
Time | What we plan to do | Status |
---|---|---|
April 1 ~ April 7 | Revise proposal, study the design and architecture of Caffe, learn Swift language and Metal API, implement a simple App for testing, design interfaces for espresso | DONE |
April 8 ~ April 14 | Develop and test the CPU version | Finished development, need more thorough testing |
April 15 ~ April 21 | Develop and test the GPU version | Finished development, need testing |
April 22 ~ April 28 | Run MNIST network(and test our implementations) | |
April 29 ~ May 5 | Run a compressed model trained by Caffe or other common frameworks | |
May 6 ~ Parallel Competition Day | Write final report and prepare for presentation |
References:
-
Huberty, K., Lipacis, C. M., Holt, A., Gelblum, E., Devitt, S., Swinburne, B., … & Chen, G. (2011). Tablet Demand and Disruption. Tablet. ↩
-
Kim, Yong-Deok, et al. “Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications.” arXiv preprint arXiv:1511.06530 (2015). ↩
-
Han, Song, Huizi Mao, and William J. Dally. “A deep neural network compression pipeline: Pruning, quantization, huffman encoding.” arXiv preprint arXiv:1510.00149 (2015). ↩
-
Chen, Wenlin, et al. “Compressing neural networks with the hashing trick.” arXiv preprint arXiv:1504.04788 (2015). ↩
-
Courbariaux, Matthieu, Yoshua Bengio, and Jean-Pierre David. “Low precision arithmetic for deep learning.” arXiv preprint arXiv:1412.7024 (2014). ↩
-
Gupta, Suyog, et al. “Deep learning with limited numerical precision.” arXiv preprint arXiv:1502.02551 (2015). ↩
-
Jia, Yangqing, et al. “Caffe: Convolutional architecture for fast feature embedding.” Proceedings of the ACM International Conference on Multimedia. ACM, 2014. ↩