Converting a Caffe model to TensorFlow
Wed, Jun 7, 2017Converting a Caffe model to TensorFlow
The Caffe Model Zoo is an extraordinary place where reasearcher share their models. Caffe is an awesome framework, but you might want to use TensorFlow instead. In this blog post, I’ll show you how to convert the Places 365 model to TensorFlow.
Using Caffe-Tensorflow to convert your model
Your best bet is to use the awesome caffe-tensorflow. This project takes a prototxt
file as an input and converts it to a python file so you can use the model with TensorFlow. I had to use this pull request to get a standalone model. I forked the repo with a few other tweaks as well.
1 - Install caffe-tensorflow
git clone https://github.com/linkfluence/caffe-tensorflow
source activate Python27 # You need Python 2.7
2 - (Optional) Switch to TensorFlow CPU
You might bump into memory issues if you don’t have enough memory. In this case just uninstall tensorflow-gpu
and install tensorflow
3 - Convert your model
python convert.py --caffemodel ./places/vgg16_hybrid1365.caffemodel ./places/deploy_vgg16_hybrid1365.prototxt --data-output-path ./output.mat --code-output-path ./output2.py --standalone-output-path ./standalonehybrid.pb`
4 - (Optional) Re-install TensorFlow GPU
Using the pb file
If the the previous command succeeded, you’ll end up with a ./standalonehybrid.pb
file. This file contains the weights and the architecture of the network. Here’s how to use it:
import tensorflow as tf
import cv2
import numpy as np
def load_graph(frozen_graph_filename):
# We load the protobuf file from the disk and parse it to retrieve the
# unserialized graph_def
with tf.gfile.GFile(frozen_graph_filename, "rb") as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
# Then, we can use again a convenient built-in function to import a graph_def into the
# current default Graph
with tf.Graph().as_default() as graph:
tf.import_graph_def(
graph_def,
input_map=None,
return_elements=None,
name="prefix",
op_dict=None,
producer_op_list=None
)
return graph
graph = load_graph('./standalonehybrid.pb')
x = graph.get_tensor_by_name('prefix/data:0')
y = graph.get_tensor_by_name('prefix/prob:0')
im = cv2.imread('./test_image.jpg')
WIDTH, HEIGHT = 224, 224
im = cv2.resize(im, (WIDTH, HEIGHT))
# Places was using batches of 10 images
batch = np.array([im for i in range(10)])
with tf.Session(graph=graph) as sess:
y_out = sess.run(y, feed_dict={ x: batch })
(the handy function comes from this blog post)
Note: You’ll see that we’re building a batch of 10 images… with only one image. This is because the network is set like this and I want to keep this article simple. It should be possible to change the batch size. We’ll see this in an upcoming blog post
Conclusion
We’ve seen how easy it is to use the models from the Caffe model zoo with TensorFlow. You are now able to classify cars, predict places, detect facial landmarks and so many magical things !
Bonus: What if the model is based on a custom version of Caffe
It can happen that researchers need custom layers: they usually fork caffe. In this case, this gist describes the steps to extract the weights and this piece of code should give you some hints on how to load them into your TensorFlow graph.
For complex layers, there are some small differences between Caffe and TensorFlow: you will have to look at the source code. For instance, LSTM gates are not concatenated in the same order in TensorFlow and in Caffe.
What worked best for me was to:
- Export Caffe’s weights into an Numpy matrix
- Run a dummy example into the first N layers in Caffe, store the output
- Load the weights into your TensorFlow Graph, run the same example into the same first N layers but using TensorFlow this time
- Compare the output. If it’s not matching, check what’s wrong
- Increment N and repeat
I was able to get a 10-3 mean difference in the final output when transfering a convnet and a 10-2 mean difference after a bi-LSTM. Not too bad !