With the speed of light being unfortunately fixed, and the corollary that heterogeneous architectures with task specific data movement offer the path to higher FLOPs, moving existing CPU software to post-single-thread platforms is the price to pay for seeing post-2000’s Moore’s Law1.
TensorFlow ships a tool for automatically creating TF graphs from pretty general Python functions2. This gives us a easy method for testing the overhead of “porting” a existing Python code to any TF target by converting it directly into a graph.
To test the compile and runtime performance, I grabbed a pure Python implementation of a cryptographic algorithm from [https://github.com/ajalt/python-sha1]. This was chosen despite SHA1 being easily implemented with hardware acceleration, as it should produce reasonable depth control flow graphs.
The compilation and runtime performance was benchmarked with the following loop with SHA1 depths from 1 to 1000.
The SHA1 code was approximately 70 lines long. If you were to attempt to throw arbitrary Python into Autograph like this, you could reasonably expect it to take 21 seconds per kloc of Python.
Just do it properly.