Optimize the inference performance of various DL models (CNN, NLP, etc., )
• Identify bottlenecks in performance and propose scalable solutions
• Benchmark & Low-level profiling to analyse hotspots in order to exploit available bandwidth & computational power across cores
• Leverage hardware resources & distribute workload using parallel computing to get optimal performance
Qualification:
• Bachelor of Engineering/Bachelor of Technology, Master of Engineering/Master of Technology.
• Must have good programming skills in Python
• Very sharp analytical & problem-solving skills
• Exposure to Multithreading & parallel processing concepts
• Knowledge on Parallel Computing - CPU/GPU & Parallel Programming - OpenMP, CUDA
• Good knowledge of computer architecture & micro architectures like x86.
• Nice to have knowledge of linux/windows profiling tools
• Nice to have DL framework knowledge (Tensorflow, Pytorch, ONNX) & exposure to DL model architectures