Optimize the inference performance of various DL models (CNN, NLP, etc.)
? Identify bottlenecks in performance and propose scalable solutions
? Finetune parameters and library-linked variables to extract the best output
? Benchmark & Low-level profiling to analyze hotspots
? Leverage hardware resources & distribute workload using parallel computing to obtain
optimal performance
Qualification:
? Bachelor of Engineering/Bachelor of Technology, Master of Engineering/Master of
Technology.
? Must have very good programming skills in Python
? Very sharp analytical & problem-solving skills
? Hands-on experience in working with any parallel workloads and multithreading
? Knowledge of Parallel Computing - CPU/GPU & Parallel Programming - OpenMP, CUDA
? Good knowledge of computer architecture & micro architectures like x86.
? Good to have knowledge of profiling tools like perf, gprof
? Nice to have DL framework knowledge (Tensorflow, Pytorch, ONNX) & exposure to DL
model architectures