Implement and optimize Machine learning training & inference and Computer Vision
related algorithms for the target GPU / DSP / AI Accelerator (NPU) hardware
? Vectorize various algorithms using hardware-specific ISA (intrinsic/assembly) and achieve
close to theoretical peak performance
? Come up with algorithm designs in such a way that it leverages the full potential of the
target hardware architecture
? Implement testbench for the hand-optimized algorithms to test quality and performance
Required skills
? BTech/BE/MTech/ME/MS/PhD degree in CSE/IT/ECE
? > 2 years of experience working in Algorithm Development, Porting, Optimization &
Testing
? Strong Knowledge of Computer Architecture
? Excellent proficiency in C and C++ programming
? Proficiency in Assembly / Intrinsic (SIMD/VLIW)
? Good knowledge of DMA, Memory banks, and cache concepts
? Good understanding of machine learning / Deep learning and exposure to DL model
architectures
? Strong problem-solving, analytical, and debugging skills