Lead Developer – Big Data Engineering

Ideal Candidate

 

  • The ideal candidate will come with hands-on experience as a Lead developer using Big Data technologies within the Banking and Financial services domain. This person will have a proven track record of implementing successful Big data solutions to business clients in financial services.

 

Requirements

 

  • 5-7 years experience as Big Data Developer
  • In-depth knowledge of Big Data technologies - Spark, HDFS, Hive, Kudu, Impala
  • Solid programming experience in Python, Java, Scala, or other statically typed programming language
    Production experience in core Hadoop technologies including HDFS, Hive and YARN
  • Strong working knowledge of SQL and the ability to write, debug, and optimize distributed SQL queries
  • Excellent communication skills; previous experience working with internal or external customers
  • Strong analytical abilities; ability to translate business requirements and use cases into a Hadoop solution, including ingestion of many data sources, ETL processing, data access, and consumption, as well as custom analytics
  • Experience working with workflow managers like Airflow, Prefect, Luigi, Oozie
  • Experience working with Data Governance tools like Apache Sentry, Kerberos, Atlas, Ranger
  • Experience working with streaming data with technologies like Kafka, Spark streaming
  • Strong understanding of big data performance tuning
  • Experience handling different kinds of structured and unstructured data formats (Parquet/Delta Lake/Avro/XML/JSON/YAML/CSV/Zip/Xlsx/Text etc.)
  • Experience working with distributed NoSQL storage like ElasticSearch, Apache Solr
  • Experience deploying big data pipelines in the cloud preferably using GCP and AWS
  • Well versed with Software Development Life Cycle Methodologies and Practices
  • Spark Certification is a huge plus
  • Cloud experience is a must have preferably with GCP
  • Contribution to open source community and Apache committer will be big plus

 

 

 

 

Responsibilities

 

  • Integrate data from a variety of data sources (data warehouse, data marts) utilizing on-prem or cloud-based data structures (GCP/AWS); determine new and existing data sources
  • Develop, implement and optimize streaming, data lake, and analytics big data solutions
  • Create and execute testing strategies including unit, integration, and full end-to-end tests of data pipelines
  • Recommend Kudu, HBase, HDFS, and relational databases based on their strengths
  • Utilize ETL processes to build data repositories; integrate data into Hadoop data lake using Sqoop (batch ingest), Kafka (streaming), Spark, Hive or Impala (transformation)
  • Adapt and learn new technologies in a quickly changing field
  • Be creative; evaluate and recommend big data technologies to solve problems and create solutions
    Recommend and implement best tools to ensure optimized data performance; perform Data Analysis utilizing Spark, Hive, and Impala
  • Work on a variety of internal and open source projects and tools

 

Posted Date
2021-03-10 10:37:40
Experience
5 -7 years
Primary Skills
Big Data,Spark, HDFS, Hive, Kudu, Impala,Python, Java, Scala,Parquet/Delta Lake/Avro/XML/JSON/YAML/CSV/Zip/Xlsx/Text,ElasticSearch, Apache Solr,SQL
Required Documents
Resume
Contact
bhawya@lorventech.com
Bootstrap Example