Data Engineer

Apply Now

Job Description

Position – Data Engineer

Location –Menlo Park, CA – Onsite, at least 3 days a week.

Exp: 10+ years only

LinkedIn must

Responsibilities:

· Design, Build, Test & Deploy Data pipelines which facilitate data acquisition & processing of data from variety of source systems.

· Design, Build, Test & Deploy Data persistent layers like Data lake, Datawarehouse, No SQL databases based on the need of the engagement.

· Work across Big Data systems either on premise or on Cloud

· Build proactive data validation automation to catch data integrity issues.

· Troubleshoot and resolve data issues using critical thinking.

· Work independently and with team members to understand and document database structure and business processes.

· Communicate and present on technical information with non-technical team members and stakeholders.

· Comply with the standard and organization requirements relative to specific assignments.

Requirements:

At least 6+ years of overall IT experience with 3 years in building modern data pipelines (Spark) written in Scala/python etc.
Good Understanding of Optimizing the data pipelines for performance & scale
Good understanding of Orchestration tool like Airflow
At least 3 years of working with BigData system built on Hadoop along with the peripheral technologies like Spark/kudu/presto/Hive/Hbase/HDFS/Yarn etc
Hands on experience in Apache Kafka and similar streaming platforms provided by the cloud vendors.
Good understanding of Data profiling, data quality policies and rules to be built for data cleansing.
Good understanding of building different patterns of Data transformation
At least 2-3 years of experience in working with different types of data repositories Hadoop based Data Lake, Data warehouse, NoSQL database etc
Excellent in advance SQL, ETL. Must understand relation and ER diagrams/normal forms. How to design, create, extend/iterate, manipulate, seed with realistic data/data exploration, create and optimize queries, operate at low scale and high scale.
Experience in creation of data pipelines using Python. Python for data manipulation & transformation (python dictionaries, data frames, data stream, joins of all kinds, outside of SQL IDEs)
Experience with Data Warehousing Architecture
Research skills – go figure it out and come back with working model. Pros/cons analysis skills