Job Description
Responsibilities include:
Able to work in the design, development and implementation of large statistical databases in DataIku/Hadoop environment
Able to work in the implementation of statistical and econometric models in Python/PySpark/R on the DataIku platform
Able to facilitate processing large data in Hadoop environment using Spark/PySpark/RSpark
Ensure data integrity through – data quality, validation, governance and transparency
Production deployment and model monitoring to ensure stable performance and adherence to standards
Skills required:
Experienced professional with 10-12 years of experience developing and implementing statistical models in Big Data ecosystem, i.e., Hadoop, Spark, HBase, Hive / Impala or any other similar distributed computing technology
Proficiency with Python/R and basic libraries for statistical/econometric modeling such as scikit-learn, pandas
Experienced in Hadoop, Spark, HDFS, Python, R, PySpark and other leading technologies
Proficiency with DataIku or similar tools
Proficiency in data analysis using complex and optimized SQL and / or above-mentioned technologies
Understanding of data architecture, structures, data modeling and database design and performance management
Good written and verbal communication skills
Proficiency / Experience with the following a plus:
In-depth understanding of Statistics
Finance, Mortgages, Bank Deposit Products