Opening for Esteem Client -
PySpark, Python
Must have Skills
- Implementing data ingestion pipelines from different types of data sources i.e Databases, S3, Files etc..
- Developing Big Data and non-Big Data cloud-based enterprise solutions in PySpark and SparkSQL and related frameworks/libraries,
- Experience in building ETL/ Data Warehouse transformation process.
- Experience working with structured and unstructured data.
- Developing scalable and re-usable, self-service frameworks for data ingestion and processing,
- Integrating end to end data pipelines to take data from data source to target data repositories ensuring the quality and consistency of data,
- Processing performance analysis and optimization,
- Bringing best practices in following areas: Design & Analysis, Automation (Pipelining, IaC), Testing, Monitoring, Documentation.
- Performance Improvement understanding and able to write effective, scalable code
- Security and data protection solutions
- Expertise in at least one popular Python framework (like Django, Flask or Pyramid)
- Knowledge of object-relational mapping (ORM)
- Developing Big Data and non-Big Data cloud-based enterprise solutions in PySpark and SparkSQL and related frameworks/libraries,
- Bringing best practices in following areas: Design & Analysis, Automation (Pipelining, IaC), Testing, Monitoring, Documentation.
Good to have (Knowledge)
- Experience in cloud-based solutions,
- Knowledge of data management principles.