This role is with the PFNA Data & Digital Solutions team within PFNA IT.
In this role, you will play a key role in advancing PFNA data solutions & capabilities. You will execute data-related initiatives in one or more functional domains (finance, sales, marketing, supply chain) that take raw data and transform it into actionable insights.
You will be a hands-on player that moves data from raw source systems to our PFNA integrated data fabric, and then makes it accessible for business users to explore. You will be responsible for building data systems and pipelines to feed into prescriptive and predictive modeling initiatives by establishing and enhancing processes around data capture, storage, accessibility, transformation and reliability.
You will work with PFNA IT and business professionals across multiple functions to drive and support the PFNA transformation to an insights-driven culture. You will identify opportunities for more streamlined and automated approaches to solve business problems with data/digital solutions and advocate for their adoption.
You will help manage and mentor other PFNA IT associates working in the data & digital space in support of the above goals.
• Build data systems and pipelines as per PFNA business needs to support both business intelligence and data science / ML / analytics initiatives
• Collect, structure, analyze, organize and maintain RAW data from various data sources needed for creating predictive models in structured databases in order to ensure faster model building
• Design, build and codify data structures in efficient way to periodically feed in raw data from various internal and external sources and also manage and house model outputs for quick input to businesses;
• This role works closely with analytics, data science and business intelligence teams in both IT and the PFNA business user community
• Establish periodic data verification processes to ensure data accuracy
• Build new technologies and algorithms to optimize any business process around creation and maintenance of databases/data lakes running of batch processes for data update
• Use large data sets to resolve major business and functional issues while improving data reliability, efficiency and quality
Qualifications / Requirements :
- BE/B TECH in computer science or related technical field
- 5 years of experience in the field of data structures, building and managing data lakes
- Past experience in data engineering teams in consulting or other industries. CPG industry experience a plus.
- Experience in relational databases as well as unstructured data streams
- Hands-on experience in SQL database design
- Experience with multiple modern data technologies such as Hive, Spark, SQL, Kafka, Sqoop, Infoworks, Python along with traditional relational database technologies such as Teradata, Oracle, SQL Server, etc
- Experience with data lake ETL & query technologies such as Denodo, Presto, Databricks
- 5+ years of experience with schema design and dimensional data modeling
- Expert in new age data platforms, understanding of relational as well as unstructured data, experience in data lake architecture and creation
- Experience designing, building and maintaining data processing systems
- Experience working on agile projects & with agile toolsets (e.g. Azure Dev Ops, Jira, etc)
- Provides training and mentoring to less experienced team members
- Ability to adapt to change quickly and handle unforeseen requirements effectively with limited assistance for prioritization.
- Assists lower level employees with resolving unforeseen requirements. Leading and setting team priorities.
- Strong understanding of data, systems and end to end data processes within functional area.
- Proven self-starter who can move projects forward by filling in the gaps on Agile teams, from leading a design session to doing some test automation, to mentoring a teammate struggling with a new technology
- Problem solving aptitude
- Under limited supervision, ability to prioritize own work in functional area, applying enhanced analytical skills to master data requests.
- Proactively adjusts work assignments and schedule to meet changing team priorities.
- Experience with specific Azure and/or AWS technologies (such as Glue, S3, Redshift, EMR, and Kinesis)
- Experience in developing and deploying CI/CD pipelines to streamline/automate development and testing processes
- Hands on with Machine learning concepts & skills – Regularization, Gradient-based optimization Hyper-parameter tuning, models like Random Forest
- Familiarity with Scala, Java or C++ is an asset
- Knowledge of predictive modeling, machine-learning tools and techniques
- Data engineering certification (e.g IBM Certified Data Engineer) is a plus
- Experience functionally with data in sales, marketing, finance or supply chain a significant plus