Job Role: Sr Flink Platform Support – DataStreams API & AKS Expert
Job Location: Dallas/Ft Worth, Texas
Job Tenure: Contract
Overview:
We are seeking a seasoned Flink Engineer with 7+ years of experience in stream processing and backend systems, including 3+ years of hands-on experience implementing Apache Flink using the DataStreams API in production. This role focuses on designing, deploying, and supporting Flink applications on Azure Kubernetes Service (AKS), with end-to-end responsibility for both application logic and infrastructure. Technical Lead role with excellent communication skills required.
Mandatory Requirements:
• 3+ years of experience using Apache Flink, specifically the DataStreams API.
• Proven delivery of production-grade Flink implementations with documentation or case studies.
• Active engagement with at least one current client using Flink (DataStreams API).
• Competence in state management (checkpoints and savepoints) with local storage.
• Configuration of connectors like EventHub, Kafka, and MongoDB.
• Implementation of Flink API Aggregators.
• Handling watermarks for out-of-order events.
• Management of state using Azure Data Lake Storage (ADLS).
• Set up a private Flink cluster within a designated AKS environment.
• Configure both session-based and application-type deployments.
• Define and build nodes and slots.
• Manage and configure Job/Task Managers.
• Establish necessary connectors, e.g., external storage for the Flink Cluster.
• Configure heap memory and RocksDB for state management.
• Define and set up checkpoints and savepoints for state recovery.
• Enable Auto-Pilot capabilities.
• Integrate network resources, such as Azure EventHub and external databases like MongoDB.
• Implement integration with ArgoCD for job submissions.
• Install LTM agents for logging and Dynatrace agents for monitoring purposes.
• Provide access to the Flink Dashboard.
• Establish High Availability (HA) and Disaster Recovery (DR) configurations.
Core Responsibilities:
Functional:
• Build Flink applications using the DataStreams API and process functions.
• Manage state (checkpoints, savepoints) using RocksDB and ADLS.
• Handle out-of-order events with watermarking and implement aggregators.
• Integrate with Kafka, EventHub, and MongoDB.
Infrastructure & Non-Functional:
• Set up and manage Flink clusters in AKS with 2+ years experience in Kubernetes-based deployments.
• Configure session-based and application deployments.
• Optimize memory, RocksDB, and task/job manager parameters.
• Implement HA/DR, state recovery, and auto-pilot features.
• Integrate with ArgoCD, Dynatrace, and logging agents (e.g., LTM).
• Ensure observability via Flink Dashboard and monitoring tools.
Qualifications:
• 7+ years in backend or distributed systems engineering.
• 3+ years with Apache Flink (DataStreams API).
• 3–5 years working with Kafka, Azure EventHub, or similar streaming platforms.
• 2+ years of experience deploying and managing applications in AKS or Kubernetes environments.
• Proficiency in CI/CD, infrastructure-as-code, and cloud observability tools.
• Strong communication and documentation skills.
Deliverables:
• Deployed Flink applications with complete monitoring and logging.
• Managed Flink infrastructure with high availability and disaster recovery.
• Deployment automation via ArgoCD and full integration with observability stack.