Journey to Apache NiFi (nifi) on Kubernetes (k8s)

Anshuman Rudra
6 min readDec 17, 2020

NiFi is apparently not designed for k8s:

With over 14 years of maturity (initial release in 2006[1]), nifi is apparently not designed for k8s (k8s v1.0 was released on July 21, 2015[2]). There is a steep learning curve on understanding architecture of nifi & k8s to work them together. We won’t be wrong mentioning that NiFi is a monolith stateful system with long lived processes working on data flows, which isn’t a microservices paradigm that k8s is built on and well supports containerized apps with microservices[3] architecture.

Playing devil’s advocate (don’t judge us, that’s our swag) on how to bring the best of both worlds together (nifi & k8s), the cluster management in nifi (being the prime factor) builds the confidence in us to work with k8s.

Architecting production grade nifi eco-system with k8s:

Now that we convinced ourselves to make them (nifi & k8s) work together, it’s been a rough road ahead to see them as production grade. Thanks to our users to have trust and faith in us. Perseverance is the key as we worked on these challenges.

Reliable:

AWS managed k8s (EKS the savior) — Managed k8s service is a saviour in this journey. We faced a few issues, AWS support team helped us resolve them.

Unused orphan resources — We’re hit with unused orphan resources during development and rolling out updates. Since we tagged all k8s objects/resources, we ran a side-car (daemonset) to clean up, which helped prevent money leaks.

Hitting max limit on AWS resources per account — The unified platform is designed for Domain accounts, which helps us the PaaS offering of nifi to respective domain accounts. Domain users are enabled self-service of nifi deployment to their accounts.

High resource consuming flows taxing low/short consuming flows — By design all nifi flows share all resources (compute, memory, disk) in the cluster. There is no notion for priority/ranking in nifi (it’s not designed to be biased, you know). Long running flows with more volume tax short flows with low data. We designed feature-based nifi namespaces which isolates nifi clusters running different flows.

Resilient:

Data loss during nifi pod preemption — k8s scheduler tends to preempt pods to cater new requests, that’s led us in data loss for preempted pods. To prevent this, we mounted EBS for all nifi repositories (content, flowfile, provenance) to our statefulset pods. Build in checkpointing and snapshotting of the flowfiles in nifi helped to resume processing from the last checkpoint.

Volume issues — We encountered a few occurrences of node taint NodeWithImpairedVolumes=true:NoSchedule, when scheduler could not attach volumes for beyond 30m timeout and node becomes unusable for future scheduling. This happens when AWS tries to attach volume from different AZ for a statefulset (rarely). We fixed it by running a daemonset to cleanup[4] the taint on the node.

State loss issues — Processors in nifi are configured to external state management with zookeeper. State loss happened when pods of zookeeper got preempted by k8s. We decide to separate it as shared services with some others (nifi-registry, prometheus, grafana, redis) to isolated EKS cluster, thus segregating the k8s scheduling requests for nifi vs shared services. Zookeeper is also mounted to EBS to help prevent data (state) loss.

Flow loss issues — Our statefulset for nifi pods gets auto replenished by k8s (post pod preemption), when joins the nifi cluster and tries to sync flows from primary pod, fails if there is change (before pod preemption) on a node which isn’t yet synced to all the pods in the cluster. This makes cluster UI unusable with the error ‘Cluster coordinator is still syncing flows’. Our side-car daemonset is scheduled to sync the flows.xml.gz (authorizers.xml, users.xml) to remote object store (s3). These versions are pulled back by the pod on restarts.

Cluster coordinator (zookeeper) have already seen the client with old zid issue[5] — An upgrade to v3.6.1 of zookeeper with persistence and preferably in replicated mode (so quorum can withstand a certain amount of failures).

Secure:

TLS termination with ELB issue — AWS ELB terminates TLS before hitting the targets (ec2 nodes) and doesn’t pass the certificates to request handler pod. We employed an internal CA server to generate and provide certs to pods in the nifi cluster for pod communication over TLS[6].

Challenge achieving Sticky session for nifi OIDC provider — sessionAffinity for k8s statefulset object don’t provide stickiness as traffic routed through the AWS ELB is unaware of such classifications and don’t have an algorithm to handle it. Our side-car demon had 2 options to implement a sticky session. To implement consistent hashing (all network load balancer implements it). We soon faced issues when pods were scaled down and the OIDC cookie was lost with the pod. We fixed it by routing traffic to the zeroth pod for the OIDC sequence.

Overriding client session timeout for OIDC — Our in-house OIDC provider has limitations to configure client specific timeout of OIDC token, thus clients needed to handle the timeouts correspondingly. Since we designed to use base nifi release (easy for release management and upgrades), we did not override nifi OIDC implementation. Here comes the savior, our side-car demon, we intercept the token request to update timeout before responding to UI.

Handling logout from OIDC claimset — Again our in-house OIDC provider sends different claimset attributes for session logout. We intercept the logout request in our side-car demon and route to the claimsets logout uri for our OIDC provider.

Integrated RBAC with federated IAM roles for nifi and pod/node — This is organization specific. Our nifi pod could only assume restricted federated IAM roles to access data. This inherently makes the RBAC maps with the roles. To keep it simple, resilient and repeatable we designed RBAC definition (json structured, would also support yaml) with nifi resources as first class citizens to assign ACLs. Our side-car demon processes the RBAC config using nifi-toolkit api on bootstrap and syncs back from remote cache on pod restarts.

Issues with nifi-api on OIDC backed authN — We have a use-case where nifi flows need to be controlled by other systems (like Airflow). When nifi is backed by OIDC, there is no way the nifi-api request could get authenticated without prior login by the user, which defeats the purpose of automation. Even for service users nifi supports only one path for authN configured, thus it would redirect to the OIDC discovery uri for login. We decided to go with kubectl client authenticated by namespace role token of service account exec-ing nifi toolkit api on zeroth pod.

Managing sensitive values in nifi — Though nifi encrypts sensitive values to disk, they are not version control in nifi by design (else it defeats the purpose of sensitivity lol). Managing sensitive values manually for flows per environment is a pain in the neck. Not to forget authorization on the sensitive values are needed to prevent them from wrong hands (being a Sherlock Holmes). AWS secrets manager manages the sensitive values for us. Before nifi is bootstrap, again our sweet side-car demon loads the secrets to the container’s environment. NiFi could access them as variable registry arguments. Secrets are also hot-loaded on-demand through our CD pipeline job.

Scalable:

EKS Cluster autoscaler — We’ve configured an ASG per EKS cluster, which isolates it from another cluster. The limitation though is that we don’t have control over the time on attaching EC2 node on a k8s scale up request. We’ll work on having buffer hot nodes to reduce time to attach nodes to cluster on scale up requests. Though there is a trade-off between cost on idle nodes, we’re planning to offer the capability.

Horizontal pod scaler (HPA) are unaware of nifi metrics — We had to handle metrics of disk and cpu to decide to trigger scale up/down nifi statefulset by handling nifi node offloading with graceful termination timeout to prevent data loss.

Node Anti-affinity as anti-pattern — By design we didn’t choose to restrict pods to dedicated nodes, which invariably has frequent occurrence of resource taints for new requests.

Observability — We’ve designed our MALT stack for nifi eco-system. In the benefit of time we’ll talk about it in another post.

[1]https://en.wikipedia.org/wiki/Apache_NiFi

[2]https://en.wikipedia.org/wiki/Kubernetes

[3]https://microservices.io/

[4]https://github.com/BouweCeunen/clear-impaired-volumes-taint

[5]https://stackoverflow.com/questions/45804955/zookeeper-refuses-kafka-connection-from-an-old-client

[6]https://medium.com/swlh/operationalising-nifi-on-kubernetes-1a8e0ae16a6c

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Anshuman Rudra
Anshuman Rudra

Written by Anshuman Rudra

Senior Consultant | Cloud architecture | Big data platform engineering at scale | Cloud engineering

No responses yet

Write a response