About

This blog is home for my experiences and adventures in data engineering. (more)


Latest posts

Apr. 4, 2023

Journey of Performance tuning our Java API's

We have recently ported all our application from our own cloud solution to AWS public cloud and switching out all our services to depend on AWS specific services like Dynamodb and Managed Airflow and Secrets manager for Storing secrets and most important of all amazon Elastic compute services to run our containerized applications. Since all of our application are run on JVM a very attractive proposition is to switch all of our application to run on amazon’s own Arm based Gravitron instances.

Oct. 27, 2022

Using Self Signed Certificates in Java Applications

Authentication and authorization in backend application is a very common phenomenon. We are using Cloudfoundry uaa to handle authentication and authorization. Now as part of our migration we had to create an internal uaa instance for dev use. Now we need to use the uaa server with our spring applications along with other uaa servers. This should be easy! Deploy application in a Ec2 instance. Create a Elastic IP/Static IP. Use the IP address in the application.

Oct. 6, 2022

Partitions, Partitions and Partitions.

Recently we have come across an issue where one of the component of our ETL pipeline hav exploded in aws cost. When we took a deeper look into it we learnt an expensive lesson on data partitions. Overall architecture of our Data ETL pipeline is as follows flowchart TB subgraph On premise subgraph region 1 A1[Onsite Database 1] -->|Edge agent| B1(Regional Edge server) A2[Onsite Database 1] -->|Edge agent| B1 end subgraph region 2 A4[Onsite Database 1] -->|Edge agent| B2(Regional Edge server) A5[Onsite Database 1] -->|Edge agent| B2 end end B1 --> C{S3 Bucket} B2 --> C C --> |Lambda Function trigger| D[(Dynamodb Record)] C <====> E(AWS Glue job) D --> |List of Unprocessed Files| E E-->|CSV to parquet| F[S3 TRANSFORMED] F --> G[(redshift)] In this whole Workflow majority of cost is occurred on AWS Glue and Redshift database on which our API server operates, and rightfully so.

Sep. 26, 2022

Migrate! To AWS we go.

I was given the task of investigating and migrating apps to aws. The Objective is simple. Migrate all of Application running on Cloudfoundry to Amazon web services. Lets get started.all of our applications are plain spring boot 1.5x applications and containerizing spring boot application is easy. Container Runtime. First step of challenge is choose where to run our applications. If these were my personal applications i would rent out bunch of Linux VM’s either from AWS or from other simple cloud services and run them all as docker containers individually or as simple systemd services which is a solution that will not work for my enterprise for multitude of reasons.

Mar. 24, 2021

Building Data Warehouse for Report Data

Objective The overall objective of this effort is to build reliable and scalable pipeline to enable reporting services and further analytic service on the manufacturing data.This include following the necessary SLA’s on data security and complince to GDPR.The overall steps include but not limited to Transport sata from SQL server to object storage with any data loss Preserve databackup with Disaster recovery for upto 2 years from their insert date. Present the data after applying necessary transformation like contextualization of data Serve the data when requested with necessary parameters.