Project information

  • Development Language: Ruby on Rails, Python
  • Technology: ECS, Firehose, S3, Glue, PySpark, S3, Apache Iceberg, Snowflake
  • Cost Savings (Monthly): $$$$
  • Coolness Factor: 🌟🌟🌟🌟

Building a Data Lake

I was able to solve a couple big problems by building this project. First, our old pipeline was running a daily rds -> snowflake fivetran sync for a table that was receiving 30+ million updates daily. As you can imagine, this was very expensive and not very practical. By moving away from RDS and storing the data in s3 in iceberg format, we were able to eliminate the need for both RDS and fivetran. Since we still wanted the data accessible in snowflake, we leveraged Snowflake's new externally managed iceberg tables features to enable reading the data from the Snowflake console. Not only was this a big cost saver, but with some well thought-out table partitioning, our query times were also much better; big win!