Data Engineering

!data_engineering

@programming.dev
Create post
Efficiently Manage Memory Usage in Pandas with Large Datasets

Efficiently Manage Memory Usage in Pandas with Large Datasets

Open link in next tab

Efficiently Manage Memory Usage in Pandas with Large Datasets

https://geekpython.in/copy-on-write-in-pandas

Pandas supports Copy-on-Write, an optimization technique that helps improve memory use, particularly when working with large datasets.

Efficiently Manage Memory Usage in Pandas with Large Datasets
Shift Left
Open link in next tab

Just a moment...

https://medium.com/@nydas/4-key-benefits-of-shift-left-ff0e4bb74a3f?source=friends_link&sk=0941f5164f5115f6fe88191f0b1b9683

Dremio is offering free pdf copies of "Apache Iceberg: The Definitive Guide: Data Lakehouse Functionality, Performance and Scalability on the Data Lake"

Dremio is offering free pdf copies of "Apache Iceberg: The Definitive Guide: Data Lakehouse Functionality, Performance and Scalability on the Data Lake"

Open link in next tab

Dremio Unified Analytics Platform for a Self-Service Lakehouse

https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html

The Dremio Unified Lakehouse Platform for self-service analytics and AI, powered by a performant SQL Query Engine and Apache-native Lakehouse Management

Postgres vs. Pinecone | Lantern Blog | Narek Galstyan | July 18, 2024

Postgres vs. Pinecone | Lantern Blog | Narek Galstyan | July 18, 2024

Open link in next tab

Postgres vs. Pinecone | Lantern Blog

https://lantern.dev/blog/postgres-vs-pinecone

We respond to Pinecone's recent blog post comparing Postgres and Pinecone. We show that Postgres can outperform Pinecone in the same benchmarks Pinecone covered in their article.

Postgres vs. Pinecone | Lantern Blog
Definite: Comparing Iceberg Query Engines (with Duckdb and Iceberg Full Notebook Example) | Steven Wang | 7/3/2024

Definite: Comparing Iceberg Query Engines (with Duckdb and Iceberg Full Notebook Example) | Steven Wang | 7/3/2024

Open link in next tab

Definite: Duckdb and Iceberg

https://www.definite.app/blog/iceberg-query-engine

Definite: Duckdb and Iceberg

Definite: Duckdb and Iceberg
A guide how to adopt an existing Spark scala library for Spark Connect

A guide how to adopt an existing Spark scala library for Spark Connect

Open link in next tab

Spark-Connect: I'm starting to love it!

https://semyonsinchenko.github.io/ssinchenko/post/porting_deequ_to_sparkconnect/

Summary This blog post is a detailed story about how I ported a popular data quality framework, AWS Deequ, to Spark-Connect. Deequ is a very cool, reliable and scalable framework that allows to compute a lot of metrics, checks and anomaly detection suites on the data using Apache Spark cluster. But the Deequ core is a Scala library that uses a lot of low-level Apache Spark APIs for better performance, so it cannot be run directly on any of Spark-Connect environment.

Spark-Connect: I'm starting to love it!
Why Use Data Build Tools (dbt)

Why Use Data Build Tools (dbt)

Open link in next tab

Just a moment...

https://medium.com/@nydas/the-power-of-data-build-tool-dbt-6b26dfab5bac?source=friends_link&sk=4ad30d3dc20fe25a2d5d474372f6c71a

7 best open-source chart libraries for developers

7 best open-source chart libraries for developers

Open link in next tab

7 Best Chart Libraries For Developers In 2024 🤯

https://dev.to/latitude/7-best-chart-libraries-for-developers-in-2024-25he

Many applications use charts or graphs for data visualization, which can be implemented using...

7 Best Chart Libraries For Developers In 2024 🤯
Building a real-time data pipeline - Technical article and GitHub repo

Building a real-time data pipeline - Technical article and GitHub repo

Open link in next tab

Just a moment...

https://medium.com/@nydas/building-a-real-time-data-pipeline-5eff6c6d8a3c?source=friends_link&sk=8d8792af4527b0d6f91f430ebdc40fc7

Diagrams as Code

Diagrams as Code

Open link in next tab

Just a moment...

https://medium.com/@nydas/diagrams-as-code-streamlining-erd-creation-for-data-engineers-f1e305cf69ef