•

Efficiently Manage Memory Usage in Pandas with Large Datasets

https://geekpython.in/copy-on-write-in-pandas

Pandas supports Copy-on-Write, an optimization technique that helps improve memory use, particularly when working with large datasets.

Efficiently Manage Memory Usage in Pandas with Large Datasets

0 comments

Data Engineering

nydas

•

Shift Left

Just a moment...

https://medium.com/@nydas/4-key-benefits-of-shift-left-ff0e4bb74a3f?source=friends_link&sk=0941f5164f5115f6fe88191f0b1b9683

3 comments

Data Engineering

ericjmorey

•

Dremio is offering free pdf copies of "Apache Iceberg: The Definitive Guide: Data Lakehouse Functionality, Performance and Scalability on the Data Lake"

Dremio Unified Analytics Platform for a Self-Service Lakehouse

https://hello.dremio.com/wp-apache-iceberg-the-definitive-guide-reg.html

The Dremio Unified Lakehouse Platform for self-service analytics and AI, powered by a performant SQL Query Engine and Apache-native Lakehouse Management

1 comments

Data Engineering

ericjmorey

•

Postgres vs. Pinecone | Lantern Blog | Narek Galstyan | July 18, 2024

Postgres vs. Pinecone | Lantern Blog

https://lantern.dev/blog/postgres-vs-pinecone

We respond to Pinecone's recent blog post comparing Postgres and Pinecone. We show that Postgres can outperform Pinecone in the same benchmarks Pinecone covered in their article.

0 comments

Data Engineering

ericjmorey

•

Definite: Comparing Iceberg Query Engines (with Duckdb and Iceberg Full Notebook Example) | Steven Wang | 7/3/2024

Definite: Duckdb and Iceberg

https://www.definite.app/blog/iceberg-query-engine

Definite: Duckdb and Iceberg

0 comments

Data Engineering

sem

•

A guide how to adopt an existing Spark scala library for Spark Connect

Spark-Connect: I'm starting to love it!

https://semyonsinchenko.github.io/ssinchenko/post/porting_deequ_to_sparkconnect/

Summary This blog post is a detailed story about how I ported a popular data quality framework, AWS Deequ, to Spark-Connect. Deequ is a very cool, reliable and scalable framework that allows to compute a lot of metrics, checks and anomaly detection suites on the data using Apache Spark cluster. But the Deequ core is a Scala library that uses a lot of low-level Apache Spark APIs for better performance, so it cannot be run directly on any of Spark-Connect environment.

0 comments

Data Engineering

nydas

•

Why Use Data Build Tools (dbt)

Just a moment...

https://medium.com/@nydas/the-power-of-data-build-tool-dbt-6b26dfab5bac?source=friends_link&sk=4ad30d3dc20fe25a2d5d474372f6c71a

1 comments

Data Engineering

gecloslatitude

•

7 best open-source chart libraries for developers

7 Best Chart Libraries For Developers In 2024 🤯

https://dev.to/latitude/7-best-chart-libraries-for-developers-in-2024-25he

Many applications use charts or graphs for data visualization, which can be implemented using...

0 comments

Data Engineering

nydas

•

Building a real-time data pipeline - Technical article and GitHub repo

Just a moment...

https://medium.com/@nydas/building-a-real-time-data-pipeline-5eff6c6d8a3c?source=friends_link&sk=8d8792af4527b0d6f91f430ebdc40fc7

0 comments

Data Engineering

nydas

•

Diagrams as Code

Just a moment...

https://medium.com/@nydas/diagrams-as-code-streamlining-erd-creation-for-data-engineers-f1e305cf69ef

7 comments