•

Efficiently Manage Memory Usage in Pandas with Large Datasets

https://geekpython.in/copy-on-write-in-pandas

Pandas supports Copy-on-Write, an optimization technique that helps improve memory use, particularly when working with large datasets.

Efficiently Manage Memory Usage in Pandas with Large Datasets

0 comments

Privacy

sem

•

FTC Issues Orders to Eight Companies Seeking Information on Surveillance Pricing

https://www.ftc.gov/news-events/news/press-releases/2024/07/ftc-issues-orders-eight-companies-seeking-information-surveillance-pricing

The Federal Trade Commission issued orders to eight companies offering surveillance pricing products and services that incorporate data about consumers’ characteristics and behavior.

FTC Issues Orders to Eight Companies Seeking Information on Surveillance Pricing

25 comments

Data Engineering

sem

•

A guide how to adopt an existing Spark scala library for Spark Connect

Spark-Connect: I'm starting to love it!

https://semyonsinchenko.github.io/ssinchenko/post/porting_deequ_to_sparkconnect/

Summary This blog post is a detailed story about how I ported a popular data quality framework, AWS Deequ, to Spark-Connect. Deequ is a very cool, reliable and scalable framework that allows to compute a lot of metrics, checks and anomaly detection suites on the data using Apache Spark cluster. But the Deequ core is a Scala library that uses a lot of low-level Apache Spark APIs for better performance, so it cannot be run directly on any of Spark-Connect environment.

0 comments

Privacy

sem

•

What is the most appropriate way of tracking web traffic?

I have my personal blog, made with Hugo and hosted on GitHub pages. Initially I did not turn on any kind of web tracking / web analytics, because I do not like tracking at all. But I want to make my blog better and to achieve it, I need a feedback loop about traffic. For example, what are the most popular publications, or how many people view my blog from mobile devices, etc.

So, my question is, what is the most appropriate (ot the less evil) way to track a web traffic?

An answer "there is no good way to do it without breaking user's privacy" is acceptable too, I did not decide yet turning on the analytics. Instead I'm interested in an opinion of the community.

Thanks in advance!

17 comments

Green - An environmentalist community

sem

•