I'm working on a query engine, essentially a tool to scan/filter/annotate by lookups/group by/aggregate a large dataset, tens-of-terabytes range. The compute part seems to be a bottleneck for me (I'll be doing around 80-300 GB/s of reads, and yes, I will have hardware capable of providing that kind of throughput). My hypothesis is that by encoding query in form of template arguments I can make the compiler generate code optimized for a specific type of query (like, the filtering or aggregation keys). But I do not know what queries will users send, so I need a way to instantiate templates at runtime.
Sounds simple: for a new type of query invoke a compiler at runtime to build a dynamic library with a new instantiation, then dynload it and off we go. Some prior work is here, though I'm pretty sure any JIT compiler also can counts here. But there's enough technical details to worry about, and at the same time this idea isn't novel, so I wonder—are there any packaged solutions for this kind of approach?
http://notes.valdikss.org.ru/jabber.ru-mitm/
https://www.theguardian.com/world/2023/oct/15/poland-exit-polls-pis-party-wins-most-votes-but-opposition-coalition-possible
Law and Justice may not be able to form a government successfully, leaving the door open to grouping led by Donald Tusk
https://arxiv.org/abs/2309.04259v1
https://lore.kernel.org/lkml/20230810223942.GG11336@frogsfrogsfrogs/
We are working on a tool that essentially allows external customers to access various extracts of our datasets, with parameterized filtering, aggregation, the usual stuff, though REST API. Some of these extracts are time consuming to prepare, so we are looking for ways to manage asynchronous report generation or making it possible for customers to schedule reports upfront, as opposed to having a synchronous API. There are tons of libraries for implementing synchronous REST APIs, but are there any standard approaches or tools for this kind of asynchronous cross-organizational communication? Like, maybe something that would allow each customer inspect their schedules and pending queries, configure how they want the results to be delivered? I fear we will need to build something like that from scratch.
https://web.archive.org/web/20190121065908/http://pcguide.com/ref/hdd/index.htm
@liori
@lemm.ee