Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: should node expose a persistent KV API #49663

Closed
cjihrig opened this issue Sep 15, 2023 · 33 comments
Closed

proposal: should node expose a persistent KV API #49663

cjihrig opened this issue Sep 15, 2023 · 33 comments
Labels
feature request Issues that request new features to be added to Node.js.

Comments

@cjihrig
Copy link
Contributor

cjihrig commented Sep 15, 2023

What is the problem this feature will solve?

Embedding sqlite in Node was proposed in 2019. There was not a lot of enthusiasm at the time. It's now been over four years, and since then at least three other runtimes have emerged with some functionality like this:

  • Deno embeds sqlite and uses it to power their KV API. In the cloud they keep the same KV API, but use FoundationDB instead of sqlite.
  • Bun embeds sqlite and exposes it as their bun:sqlite API.
  • Cloudflare Workers exposes their own KV API.

What is the feature you are proposing to solve the problem?

Embedding sqlite in the binary and providing an API similar to Deno KV or Bun's sqlite module.

Using the SQLite Amalgamation makes the embedding pretty simple. On my machine, embedding sqlite with a bare bones binding caused the binary size to increase from 93.45MB to 94.82MB (at commit dac5f29).

Other outstanding questions:

  • Does sqlite compile on all of Node's supported platforms? If not, there are also wasm builds available, though I would prefer native.
  • What would the public API looks like? Deno's KV does a better job at abstracting away sqlite because they swap out underlying databases in the cloud. Bun provides a more powerful SQL interface, but that also has a better chance of becoming a leaky abstraction.

What alternatives have you considered?

Status quo. Continue using better-sqlite3, sqlite3, etc. modules from npm.

@cjihrig cjihrig added the feature request Issues that request new features to be added to Node.js. label Sep 15, 2023
@tpoisseau
Copy link
Contributor

About second and third point. SQLite is also officially available in WASM

https://www.sqlite.org/download.html
https://sqlite.org/wasm/doc/trunk/index.md

I think if sqlite is integrated to Node.js (This is what I would prefer), focus must be on proper binding with low-level API and higher level API to interact with SQL (connection with DB, prepare statement, cursor, proper integration with javascript features like iterator pattern, choose between sync or async mode).

A KV API (like Deno) should not be a target for this kind of proposal. This kind of API is a bit to high and out of scope of a sqlite integration.

@mscdex
Copy link
Contributor

mscdex commented Sep 15, 2023

While lots of things would be nice to have in node core so you wouldn't have to install them separately, I think databases are one of those things best left to userland, mostly because I think node should be focused on as few things as possible so we can do them very well and not become a kitchen sink platform.

However, specifically in the case of sqlite (speaking as someone who maintains an sqlite addon) there are also at least these issues I can think of offhand:

  • Sqlite has a lot of compile-time knobs that would affect available features and performance characteristics. Due to this, there would realistically be no way to satisfy all users and may result in GH issues.

  • Sqlite is missing some features that would be desirable to general users (e.g. encryption). Adding those would mean needing to instead rely on forks or having to maintain our own fork/patches.

  • As with many types of databases and their client drivers, there are lots of opinionated ways to present APIs in such client drivers, some of which can have performance implications.

@Xstoudi
Copy link
Contributor

Xstoudi commented Sep 15, 2023

I disagree with @mscdex about how databases are not handled at all in Node versus how they are on other ecosystems and their kitchen sink platforms. Yet let's admit sqlite stands on a particular place in the relational database world.

As a one of the general users, having SQLite shipped builtin felt missing. The features that could be missing are mostly nice to have, better-sqlite3 API is nice to use (probably why Bun inspired from it) and I think it would be a good place to start.

@fabiospampinato
Copy link

IMO this is an unnecessary burden, the more stuff added to core the more time spent maintaining it, the bundled version of sqlite might not be the one the user wants to use, different compilations flags from the ones that the user wants to use may be used etc.

I think it'll be better to focus on shipping an FFI, then Node could talk to any native version of sqlite, or to other interesting binaries, efficiently with low overhead.

@fox1t
Copy link
Contributor

fox1t commented Sep 15, 2023

I like the proposal, and I am definitely on the KV API side. I see no value in just providing SQLite with a “raw” SQL API to interact with it.

@anonrig
Copy link
Member

anonrig commented Sep 15, 2023

We can also use sqlite as a basis for the window.localStorage API similar to what Deno implements.

@atomicwrite
Copy link

I say one up deno and since sqlite is faster at storing files than a file system, up them by having it as the base of a virtual file system that you can abstract to replace with cloud ones if desired

https://www.sqlite.org/fasterthanfs.html

@jasnell
Copy link
Member

jasnell commented Sep 15, 2023

Let's not worry about competing with other runtimes at all on this and base the decision solely on whether this will provide benefit to users. I think it would so I'm fine with it.

That said, Cloudflare has a KV API (backed by sqllite in workerd but production uses a completely different backend). Deno has a different kind of thing they also call KV but it's definitely not the same as Cloudflare's. Bun has something also. And, as @anonrig points out there's localStorage to think about. It would be a shame to introduce Yet Another KV API that is incompatible with everything else so while I definitely agree this is something to explore, I think we need to put some careful consideration into what API we implement here.

At the very least, it's worth experimenting with.

So... what are the use cases this would be used for?

@Jarred-Sumner
Copy link

We’ve considered doing localStorage in Bun, but I’m not convinced sqlite is the best approach for it. Something like LMDB makes more sense, particularly if the DB is not on a network drive. It becomes a challenge with servers, but it’s useful for CLIs or cases where you don’t need multiple servers to use the same data store

@atomicwrite
Copy link

I am hesitant to express this sentiment in a conversation involving individuals notorious for appropriating ideas, however, caching rdbms record from postgres or mysql to a local sqlite instance is how like 90% of real web back end optimize and reduce their cloud bill.

Especially mid 1m-5m contract companies. Every serialization, every network byte is metered. People hire out these people because the cost of their rewrite is cheaper than cloud bills. This is a trend.

If a record can have a ttl of 30 seconds even it can shave numbers off their bandwidth. Then there is the whole circus of deciding which data can be used this way. (Read/Write but low update tables).

Putting in a kv that is just a lower abstraction of sqlite with out the added benefits is 10ft view. Since it's happening at "mom and pop" shops frequently then it will roll over. Furthermore if you look at what companies like litestream are doing, using sqlite's replicating on a dumb aws bucket.. it's really amazing. Typical costs are only about $1 per month.

image

@mscdex
Copy link
Contributor

mscdex commented Sep 16, 2023

@atomicwrite I don't think anyone here is arguing that sqlite is not useful or capable. I believe the discussion is mostly about whether it makes sense to embed sqlite into node core.

@cjihrig
Copy link
Contributor Author

cjihrig commented Sep 16, 2023

After thinking about this more, I'm still not completely convinced that this is something we should implement, but it's definitely worth discussing.

I think the only way this makes sense is by building an abstraction on top of sqlite. That helps with any issues around users wanting to customize sqlite. It also makes swapping out sqlite for something else a lot easier. But, if we were to implement a KV type API, I think the point about "is sqlite the right choice" is valid.

what are the use cases this would be used for?

Off the top of my head:

  • Anything you would currently install a third party module for would now be a bit simpler.
  • Caching.
  • Local/session storage.
  • Potentially IndexedDB, but I don't think this is an API we would want to support.
  • Cookies.
  • Offline applications.
  • The point about a virtual file system is interesting, but I doubt that is something we would add to core.

In my opinion, the coolest use case is having a persistence API locally that is replaced with something else in the cloud (similar to Deno and Cloudflare). Unfortunately, that doesn't really translate for Node.

If anyone else has some interesting use cases, I would love to hear them.

@fox1t
Copy link
Contributor

fox1t commented Sep 16, 2023

@cjihrig I think KV could translate to Node.js too. It would be enough to have a concept of a “driver,” and Node.js ships the “local” one itself. I am a big fan of unified APIs that can use different things under the hood.

@GeoffreyBooth
Copy link
Member

I think the only way this makes sense is by building an abstraction on top of sqlite.

But if we build an abstraction layer, then we’ve just invented a new API. That flips the question around to “what API do you want” because sqlite becomes just an implementation detail, potentially replaceable with some other database if we desire.

So it’s not really about sqlite, because it feels backwards to say that the solution is to embed sqlite, what are the problems? If the API is along the lines of persistent key-value store, well, has that been proposed over Node’s 13 years’ history? If so, what were the discussions or concerns around it? If it hasn’t been proposed, is it just that no one considered such a feature before Deno and Bun, or is there a reason that such a feature would be a bad fit in Node?

I’m not opposed to embedding sqlite, but I feel like it should solve a significant use case that until now has been unaddressed, where sqlite (with or without an abstraction layer over it) is the best solution.

@hinell
Copy link

hinell commented Sep 17, 2023

As pointed out above by @mscdex, sqlite has a lot of compile-time flags & features to enable/disable. It would make integration a bit complicated and may risk omitting useful features.

The idea is cool but the hype isn't worth it.

@cjihrig cjihrig changed the title proposal: embed sqlite proposal: should node expose a persistent KV API Sep 17, 2023
@cjihrig
Copy link
Contributor Author

cjihrig commented Sep 17, 2023

I have updated the title of this issue. Please do not fixate on sqlite compile time flags. Raw sqlite will not be exposed to users. I started the discussion with sqlite because that is the common denominator between all of the other runtimes.

Now, the question is do we want a persistent KV API in Node. Deno and Cloudflare both expose one. If Node were to provide one, I would suggest implementing one of those APIs instead of creating a new one (my personal preference here would be to copy Deno's). In addition to a user facing API, Node could use a persistence layer to implement things like local storage.

EDIT: Another possible scenario is Node embeds something to implement things like local storage, but does not expose it to users at all.

@KhafraDev
Copy link
Member

I'm a -1 on adding localStorage, as the api is probably more complicated than most people realize. I wrote a wrapper (with no persistence) a few years ago assuming it'd be similar to Map/Set, but it's not.

  • Each key is also a getter/setter on the localStorage/sessionStorage object, so localStorage.item is the same as localStorage.getItem(item), localStorage.item = 3 is the same as localStorage.setItem(item, 3), and delete localStorage[item] is the same as localStorage.deleteItem(item).
  • You also have to worry about quotas, where localStorage is expected to throw if the total size of all values in localStorage exceeds 5 Mb.
  • Many behaviors are only documented with WPTs. Especially regarding symbol keys, which are completely undocumented, and I'm pretty sure are impossible to handle in JS-only land (mostly because of delete localStorage[Symbol('...')] with a Proxy IIRC).

@cjihrig
Copy link
Contributor Author

cjihrig commented Sep 17, 2023

@KhafraDev are you saying the API is more complicated for users or implementers? If it is more difficult to implement in JS, then I think that makes it a better candidate for core where it would not be a JS-only implementation. The fact that Deno was able to implement it and Bun is considering it makes me think Node should consider it (though I don't think we would want to implement Deno's --location flag).

The fact that the API is not the same as a Map doesn't sound too bad. Implementing the quotas also seems straightforward. It's unfortunate that some of the specification comes from WPTs, but that doesn't seem like a deal breaker to me.

@KhafraDev
Copy link
Member

If it's written natively, my concerns are mostly solved. My expectation was that it'd be written in js like most other specs are. The api is good enough (you can essentially do everything that a Map/Set can do, with some work). It is synchronous, which might cause issues, but if we were to choose a kv api, it'd be the easiest to use and documentation already exists on mdn and a million other places.

@anonrig
Copy link
Member

anonrig commented Sep 17, 2023

If it's written natively, my concerns are mostly solved.

I personally don't see any new features implemented in JS unless there is a significant benefit compared to performance regression.

@hinell
Copy link

hinell commented Sep 18, 2023

@KhafraDev The main feature behind SQL that justifies its integration is database file that can serve as universal storage.

@bnoordhuis
Copy link
Member

A ding against localStorage as KV abstraction is that it doesn't support range queries. You can't say "give me all keys or values between bar and foo", you instead have to do a linear scan...

...and actual performance may be closer to O(n^2) because the naive approach to implementing localStorage.get(i) is with a SELECT key FROM storage OFFSET ${i} LIMIT 1, which is a linear scan in itself.

(The average case is likely O(n) or O(n*log(n)) because I expect sqlite has a fairly good notion of where offset i lives on disk but you can't rely on database heuristics for predictable performance.)

@robtweed
Copy link

Having a physically or virtually embedded high-performance KV store for Node (and now Bun) is something our company has been working on for many years. We've determined that the best database for this purpose, in terms of both performance and multi-model capability is actually not one most of you will ever heard of - Global Storage Databases. I've created this set of resources for people to learn about them and understand their capabilities, to see why we have reached this conclusion. We've also done comparisons with other potential candidates such as BerkeleyDB and LMDB that show them to have superior performance. These are mature database technologies that have been used for decades in particular market sectors and proven in terms of performance, scalability and functionality and yet, as I say, are largely unknown in the wider IT mainstream.

Take a look here:
https://github.com/robtweed/global_storage

The candidate Global Storage DB we'd propose as a KV store is the Open Source YottaDB:
https://yottadb.com/

We're currently working on a next generation high-performance NAPI interface for it, with both networked and API/in-process connectivity options (the latter providing the highest levels of performance).

We believe this database should be seriously considered as the KV store for Node (and, of course Bun!)

Happy to provide any further information and keen for further dialogue

@mcollina
Copy link
Member

mcollina commented Sep 21, 2023

Let me answer a few of those questions:

  1. would Node.js benefit from a stable API to storage data in a structured manner on disk, in a way that can support good random query access, without installing binary addons? Yes
  2. should it expose SQLite, allowing users to run SQL? No. Long term, It would create applications that would be hard to port between different version of Node.js. The API has to be abstracted. I recommend taking inspiration from https://github.com/Level/classic-level.
  3. should that API be powered by SQLite? Maybe, I would personally prefer embedding LevelDB or RocksDB; however, we should focus on the API first.
  4. should the API be modelled after localStorage? No, it does not have sorted range access. (this is critical for secondary indexes).

@coreybutler
Copy link
Member

I'd also be much more interested in a KV store than SQLite. When apps need a full RDBMS, there are community drivers available to do this. When you just need lightweight storage, KV is typically faster and much smaller. Last time I looked a Level/RocksDB, they still had a lot of build problems on Windows.

Just food for thought:

The Go community has a standard SQL library. It doesn't do everything for every RDBMS, but it provides a consistent API for working with ANSI standard SQL. It avoids the nuances of individual RDBMS platforms by focusing on standards instead of DB implementations. The Go community has a more unified driver ecosystem as a result, and the Go team manages a single standard instead of many. Most community drivers extend the standard library, so they can still leverage DB-specific features.

@GeoffreyBooth
Copy link
Member

What use cases would a persistent KV store solve? In a sense, we already have one solution for the generic “read and write data persistently” feature request: files on disk. I can’t count how many scripts I’ve written over the years that read and wrote JSON files and treated them as a de facto database. I don’t mean to argue that therefore the use case is solved, but I think it begs the question what value does a new solution provide that this primitive solution doesn’t?

@MoLow
Copy link
Member

MoLow commented Sep 21, 2023

In my opinion, the coolest use case is having a persistence API locally that is replaced with something else in the cloud (similar to Deno and Cloudflare). Unfortunately, that doesn't really translate for Node.

I agree this would be the most interesting scenario and if storage will be added as a core module, we should think of an API to either mirror or forward the data to a network implementation

another use case I had in mind that will benefit of a storage solution is electron apps

@coreybutler
Copy link
Member

@GeoffreyBooth Performance and standardization. I don't want to read an entire JSON file into memory just to access a couple of keys. This gets particularly annoying when values are large (i.e. using a KV as a cache). You could write individual files to disk for each key, but that's a lot of overhead. By providing a KV store, developers can reliably interact with data via code without having to worry about whether they're loading a whole file, a partial file, or individual records as files.

@mcollina
Copy link
Member

What use cases would a persistent KV store solve? In a sense, we already have one solution for the generic “read and write data persistently” feature request: files on disk. I can’t count how many scripts I’ve written over the years that read and wrote JSON files and treated them as a de facto database. I don’t mean to argue that therefore the use case is solved, but I think it begs the question what value does a new solution provide that this primitive solution doesn’t?

Well... no. That approach has a decent performance only for small data, and even at that it's not great. Fast, random based access is key.

@robtweed
Copy link

I would actually question the need for Node to embed any particular database to achieve the aim of a KV store. One reason is that the version of the selected database to be embedded will need to be kept up to date. The second reason is that whatever database is chosen, there will be loads of people that don't like the choice made (usually it's because it's not the one they are familiar with rather than it being the "best" fit for Node or the best/fastest/most functional database).

The NAPI interface provided by both Node and Bun has the potential for very high performance in-process/API connection to a database. However, it's usually limited by one or other (or both) of two things:

  • most "mainstream" databases, even the ones marketed as super-fast, are glacially slow by comparison with the performance of Node or Bun. Those crazily high request processing rates of your favourite super-fast Node/Bun Web Framework evaporate as soon as you do any real work that involves accessing your typical database. What's required is a database with near in-memory performance for both read and write - almost none meet that criterion in our experience.

  • even if you use such an ultra-high performance database (eg the YottaDB database I've mentioned earlier), and connect via its in-process API which has the potential for delivering its full performance, you find that there are significant bottlenecks in the V8 API which seriously compromise performance from JavaScript. We've tried to have this looked into and resolved - we first identified and reported it in 2016 - but to no avail sadly:

https://bugs.chromium.org/p/v8/issues/detail?id=5144

However, my colleague Chris has found ways to mitigate many of these bottleneck issues and his latest mg-dbx-napi interface provides probably the best in-process database interface performance you'll see anywhere right now.

https://github.com/chrisemunt/mg-dbx-napi

Nevertheless it would be good to see some work done on optimising the V8 API to allow the full performance of the very fastest databases to become available from Node. Similar work is needed on the equivalent API in WebKit used by Bun.

I think efforts focused in this area would be more effective and appropriate than a consideration of embedding a database, and would potentially benefit access to a wide range of databases.

@gengjiawen
Copy link
Member

I am -1 on localstorage. I agree @bnoordhuis , it's not good api.
Many web api just sucks, following it just because it's standard won't do Node.js any good.

I am super +1 on bundle sqlite3. I think python has it for a reason.
It's basically got db support on a runtime. In the long run, it will bring more values on server support and imagination.

@KhafraDev
Copy link
Member

I agree that the spec is bad, but localStorage is good enough for most cases. Exposing sqlite would let many better alternatives pop up, but I'm sure that's not possible (#50169 (review)).

The spec is simple and it will literally never change. If someone is willing to work on it, I see no reason to block it.

@bmeck
Copy link
Member

bmeck commented Oct 16, 2023

I'm open to many APIs but generally want one that does temporary file equivalent utility, not necessarily long-term persistence. Long term persistence has a lot more nuance regarding things like update policies and external tooling considerations so things like security scanners/machine provisioning can utilize it.

It would be good to add permissions best practices to this plan. It is simple enough but should be documented just to aid in configuration and ensuring that permissions are not bypassed by adding a few tests.

I'd also lean on keeping the data outside of the application directory since that would:

  1. allow applications to have a readonly directory and still use this feature (ok if this is not the default).
  2. allow sharing of this between cooperative concurrent services (this might be considered out of scope, but likely the K/V should be able to sync for workers/cluster/etc.). having races is always something we should document if we could end up with things like partial writes.
  3. allow general compliance with things like XDG directory expections and how they map to other systems for the purpose of things like system restore/disk cleanup mechanisms (this is open to debate but likely not a needed debate as long as the ways to cleanup / find the data are documented).

I'd note that all of these can just be small nits/documentation and not big discussions from my viewpoint.

@cjihrig cjihrig closed this as completed Nov 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js.
Projects
Status: Pending Triage
Development

No branches or pull requests