Introducing Embedded Replicas: Deploy Turso anywhere

the local speed of SQLite, with the convenience of a remote database.

Cover image for Introducing Embedded Replicas: Deploy Turso anywhere

Ever since launching the libSQL project and the Turso serverless database, we have strived to transform SQLite from an embedded-only database into a database that can work well for production-grade networked environments. But we also want to leverage the best capabilities users already know and love from SQLite, and working as an embedded database is a big part of that story.

Today we are releasing Embedded Replicas: it allows users using VMs or VPS to replicate a Turso database inside their application. It provides you with the ability to seamlessly transition between local and remote based on what's best for the specific situation for the same database, and makes Turso into a database that can work well anywhere.

Oh, and users can create unlimited Embedded replicas on any plan, including the free Starter plan.

Imagine being able to serve reads in microseconds while keeping your data in sync between your services. Makes for a great analytics platform that doesn't disrupt your main database, without the need for any ETL job.

Or how about an architecture so simple and fast that you don't even need external caches. Always have your reads with you, whether or not you have constant connectivity to the internet.

With embedded replicas, you can do that and more!

#Turso until now: local, as an embedded database, or fully remote.

The advantages of local execution with an embedded database, are that the Database is:

  • easy and reliable to use, with no network issues or configuration,
  • simple and fast to be integrated into CI pipelines, no services or containers are required,
  • fast, with microsecond-level queries.

It is important to note that the level of magnitude speed improvement delivered by local databases not only makes your application faster, but allows entirely different ways to interact with it. Concerned about N+1 query patterns? Well, that's a concern no more if you can do 1,000 queries, and still be faster than you would if you had to go over the network.

But dealing with a fully local embedded database comes with its own disadvantages:

  • you have to handle your own backups,
  • it is hard to scale out and add more servers to the mix, unless you handle your own replication protocol,
  • it is just impossible to do in some environments, like serverless, where a filesystem is not present.

Turso was designed to tackle the disadvantages of embedded mode, by making the database accessible over the network.

Local access as an embedded database was retained, but that was seen as a development-only feature: you could add files to your CI, and you could develop your application locally, with a file URL:

import { createClient } from '@libsql/client';

const client = createClient({
  url: process.env.DB, // will be 'file:localdev.db`
  authToken: process.env.TOKEN, // stays empty
});
const res = await client.execute('SELECT * from users;');
for (const row of res.rows) {
  console.log(`name: ${row['name']}, age ${row['age']}`);
}

but when it was time to handle live traffic, that was done over HTTP. This could be easily done by just making configuration changes, making sure the remote URL was used to create the client, and that the authentication token was present:

const client = createClient({
  url: process.env.DB, // will be `libsql://your-db.turso.io`
  authToken: process.env
    .TOKEN`can be obtained with turso db tokens create your-db`,
});

#Embedded Replicas: local and remote

Embedded replicas tear down the walls between local and remote modes of operation. It works similarly to other technologies that replicate SQLite databases, but with a twist: instead of having a local embedded database that can be copied somewhere else, the source of truth is the remote database. All writes are still done to the remote database. The writes are then synchronized into all of the replicas, allowing network-free local reads.

#Fast APIs

If you are deploying your API anywhere that has a filesystem, such as a VM or VPS, embedded replicas will provide you with the fastest possible reads, while still allowing you to scale out: need more requests? Bring a new server, sync your local instance with your main Turso copy, and start serving your microsecond-level requests. Only two changes are needed to your code.

First, when creating the client, you specify a local file and a synchronization URL:

import { createClient } from '@libsql/client';

const client = createClient({
  url: 'file:local.db',
  syncUrl: process.env.DB,
  authToken: process.env.TOKEN,
});

And also, add calls to client.sync(). You can add them periodically in the background, explicitly before important selects, or whenever your application wants

await client.sync(); // alternatively, do it every 1s in the background.
const res = await client.execute('SELECT * from users;');

#Ad-hoc replication for fast analytics

Oftentimes, one of the concerns raised by organizations is that they want to run analytical workloads on the dataset. Analytical queries tend to be heavy, and the concern is that it may disrupt the main transactional databases. Standard Replicas or complex ETL jobs are often used to solve this issue, but long-lived replicas have a cost to run, and ETL jobs are no-one's idea of fun.

Embedded replicas allow you to create an ad-hoc replica locally, do whatever you want with it without disrupting the main database, and then let them go. You can keep the file alive, but shut down your compute, meaning the next synchronization point will only download the changes since the last job!

import { createClient } from '@libsql/client';

const client = createClient({
  url: 'file:local.db',
  syncUrl: process.env.DB,
  authToken: process.env.TOKEN,
});

await client.sync();
const res = await client.execute(`
SELECT
    o.customer_id,
    SUM(oi.quantity * oi.price) AS total_revenue
FROM
    orders o
JOIN
    order_items oi ON o.order_id = oi.order_id
GROUP BY
    o.customer_id;`);
console.log(res);

#Ditch your cache

There are two hard problems in computer science: naming things, cache invalidation, and off-by-one errors. Embedded replicas may not solve the first one, but by allowing an in-process, you can rearchitecture your application in such a way that external caches are not needed. Embedded replicas have the following advantages over external caches:

  • No need for invalidation: by controlling how often you sync, you can control how fresh your data is.
  • SQL: there's no need to change your programming model to reach out for a cache, and manage both a SQL and a KV model. The replica is also still transactional with snapshot isolation. While you may see data in the past, all relationships will still be consistent.
  • Optional strong consistency on a request-basis: need a particular request to be fully consistent? Transparently go straight to the main database. Okay with a 5 second delay? Query your transactionally consistent copy.
  • No network: caches exist to make your application faster. Nothing will ever be as fast as a local copy that lives on the same server as your API.

#Limited network connectivity

Embedded replicas allow Turso to be used at, or closer to, the far edge. Have a device that has limited connectivity? You can always keep reading your database.

#Clients

At launch date, embedded replicas are available for JavaScript/Typescript, and to get started, make sure your @libsql/client package is version 0.3.5 or higher.

Go, Python and Rust packages are in beta state. Examples are available in our Github repository.

#How much do embedded replicas cost?

We haven't announced pricing for embedded replicas, and they are currently free to use for all users on all plans. We will give at least a few months of advanced notice if and when this changes.

scarf