Not long ago, Turso announced pricing and usage plans that include a free Starter plan and a paid Scaler plan. There are allowances for the number of databases and locations in each plan, which are easy to understand. But these three observable metrics are more difficult to predict and optimize:
Number of rows read (monthly)
Number of rows written (monthly)
Total amount of storage
The implementation of these observations is rooted in SQLite internal details, and you can get some details about how Turso accounts for them in the billing documentation. It’s mostly straightforward stuff, but there are a couple things pointed out by the documentation that might be surprising at first:
A SQL query might cause more rows to be read (scanned) than are returned by that query. So, if your query returns N rows, it could cost (much) more than N row reads, depending on what was required to identify the matching rows.
A SQL update statement incurs at least one row read for each row updated. This means that an update statement doesn’t only incur write costs as you might expect.
The documentation also gives some advice for reducing usage on these points, and that’s what I want to unpack in this post. With some care, you can maximize the use of your quotas. And the great news is that these tips also help you boost the overall performance of your queries!
The two pieces of advice I have are:
Avoid full table scans whenever possible.
Maintain pre-computed aggregate values instead of using
If you want to jump straight to a collection of examples, follow along with the SQL code on GitHub.
What is a “full table scan”?
A full table scan describes what a SQL database does when it can’t use an index to find the specific rows needed for a query: it must examine the column data in each and every row of the table to find them. As you can imagine, this is wildly inefficient when a table becomes large! With Turso, a table with N rows that requires a full table scan always incurs a full N row reads, even if you only need a few of them. So obviously, we don’t want to do that.
The SQLite query planner documentation discusses full table scans, if you want to dive into the gory details.
How do I know if my query does a table scan?
SQLite provides a statement to help you analyze a query and determine if it’s going to do a table scan:
EXPLAIN QUERY PLAN (EQP). It’s easy to use. Just prefix your SQL query with those keywords, and it outputs some information about the indexes (or lack of index) that the query would use. For example, to find out if a select statement uses an index, run this:
EXPLAIN QUERY PLAN SELECT columns FROM tablename WHERE filter = "value";
If the output of the “detail” column contains only the text “SCAN tablename”, then it’s going to do a table scan, and incur one row read for each row in the table. But if you see the text “USING INDEX”, it’ll efficiently use an index. For more complicated queries, the output gets more difficult to understand. Note that use of a “COVERING INDEX” can improve the performance of a query, but not necessarily the cost of its reads or writes. You can learn more in the SQLite documentation on EXPLAIN QUERY PLAN.
(One exception to the above: if you explain the query
SELECT 1, you’ll see the text “SCAN CONSTANT ROW”. This is obviously not doing a table scan since there is no table, but rather, it is using a constant number of in-memory rows.)
Note: With EQP, the output from the Turso CLI does not match that of the sqlite3 CLI. sqlite3 massages the underlying tabular data to make it easier to read. You might prefer using that instead of Turso for the purpose of analyzing queries against your schema. The results will be the same. In this post, I’ll show the Turso CLI output.
A detailed example using EXPLAIN QUERY PLAN
Imagine you have the following table and rows representing users, their group membership, and a score:
CREATE TABLE example ( uid TEXT NOT NULL, gid TEXT NOT NULL, score INTEGER NOT NULL ); INSERT INTO example VALUES ('a', 'g1', 10); INSERT INTO example VALUES ('b', 'g1', 15); INSERT INTO example VALUES ('c', 'g2', 5); INSERT INTO example VALUES ('d', 'g2', 10);
With that, say you want to understand if the query
SELECT * FROM example WHERE uid = 'a' performs a table scan to get the one matching row. You can run this:
EXPLAIN QUERY PLAN SELECT * FROM example WHERE uid = 'a';
And you’ll see the following output:
ID PARENT NOTUSED DETAIL 2 0 0 SCAN example
I see “SCAN”! That’s not great. This query will incur 4 row reads just to get the one matching row. We can improve that by adding an index on the
uid column used in the
WHERE clause. Assuming that
uid is actually supposed to be unique:
CREATE UNIQUE INDEX example_uid ON example (uid);
If you run the same EQP again, the output changes:
ID PARENT NOTUSED DETAIL 3 0 0 SEARCH example USING INDEX example_uid (uid=?)
“SEARCH” using the new index is what we want to see! This query now incurs only 1 row read.
But how about this filter on group “g1”?
EXPLAIN QUERY PLAN SELECT * FROM example WHERE gid = 'g1';
I won’t even bother showing the output — we know this is doing a table scan because there is no index that supports filtering on the
gid column. A new index is required to reduce the number of rows read:
CREATE INDEX example_gid ON example (gid);
Run the same EQP after creating the index, and see that it’s all good now.
Should you create indexes for all columns?
Do you think we need another index on the
score column in the example table? An index would help the runtime performance (and memory consumption) when filtering or ordering on that column:
SELECT * FROM example ORDER BY score;
But the index wouldn’t affect the number of rows read. If you never need to filter or sort values in a column, then the index is useless. Not only that, the index incurs additional storage to your overall usage, and slightly reduces the performance of all inserts and deletes on the table.
With Turso, there is a tradeoff between cost and performance when it comes to indexes. Before creating an index, you should consider if that index is going to be useful for your expected queries, and therefore justify its cost.
How to improve the usage of aggregate functions?
Aggregate functions (such as
sum) always incur one read for every row that was considered in the query. This makes query like
SELECT count(*) FROM example increasingly expensive as the table grows. If your app needs to perform aggregate queries often, you should consider taking steps to reduce that cost.
There are two viable options here, and they both involve maintaining a pre-computed tally of the required aggregate values as the table data changes, so that the aggregate costs only one row read instead of many.
Use a transaction in client code to maintain an aggregate
Let’s say you want to maintain a count of rows for some tables in your schema. You can create a new table to store the precomputed count in a single row:
CREATE TABLE table_row_counts ( table_name TEXT UNIQUE NOT NULL, row_count INTEGER NOT NULL ); – initialize each table with a count of 0 INSERT INTO table_row_counts VALUES ('example', 0);
After that, keep it up to date as you perform inserts and deletes on the rows. Typically you would do this in a transaction so there is no chance of an interruption that would cause the count to get out of sync. The pseudocode for the transaction goes like this:
Insert or delete from example
DELETE FROM example WHERE uid = 'b')
Get the number of rows affected for the prior SQL
(e.g. rows = 1)
Update example_count and increment or decrement the c column for the table modified
UPDATE row_count SETrow_count = row_count — $rows WHERE table = 'example')
This places the burden on all of your app code that modifies the row count of a table to keep the count up to date. If you miss insert or delete somewhere, your data will be incorrect. Fortunately, you can automate this record-keeping entirely using a SQLite trigger.
Use a SQLite trigger to maintain an aggregate
SQLite triggers are database operations that are automatically performed when a specified database event occurs. You can set up a trigger that modifies the aggregate value on every insert or delete without having to manage anything at all in your app code.
Taking the example above with table
example to be counted and
table_row_counts maintaining the count, you can set up triggers to automatically keep
table_row_counts up to date when anything inserts or deletes happen in
CREATE TRIGGER insert_row_count_example AFTER INSERT ON example FOR EACH ROW BEGIN UPDATE table_row_counts SET row_count = row_count + 1 WHERE table_name = 'example'; END; CREATE TRIGGER delete_row_count_example AFTER DELETE ON example FOR EACH ROW BEGIN UPDATE table_row_counts set row_count = row_count - 1 WHERE table_name = 'example'; END;
Assuming these triggers were created at the same time as the
example table, you can be sure that
table_row_counts stays up to date with the latest counts.
What about filtered aggregates?
The above example works well when all you want to know is the row count for the entire table. If you want a filtered count, such as the following query that gets you the number of users in a specific group:
SELECT count(*) FROM example WHERE gid = 'g1';
row_count table above will be of no use here. You’ll need another table to store the aggregates for each group, and another pair of triggers that keeps them up to date. You can see that pre-computing aggregates gives you a big speed and cost boost, but you have to know which aggregates your app needs ahead of time in order to get the benefit, then design your schema for those use cases.
You can see a specific example of this, and more examples of triggers that maintain precomputed aggregates, in the companion repo for this post.
Downsides to maintaining aggregate values
So you’ve decided to maintain aggregate values in order to save on your Turso usage: good call! Just be aware that there are some related costs, and you should account for these when it comes time to estimating your total usage in production.
Firstly, you should be aware that each table operation that requires maintaining another precomputed table with aggregates incurs additional writes for that maintenance. So, you are saving on the cost of reads at the expense of additional writes. This makes a lot of sense if you intend to do a lot of reading and not as much writing on that table. But for write-heavy tables with a need for fast, frequent aggregates, you might end up incurring more usage than is beneficial. You’ll have to decide ahead of time if this strategy actually helps your actual use cases.
Secondly, it makes your write operations perform slightly worse. If write performance is important for your app, you might want to run some benchmarks to make sure the additional writes from the triggers don’t end up slowing down your app’s perceived performance too much.
Maintaining aggregates — is it worthwhile?
As with many decisions in life, there’s no one-size-fits-all approach for all situations for all time. You’ll have to decide if it’s beneficial to take time to improve the usage and performance of your aggregates, at the cost of additional writes and slower writes. Turso (and SQLite) excels with read-heavy workloads, and not so much with write-heavy. So you’ll have to take a guess at what your application actually does in production to figure out what the actual savings is going to be.
See some examples
Hop on over to the code on GitHub that contains plenty of examples of the use of indexes and triggers that maintain aggregates (especially filtered counts and averages) that are cheap to query.
If you’re interested in talking about this with other Turso enthusiasts, I invite you to also join the Discord and post your questions in the help channel. We love a good technical discussion!