Using local AI models
LLMs (Large Language Models) can be used to generate code from natural language questions. Popular examples include GitHub Copilot, OpenAI's GPT-4 or Meta's Code Llama.
DuckDB-NSQL is a Text-to-SQL model created by NumberStation for MotherDuck. It's hosted on HuggingFace and on Ollama, and can be use with different LLM runtimes. There are also some nice blog posts that are worth a read:
The model was specifically trained for the DuckDB SQL syntax with 200k DuckDB Text-to-SQL pairs, and is based on the Llama-2 7B model.
Bring your own AI
If you want to enable SQL Workbench's privacy first AI integration for SQL generation, you first have to install Ollama on your local machine.
Once you installed Ollama, you can either download the relevant DuckDB-NSQL model beforehand, or have it automatically downloaded on the first usage. If you want to pull the model yourself, you can do this in your terminal (after you installed Ollama) by issuing the following command:
Please be aware that the default model has a size of 3.8GB, which can take a while to download, depending on your internet connection speed. There are smaller quantized models as well, but be aware that the answer quality might be lower with them.
Once the model is downloaded, Ollama can be started from your terminal:
Setting the `OLLAMA_ORIGINS` environment variable to
https://sql-workbench.com
is necessary to enable CORS from the SQL Workbench running in your browser for your locally running Ollama server.You can enable the AI feature in SQL Workbench:

You need to create a local table first, as the AI model needs a schema to work with. For example, you can create a table with AWS Edge Locations by using a remote dataset:
Then, you can ask your questions after a specific comment string like below:
To execute the prompt, you have two options:
- Press
ALT + g
to generate the SQL - Press
ALT + r
to run the generated SQL directly

The generated SQL is automatically inserted below the closest prompt comment string. In case you have multiple comment strings in the current SQL Workbench tab, the one closest to the actual cursor position is used.
IMPORTANT
You can also ask questions regarding about remote files (Parquet, CSV), but don't expect the answer quality to be very good, because the schema will be unknown to the model.
The best practice would be to create a table first, and then ask questions about the data in that table.
Explore an example dataset with AI
AWS publishes its Service Authorization Reference documentation, and there's a Github repository that transforms the published data automatically to Parquet, CSV, JSON and DuckDB database formats every night at 4AM UTC.
You first need to run the below statements to create a database structure:
You can then explore the data with the DuckDB-NSQL model. For example, you can ask the model to show all services that contain 'S3' in their name:
The model will generate an appropriate SQL statement, and execute it:

Other promt examples
Prompt: Count the action names of services whose name contains 'S3':
Result:

Prompt: Count the action names of services whose name contains 'S3':
Result:

Prompt: Show all 'Write' access level actions for service 'Amazon S3':
Result:

Prompt: Show first 10 actions for service 'Amazon CloudFront':
Result:

Prompt: Show service name with least actions:
Result:

Prompt: Show service with most resource types:
Result:

Prompt: Show the count of actions of service 'Amazon S3':
Result:

Prompt: Show the top 10 resource types by service:
Result:

Summary
Using a locally hosted LLM together with an in-browser SQL Workspace enables a cost-effective and privacy-friendly way to use state of the art tools without needing to rely on third party services.
Demo video
There's a demo video on YouTube that showcases the AI features of SQL Workbench.