Skill v1.0.1
currentAutomated scan100/100+6 new
version: "1.0.1" name: chdb-sql description: >- Use when the user wants to run SQL — especially analytical SQL — on local files (parquet/csv/json), URLs, S3 paths, or remote databases (Postgres, MySQL, MongoDB, ClickHouse Cloud, Iceberg, Delta Lake) without setting up a server. Provides chDB — embedded ClickHouse SQL in Python with 1000+ functions, Session for stateful multi-step pipelines, parametrized queries, and cross-source joins via s3(), mysql(), postgresql(), iceberg(), deltaLake(), remoteSecure() table functions. TRIGGER when: user wants SQL on parquet/csv/files or across remote analytical sources; uses ClickHouse SQL features (window functions, windowFunnel, geoToH3, JSON path ops, Session, parametrized queries); imports chdb or calls chdb.query(). SKIP this skill for pandas-style DataFrame method-chaining (use chdb-datastore instead) or ClickHouse server administration. license: Apache-2.0 compatibility: Requires Python 3.9+, macOS or Linux. pip install chdb. metadata: author: chdb-io version: "4.1" homepage: https://clickhouse.com/docs/chdb
chdb SQL — ClickHouse in Your Python Process
Run ClickHouse SQL directly in Python — no server needed. Query local files, remote databases, and cloud storage with full ClickHouse SQL power.
pip install chdb
Decision Tree: Pick the Right API
1. One-off query on files or databases → chdb.query()2. Multi-step analysis with tables → Session3. DB-API 2.0 connection → chdb.connect()4. Pandas-style DataFrame operations → Use chdb-datastore skill instead
chdb.query() — One Line, Any Data
import chdbchdb.query("SELECT * FROM file('data.parquet', Parquet) WHERE price > 100 LIMIT 10") # local fileschdb.query("SELECT * FROM mysql('db:3306', 'shop', 'orders', 'root', 'pass')") # databaseschdb.query("SELECT * FROM s3('s3://bucket/data.parquet', NOSIGN) LIMIT 10") # cloud storagechdb.query("SELECT * FROM deltaLake('s3://bucket/delta/table', NOSIGN) LIMIT 10") # data lakes# Cross-source joinchdb.query("""SELECT u.name, o.amount FROM mysql('db:3306', 'crm', 'users', 'root', 'pass') AS uJOIN file('orders.parquet', Parquet) AS o ON u.id = o.user_id ORDER BY o.amount DESC""")data = {"name": ["Alice", "Bob"], "score": [95, 87]}chdb.query("SELECT * FROM Python(data) ORDER BY score DESC") # Python datadf = chdb.query("SELECT * FROM numbers(10)", "DataFrame") # output formatschdb.query("SELECT toDate({d:String}) + number FROM numbers({n:UInt64})","DataFrame", params={"d": "2025-01-01", "n": 30}) # parametrized
Table functions → table-functions.md | SQL functions → sql-functions.md | Full API → api-reference.md
Session — Stateful Analysis Pipelines
from chdb import session as chssess = chs.Session("./analytics_db") # persistent; Session() for in-memorysess.query("CREATE TABLE users ENGINE=MergeTree() ORDER BY id AS SELECT * FROM mysql('db:3306','crm','users','root','pass')")sess.query("CREATE TABLE events ENGINE=MergeTree() ORDER BY (ts,user_id) AS SELECT * FROM s3('s3://logs/events/*.parquet',NOSIGN)")sess.query("""SELECT u.country, count() AS cnt, uniqExact(e.user_id) AS usersFROM events e JOIN users u ON e.user_id = u.idWHERE e.ts >= today() - 7 GROUP BY u.country ORDER BY cnt DESC""", "Pretty").show()sess.close()
Connection API (DB-API 2.0)
from chdb import dbapiconn = dbapi.connect()cur = conn.cursor()cur.execute("SELECT * FROM file('data.parquet', Parquet) WHERE value > 100")print(cur.fetchall())cur.close()conn.close()
Troubleshooting
| Problem | Fix | |
|---|---|---|
ImportError: No module named 'chdb' | pip install chdb | |
DB::Exception: FILE_NOT_FOUND | Check file path; use absolute path or verify cwd | |
DB::Exception: Unknown table function | Check function name spelling (e.g., deltaLake not deltalake) | |
| Connection refused to remote DB | Check host:port format; ensure remote DB allows connections | |
| Environment check | Run python scripts/verify_install.py (from skill directory) |
References
- API Reference — query/Session/connect signatures
- Table Functions — All ClickHouse table functions
- SQL Functions — Commonly used SQL functions
- Examples — 9 runnable examples with expected output
- Official Docs
Note: This skill teaches how to use chdb SQL.For pandas-style operations, use thechdb-datastoreskill.For contributing to chdb source code, see CLAUDE.md in the project root.