Skip to main content

Identifying expensive queries by memory usage in ClickHouse

Learn how to use the `system.query_log` table to find the most memory-intensive queries in ClickHouse, with examples for clustered and standalone setups.

Using the system.query_log table

The following useful query shows which of your executed queries used the most memory.

A couple of comments about this query:

  • the results are computed from the past day (now() - toIntervalDay(1))) but you can easily modify the time interval
  • it assumes you have a cluster named default, which is the name of your cluster in ClickHouse Cloud. Change default to the name of your cluster
  • if you do not have a cluster, see the query listed at the end of this article
SELECT
    count() as nb_query,
    user,
    query,
    sum(memory_usage) AS memory,
    normalized_query_hash
FROM
    clusterAllReplicas(default, system.query_log)
WHERE
    (event_time >= (now() - toIntervalDay(1)))
    AND query_kind = 'Select'
    AND type = 'QueryFinish'
    and user != 'monitoring-internal'
GROUP BY
    normalized_query_hash,
    query,
    user
ORDER BY
    memory DESC;

The response looks like:

┌─nb_query─┬─user────┬─query─────────────────────────────────────────────────────────┬───memory─┬─normalized_query_hash─┐
│       11 │ default │ select version()                                              │ 46178924 │   7202516440347714159 │
│        2 │ default │ SELECT * FROM "system"."table_functions" LIMIT 31 OFFSET 0    │  8391544 │  12830067173062987695 │
└──────────┴─────────┴───────────────────────────────────────────────────────────────┴──────────┴───────────────────────┘
Note

If you do not have a system.query_log table, then you likely do not have query logging enabled. View the details of the query_log setting for details on how to enable it.

Note
query_log contains more than the queries you submitted

A single submitted query can appear as several rows: the initial query (is_initial_query = 1) plus derivative or internal steps (is_initial_query = 0), such as secondary queries for distributed execution or the internal queries used to evaluate views. Those internal rows have their own query_id values — assigned by ClickHouse, and sometimes including a label such as queryView... — so don't assume every query_id matches what your client reported. To attribute usage to the query as you ran it, filter on is_initial_query = 1 or match query_id = initial_query_id. See How to identify the most expensive queries and the query_log reference for details.

If you do not have a cluster, use can just query your one system.query_log table directly:

SELECT
    count() as nb_query,
    user,
    query,
    sum(memory_usage) AS memory,
    normalized_query_hash
FROM
    system.query_log
WHERE
    (event_time >= (now() - toIntervalDay(1)))
    AND query_kind = 'Select'
    AND type = 'QueryFinish'
    and user != 'monitoring-internal'
GROUP BY
    normalized_query_hash,
    query,
    user
ORDER BY
    memory DESC;
· 2 min read