data modeling

How to optimize SQL queries: 12 techniques to follow

Cloud databases scale effortlessly, but that scale comes at a price. Poorly written SQL can slow things down, inflate costs, and frustrate your team. Whether you're analyzing customer data or powering dashboards, efficient queries matter. 

This guide covers SQL query optimization techniques that reduce runtime, minimize compute usage, and improve overall performance. From smarter indexing to better joins, each tip is practical and grounded in real-world usage. If you care about speed, cost, or clean SQL, these habits are worth building into your workflow.

How to optimize SQL queries? 

Improving SQL query performance isn’t just about shaving off a few milliseconds. It’s about building queries that scale—fast, reliable, and resilient under pressure. Whether you're dealing with millions of rows or just trying to reduce lag in a dashboard, these practices help your queries run faster and your systems run smoother.

Here are SQL query optimization techniques worth building into your workflow.

1. Use indexes effectively in relational databases

Think of indexes like a table of contents. They help you skip straight to what you need without flipping through every page.

In relational databases like MySQL or Postgres, indexes often show up in two ways: primary keys (which uniquely identify rows) and mapping tables (which help you join data back to other tables). Both speed up queries by preventing full table scans.

That said, if you’re using a cloud data warehouse like Snowflake or Redshift, indexes don’t work the same way. These systems automatically distribute and partition data on load. To get better performance, sort your data on commonly queried columns and consider setting distribution keys manually if needed.

The three types of indexes to know:

1. Clustered index
  • What it is: Sorts and stores the actual table data in the order of the indexed column.

  • Good for: Range queries and primary key lookups.

  • Example: In a sales table, clustering by order_date can speed up time-based queries.

  • Visual idea: Imagine a phonebook organized by last name, data is stored in the same order it's indexed.

Pro tip: Clustered indexes are generally faster and lighter; use them when you can.

2. Non-clustered index
  • What it is: Creates a separate structure that points to the rows in the table.

  • Good for: Quick lookups on columns that aren't part of the clustered index.

  • Example: You might have a non-clustered index on customer_id if the table is clustered by order_date.

  • Visual idea: Think of a book’s index at the back. It lists keywords with page numbers, but the content is elsewhere.

3. Composite index
  • What it is: An index built on multiple columns, used when queries filter or sort by more than one field.

  • Good for: Queries that filter on a combination of fields

  • Example: You might have a composite index on (country, state) if your queries often filter by both fields together, for instance, to find all customers in California, US.

  • Important: The order of columns matters, indexes only work left to right.

  • Visual idea: A nested folder structure, where you search country first, then drill down by state.

Most SQL query optimization benefits come from these three common types, but if you're working with large text fields or search-heavy use cases, there's a fourth worth knowing.

4. Full-text index
  • What it is: An index designed for fast text-based searches across large string fields.

  • Good for: Searching within unstructured text, like product descriptions or support tickets.

  • Example: Running a query for “wireless noise-canceling headphones” in a product catalog.

  • Visual idea: Like a search engine. Less about exact matches, more about finding the best fit.

2. Avoid SELECT * and retrieve only necessary columns

Sure, it’s tempting when you’re exploring data. But SELECT * pulls in everything, including columns you don’t care about, and that slows things down.

Instead, be specific. If your business user only needs name and email, just select those fields. It keeps your queries cleaner, faster, and easier to maintain.

Instead of this:

SELECT * FROM customers.customer_profiles

Do this:

SELECT customer_name, customer_email, customer_phone FROM customers.customer_profiles

Being selective with columns isn't just cleaner, it's a quick win for SQL query performance, especially in warehouses where data scanned = dollars spent.

3. Write smarter JOINs

JOINs are the backbone of SQL, but also one of the easiest places to go wrong.

Bad JOINs can lead to slow queries, duplicated rows, or (worse) incorrect results. So it’s worth knowing which JOIN to use and when.

Use the right type:

  • Inner Join: Only keeps matches. This should be your default.

  • Left Join: Keeps everything from the left table. Use when you need all records from one side.

  • Right Join: Same as left, just flipped. You can almost always rewrite a right join as a left join, it’s easier to read and reason about.

  • Outer Join: Returns everything from both tables, matched or not. Use only when absolutely necessary (read: rarely).

Pro tip: Always join on a column that actually exists and makes sense, ideally, a primary/foreign key. It’s a basic SQL query optimization technique that prevents incorrect results and wasted compute.

Example (Left join):

SELECT 
Profile.customer_name, 
Profile.customer_email, 
Address.home_state 
FROM customers.customer_profiles profile
LEFT JOIN customers.customer_addresses address
ON profile.customer_id = address.customer_id

Need to join more than one field? Go for it, it can cut down on duplicates and speed things up.

Here’s an example with an inner join:

SELECT 
Customer_orders.customer_id, 
Order_details.order_id, 
Order_details.order_date  
FROM customers.customer_orders customer_orders
INNER JOIN orders.order_details order_details
ON customer_orders.customer_id = order_details.customer_id 
AND customer_orders.customer_order_id = order_details.order_id

4. Use CTEs instead of subqueries

CTEs (Common Table Expressions) are like scratchpads for your SQL. They let you break complex queries into readable steps and reuse logic without nesting giant subqueries.

CTEs don’t always perform better, but they make your SQL way easier to debug, explain, and maintain.

Bonus: If you’re working in a team, CTEs are a gift to whoever inherits your queries next, and a solid SQL query optimization technique for improving readability.

SELECT MAX(customer_signup) AS most_recent_signup FROM (SELECT customer_name, customer_phone, customer_signup FROM customer_details WHERE YEAR(customer_signup)=2023)
→ 
WITH 
2023_signups AS (
SELECT 
    customer_name, 
customer_phone, 
customer_signup 
FROM customer_details 
WHERE YEAR(customer_signup)=2023
), 
Most_recent_signup AS (
SELECT 
MAX(customer_signup) AS most_recent_signup 
FROM 2023_signups 
) 
SELECT most_recent_signup FROM Most_recent_signup

5. Don’t retrieve what you don’t need

Every column and row you retrieve costs time and computation, especially in cloud databases, where you’re billed for data scanned, not just what shows up in the result set.

Use LIMIT aggressively when exploring data, especially on large tables:

SELECT customer_name FROM customer_details ORDER BY customer_signup DESC LIMIT 100;

This returns the 100 most recent customers, even if the table has thousands. To skip rows (e.g., for pagination), add an OFFSET:

SELECT customer_name FROM customer_details ORDER BY customer_signup DESC LIMIT 100 OFFSET 20;

If you're building dashboards or reports, be thoughtful about default date ranges, pagination, and how often the data refreshes. Cloud platforms often cache repeat queries, and you can use temporary tables to reduce load, but don’t forget to clean them up.

Pro tip: Use CASE WHEN sparingly on large datasets. Pre-filtering with a CTE or subquery is often cleaner and more efficient.

6. Use stored procedures

Stored procedures let you wrap up complex logic into a callable, reusable block. They’re great when you:

  • Run the same query with different parameters

  • Want to centralize logic instead of repeating it across dashboards

  • Need to cut down network latency between your app and the database

Stored procedures are precompiled, so they’re typically faster to execute and they help keep your SQL DRY (don’t repeat yourself). Just don’t overdo it. They’re ideal for recurring jobs and business logic, but debugging can get tricky at scale.

Here’s how to create a simple stored procedure:

CREATE PROCEDURE find_most_recent_customer
AS BEGIN 
SELECT 
MAX(customer_signup) AS most_recent_signup 
FROM 2023_signups 
END

Then, you can run this procedure using the following command:

EXEC  find_most_recent_customer; 

You can also pass parameters into stored procedures by specifying the column name and datatype. 

CREATE PROCEDURE find_most_recent_customer
@store_id INT AS BEGIN 
SELECT 
MAX(customer_signup) AS most_recent_signup 
FROM 2023_signups 
WHERE store_id= @store_id 
END 

Simply include the column_name that is going to be the parameter using an @ sign and the data type you want it to be passed through. Then, to execute it, you again specify the parameter and its value.

EXEC  find_most_recent_customer @store_id=1188; 

This gives you a clean, reusable way to pass in dynamic values, without rewriting the same logic over and over.

7. Partition and shard when appropriate

Partitioning and sharding both help with performance, but they solve slightly different problems.

Partitioning is when you break a large table into smaller, more manageable pieces based on something like a date, region, or customer type. Each query then only scans the relevant partition instead of the full table, which saves time and computation.

Sharding splits your dataset across multiple machines, often based on a high-cardinality key like customer ID or organization. This is more about scaling horizontally when a single database instance can’t keep up.

Use partitioning when your data volume is growing and queries are slowing down. Use sharding when your infrastructure is the bottleneck and you need to scale reads/writes across nodes.

Just keep in mind: both approaches can complicate joins and indexing, so they’re best used with a clear design and intent.

8. Normalize your tables

Normalization helps reduce redundancy and improve data integrity by structuring your tables into logical layers. The classic approach follows the first three normal forms (or NFs):

  • First normal form (1NF): Make sure each column holds atomic (indivisible) values, and each record is unique. No repeating groups or comma-separated lists in one cell.

Bad: order_items = "hat, socks, shirt"

Good: One row per item in a separate order_items table.

  • Second normal form (2NF): Ensure that every non-key column depends on the whole primary key, not just part of it. This mainly applies to tables with composite keys.

If you're storing customer names in a table keyed on (customer_id, product_id), that violates 2NF because customer name depends only on customer_id.

  • Third normal form (3NF): Eliminate columns that don’t depend directly on the primary key. If orders includes both customer_id and customer_email, move the email to the customers table. That way, customer info only lives in one place.

These principles help you build clean, consistent data models, but going too deep can make querying painful. At scale, over-normalization leads to heavy joins, slower performance, and frustrated analysts.

Many teams adopt a hybrid approach: normalize your source-of-truth data (e.g. users, products), but denormalize for analytics use cases where speed matters more than purity.

It’s about finding that sweet spot where your schema is both accurate and easy to work with.

9. Monitor query performance

You can’t optimize what you can’t measure.

Every cloud database has tools to help here:

  • Snowflake: Query Profiler, Warehouse Monitor

  • BigQuery: Execution details, slot usage

  • Redshift: Query Plan visualizer, Workload Management (WLM)

Pay attention to:

  • Query duration

  • Rows scanned vs. rows returned

  • Spill to disk or out-of-memory errors

These clues will tell you whether you need to rewrite the SQL or revisit your schema. 

10. Prefer UNION ALL over UNION

Both UNION and UNION ALL combine results from multiple queries. But there’s a key difference:

  • UNION removes duplicates

  • UNION ALL keeps everything

That might sound like a small detail, but de-duplicating rows requires a sort operation or a hash aggregation step under the hood, which means more compute and slower performance.

If you know your datasets don’t contain duplicates (or if you don’t care), use UNION ALL. It’s faster and lighter on your system.

Save UNION for when correctness requires deduplication.

11. Use EXISTS, IN, or OR wisely

These are all ways to filter based on conditions in another table, but they don’t always perform the same.

1. EXISTS

Stops checking as soon as it finds a match.

Best when the subquery is large or when you're just checking for existence, not comparing values.

Faster with correlated subqueries or large datasets.

SELECT customer_id
FROM customers c
WHERE EXISTS (
  SELECT 1 
  FROM orders o 
  WHERE o.customer_id = c.customer_id
);

2. IN

Checks if a value is in a list of values from the subquery.

Efficient with small, static lists, but can slow down with large subqueries.

SELECT customer_id
FROM customers
WHERE customer_id IN (
  SELECT customer_id 
  FROM orders
);

3. OR

Easy to write, but often the worst for performance, especially when conditions span multiple columns or involve separate indexes.

If you're using OR across multiple tables or joins, consider breaking it into separate queries with UNION ALL.

SELECT *
FROM orders
WHERE region = 'East' OR region = 'West';

Pro tip: Test alternatives. What performs best can vary based on data size, indexing, and database engine.

12. Use your cloud platform’s native features

Cloud data warehouses are not just “databases in the cloud.” They come with powerful native capabilities that can save time, cut costs, and improve performance if you use them.

Depending on your platform:

  • BigQuery: Take advantage of partitioned and clustered tables, table decorators, and MERGE statements for efficient updates.

  • Snowflake: Use automatic clustering (if needed), result caching, and tasks for scheduling SQL. Leverage Snowflake Streams and Tasks for ELT workflows.

  • Databricks: Make use of Delta Lake features like time travel, schema evolution, and optimized write/read performance.

  • Redshift: Use sort keys and distribution keys wisely. Consider Redshift Spectrum for querying external data in S3.

Rule of thumb: If you’re treating your cloud warehouse like a traditional RDBMS, you’re probably missing out on performance gains. Learn the platform-specific features, it’s worth the investment.

Optimization is a habit, not a hack

SQL optimization isn’t just about writing cleaner queries; it’s about building faster workflows, more reliable dashboards, and data systems that scale without surprising you on the next cloud bill.

The best part? You don’t have to do it alone.

With ThoughtSpot Analyst Studio, you get a purpose-built workspace designed for the modern analyst. From SQL editors with AI Assist to version-controlled notebooks and performance-aware modeling tools, it’s built to help you move fast without compromising accuracy or blowing up compute.

See how ThoughtSpot can speed up your workflows—schedule a demo.

FAQs

1. What is SQL query optimization?

SQL query optimization is the process of improving the structure and logic of SQL statements so they run more efficiently. It involves techniques like filtering early, avoiding unnecessary columns, using indexes, and choosing the right joins to reduce load time and boost performance.

2. Why is SQL query performance important?

Slow SQL doesn’t just cost time, it can increase cloud spend, delay reporting, and frustrate users who depend on dashboards and insights. Optimizing for performance helps your queries scale with your data, whether you're analyzing thousands or billions of rows.

3. What’s the difference between query optimization and indexing?

Indexing is a specific optimization technique that speeds up data retrieval by letting the database find rows faster. But optimization is broader, it includes how you structure joins, use filters, write subqueries, limit rows, and even how you organize your logic with things like CTEs.

4. Do SQL best practices vary by database?

Yes, though many fundamentals apply across systems. For example, filtering early is good practice in both BigQuery and Snowflake. But how each engine handles joins, caching, or partitioning can differ, so it’s worth learning the quirks of your specific platform.

5. How often should I revisit or refactor SQL queries?

Ideally, anytime your data model changes, query performance drops, or a dashboard starts lagging. SQL is rarely “write once and forget.” The best-performing queries are often ones that have been trimmed, tested, and tuned over time.