This chapter covers the fundamentals of SQL querying, focusing on SELECT statements, filtering, sorting, and joining tables. For the DP-900 exam, approximately 15-20% of questions touch on SQL querying basics, including reading and writing queries against Azure SQL Database, Azure Synapse Analytics, and other relational data services. Mastery of SELECT, WHERE, JOIN, GROUP BY, and ORDER BY is essential, as these appear in both multiple-choice and case-study scenarios.
Jump to a section
Imagine a library with thousands of books organized on shelves. The library has a card catalog — a set of drawers filled with index cards. Each card represents one book and contains fields: Title, Author, Subject, and Shelf Location. To find books, you ask a librarian a query. For example, 'Find all books by J.K. Rowling published after 2000.' The librarian does not walk the entire library pulling books off shelves. Instead, she goes to the card catalog, opens the Author drawer, finds all cards for Rowling, then filters by year using the Date field on those cards, and finally records the Shelf Locations. She then retrieves only those specific books. This is exactly how SQL queries work on a relational database: the 'card catalog' is the table's structure (schema) and indexes. The query specifies conditions (WHERE clause), which the database engine uses to navigate indexes or scan tables efficiently. The result set is like the stack of books the librarian hands you. If you ask for all books (SELECT *), the librarian must pull every card — a full table scan. Similarly, a query without a WHERE clause on a large table can be slow. The librarian can also sort the cards (ORDER BY) or count them (COUNT). The key point: the librarian uses the catalog to avoid touching every book, just as the database uses indexes to avoid scanning every row.
What is SQL and Why Does It Exist?
SQL (Structured Query Language) is the standard language for managing and querying relational databases. It was developed in the 1970s by IBM and later standardized by ANSI and ISO. SQL is declarative: you specify *what* data you want, not *how* to retrieve it. The database engine's optimizer decides the execution plan — which indexes to use, join algorithms, and data access paths. This abstraction allows you to write queries without knowing the physical storage details.
SELECT Statement Structure
The fundamental SQL statement is SELECT. Its basic syntax:
SELECT column1, column2, ...
FROM table_name
WHERE condition
GROUP BY column
HAVING group_condition
ORDER BY column [ASC|DESC];SELECT specifies columns to return. Use * for all columns (avoid in production for performance).
FROM identifies the source table(s).
WHERE filters rows before grouping.
GROUP BY groups rows with same values in specified columns.
HAVING filters groups after aggregation.
ORDER BY sorts the result set.
The logical order of execution is: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY. Understanding this order helps debug unexpected results.
Filtering with WHERE
WHERE clause uses predicates:
- Comparison: =, <>, <, >, <=, >= - Range: BETWEEN (inclusive) - List: IN (value1, value2, ...) - Pattern: LIKE with wildcards: % (any sequence), _ (single char) - NULL: IS NULL, IS NOT NULL (never use = NULL) - Logical: AND, OR, NOT Example: Find products with price between $10 and $50 or category 'Electronics':
SELECT ProductID, Name, Price
FROM Products
WHERE (Price BETWEEN 10 AND 50) OR Category = 'Electronics';Performance tip: Avoid leading wildcards in LIKE (e.g., '%value') because they prevent index usage.
Sorting with ORDER BY
ORDER BY sorts by one or more columns, each with ASC (default) or DESC. You can also sort by column alias or ordinal position (not recommended). Example:
SELECT FirstName, LastName, HireDate
FROM Employees
ORDER BY HireDate DESC, LastName ASC;Sorting is expensive on large result sets; only sort what you need.
Aggregation and GROUP BY
Aggregate functions: COUNT, SUM, AVG, MIN, MAX. They operate on a set of rows and return a single value. When used with GROUP BY, they produce one row per group. Columns in SELECT not in aggregate must be in GROUP BY. Example: Count orders per customer:
SELECT CustomerID, COUNT(OrderID) AS OrderCount
FROM Orders
GROUP BY CustomerID;To filter groups, use HAVING:
SELECT CustomerID, COUNT(OrderID) AS OrderCount
FROM Orders
GROUP BY CustomerID
HAVING COUNT(OrderID) > 5;WHERE filters rows before grouping; HAVING filters after. Both can appear together.
Joining Tables
JOIN combines rows from two or more tables based on a related column. Types: - INNER JOIN: returns rows with matching values in both tables. - LEFT OUTER JOIN: returns all rows from left table, matched rows from right; NULLs where no match. - RIGHT OUTER JOIN: opposite of LEFT. - FULL OUTER JOIN: returns all rows from both; NULLs where no match. - CROSS JOIN: Cartesian product. Syntax:
SELECT e.EmployeeID, e.Name, d.DepartmentName
FROM Employees e
INNER JOIN Departments d ON e.DepartmentID = d.DepartmentID;The ON clause specifies the join condition. You can join multiple tables, and use aliases for readability.
Subqueries
A subquery is a query nested inside another query. It can return a scalar (single value), a single column, or a table. Subqueries appear in WHERE, HAVING, FROM, or SELECT clauses. Example: Find products with price above average:
SELECT ProductName, Price
FROM Products
WHERE Price > (SELECT AVG(Price) FROM Products);Correlated subqueries reference outer query columns and execute for each row.
Set Operations
UNION, INTERSECT, EXCEPT combine results of two queries. UNION removes duplicates; UNION ALL keeps duplicates. The number and data types of columns must match.
SELECT City FROM Customers
UNION
SELECT City FROM Suppliers;NULL Handling
NULL represents unknown or missing data. Comparisons with NULL yield unknown (not true/false). Use IS NULL. In aggregate functions, NULLs are ignored (except COUNT(*)). In GROUP BY, NULLs form a group.
String and Date Functions
Common functions: - String: CONCAT, SUBSTRING, LEN, UPPER, LOWER, TRIM, REPLACE - Date: GETDATE, DATEADD, DATEDIFF, YEAR, MONTH, DAY - Numeric: ROUND, ABS, CEILING, FLOOR Example: Get orders from last 30 days:
SELECT OrderID, OrderDate
FROM Orders
WHERE OrderDate >= DATEADD(day, -30, GETDATE());Performance Considerations
Use indexes on columns in WHERE, JOIN, and ORDER BY.
Avoid SELECT *; specify only needed columns.
Use EXISTS instead of IN for large subqueries when checking existence.
Avoid functions on indexed columns in WHERE (e.g., WHERE YEAR(Date)=2020) — rewrite as range.
Use appropriate data types (e.g., INT vs VARCHAR for IDs).
Azure-Specific SQL
Azure SQL Database and Azure Synapse Analytics support standard SQL with some differences. Synapse uses a distributed query engine; some features like SELECT INTO are used differently. Always check documentation for compatibility.
1. Write SELECT Clause
Identify the columns you need. Use explicit column names (e.g., FirstName, LastName) instead of * to reduce data transfer and improve performance. If using expressions or aggregations, give them an alias with AS. The SELECT clause determines the shape of the result set. In Azure SQL Database, you can also use TOP to limit rows.
2. Write FROM Clause
Specify the source table(s). Use table aliases (e.g., e for Employees) for readability, especially with joins. The FROM clause identifies the rows that will be processed. For multiple tables, this is where you list them before applying joins in the WHERE or JOIN clause.
3. Add WHERE Filter
Apply row-level filters to reduce the number of rows processed. Use predicates with indexes in mind. For example, WHERE OrderDate >= '2020-01-01' can use an index on OrderDate. Avoid wrapping columns in functions (e.g., YEAR(OrderDate) = 2020) as it prevents index seeks. Combine conditions with AND/OR, using parentheses for clarity.
4. Group and Aggregate
If needed, use GROUP BY to group rows with same values in specified columns. Then apply aggregate functions (SUM, COUNT, AVG, etc.) to each group. Remember: any column in SELECT not in an aggregate must be in GROUP BY. Use HAVING to filter groups after aggregation. HAVING is similar to WHERE but for groups.
5. Order Results
Use ORDER BY to sort the final result set. Sorting is applied after all other operations. It can be expensive on large sets; consider limiting rows with TOP or OFFSET-FETCH. You can sort by multiple columns, each with ASC or DESC. Column aliases are allowed.
6. Execute and Verify
Run the query and examine the result set. Check row count, data types, and expected values. Use SET STATISTICS IO ON in SQL Server Management Studio to see logical reads. If performance is poor, examine the execution plan to identify missing indexes or inefficient joins. Tune by adding indexes or rewriting the query.
Enterprise Scenario 1: E-Commerce Order Reporting
A large online retailer uses Azure SQL Database to store orders. The reporting team needs a daily query to list top 10 customers by total spend in the last 30 days. The query: SELECT TOP 10 CustomerID, SUM(TotalAmount) AS TotalSpend FROM Orders WHERE OrderDate >= DATEADD(day, -30, GETDATE()) GROUP BY CustomerID ORDER BY TotalSpend DESC. With millions of orders, performance issues arise because the WHERE clause uses a function on OrderDate. The fix: rewrite as WHERE OrderDate >= '2024-01-01' (using a parameter) and create a nonclustered index on OrderDate including TotalAmount. This reduces scan to seek. The DBA also partitions the table by month to further speed up date-range queries.
Scenario 2: Healthcare Patient Records
A hospital uses Azure Synapse Analytics for big data analytics. Doctors need to find patients with multiple visits in the past year and their average length of stay. The query joins Visits and Patients tables. A common mistake is using SELECT * and then filtering in application code. Instead, the query should only retrieve necessary columns. Additionally, because Synapse is MPP, join distribution keys must align to avoid data movement. The DBA ensures both tables are distributed on PatientID (hash distribution). The query: SELECT p.PatientID, p.Name, COUNT(v.VisitID) AS VisitCount, AVG(DATEDIFF(day, v.AdmitDate, v.DischargeDate)) AS AvgStay FROM Patients p INNER JOIN Visits v ON p.PatientID = v.PatientID WHERE v.AdmitDate >= '2023-01-01' GROUP BY p.PatientID, p.Name HAVING COUNT(v.VisitID) > 1 ORDER BY VisitCount DESC.
Scenario 3: Financial Transactions
A bank uses Azure SQL Database for transaction processing. A compliance query must find all transactions over $10,000 in the last 24 hours. The query: SELECT * FROM Transactions WHERE Amount > 10000 AND TransactionDate >= GETDATE()-1. However, GETDATE() is evaluated per row in some contexts? Actually, it's evaluated once, but the date arithmetic may not be sargable. Better: declare a variable @Cutoff = DATEADD(day, -1, GETDATE()) and use WHERE TransactionDate >= @Cutoff. Also, index on (TransactionDate, Amount) helps. Misconfiguration: forgetting to include INCLUDE columns in index leads to key lookups. The DBA monitors for blocking and deadlocks during peak hours.
DP-900 Objective 2.3: SQL Querying Basics
The exam expects you to read and write basic SELECT queries, filter with WHERE, sort with ORDER BY, group with GROUP BY, and join tables. You may be given a scenario and asked which query returns the correct result. Key areas: - WHERE clause operators: =, <>, <, >, BETWEEN, IN, LIKE, IS NULL. Trap: candidates think = NULL works; it does not. Use IS NULL. - ORDER BY default: ASC. Trap: some think default is DESC. - GROUP BY requirement: any non-aggregated column in SELECT must be in GROUP BY. Trap: forgetting this causes error. - HAVING vs WHERE: WHERE filters rows before grouping; HAVING filters groups after. Trap: using WHERE with aggregate functions (e.g., WHERE COUNT(*) > 1) — not allowed. - JOIN types: INNER, LEFT, RIGHT, FULL. Trap: confusing LEFT and RIGHT. Remember: LEFT keeps all rows from left table. - NULL in aggregates: COUNT(*) counts all rows; COUNT(column) ignores NULLs. Trap: thinking COUNT(column) counts NULLs. - String concatenation: In SQL Server, use + or CONCAT. Trap: Oracle uses ||, but DP-900 is Microsoft-focused. - Date functions: DATEDIFF, DATEADD, GETDATE. Trap: using YEAR(OrderDate) instead of range query. - TOP and OFFSET-FETCH: TOP with ORDER BY; OFFSET-FETCH for paging. Trap: TOP without ORDER BY arbitrary. - Subqueries: scalar vs multi-valued. Trap: using = with subquery returning multiple rows.
Common Wrong Answers
WHERE Price = NULL – Always wrong. Use IS NULL.
HAVING without GROUP BY – Valid only if entire table is one group, but often confused.
ORDER BY column alias in WHERE – ORDER BY is evaluated after SELECT, so alias is not available in WHERE.
Using IN with subquery that returns NULL – IN with NULL behaves unexpectedly; use EXISTS.
Assuming LEFT JOIN returns fewer rows than INNER JOIN – Actually LEFT JOIN can return more (non-matching left rows).
Exam Tips
Know the logical order of operations: FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY.
Practice writing queries for Azure SQL Database (same as SQL Server).
For Synapse, understand that some features like SELECT INTO are used for CTAS.
Read the question carefully: 'Which query returns the correct result?' often tests subtle syntax errors.
SQL is declarative: specify what data you want, not how to retrieve it.
Logical order of execution: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY.
Use IS NULL, not = NULL, to check for null values.
GROUP BY requires all non-aggregate columns in SELECT to be listed in GROUP BY.
HAVING filters groups after aggregation; WHERE filters rows before.
INNER JOIN returns only matching rows; LEFT JOIN returns all rows from left table.
COUNT(*) counts all rows; COUNT(column) counts non-NULL values.
Avoid SELECT * in production; specify columns explicitly.
Use indexes on columns in WHERE, JOIN, and ORDER BY for performance.
In Azure SQL Database, use TOP with ORDER BY to limit rows; OFFSET-FETCH for paging.
These come up on the exam all the time. Here's how to tell them apart.
WHERE
Filters rows before grouping
Cannot use aggregate functions directly
Applied to individual rows
Used with GROUP BY or alone
Executed before GROUP BY
HAVING
Filters groups after grouping
Used with aggregate functions (e.g., HAVING COUNT(*) > 5)
Applied to groups
Requires GROUP BY (or entire table as one group)
Executed after GROUP BY
Mistake
NULL = NULL is true
Correct
NULL is not equal to anything, including another NULL. Use IS NULL or IS NOT NULL. Comparisons with NULL yield UNKNOWN, which is treated as false in WHERE.
Mistake
ORDER BY defaults to DESC
Correct
ORDER BY defaults to ASC. You must explicitly specify DESC for descending order.
Mistake
HAVING can be used without GROUP BY
Correct
HAVING can be used without GROUP BY if the SELECT contains only aggregate functions (treats entire table as one group). However, if you have non-aggregate columns, you must use GROUP BY.
Mistake
COUNT(*) and COUNT(column) are identical
Correct
COUNT(*) counts all rows including those with NULLs. COUNT(column) counts only non-NULL values in that column.
Mistake
INNER JOIN returns all rows from both tables
Correct
INNER JOIN returns only rows with matching values in both tables. Rows without a match are excluded.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
WHERE filters rows before any grouping occurs, so you cannot use aggregate functions (like COUNT, SUM) in WHERE. HAVING filters groups after grouping, so you can use aggregate functions. For example, to find customers with more than 5 orders, you use HAVING COUNT(OrderID) > 5 after GROUP BY CustomerID. WHERE would be used to filter orders by date before grouping.
Use IS NULL or IS NOT NULL to test for NULL. Never use = NULL or <> NULL because comparisons with NULL yield UNKNOWN, which is treated as false in WHERE. In aggregate functions, COUNT(column) ignores NULLs, but COUNT(*) counts all rows. In GROUP BY, NULLs form a separate group. In ORDER BY, NULLs are sorted first (or last depending on settings).
INNER JOIN returns only rows that have matching values in both tables. LEFT JOIN returns all rows from the left table, and matching rows from the right table; if no match, right table columns are NULL. For example, if you want all customers including those without orders, use LEFT JOIN. If you want only customers with orders, use INNER JOIN.
Use ORDER BY column_name DESC. The default is ASC (ascending). You can sort by multiple columns, each with its own direction, e.g., ORDER BY LastName ASC, FirstName DESC.
No, because WHERE is evaluated before SELECT, so column aliases defined in SELECT are not available in WHERE. You must use the original column name or expression. However, you can use alias in ORDER BY because ORDER BY is evaluated after SELECT.
COUNT(*) counts the number of rows in the result set, including rows with NULL values. COUNT(column) counts the number of non-NULL values in that specific column. For example, if a table has 10 rows but 2 have NULL in a column, COUNT(*) returns 10 and COUNT(column) returns 8.
Use SELECT TOP (number) column_list FROM table ORDER BY ... to return the first 'number' rows. For paging, use OFFSET-FETCH: SELECT ... ORDER BY ... OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY. TOP without ORDER BY returns arbitrary rows.
You've just covered SQL Querying Basics — now see how well it sticks with free DP-900 practice questions. Full explanations included, no account needed.
Done with this chapter?