Capturing Unique Data with DISTINCT
In SQL, DISTINCT commands are used in the SELECT Statement to get data without duplication. If there is the same data in a column and we just want to display different values, DISTINCT makes sure each value appears once in the query results.
DISTINCT is placed after the SELECT keyword and before the name of the column you want to retrieve. This command tells MySQL to display only the unique data from that column and delete rows that have the same value.
The syntax for using DISTINCT is as follows:
SELECT DISTINCT column_name FROM table_name;
- DISTINCT: instructs SQL to display unique values in the specified columns.
- column_name: The name of the column that you want to uniquely retrieve data from.
- FROM: indicates the table from which the data originated.
For example, we want to get unique data from the vend_id columns in the products table. Without using DISTINCT, if there are multiple products from the same vendor, then each vendor ID will appear multiple times in the query results. With DISTINCT, we can make sure the results only show each vendor ID once.
Examples are as follows:
SELECT DISTINCT vend_id FROM products;
If the products table has data like this:
| vend_id |
|---------|
| 1001Â Â Â |
| 1001Â Â Â |
| 1002Â Â Â |
| 1003Â Â Â |
| 1003Â Â Â |
| 1003Â Â Â |
| 1004Â Â Â |
The results of the query will only show the unique vendor ID, so it will look like this:
| vend_id |
|---------|
| 1001Â Â Â |
| 1002Â Â Â |
| 1003Â Â Â |
| 1004Â Â Â |
Important Notes
- DISTINCT applies to the combination of columns selected in the query. If more than one column is used with DISTINCT, then SQL will ensure that the combination of columns is unique.
- DISTINCT only removes duplicates from the specified columns and does not affect other columns in the table.
Using Qualified Column Names
Using fully qualified column names (in the table.column format) in SQL has several important benefits, especially when retrieving data from tables that may have columns with the same name.
Here are some of the advantages of using fully qualified column names:
- With the table.column format, you indicate which table the column is coming from. This is especially useful when working with multiple tables in a single query, such as when doing JOIN. This clarity reduces confusion about which column is being referenced.
- If there are two or more tables that have columns with the same name, using fully qualified column names can prevent conflicts. For example, if you have products and orders tables that each have prod_id columns, you can use products.prod_id and orders.prod_id to differentiate between the two.
- By including table names in column references, SQL code becomes easier to understand and maintain by other developers or by themselves in the future. This helps in ongoing documentation and development.
Conclusion
Learning SELECT statements in MySQL is essential for managing data properly and securely. With the ability to retrieve specific data, flexibility in selecting columns, maintaining clarity, and improving performance, you can improve the efficiency and quality of your projects. Following best practices such as using complete column names and consistent formatting is also beneficial for documentation and troubleshooting in SQL code.