SQL Tutorial: How To Use ROW NUMBER() and Why

in STEMGeeks3 years ago

No matter what data we receive, we will probably be required to order it a certain way at some point. In some cases, we may simply order data by dates and receive it in that format, meaning we have little to do. But there are other times when we have to order those data sets for other reasons - possibly to identify duplicates, to order the dated data with other patterns ordered, etc. In the video, SQL Basics: How To Use ROW NUMBER() and Why, we look at a built in function in most SQL languages that allows us to quickly order data by a column or more columns.

ROW_NUMBER() (now referred to as RN) provides us with the functionality to order data by the column or columns that we specify. Suppose that we have 10 dates: we can order these ascending or descending using this functionality and we can even include additional columns if we wanted to add to the order condition. This is one way that we can get the median of a data set (even easier if there are an odd number of values), and it allows us to use other math operations on a data set as well.

As mentioned in the video, we can sometimes use ordering data to calculate values (ie: median was mentioned as one outcome that we can get with using this functionality). In the case of median, as an example, if we order data ascending and descending by row number with a data set that has an odd number of values, when the values equal each other, we've found the median (note that this is not the case with an even number set of data points). This highlights an example of how we can find the median with an odd set of values using the ROW_NUMBER() functionality. We can also use this functionality (along with other functionality) to identify duplicates because a duplicate is ultimately an "identical" value to something else - thus two rows that are identical are really the same order, even if the ROW_NUMBER() functionality will list them as two (this is why other functionality such as PARTITION BY are key, as we want to then divide by identical values to find them).