Previously, we've look at the join functionality of LEFT JOIN. If we recall from that lesson, we learned that a LEFT JOIN will join records that match and don't match on the condition specified. Since we're using the same data and schema over and over again for easily remembering the result, we'll remember that two of our records in Table1 matched with Table2, but two results didn't match, so we saw nulls as the result. When there isn't a match (because the records don't exist), we get a null as the return. What if we only want to get the records in Table1 that don't match in Table2? In the video, SQL Basics: How To Use A LEFT ANTI JOIN and Why, we look at accomplishing this challenge using the LEFT ANTI JOIN. Since not all SQL languages support LEFT ANTI JOIN with this syntax, I show this syntax two different ways, one of which is generally supported.
We'll note the tables that we're using and what values are identical based on the column names and what values differ based on the column names. We'll be using these tables throughout these videos. As a note on aliasing tables - the first table always gets a "t1" while the second table always gets a "t2" and so on and so forth (the 5th table would get a "t5"). Finally, consider that some SQL languages may not support the JOINs that we discuss. In first our example, we use a LEFT ANTI JOIN without directly calling it (like we could do in some SQL languages), but by using a null filter on the LEFT JOIN condition. We'll notice that by specifying nulls in our filter, we can get the records in Table1 that don't exist in Table2 - in a sense, we're using these nulls as filters. Let's suppose that we wanted to bypass the null filter in Hive SQL: we could accomplish the same thing just adding a LEFT ANTI JOIN on our condition. This saves us development time because we don't need a WHERE clause when we perform a LEFT ANTI JOIN.
If we were to apply the same logic, but use a different column, notice that our result would be all the records in Table1. This is because with other columns, all the records in Table1 don't exist in Table2. As we see, a big part of what we're doing here is filtering out records when we have two data sets with similarities and differences. If you've ever completed an assignment that involved "compare" and "contrast" then you've done something similar in functionality as you had to identify the similarities and the differences, which is what we're doing here with data.
As for a real world example with LEFT ANTI JOINs, consider a health example where we have populations with different traits, but all that have the same disease. We could use a LEFT ANTI JOIN to filter out these differences. This would possibly allow us to further our analysis (may or may not be useful). The key with any data set where we use a LEFT ANTI JOIN is that must have columns that will be used in our join condition where we do the filtering. In other words, we would filter on trait or multiple traits depending on how we broke down our data in the example. The same applies if we were to use LEFT ANTI JOINs with sales data, housing data, etc. We must have columns where we'll apply our filtering condition.
For more SQL lessons, you can continue to watch the SQL Basics series.