Language Integrated Query or LINQ is a C# feature that allows you to query different data sources using a unified language syntax.
In the first two parts we learned what is LINQ, when it is used and then went through every-day operations, projection, filtering, sorting, set operations, aggregation, etc. If you're not familiar, I suggest taking a peek.
In this final part we'll go over methods to generate, partition, compare and join collections. Once again, we'll be working a List imported from JSON. Here, you can aquire the data as well as learn how to import it into C#.
GENERATORS
Empty Collection
Let's kick things off with creating empty collections. This is one way to declare an empty Students collection:
var = Enumerable.Empty<Student>();
Alternatively you can do this too:
var emptyCollection = new List<Student>();
In this example, imagine you have to return an empty collection from a method. Here is one way to do it:
public IEnumerable<Student> GetStudents()
{
return Enumerable.Empty<Student>();
}
Or declare a new empty List when the return type is List<T>
:
public List<Student> GetStudents()
{
return new List<Student>();
}
If you're using .NET 8 or above you can reduce the number of steps by using a collection expression []
. This works both for IEnumerable<T>
and the List<T>
:
public IEnumerable<Student> GetStudents()
{
return [];
}
public List<Student> GetStudents()
{
return [];
}
This also works when declaring empty collections:
List<Student> emptyCollection = [];
Range
This operator is used to generate an IEnumerable for a specified range:
IEnumerable<int> fiveNumbers = Enumerable.Range(1, 5);
// [1, 2, 3, 4, 5]
We can also use Range
to generate a new collection of students:
IEnumerable<Student> newStudents = Enumerable.Range(1, 3).Select(count =>
{ // creates new Student for each count (1 - 3)
return new Student()
{
ID = count,
Name = "Placeholder",
Country = "Placeholder",
Age = count * 10
};
});
Repeat
The Repeat
operator generates a sequence that contains one repeated value. For example let's create 5 exact same students:
var repeatTimes = 5;
var csharpStudent = Enumerable.Repeat(new Student()
{
ID = 1,
Name = "C#",
Country = string.Empty
}, repeatTimes);
PARTITION
Take, TakeLast, TakeWhile
The Take
operator is used to limit the number of items in the collection.
IEnumerable<string> firstThreeNames = students.Select(s => s.Name).Take(3);
// ["Mirza", "Armin", "Alan"]
IEnumerable<string> lastThreeNames = students.Select(s => s.Name).TakeLast(3);
// ["Eddy", "Abdurahman", "Amy"]
If the number specified is greater than the number of elements in the collection, the Take
operator will trim the list at the last element (without throwing errors).
The TakeWhile
operator returns all the elements that satisfy the specified condition and skips the rest. In this example, we limit the collection to students under 20 years old.
IEnumerable<string> teens = students
.OrderBy(s => s.Age) // order from youngest to oldest
.TakeWhile(s => s.Age < 20) // take younger than 20. ignore the rest
.Select(s => s.Name); // take only names
// ["Mirza", "Farook", "Alan", "Eddy", "Abdurahman"]
Skip, SkipLast, SkipWhile
The Skip
operator ignores all the elements until a point specified.
IEnumerable<string> lastFive = students.Skip(5).Select(s => s.Name);
// ["Raj", "Nihad", "Eddy", "Abdurahman", "Amy"] (remaining five)
IEnumerable<string> firstFive = students.SkipLast(5).Select(s => s.Name);
// ["Mirza", "Armin", "Alan", "Seid", "Farook"]
Here we're ignoring all the students that are under 20.
IEnumerable<string> teens = students
.OrderBy(s => s.Age)
.SkipWhile(s => s.Age < 20)
.Select(s => s.Name);
// ["Armin", "Raj", "Nihad", "Seid", "Amy"]
The Skip
and Take
operators are commonly used when creating API pagination.
EQUALITY
SequenceEqual
The SequenceEqual
operator is used to compare two collection to determine if they're equal or not. Let's demonstrate that with a simple example:
string[] countries = { "Bosnia", "UK", "Turkey" };
string[] countries2 = { "Bosnia", "UK", "Turkey" };
var isEqual = countries.SequenceEqual(countries2);
// true
If we'd change the order in either collection, the output would not evaluate to true.
string[] countries = { "Bosnia", "UK", "Turkey" };
string[] countries2 = { "Turkey", "Bosnia", "UK" };
var isEqual = countries.SequenceEqual(countries2);
// false
Back to the students collection, we can filter out student objects from the same country and compare the results:
// I created this to avoid writing the predicate `s => s.Country == "Bosnia"` twice
Func<Student, bool> isFromBosnia = s => s.Country == "Bosnia";
var studentsFromBosnia = students.TakeWhile(isFromBosnia);
var studentsFromBosnia2 = students.Where(isFromBosnia);
var isEqual = studentsFromBosnia.SequenceEqual(studentsFromBosnia2);
// true
JOINS
In this section we'll look at various ways to combine collections.
Zip
The Zip
operator in LINQ pairs elements from two collections based on their positions (indexes). Let's create a new collection that will be paired with countries collection:
int[] countryCodes = { 387, 44, 20, 90, 91, 86, 1 };
var distinctCountries = students.DistinctBy(s => s.Country).Select(s => s.Country);
Now let's merge the two using Zip
:
var countriesMerge = countryCodes.Zip(distinctCountries);
Concat
The Concat
operator is used to concatenate (join) multiple collections together.
int[] nums = { 1, 2, 3, 4, 5 };
int[] newNums = { 100, 2, 300, 4, 500 };
var totalNums = nums.Concat(newNums);
The Concat
operator seems similar to the Union
. Both join operators collections. However, there are some differences when using the Concat
:
- No duplicate elements were removed
- The order is preserved
- The second collection is added to the end of the first
Let's join a collection of students ages with another randomly-generated ages collection and combine the result.
var studentsAges = students.Select(s => s.Age);
var eldersAges = Enumerable.Range(1, 10).Select(_ =>
{
// I'm randomly generating an age between 65 & 100
var random = new Random();
int minAge = 65;
int maxAge = 100;
// Add adding random age into the eldersAges collection
return random.Next(minAge, maxAge);
});
IEnumerable<int> combinedAges = studentsAges.Concat(eldersAges);
int totalAges = combinedAges.Count(); // 20
SelectMany
In the students.json file we're using, we know that each student object has a classes property, which represents an array of objects. How can we access those?
{
"ID": 1,
"Name": "Mirza",
"Age": 18,
"Country": "Bosnia",
"Classes": [
{
"ID": 1,
"Title": "CAD"
},
{
"ID": 2,
"Title": "IT"
}
]
}
Bad way
The first choice would be to use the Select()
projection operator:
var classesList = students.Select(s => s.Classes);
Since s.Classes
is the collection as well, the variable classesList
is of type IEnumerable<List<Classes>>
. To get the list of titles we need to loop through outer collection and then use Select
in the inner collection:
var classesList = new List<string>();
foreach (var stud in students)
{
foreach (var cl in stud.Classes)
{
classesList.Add(cl.Title);
}
}
However, there is a simpler way.
Better way
Using the SelectMany()
projection operator we can drill into the inner array with ease.
IEnumerable<string> classTitles = students
.SelectMany(s =>
s.Classes.Select(s => s.Title)
);
The SelectMany()
acts like a join between the outer and inner array.
If we expand our student object with a new Hobbies
property that contains a List:
public class Student {
....
public List<string> Hobbies { get; set; } = [];
}
And then add a few to one of our students:
students.First().Hobbies = new List<string> { "Games", "Hiking", "Blogging" };
We can easily extract it once again the SelectMany()
:
var hobbies = students.SelectMany(s => s.Hobbies);
As opposed to doing:
var hobbiesList = new List<string>();
foreach (var stud in students)
{
foreach (var hob in stud.Hobbies)
{
hobbiesList.Add(hob);
}
}
Alternative SelectMany
The SelectMany()
also has the second mode that accepts two parameters:
- The first parameter is again the collection we're trying to extract
- The second is a function containing data of the original collection and the inner we're trying to extract
var data = students.SelectMany(
s => s.Hobbies,
// original collection, inner collection
(student, hobbies) => ... }
);
Let's use this to create a combination of student names and hobbies:
var hobbies = students.SelectMany(
s => s.Hobbies,
(student, hobbies) => new { Name = student.Name, Hobbies = hobbies }
);
The output is the name of thes student followed by their hobby.
Join
The Join
operator is used to create a combination of two collections. For this example I created a new countries collection that we'll join with the students collection.
public class Country
{
public int ID { get; set; }
public string Name { get; set; }
public string CapitalCity { get; set; }
public string Continent { get; set; }
}
var countries = new List<Country>
{
new Country() { ID = 1, Name = "Bosnia", CapitalCity = "Sarajevo", Continent = "Europe" },
new Country() { ID = 2, Name = "UK", CapitalCity = "London", Continent = "Europe" },
new Country() { ID = 3, Name = "Egypt", CapitalCity = "Cairo", Continent = "Africa" },
new Country() { ID = 4, Name = "Turkey", CapitalCity = "Ankara", Continent = "Asia" },
new Country() { ID = 5, Name = "India", CapitalCity = "New Delhi", Continent = "Asia" },
new Country() { ID = 6, Name = "China", CapitalCity = "Beijing", Continent = "Asia" },
new Country() { ID = 7, Name = "USA", CapitalCity = "Washington", Continent = "North America" },
// Countries below have no students:
new Country() { ID = 8, Name = "Croatia", CapitalCity = "Zagreb", Continent = "Europe" },
new Country() { ID = 9, Name = "Serbia", CapitalCity = "Belgrade", Continent = "Europe" },
};
All students have a country property and we'll use that to link the two collections. Here is a basic join:
var studentsCountriesJoin = students.Join(
countries,
student => student.Country,
country => country.Name,
((student, country) => ( student: student.Name, continent: country.Continent ))
);
Let's clarify what happened here.
- The
students
is the outer collection that is joining the inner collection (countries
). That's the partstudents.Join(countries)
- Then we determine on what property we are going to join the two. We join two on the country name:
student => student.Country,
country => country.Name,
// SQL equivalent
ON Student.Country = Country.Name
- Then we group the two
(student, country)
- And then we decide what we're going to return. In this case it's a collection with two properties, student and continent:
( student: student.Name, continent: country.Continent )
The outcome of the join
var studentsCountriesJoin = students.Join(
countries,
student => student.Country,
country => country.Name,
((student, country) => ( student: student.Name, continent: country.Continent ))
);
is the following collection:
LINQ also allows join by multiple properties as well as applying multiple Joins. More on in the video.
Join & Group
Let's again join students and countries and then group students by continents they're from. The desired structure will look like:
{
"Europe": [List of students where continent is "Europe"],
"Africa": [List of students where continent is "Africa"],
...
}
var studentsByContinents = students
.Join(
countries,
student => student.Country,
country => country.Name,
// Note we do not need to specify { Name = student.Name, Continent = country.Continent }
// C# will do that for us
((student, country) => new { student.Name, country.Continent })
)
// Now comes the groupping part
.GroupBy(g => g.Continent)
.ToDictionary(
// The continent is the key
g => g.Key,
// Value is the list of student names
g => g.Select(sc => sc.Name).ToList());
GroupJoin
The GroupJoin
operator is used to group elements from the second sequence (right side) that match each element from the first sequence (left side). It produces a hierarchical result set.
To get started, let's look again at our join of students and countries.
var studentsCountriesJoin = students.Join(
countries,
student => student.Country,
country => country.Name,
(student, country) => new { student.Name, Country = country.Name, country.Continent }
);
We know that the output here is going to be a collection containing a student name, country and the continent belong to that student. Now let's see the GroupJoin
:
var studentsGroupedByCountry = countries.GroupJoin(
students,
country => country.Name,
student => student.Country,
(country, studentsGroup) =>
new { country.Continent, Country = country.Name, Students = studentsGroup.Select(s => s.Name) }
);
Let's analyze what happened here:
- First of all, we can see that we groupped students by country while joining
- Second thing, the Name is represented as a collection with a
Count
property, not the actual student names - But the most importantly, we have countries without students. There are no students from countries at the bottom and the
groupjoin()
is indicating that.
In SQL terms,
- The
Join()
operator represents the INNER JOIN as it produces the result where only the matching elements from both sequences are included (only the students and the countries with students). - The
GroupJoin()
operator represents the LEFT OUTER JOIN as it produces all records from the inner table (countries), and the matching records from the outer table (students). As we can see all countries are in the result, even those without matching students.
If we'd apply groupping again, the output would be the same thing we had above with Join & Group:
var studentsGroupedByCountry = countries.GroupJoin(
students,
country => country.Name,
student => student.Country,
(country, studentsGroup) => new { country.Continent, Students = studentsGroup.Select(s => s.Name) }
);
var groupedByContinent = studentsGroupedByCountry
.GroupBy(x => x.Continent)
.Select(g => new
{
Continent = g.Key,
Students = g.SelectMany(x => x.Students).ToList()
})
.ToList();
Wrapping Up
That's all I wanted to share on LINQ. If you learned something new, don't forget to hit the follow button. Also, follow me on Twitter to stay up to date with my upcoming content.
Bye for now 👋
Top comments (1)
helpful...