# Interview question: merging intervals ## Let's write a calendar app Recently I stumbled upon a very common interview question (I think it can be found both in AlgoExpert database as explained in this [mock interview](https://www.youtube.com/watch?v=3Q_oYDQ2whs)) and CoderPad examples database. The gist of the problem is that you're asked to write a calendar app, which given two calendars will return a list of free time slots between the two. Each calendar is a list of meetings. As an example, let's have a look on the following: Calendar1: {1, 2}, {4, 8}, {10, 11} Calendar2: {3, 5}, {9, 10} A meeting in each calendar is described as a `{start, end}` tuple. The timestamps are simplified and denote 30 minutes intervals counting from 9:00am. So `{0, 1}` describes a meeting from 9:00am up to 9:30am. `{3, 5}` is a meeting starting at 10:30am and ending at 11:30am. ### First approach First trivial approach (and probably most intuitive initially) can be summarised by the following steps: 1. Merge both calendars (that's easy, list of meetings in Calendar2 can just be appended to Calendar1), 2. Detect overlapping/adjacent ranges and merge them (i.e. `{3,5}` and `{4,8}` are overlapping, both parties are occupied at that time, it can be merged to `{3,8}` meeting), 3. Return the list of free time slots within the resulting calendar. Point 1 is obvious, the two calendars merged are {1, 2} {4, 8} {10, 11} {3, 5} {9, 10} To detect overlapping ranges the list has to be sorted by start time, after which detection of interval overlap becomes trivial as well: {1, 2} {3, 5} {4, 8} {9, 10} {10, 11} ```C++ isOverlapping = (start1 > start2 && start1 <= end2) || (start2 > start1 && start2 <= end1) ``` After applying the above predicate, the list is reduced to: {1, 2} {3, 8} {9, 11} ... which is the answer I'm looking for in point 2. Point 3 is simple as well. Another pass is required to return the list of free time slots: {0, 1} {2, 3} {8, 9} {11, 12} The above list represents time slots within a day when two parties are available and a meeting can be scheduled. The implementation would like something along the lines: ```C++ #include #include #include // 30 minutes divisions counting from 9:00am using TimeQuant = int; using Meeting = std::pair; constexpr bool isOverlapping(const Meeting& a, const Meeting& b) { return ((a.first > b.first && a.first < b.second) || (b.first > a.first && b.first <= a.second)); } Meeting operator+(const Meeting& a, const Meeting& b) { Meeting merged; merged.first = std::min(a.first, b.first); merged.second = std::max(a.second, b.second); return merged; } std::vector mergeOverlaps(std::vector meetings) { std::sort(meetings.begin(), meetings.end(), [](auto a, auto b){ return a.first < b.first; }); std::vector merged{}; for (std::size_t i = 1; i < meetings.size(); ++i) { if (isOverlapping(meetings[i - 1], meetings[i])) { merged.push_back(meetings[i - 1] + meetings[i]); i++; } else { merged.push_back(meetings[i - 1]); } } return merged; } std::vector computeFreeTimeSlots(std::vector meetings) { std::vector results; TimeQuant endpoint = 0; for (const auto& m : meetings) { if (m.first > endpoint) { results.emplace_back(endpoint, m.first); } endpoint = m.second; } if (meetings.back().second < 12) { results.emplace_back(meetings.back().second, 12); } return results; } int main(int argc, const char *argv[]) { std::vector cal1{ Meeting{1, 2}, Meeting{4, 8}, Meeting{10, 11}, }; std::vector cal2{ Meeting{3, 5}, Meeting{9, 10}, }; std::vector meetings(cal1.begin(), cal1.end()); meetings.insert(meetings.end(), cal2.begin(), cal2.end()); auto merged = mergeOverlaps(meetings); auto freeSlots = computeFreeTimeSlots(merged); for (const auto& m : freeSlots) { std::cout << "[" << m.first << "," << m.second << "]" << std::endl; } return 0; } ``` Since this approach requires sorting, which costs `O(nlogn)`, and the remaining steps are `O(n)`, the overall time complexity is `O(nlogn)`. The code above works and as an interview answer is probably good enough however, it's kinda lame and proves that the candidate potentially has no familiarity of set theory. ### Second interation The problem can be reduced to a series of set operations. Each calendar after all is just a set of intervals. Having said so, steps which I previously formalised can be entirely replaced by the following set operations: 1. set merge; to combine both sets, 2. set union; to merge overlapping intervals, 3. set difference; to compute the list of free time slots. First question that comes to mind is if it's possible to use STL algorithms on a `std::set`? To the best of my knowledge, I'm convinced that it's not possible (not entirely, to be precise). The problem is the second step. `std::set_union` performs elements comparison to determine if the element is a member in both sets or not, the problem here is that it's insufficient because to merge overlapping intervals two (or more) intervals have to be combined into one and it's impossible to do that with any comparison predicate. For now, I'm gonna continue with manual approach. First two steps can be combined into one. The answer I'm looking for is: [ {1, 2}, {4, 8}, {10, 11} ] + [ {3, 5}, {9, 10} ] = [ {1, 2}, {3, 8}, {9, 11} ] This is depicted on the below diagram: ![set union](/images/set_union.png) I'll build on top of that diagram when describing an algorithm operating on sets. All of the most common operations (union, difference, intersection) can be computed in linear time (assuming that the input range is sorted). The algorithm is similar (and this is, in fact, how I learned about it) to the classic CS curicculum problem, where given an input string, you're required to check if it contains a balanced set of parentheses. In that algorithm a counter is maintaned. It's incremented on each '(' and decremented on each ')'. At the end of the string, if the value of a counter is zero, the string contains balaned set of parentheses. For, the set operations algorithm a counter is needed as well. First, the interval endpoints have to be flattened and sorted. I'm gonna store these as a tuples. The first element will contain the interval endpoint value, the second is a flag (indicating if this is interval start (if true) or its end (otherwise)). As a result both input calendars will be converted to the following sorted flat list: [ (1, true), (2, false), (3, true), (4, true), (5, false), (8, false), (9, true), (10, false), (10, true), (11, false) ] Now, just like with the parenthesis algorithm, I'm gonna maintain a counter. The counter is incremented on interval `start` endpoint and decremented on interval `end` endpoint. The trick is to track the counter value. When the counter value is "1", the current interval defines interval difference (complement). If it's "2", it'll be interval intersection. It's probably visible better on the diagram below: ![set complement & intersection calculation algorithm](/images/set_ops.png) That's a pretty cool trick, at least in my book! There's one more thing to deal with. What about merging overlapping ranges? Same algorithm can be adapted to solve that problem too. Whenever the counter's value is greater or equal to one, the overlapping interval set has begun. When the counter drops to zero, the interval has ended. Perhaps it's clearer on the diagrams: ![overlapping intervals merge](/images/set_union1.png) ... and a non-overlapping case: ![non-overlapping intervals merge](/images/set_union2.png) Right! It's time to code it down. I'll need only one function (well almost but I'll come back to that later): ```C++ enum class SetOp { Difference, Intersection, Union, }; std::vector setOp(std::vector a, std::vector b, SetOp op) { std::vector results; // join both interval ranges a.insert(a.end(), b.begin(), b.end()); // true indicates start of interval using Endpoint = std::pair; std::vector flatten; // convert a list of Meetings (from board calendars) into a list of endpoints std::for_each(a.begin(), a.end(), [&](const auto& p){ flatten.emplace_back(p.first, true); flatten.emplace_back(p.second, false); }); // and sort the list std::sort(flatten.begin(), flatten.end(), [](auto a, auto b){ return a.first < b.first; }); int d1 = 0, d2 = 0; std::vector endpoints; auto trackState = op == SetOp::Difference ? 1 : 2; for(const auto& e : flatten) { d2 = d1; // increment the counter on starting endpoint, decrement on ending d1 += (e.second * 2) - 1; switch (op) { case SetOp::Difference: case SetOp::Intersection: if (d1 == trackState || d2 == trackState) { endpoints.push_back(e); } break; case SetOp::Union: if ((d1 == 1 && d2 == 0) || (d1 == 0 && d2 == 1)) { endpoints.push_back(e); } break; } // once two endpoints are collected it's possible to recreate an interval out of them if (endpoints.size() == 2) { results.emplace_back(endpoints[0].first, endpoints[1].first); endpoints.clear(); } } return results; } ``` I've tried to comment the code so, it's self explanatory. A couple of extra comments on it. Two sets of intervals are the input for this function along with an enum describing what to do with them. The magic happens in the `switch` case. `d1` is the counter tracking how many starting and ending interval endpoints has been seen so far and `d2` is just its copy from prior iteration - this is needed to determine if transition happened; the important thing to remember is that the counter value is an interesting bit, the transitions are only important to collect the endpoints when the counter value has been attained. As described earlier, if the counter `d1 = 1` the resulting tracked interval resembles a set difference. If it's `d1 = 2`, the resulting interval is an intersection and if `d1 > 1`, it'll merge all overlapping intervals from both sets into one in the result set. Just for reference, the complete code is below: ```C++ #include #include #include // 30 minutes divisions counting from 9:00am using TimeQuant = int; using Meeting = std::pair; enum class SetOp { Difference, Intersection, Union, }; std::vector setOp(std::vector a, std::vector b, SetOp op) { std::vector results; // join both interval ranges a.insert(a.end(), b.begin(), b.end()); // true indicates start of interval using Endpoint = std::pair; std::vector flatten; std::for_each(a.begin(), a.end(), [&](const auto& p){ flatten.emplace_back(p.first, true); flatten.emplace_back(p.second, false); }); std::sort(flatten.begin(), flatten.end(), [](auto a, auto b){ return a.first < b.first; }); int d1 = 0, d2 = 0; std::vector endpoints; auto trackState = op == SetOp::Difference ? 1 : 2; for(const auto& e : flatten) { d2 = d1; // increment the counter on starting endpoint, decrement on ending d1 += (e.second * 2) - 1; switch (op) { case SetOp::Difference: case SetOp::Intersection: if (d1 == trackState || d2 == trackState) { endpoints.push_back(e); } break; case SetOp::Union: if ((d1 == 1 && d2 == 0) || (d1 == 0 && d2 == 1)) { endpoints.push_back(e); } break; } if (endpoints.size() == 2) { results.emplace_back(endpoints[0].first, endpoints[1].first); endpoints.clear(); } } return results; } std::vector mergeAdjacent(std::vector meetings) { std::vector results; for (const auto& m : meetings) { if (!results.empty() && results.back().second == m.first) { results.back().second = m.second; } else { results.push_back(m); } } return results; } int main(int argc, const char *argv[]) { std::vector cal1{ Meeting{1, 2}, Meeting{4, 8}, Meeting{10, 11}, }; std::vector cal2{ Meeting{3, 5}, Meeting{9, 10}, }; auto calsMerged = mergeAdjacent(setOp(cal1, cal2, SetOp::Union)); auto freeSlots = setOp({{0, 12}}, calsMerged, SetOp::Difference); for (const auto& m : freeSlots) { std::cout << "[" << m.first << "," << m.second << "]" << std::endl; } return 0; } ``` You may have noticed that there's an extra function `mergeAdjacent`, why is it needed? It does exactly what it says, if within the given set there are two neighbouring intervals one ending and, at the same time, one starting at the same point, the original algorithm won't be able to merge these and that's fine since no overlap occurs. For the purpose of this exercise such events have to be coalesced into one hence this function which is just doing another run through the results set and merges any intervals which are next to each other. ### Third iteration Can the problem be solved in a more fancier way though? Enter [interval tree](https://en.wikipedia.org/wiki/Interval_tree). In a nutshell, an interval tree stores intervals and allows to answer the following queries in an optimal time complexity manner: - find(I) -> []I: given interval I, return all overlapping intervals, - find(P) -> []I: given point P, return all intervals that contain P. Conveniently it just so happens that boost's [ICL](https://www.boost.org/doc/libs/1_68_0/libs/icl/doc/html/index.html) library contains the implementation of an [interval tree](https://www.boost.org/doc/libs/1_64_0/libs/icl/doc/html/boost/icl/interval_set.html). The examples are a bit more compelling, especially [this one](https://www.boost.org/doc/libs/1_68_0/libs/icl/doc/html/boost_icl/examples/interval_container.html). Using these as a reference it's possible to solve this problem the following way: ```C++ #include #include // 30 minutes divisions counting from 9:00am using TimeQuant = int; using Meeting = boost::icl::interval::type; int main(int argc, const char *argv[]) { boost::icl::interval_set cal1, cal2; boost::icl::interval_set entireDay; cal1.insert(Meeting{1, 2}); cal1.insert(Meeting{4, 8}); cal1.insert(Meeting{10, 11}); cal2.insert(Meeting{3, 5}); cal2.insert(Meeting{9, 10}); entireDay.insert(Meeting{0, 12}); std::cout << entireDay - (cal1 + cal2) << std::endl; return 0; } ``` ... and that's it! Pretty amazing isn't it? Of course, this is cheating a bit since it's just using an off the shelf implementation and for sure that wouldn't be a satisfactory answer during and interview. For any production purpose this is the way to go though. Reliable, optimal and tested. Certainly I enjoyed working with sets, inspired by this small toy problem and hope you had too.