PAssignment 3: A (2+ϵ)-approximation for maximum weight matching in the semi-streaming model

Getting started

In this assignment you will implement a recent advancement in the semi-streaming model for computing a maximum weight matching. The assignment will be autograded on gradescope, and you will be able to see your grades after you submit. As in Programming Assignment 1 and 2, you are allowed up to 44 submissions and there will be a public scoreboard!

The starter template is here and the setup process is the same as the previous assignments (here and here).

After you install these packages, verify that you can run the stream.py python file. You should see the output (some of the numbers may be different):

Memory units (greedy): 5062
Matching value: 5866.0

Background

A maximum matching on a graph G = (V, E) is a set of non-adjacent edges which maximize the sum of their weights (a more precise definition can be found here).

Since the introduction of the semi-streaming model in 2005, the problem of computing a 2-approximation to the maximum weight matching has been actively studied. In the offline (normal) model of computing, one can just sort the weights from largest weight down and add edges greedily to the matching to obtain a 2-approximation. Unfortunately this sorting step is not possible in the streaming model.

Numerous attempts were made at the problem: a 5.828-approximation (McGregor) in ‘05, a 5.585-approximation and then a 4.911-approximation (Zelke, and then Epstein) in ‘12, a 4-approximation in ‘14, and then a 3.5-approximation in ‘16 (Grigorescu). Finally, in 2017 a 2-approximation was shown by Paz and Schwartzman, winning them the Best Student Paper and Best Paper award at SODA.

The algorithm is ~10 lines long, and the approach you will be following is a simplified version by Ghaffari and Wajc here: https://arxiv.org/pdf/1701.03730.pdf.

The basic algorithm uses a technique from the ’80s called “local ratio”. The algorithm itself is simple to state:

Repeatedly select an edge e with positive weight; reduce its weight from itself and its neighboring edges; push e into a stack and continue to the next edge, so long as edges with positive weight remain. At the end; pop edges off stack and add the edges greedily to the matching.

Implementing the assignment

To complete the assignment, you will have to fill out the implementations of wm_streamer, wm_improved_streamer in stream.py. No other files need to be modified. Also, please do not change any of the file names.

You will have to implement both the original algorithm of Paz and Schwartzman, as well as the improved algorithm of Mohsen and Wacj. The original algorithm should be implemented in the wm_streamer function, and can be found in Algorithm 1 of the linked paper (see Background). The improved algorithm should be implemented in wm_improved_streamer, and can be found in Algorithm 2 of the linked paper. The two algorithms only differ by 5 lines, so it should be straight-forward to implement the second algorithm once you get the first one working.

Using `cs368lib`

As in previous assignments, your algorithms must be extremely memory efficient. To track the amount of memory your program is using, you must use the functions provided to you in cs368lib.py. cs368lib.py must not be modified.

For this particular assignment, cs368lib.py has been updated with additional functionality. The tracked_list class now tracks memory used for lists of lists correctly. This will be necessary to implement a per-vertex queue for the improved algorithm. Note that the tracked_list can be used as both a queue and a stack via the pop method.

Food for thought

Here are some questions to think about (which you do not have to submit):

Is the improved algorithm actually better than the original algorithm? What do you see in your local experiments when you compare the two algorithms?
Are there simple distributions of weights for which better approximation guarantees can be shown?
Along the same lines, are there more efficient algorithms for different distributions?

Submissions, Grading, and ⚝Scoreboard⚝

Scoreboard names

When you submit to gradescope, please only upload stream.py. Upon submission, you will get a prompt for filling out your scoreboard name. Feel free to choose any name you wish.

Time limits

The judge will run your estimators on a variety of input distributions under a some-what tight time limit. This means that the time complexity of your solution is important as well. Each test case will get 1-10 seconds of run time. If you timeout on a test case, no credit is given for that test case.

Memory limits

The grader also grades you based on the amount of memory you use. If your submission uses too much memory on a test case, no credit is given. The accepted level of memory is min(len(stream), 8*n*log(1/eps)/eps). You are encouraged to optimize your solution to use even less memory.

Accepted level of error

If your solution does not return something within an additive eps F1 error of the exact solution, no credit is given.

Lastly, please do not submit excessively. Each submission takes a few minutes to judge and we only have 1 machine. Instead, test things locally when possible.

Final words

Start early! Please do not leave the programming assignments to the last minute.
For testing your solution, it may help to write random input generators and test locally. To help you test, the default implementation of the estimators returns the exact solution.
If you have suggestions for new problems or improvements to current problems, let me know!