PAssignment 1: F0 and F2 estimation

Getting started


In this assignment you will implement some of the F0 and F2 estimators we covered in class. The assignment will be autograded on gradescope, and you will be able to see your grades after you submit. You are also allowed up to 44 submissions (that’s 1 per day from the time of release until June 1).

To make things more fun, we are also running a public scoreboard! The scoreboard will rank by the amount of memory you use in your solutions (you can be anonymous in this scoreboard). The top teams on the scoreboard will receive bonus credit at the end of the quarter. At this time we are still deciding on the criteria for “top” teams and the amount of bonus credit.

The starter template is here. To set up the coding environment, you’ll need to install python 2.7 along with packages mmh3, numpy, and sortedcontainers. With python installed, you can install these packages with the command (executed in bash for Linux and cmd for Windows):

python -m pip install mmh3 numpy sortedcontainers

After you install these packages, verify that you can run the stream.py python file. You should see the output (some of the numbers may be different):

F0 estimate: 1234
F2 estimate: 91088
Total memory units used: 30005

Implementing the assignment


To complete the assignment, you will have to fill out the implementations of f0_estimator and f2_estimator in stream.py. No other files need to be modified. Also, please do not change any of the file names.

The estimators you implement can be any of the algorithms we’ve seen so far in class (or other low-memory ones you are aware of). The functions you’ll implement must take in a stream of integers (a normal python list) and an error parameter eps. The output is a floating point value for the estimated quantity. Please do not modify the input stream variable (such as sorting it in place). This is a violation of the streaming model.

Using cs368lib

Since we are working in the streaming model, your algorithms must be extremely memory efficient. To track the amount of memory your program is using, you must use the functions provided to you in cs368lib.py. cs368lib.py must not be modified. The cs368lib file provides to you the following functions/classes that you may find useful:

Any memory you use must be tracked using cs368lib. Failure to do so will result in penalization.

Submissions, Grading, and ⚝Scoreboard⚝


Scoreboard names

When you submit to gradescope, please only upload stream.py. Upon submission, you will get a prompt for filling out your scoreboard name. Feel free to choose any name you wish.

Time limits

The judge will run your estimators on a variety of input distributions under a some-what tight time limit. This means that the time complexity of your solution is important as well. Each test case will get 1-10 seconds of run time. If you timeout on a test case, no credit is given for that test case.

Memory limits

The grader also grades you based on the amount of memory you use. If your submission uses too much memory on a test case, no credit is given. The accepted level of memory is min(3n, 20 \log n/eps^2) where n is the length of the stream. You are encouraged to optimize your solution to use even less memory.

Accepted level of error

If your solution does not return something within a (1 ± eps) factor of the exact solution, no credit is given.

Lastly, please do not submit excessively. Each submission takes a few minutes to judge and we only have 1 machine. Instead, test things locally when possible.

Final words



  1. These hash functions use MurmurHash underneath the hood, which is not technically a k-wise independent hash function. However it is fast, and will work in practice. ↩︎