The DEC Systems Research Center ran the EachMovie recommendation service for 18 months to experiment with a collaborative filtering algorithm. During that time, some 72916 users entered a total of 2811983 numeric ratings for 1628 different movies (films and videos). We are making this preference data set available, with all user identification removed, so that other collaborative filtering researchers can use it to test their algorithms.
If you are interested in the design of our system, you can read the Each to Each Programmer's Reference Manual written by Paul McJones and John DeTreville.
Copyright © Digital Equipment Corporation 1997.
The preference data set was compiled by Digital Equipment Corporation using our collaborative filtering technology. Digital is making the data set available for use under the terms that apply to this Digital web site (see Legal) including the following terms:
1. All information is provided "AS IS". Digital makes no warranties or representations with respect to the completeness or accuracy of the information or otherwise. DIGITAL DISCLAIMS ALL WARRANTIES WITH REGARD TO THE INFORMATION, INCLUDING ANY IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
2. In no event shall Digital be liable for damages, and in particular Digital shall not be liable for special, indirect, consequential, or incidental damages, or damages for lost profits, loss of revenue, or loss of use, arising out of or related to the information or the use or dissemination thereof, whether such damages arise in contract, negligence, tort, under statute, in equity, at law or otherwise.
3. The user may use the information only for research purposes which are non-commercial and non-revenue bearing. Any published research results or other publications resulting from use of the information shall credit Digital Equipment Corporation as the provider of the data. The user agrees to provide Digital with a copy of any such publication using any of the contact names provided at this web site. The user may make copies of the data set as needed for internal use only for the preceding purposes. All such copies shall duplicate Digital's copyright notice and this notice.
The data set is available as eachmoviedata.tar.gz (zipped tab-separated-value text files, 17632000 bytes compressed). There are three tables, one per file:
IMDb URLs are provided by courtesy of Internet Movie Database.
The theater and video status and release dates were (approximately) correct in the San Francisco bay area as of September 15, 1997, when EachMovie was terminated.
Score is the rating provided by this person for this movie. The zero-to-five star rating used externally on EachMovie is mapped linearly to the interval [0,1]. Here's a histogram of the Score values:
Score Count 0 347191 0.2 150495 0.4 339718 0.6 701236 0.8 761676 1.0 511667
Weight is only relevant in the case of a Score of zero, in which case it distinguishes whether the person rated a movie as zero stars (weight = 1) or "sounds awful" (weight < 1). (Most "sounds awful" weights are 0.2, but for historical reasons about 10% are 0.5.) The idea behind "sounds awful" was to let a user indicate he never planned to see a movie (hence we would omit it from future list of predictions). Our collaborative filtering algorithm treated such a declaration as less authoratative than a regular rating of zero stars.
Given our site design, there is no way to know whether the person had seen the movie in a theater or on video.
If you have read the terms above, and agree to them, contact
1 650 853-2166
Compaq Systems Research Center
130 Lytton Avenue
Palo Alto, CA 94301
by telephone or email. He will give you a password for downloading the data. You may also send copies of your publications involving this data (see term 3 above) to Steve.