-
Notifications
You must be signed in to change notification settings - Fork 0
/
rt-polaritydata.README.1.0.txt
54 lines (37 loc) · 1.64 KB
/
rt-polaritydata.README.1.0.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
=======
Introduction
This README v1.0 (June, 2005) for the v1.0 sentence polarity dataset comes
from the URL
http://www.cs.cornell.edu/people/pabo/movie-review-data .
=======
Citation Info
This data was first used in Bo Pang and Lillian Lee,
``Seeing stars: Exploiting class relationships for sentiment categorization
with respect to rating scales.'', Proceedings of the ACL, 2005.
@InProceedings{Pang+Lee:05a,
author = {Bo Pang and Lillian Lee},
title = {Seeing stars: Exploiting class relationships for sentiment
categorization with respect to rating scales},
booktitle = {Proceedings of the ACL},
year = 2005
}
=======
Data Format Summary
- rt-polaritydata.tar.gz: contains this readme and two data files that
were used in the experiments described in Pang/Lee ACL 2005.
Specifically:
* rt-polarity.pos contains 5331 positive snippets
* rt-polarity.neg contains 5331 negative snippets
Each line in these two files corresponds to a single snippet (usually
containing roughly one single sentence); all snippets are down-cased.
The snippets were labeled automatically, as described below (see
section "Label Decision").
Note: The original source files from which the data in
rt-polaritydata.tar.gz was derived can be found in the subjective
part (Rotten Tomatoes pages) of subjectivity_html.tar.gz (released
with subjectivity dataset v1.0).
=======
Label Decision
We assumed snippets (from Rotten Tomatoes webpages) for reviews marked with
``fresh'' are positive, and those for reviews marked with ``rotten'' are
negative.