[We'd like to thank Paul Jakus for this analysis of recent Phish.net ratings. Coincidentally, we've been analyzing ratings with him for a future blog series digging deeper into how Phish fans rate shows. Stay tuned for more on ratings soon! —Ed.]
At 3:42 p.m. on the afternoon of January 3, 2024 the ratings function of Phish.Net was disabled due to unusual patterns in ratings behavior. Here we’ll explain those patterns, but first let’s establish what a “normal” Holiday Run ratings pattern looks like.
For comparison, let’s look at ratings submitted between 1:00 a.m. January 1, 2023 through 3:42 p.m. January 3, 2023 (a time period that will match that of the 2023/24 NYE Run.) Some 1,004 ratings were submitted over nearly 63 hours, for 116 different shows. Of these ratings, 838 (84%) were for the four holiday shows, leaving 166 ratings to be spread across the remaining 112 non-holiday shows. The most new ratings any non-holiday show received was six.
So, what happened after the 2023/24 Run? Read on for more.
The first rating for 12/31/23 was at 1:12 a.m. on January 1, 2024. A total of 3,779 ratings were submitted for 442 different shows until ratings were suspended at 3:42 p.m. on January 3. Only 2,103 of these ratings (56%) were for the 2023/2024 NYE Run, leaving 1,676 ratings spread over the remaining 438 shows.
There is no doubt that something odd happened.
Now let’s take a closer look at the ratings for the Gamehendge show (Figure 1). Nearly 1,800 ratings were posted, with a very high proportion of ‘5’ ratings (87%), neither of which would be unexpected for an instantly classic performance. Almost 7% of ratings were a ‘1’, though, which was a bit higher than the norm for all modern era shows (5.8%). Peculiar, yes, but nothing immediately identified as highly unusual.
But if nothing was obviously wrong with ratings for the Gamehendge show, why suspend the ratings function?
Recall the same time period after the 2022/23 run: we saw lots of ratings for the Holiday shows and relatively little activity on non-holiday shows. Things were different this year.
In 2024, newly submitted ratings of notably historic shows were pervasive, with more than 30 older shows receiving at least 10 new ratings in the first three days of the New Year. For example, here are all the older shows with 30 or more new ratings (and the ratings distribution) during the January 1-3, 2024 period.
Table 1: Older Shows and Newly Submitted Ratings
|
|
Before January 1, 2024* |
After January 1, 2024* |
||
Date |
Location |
# of Ratings |
Average |
# of New Ratings |
Average of New Ratings |
12/31/1999 |
Big Cypress, FL |
1,325 |
4.767 |
259 |
4.232 |
12/30/1997 |
MSG |
770 |
4.700 |
69 |
3.783 |
11/22/1997 |
Hampton, VA |
854 |
4.678 |
55 |
3.873 |
4/3/1998 |
Nassau, NY |
832 |
4.669 |
49 |
3.932 |
8/2/2003 |
Limestone IT |
490 |
4.680 |
39 |
3.769 |
12/31/1995 |
MSG |
942 |
4.634 |
37 |
4.297 |
8/17/1997 |
Limestone Went |
650 |
4.652 |
33 |
4.030 |
8/22/2015 |
Watkins Glen Magnaball |
1,648 |
4.649 |
30 |
3.233 |
*Before/After 1:12 a.m. EST, through 3:42 p.m. January 3
It’s clear that Big Cypress was the primary historic show affected, but it was definitely not the only one. Every show in the table above was well-known, highly rated, and listed in The Phish Companion as a “Top 100” performance.
Did all of these performances really get perceived as “worse” starting on January 1?
Let’s take a closer look at ratings for Big Cypress, 12/31/1999. The 259 post-January 1, 2024 ratings happened within about 51 hours. For comparison, the previous 259 ratings for this show were submitted over 1,336 days. The number of new ratings for this show was highly unusual.
Table 2 depicts the number and percentage of each rating for NYE Big Cypress, before and after January 1.
Table 2: Big Cypress (12/31/1999), Before and After January 1, 2024
Rating |
Before January 1, 2024* |
After January 1, 2024* |
1 |
51 (3.85%) |
42 (16.22%) |
2 |
7 (0.53%) |
3 (1.16%) |
3 |
14 (1.06%) |
4 (1.54%) |
4 |
56 (4.23%) |
14 (5.41%) |
5 |
1,197 (90.34%) |
196 (75.34%) |
*Before/After 1:12 a.m. EST
First, the 12/31/99 ratings distribution looks remarkably similar to that of the 2023 NYE Gamehendge show, except for fewer “1s’. Second, the ratings distribution for the post-January 1 period is markedly different. Is it reasonable to expect 259 ratings to pour in for a show that’s 24 years old, in a matter of hours, and with such different ratings?
A nice feature of the Phish.Net ratings function is that it generates debate among fans, but its primary purpose is to help guide fans, both new and old, through a large body of recorded music. Given this primary goal, and given what appears to be manipulation of numerous highly-regarded shows, it was decided to suspend the ratings function.
The ratings function will return, but Phish.Net administrators are currently mulling possible changes in how ratings are calculated and presented in the future. To begin with, though, it’s necessary to answer the thousand dollar question: whether the ratings submitted this week reflect an honest re-appraisal of historic shows, or if they were submissions designed to affect show rankings.
If you liked this blog post, one way you could "like" it is to make a donation to The Mockingbird Foundation, the sponsor of Phish.net. Support music education for children, and you just might change the world.
You must be logged in to post a comment.
Phish.net is a non-commercial project run by Phish fans and for Phish fans under the auspices of the all-volunteer, non-profit Mockingbird Foundation.
This project serves to compile, preserve, and protect encyclopedic information about Phish and their music.
Credits | Terms Of Use | Legal | DMCA
The Mockingbird Foundation is a non-profit organization founded by Phish fans in 1996 to generate charitable proceeds from the Phish community.
And since we're entirely volunteer – with no office, salaries, or paid staff – administrative costs are less than 2% of revenues! So far, we've distributed over $2 million to support music education for children – hundreds of grants in all 50 states, with more on the way.
While you're under the hood, please also consider how to respond to the one-star bot bombs that have affected shows the last year or two as articulated by @abuani in this this thread.
Now we have people gaming a ratings system. To what end I can't even guess.
BTW, I was not involved in the decision to suspend the ratings (I'm simply a user, not an admin). In fact, I came to .Net on Wednesday afternoon just to look at the NYE ratings and saw they'd been locked. I sent a message to @sethadam1 and @Lemuria Thursday morning and asked what was happening. That's how I came to look at the new data.
1. Mitigate reactionary ratings: add a cooling off period before show ratings open. Say, a week. Maybe even a month.
2. Institute "one person one rating": include only *verified* user accounts in the average ratings tabulations to limit the number of users with multiple .net accounts rating the same show multiple times.
3. Increase transparency and accountability: make all user ratings public. Any user should be able to see the ratings for any other user. Kinda like free speech... sure, you can say (almost) anything you want. But just because you can say almost anything doesn't insulate you from from any consequences from what you say.
Thanks .net team for hitting the pause button on the ratings and working to enhance their usability to guide fans to good shows!
1. Jam chart or new team determines the list of "canonical" shows. Remove rating capabilities from these shows and create a prominent page with this list. Even create a project and select volunteers to write long form articles explaining the importance and excellence of said show. You get to remove them from controversy and also achieve the primary purpose you mentioned, making sure people are aware of these particular shows. I'd prob choose a word other than canon TBH. Sounds a bit too serious for my liking.
2. New "notable shows" page for 3.0 and beyond managed annually by jam chart or new team. Again you achieve the primary objective of helping fans find shows they should hear. At any point if a show like 12/31/23 comes along you can pretty much immediately add it to canon page.
You can still have the top rated page, but maybe divide into three pages by era? Or keep it the same and add some protections, but with the two changes above less harm can be done and also less incentive to commit the harm.
Someone else suggested potentially reaching out to RateYourMusic and considering implementing something similar to what they have: RateYourMusic FAQ
Hope this helps.
I give my comment a 4.78
A master list of top 100 or more shows of all time would be helpful. We don’t need to see rating scores for this just a solid list based on scores from the yearly ratings.
Big Cypress a 1? Shame on you. These folks should seriously be blocked from all future ratings.
Based on the October dataset, and restricting ratings to just the Modern Era (post-2009), 91% of all ratings submitted by Phish.Net users were a 3, 4, or 5. We're just like Netflix.
How about the little guys?? Look at the data for all 4.0 Alabama shows and the criminal rating bombs that occurred to those beauties.
Most of all Keep up the good work
The job You're doing is Grrreattt!
Personally I'm one of those stat geeks who likes browsing the ratings database, mostly to find new shows outside the top 100 to check out and would like it to continue
The only online site I use to listen to the Dead has a rating system. I use it as a guide and find it quite useful, given I'm usually looking for something particular.
I think the rating system should require a [blank] character blurb in which the rater explains why - even if it's just bc of the Jim->LSG - the show is worth exploring. Not only will this mitigate how the "how easy it is to click a star" problem, the system will have added meaning.
.02.
Also: People simply don't have to reference them.
Send me some dates for the AL shows and I'll let you know. I've never been sure what 4.0 actually means...
One way to handle this without disrupting the democratic nature of the ratings system is to report two ratings: "raw" - what we see now and "adjusted" - after anomalous/manipulative ratings have been removed through statistical or data analysis.
Given the inherent limits in rating shows in a meaningful way and that will be less likely to cause such truly stupid strife among fans, I ask that you simply ditch the system and don’t bring it back. If it is brought back, however, the non-NYE run shows that we’re ranked in the 1.1.24-1.3.24 timeframe should be restored to the rating assigned before 1.1.24.
@BetweenTheEars said: Th
3.783??? hahahaha
One request for the revamped ratings system would would be to add half-stars, or a 10 star system, to rating choices Thanks admins!
I feel the best way to keep these true, is to motivate folks to vote here. You're probably going to get a lot of votes when you turn it back on, great, good start. But keep it going and encourage people to cast theirs. Hold them accountable, as one phan mentioned. Post their votes in their profile (i noticed this is active already). And make sure they can only vote once per show, no edits/revisions allowed either. Incentivize maybe, offer access to special stat analytics once you've voted x times. (we're all geeks for data aren't we?) Truth will prevail. Good luck.
I think in situations like this it’s not unreasonable to just remove all the “1” ratings and just move all affected shows to a GOATed status. Then people know for next time too.
Part 1: Salvage and permanently pause the current rating system
1. If you have the logs, dedupe ratings on the same show from the same IP address. Or even just remove all of the ratings after 5 from the same IP to account for some people on shared connections.
2. Permanently freeze this ratings system.
3. Continue publishing the legacy ratings.
Part 2: Build the new rating system
1. Require a Phish.net account with validated email and mobile number that has been active at least 90 days with some forum participation to cast any votes at all.
2. Share the rating distribution data and make ratings non-anonymous
3. To start, only open ratings for the most recent tour.
4. Each month, open ratings for one previous tour with some addition blog and forum content to help encourage people to re-listen to shows. Work you way back to the beginning plus open ratings on all new shows 24 hours after the show ends.
Rando ideas from someone with little programming experience or availability to help out. I deal with the goddamn customers so the Phish.net engineers don't have to. I have people skills. I am good at dealing with people. Can't you understand that? What the hell is wrong with you people?
Oh yeah, ideas.
-Raters have some readily-viewable analysis of their rating history indicated next to their name; box plots would be useful and they are kinda look like cartoon turtles.
-Raters indicate if they attended live, streamed, listened afterwards. No other option. Someone to have in-person attendance verified is the golden seal. Liars get the scarlet donut next to name in perpatuity.
-Raters have number of shows attended live indicated easily next to name. Maybe some way of indicating spread of shows across era's by color and hue?
-If attended live, can rate based on both musical achievements and/or experience. Ive had a great time at shows that was more about vibe than musical achievement. The show I went to while my ex was cleaning out her stuff from our once shared apartment was maelstrom of every emotion which I shant forget soon. Maybe a Mario power-up shroom icon to indicate a psycotropic state of mind?
-Id love to select my own criteria for ratings. For example, only use select Netters (using a query function) for my rating calculations.
Y'all are great for making this community apparatus exist. Giggity.
The more I think about it;
…..when you first heard this particular show - were you there? Did you couch tour? Did you listen to the recording? —
this should be a tracked metric.
If someone was at NYE MSG ‘23/‘24 as I was, your review will be drastically different than someone listening (and comparing in their mind to the last time they played Gamehendge) the next day without all the visuals who was not there.
I also wholeheartedly agree that to compare stellar shows throughout the years is impossible. Nothing will ever top Big Cypress for what it was….Same with Gamehendge NYE….they are incomparable.
One easy fix would be to eliminate all 1 star ratings from all shows. I mean seriously - there has never been a 1 star show. Even the worst shows have at least something redeeming about them. Has anyone here actually ever had a BAD time at a show? Some obviously aren't as good as others, but as someone who went to what were rated two of the worst shows in recent history (Grand Prairie 2016), they were still far better evenings than anything else I could have been doing. The only possible reason anyone would ever rate anything 1 star is to try to lower that show's rating.
Beyond that, I think @betweentheears' 2nd and 3rd suggestions are definitely the way to go. Only real accounts, one rating per user, and everyone's ratings posted on their profile pages.
Any of 5/27/22-5/29/22 the highlights are 5/27 Set 2, 5/28 Set 1 overall best sets of the run. But it’s all good. 7/12/23 and the. 7/30/21 but it’s not a rating bomb victim
The authors of the book and Charlie had enough experience listening to shows that I trusted their opinions and I filled in the gaps on my own.
There’s a review in the Almanac of a Sept ‘99 show where one review says the concert was great and the other review panned it. I listened to that tour and knew the negative guy was right as the band was sluggish and it was a tour low point. How many people rate the shows on Phish.net have only seen one or two shows on the tour or haven’t listened to all the others played previously?
I think there are a couple of people who've setup hundreds or thousands of accounts and written scripts to shift ratings. It is crazy that someone has done that and it should definitely be put to a stop.
⭐️hit it, quit it
⭐️⭐️not bad
⭐️⭐️⭐️well liked
⭐️⭐️⭐️⭐️very popular
⭐️⭐️⭐️⭐️⭐️exceptionally popular
And most of these shows have 2000 or fewer votes. That's less than .5% of the fanbase.
Ratings are meaningless.