I generally consider myself a competitive player first and a community member second but, in Hex, those roles for me certainly have reversed greatly—and one of my biggest roles within the Hex community, in addition to all the other roles, is running and maintaining two major tournament series. In both the Shard Cup Tournament Series and the FiveShards Weekly Series, we have to be extremely careful about the rules we adopt, the schedules that we pick, the prizes we award, the entry fees we charge, and any other institutional rules that we establish as every decision has an effect on player behavior. This is true of anyone planning to organize a tournament within our community. The most important player behavior from a tournament design perspective, of course, is player turnout. We certainly care about other metrics such as player cheating, player fun, length of the tournament, player perception of the series, and others—but all of these feed into that primary indicator of whether players are playing in our tournaments.
As many people already know, my occupation involves me doing research into political phenomena and attempting to test hypotheses about different institutions and how those institutions affect the behavior of various actors within the realms of politics and economics (my curriculum vitae can give you an idea of the types of things I normally research/publish on). Thankfully, due to the data Hex Entertainment has made available over the past year, I can combine both my professional interests with my hobby inclinations to test some ideas about tournament design and outcomes. I have been collecting the data for this project over the last few months and have mentioned them to people and on the forums in a few places, but have not posted any systematic analysis publicly. I figured this was an opportune time to do so as it is the start of a new year as many people are thinking about how they can effect the Hex community and help it grow.
Of course, the FiveShards Shard Cup Series is far too young to do statistical analysis on (an observation count of 13 is insufficient to get reliable results and the tournament structures change from month to month) and the FiveShards Weekly Series has yet to start. However, the gauntlet tournament data has been available since the qualifying season had started back on October 6th, 2015 and—more importantly—the institutional rules governing these tournaments have changed twice over the course of the last three months. As such, our research interest of whether institutional rules changes affect player behavior is ripe for exploration.
A few caveats before we proceed. First, I have been planning on turning this into an academic paper in the future and so this is more of an initial discussion and examination of the data; naturally, the language and description will change in some ways, but much of the core analysis and setup is going to look the same. One big change is that I will have to spend several paragraphs explaining what Hex is and different tournament structures and options; these are all things that we as players are all mostly familiar with at this point. Second, this is preliminary. I am pretty confident that the results are consistent as I have been evaluating them periodically for the last few months, but I still want the full IQ experiment to run before I finish collecting and presenting the data. By January 17th, I will have roughly 100 observations, which is a good set of data to analyze and get some level of meaningful results. Generally, statistical analysis gets better with more data and I may be inclined to include some post-IQ numbers, but the analysis will get trickier if I choose to do so. Third, I will try to explain most of my decisions and techniques used in this article, but some of it might be a little bit more dense than it needs to be. If you have any questions about clarity, I will be happy to answer them below in the comments section. Fourth, the data are quasi-experimental, observational data which means that we are mostly looking at correlative behavior. Like most science, we are not proving things here, but are providing evidence for a particular hypothesis or providing evidence against it.
Over the next few sections I will discuss the theory and hypotheses we expect, the data and variables I use to test those hypothesis, the results of those tests, and, finally, any conclusions we can draw from it.
Starting on October 6th, 2015, Hex Entertainment began giving out qualifier tickets for participation in competitive tournament events. Players could earn a variable amount of tickets from competing in scheduled events, a single ticket from winning a competitive draft event, or a single ticket by participating in constructed or sealed gauntlet queues and winning five matches before losing three. Each of these events offered different advantages and disadvantages for participation, but the lure of the Invitational Qualifier tournament, and participating in a final tournament with a prize pool of $100,000 certainly galvanized the player base to begin playing in more tournaments. Constructed gauntlet queues were best-of-1 matches and gave each player a 20-minute clock. Constructed gauntlet offered a unique opportunity for quick matches for players that could take less time than a draft or a scheduled event. Drafts can take up to 3.5 hours, scheduled events can take 4-4.5 hours, and constructed gauntlet tournaments can take a player anywhere from 15 minutes (three quick losses), ~50 minutes for a win (five quick wins), or, in an unlikely worse-case scenario, 4.67 hours for 7 matches if each match went close to time. This does not take into account how long it takes to fill a draft tournament or gauntlet queue—but constructed gauntlet offered, initially, quick grinds for tickets. Scheduled events offer a higher ticket return (10 for a win), but are likely to be more variable and often can result in a 4-hour tournament that results in one or zero ticket(s).
After some high level player demand for constructed gauntlet queues to be more representative of traditional constructed matches, Hex Entertainment made two institutional changes that altered the structure and the incentives for players to participate. First, on November 10th, 2015, they made the matches best-of-three. The clocks still remained at 20 minutes for each player, but it allowed people to use 15-card reserves, which can add up to 4 minutes to each match and made it much more likely that the full clock would be used. In fact, vocal players on the forums were opposed to the 20-minute-clock remaining and demanded a traditional 30-minute clock to replace it so matches would be more likely to finish in a state-based winner instead of a clock-based winner. Hex Entertainment agreed and increased both players’ clocks to 30 minutes on November 23rd.
With these changes, the design of the incentives for the players changed. The rewards for participation did not change, but the cost of participation in terms of time went dramatically up. This happened in two ways: first, requiring at least two matches (if not three) meant that the rounds were fundamentally longer than best-of-1 matches; second, quick decks such as aggro mono-ruby, would be less likely to win in a best-of-3 match as such decks are sensitive to reserves giving players more options to remove early troop drops. If quick decks drop out of the meta, then rounds become longer. This can lead us to two hypotheses about the structure of the changes.
Hypothesis 1: Players will play fewer constructed gauntlet games in best-of-three matches than best-of-one matches.
Players are sensitive to costs whether they are material or temporal. As such, requiring more time will mean some players will play less constructed gauntlet, players who do play gauntlet will be able to play fewer matches in a day, and others may find better expected value elsewhere.
Hypothesis 2: Players will play fewer constructed gauntlet games in 60-minute matches than 40-minute matches.
Likewise, adding up to 20 minutes to a round may deter players from participating in constructed gauntlet.
We can consider each of the hypotheses above to be sentences we want to discover the truth of. As such, we will dive into the data to see if either of these statements have some bearing on reality.
Since the Invitational Qualifier season started, Hex Entertainment has made available the lists of decks that earn at least 5 wins in a given day. As such, thanks to sites like HexMeta.com, we are able to figure out how many 5-x decks make it each day. This is not a perfect capture of the number of players that start and finish a constructed gauntlet each day, but it does give us a proxy for turnout as each 5-x deck require some other decks to facilitate their victory. As such, we can use this number as an estimate for player turnout. The data presently runs from October 6th until December 31st.
Our primary variables of interest are structural variables and do not vary; consequently, they will be simple binary variables (often called dummy variables as they contain very little information) that take a value of 1 when they are present and a value of zero when they are absent. Our first variable corresponds to our first hypothesis and measures if constructed gauntlet is best-of-three. Our second variable is also binary and tracks whether constructed gauntlet matches are 60 minutes long (a value of 1) or 40 minutes long (a value of zero). We expect both variables to negatively correlate with the dependent variable and to suppress overall activity.
Beyond our control variable, there are a ton of other variables that likely affect player turnout that we should control for so our estimations are not contaminated by mitigating factors. First, we likely expect that the price of set three boosters (Price) determine how much constructed gauntlet people are playing. As booster prices drop, players will have less incentive to play in constructed gauntlet. I gathered this data from hexprice.com which gives you the median value of boosters per day (at the bottom of the page there is a menu you can expand). For the sake of endogeneity, I lag this variable by one day to see if yesterday’s booster prices affect today’s turnout. If I used the data from the same day, there is a good chance that constructed gauntlet turnout would affect the price of boosters and that would undermine our estimation.
Second, we may expect a few different events to effect constructed gauntlet. If there is a special in-client event (Flashback drafts or VIP tournaments) people are likely to play those instead of constructed gauntlet (Event). Patch Days are also likely to suppress turnout as people cannot play the game. Holidays (Halloween, Thanksgiving, Christmas Eve, Christmas, New Years’ Eve, and New Years) are similarly likely to either decrease turnout (people are doing other things) or increase it (people have time to play Hex). These are all binary variables as well.
Third, there are two factors related to IQs that will affect turnout. First, Days Until IQ is a simple count of how many days it is until the next qualifier that resets on the day of the qualifier. This variable ranges from 1 to 15 and we expect it to be negative (the further out you are from an IQ, the less likely you are to grind for the IQ). Also, if the next IQ is constructed, people may have more incentive to play in constructed gauntlet to practice, so the binary variable of Next IQconstructed may positively affect turnout.
Finally, I include a cubic-polynomial for time. That is, I include Time, Time2, and Time3 that begins as a zero on October 6th and ticks up for every day that passes. This set of variables captures unobservable data that trend with time and can act as a proxy for things like player interest in set 3 waning. Generally, as a newness of an event wears out, we should expect player participation and interest to drop over time and the collective sum of the coefficients will be negative after accounting for the exponentiated variable values.
Table 1 contains the distributional characteristics of each variable.
To estimate the impact of the structural changes, I run several linear regressions with robust standard errors; this is allows me to find the conditional effect of each variable on the outcome variable of interest. On a very technical note, there is an argument to use a Poisson or negative binomial regression instead as both of those are better suited to count models. However, the distribution of the Poisson starts to become approximately normal (like our linear regression) with large means which we have here and the data are overdispersed which makes Poisson inefficient for estimation. Additionally, the negative binomial regression tends to be overly demanding in regards to sample size and having smaller sample sizes tends to lead to inefficient standard errors. Using the alternative models does affect the results for price and the time variables, but not in a way that affects how we interpret the evidence for our hypotheses, so I will leave the alternative specifications alone for now.
Table 2 presents the results of the series of regressions using alternative specifications of the model. The first number is the coefficient of the variable and the number in parentheses is the standard error.
Model 1 in Table 2 is a simple regression that includes our 2 independent variables of interest without controls to see if, by themselves, they correlate with the dependent variable. The results in model 1 are less important than the other two. Model 2 includes controls for the price of set 3 booster packs, special events in client, and the number of 5-x decks in the previous day to look at any type of day-to-day trending, while model 3 adds in the rest of our controls except for time. Model 4 includes the temporal controls while dropping the lagged dependent variable and model 5 is a reduced model that includes what correlates to see the change in the coefficients. Of note, in interpreting these numbers, only the results that have asterisks to them are worth thinking about. We can interpret a result without significance as being indistinguishable from zero. Beyond that, for results that have significance, you can interpret them in a linear way—a one unit increase in the independent variable causes the coefficient change in the dependent variable. For the binary variables, this simply means that if they are on (take a value of one), then they have the full effect of the coefficient on the dependent variable, all else being equal.
Across the board, two trends are apparent for our independent variables: Best-of-three matches suppressed turnout while 60-minute matches have no effect on the number of 5-x decks. It appears that the likely result is that we lost between 12 and 14 5-x decks per day by having best-of-three matches. This number is only lower when we include the lagged dependent variable which is a bit misleading as we are then looking at the daily change rate instead of the daily rate. The result of 60-minute matches, across the board, is indistinguishable from zero.
In terms of controls, we see some interesting effects. The price variable correlates with an approximate 3:1 ratio of price to 5-x decks, so for every 3 plat increase, we get another 5-x deck. However, this effect goes away when we control for the temporal polynomial and this variable losing significance is likely a result of price being largely a function of time. The proximity of an IQ event also encourages turnout. If an event is 15 days away, we get roughly 9-10 fewer players than when the IQ is tomorrow. Whether the next IQ is constructed does not appear to really affect the turnout for daily constructed.
The R2 metric tells us how much of the data is predicted by the variables within the model. For the research I normally do, my R2 values tend to very low as I predict rare events. However, here, it is pretty high where we have up to 93% of the variation of the dependent variable explained by the model. Consequently, despite having relatively limited variables we can use for the independent variable, they are quite sufficient at capturing the daily number of 5-x decks in Hex.
What we learn is pretty clear: The introduction of best-of-three matches decreased the number of decks making the 5-x list daily and suppressed the overall number of people entering the queues. Of course, this partially means we have fewer repeat entries of decks, but the overall ticket generation from constructed gauntlet dropped substantially. In effect, by giving players what they demanded on the forums and reddit, we earned 13 fewer IQ tickets per day as a result of this change from constructed gauntlet. My expectation is that many players shifted over to sealed gauntlet to make up for the temporal cost of grinding tickets at the cost of reward expected value, but I do not have the data to know for sure.
Of course, there are a few things we are unable to model in these estimations and we are still in the experiment presently. Ideally, I would be able to include a few more variables that could capture intangibles like player interest, meta changes, and similar shifts in the player base, but those things are difficult to accurately capture and our model appears to be performing well despite missing useful information. In regards to the full sample, I started initially collecting and testing the data back in late October and the results, over time, have been mostly stable. As such, when we are done with the IQs, I expect the results to be similar to what we have here. This, of course, could change if we see another shift in the format of the constructed gauntlets. There is plenty more to say in regards to much of this data and the results, but I will keep this discussion short as we cross three thousand words.
Going forward, for those players that are either designing tournament series or are calling for changes in existing ones, remember that every rule change or adoption affects player turnout and, for TOs, one of our primary goals is to get more people playing this game in a competitive (or fun) environment. Tournament Organizers need to be careful in selecting incentives and costs for participation; whenever an organization seems slow to respond to player feedback, it partly has to do with weighing the costs of changes against the demands of vocal players on public forums.