# Tournament Design and Player Participation

I generally consider myself a competitive player first and a community member second but, in Hex, those roles for me certainly have reversed greatly—and one of my biggest roles within the Hex community, in addition to all the other roles, is running and maintaining two major tournament series. In both the Shard Cup Tournament Series and the FiveShards Weekly Series, we have to be extremely careful about the rules we adopt, the schedules that we pick, the prizes we award, the entry fees we charge, and any other institutional rules that we establish as every decision has an effect on player behavior. This is true of anyone planning to organize a tournament within our community. The most important player behavior from a tournament design perspective, of course, is player turnout. We certainly care about other metrics such as player cheating, player fun, length of the tournament, player perception of the series, and others—but all of these feed into that primary indicator of whether players are playing in our tournaments.

As many people already know, my occupation involves me doing research into political phenomena and attempting to test hypotheses about different institutions and how those institutions affect the behavior of various actors within the realms of politics and economics (my curriculum vitae can give you an idea of the types of things I normally research/publish on). Thankfully, due to the data Hex Entertainment has made available over the past year, I can combine both my professional interests with my hobby inclinations to test some ideas about tournament design and outcomes. I have been collecting the data for this project over the last few months and have mentioned them to people and on the forums in a few places, but have not posted any systematic analysis publicly. I figured this was an opportune time to do so as it is the start of a new year as many people are thinking about how they can effect the Hex community and help it grow.

Of course, the FiveShards Shard Cup Series is far too young to do statistical analysis on (an observation count of 13 is insufficient to get reliable results and the tournament structures change from month to month) and the FiveShards Weekly Series has yet to start. However, the gauntlet tournament data has been available since the qualifying season had started back on October 6th, 2015 and—more importantly—the institutional rules governing these tournaments have changed twice over the course of the last three months. As such, our research interest of whether institutional rules changes affect player behavior is ripe for exploration.

A few caveats before we proceed. First, I have been planning on turning this into an academic paper in the future and so this is more of an initial discussion and examination of the data; naturally, the language and description will change in some ways, but much of the core analysis and setup is going to look the same. One big change is that I will have to spend several paragraphs explaining what Hex is and different tournament structures and options; these are all things that we as players are all mostly familiar with at this point. Second, this is preliminary. I am pretty confident that the results are consistent as I have been evaluating them periodically for the last few months, but I still want the full IQ experiment to run before I finish collecting and presenting the data. By January 17th, I will have roughly 100 observations, which is a good set of data to analyze and get some level of meaningful results. Generally, statistical analysis gets better with more data and I may be inclined to include some post-IQ numbers, but the analysis will get trickier if I choose to do so. Third, I will try to explain most of my decisions and techniques used in this article, but some of it might be a little bit more dense than it needs to be. If you have any questions about clarity, I will be happy to answer them below in the comments section. Fourth, the data are quasi-experimental, observational data which means that we are mostly looking at correlative behavior. Like most science, we are not proving things here, but are providing evidence for a particular hypothesis or providing evidence against it.

Over the next few sections I will discuss the theory and hypotheses we expect, the data and variables I use to test those hypothesis, the results of those tests, and, finally, any conclusions we can draw from it.

## Expectations

Starting on October 6th, 2015, Hex Entertainment began giving out qualifier tickets for participation in competitive tournament events. Players could earn a variable amount of tickets from competing in scheduled events, a single ticket from winning a competitive draft event, or a single ticket by participating in constructed or sealed gauntlet queues and winning five matches before losing three. Each of these events offered different advantages and disadvantages for participation, but the lure of the Invitational Qualifier tournament, and participating in a final tournament with a prize pool of $100,000 certainly galvanized the player base to begin playing in more tournaments. Constructed gauntlet queues were best-of-1 matches and gave each player a 20-minute clock. Constructed gauntlet offered a unique opportunity for quick matches for players that could take less time than a draft or a scheduled event. Drafts can take up to 3.5 hours, scheduled events can take 4-4.5 hours, and constructed gauntlet tournaments can take a player anywhere from 15 minutes (three quick losses), ~50 minutes for a win (five quick wins), or, in an unlikely worse-case scenario, 4.67 hours for 7 matches if each match went close to time. This does not take into account how long it takes to fill a draft tournament or gauntlet queue—but constructed gauntlet offered, initially, quick grinds for tickets. Scheduled events offer a higher ticket return (10 for a win), but are likely to be more variable and often can result in a 4-hour tournament that results in one or zero ticket(s).

After some high level player demand for constructed gauntlet queues to be more representative of traditional constructed matches, Hex Entertainment made two institutional changes that altered the structure and the incentives for players to participate. First, on November 10th, 2015, they made the matches best-of-three. The clocks still remained at 20 minutes for each player, but it allowed people to use 15-card reserves, which can add up to 4 minutes to each match and made it much more likely that the full clock would be used. In fact, vocal players on the forums were opposed to the 20-minute-clock remaining and demanded a traditional 30-minute clock to replace it so matches would be more likely to finish in a state-based winner instead of a clock-based winner. Hex Entertainment agreed and increased both players’ clocks to 30 minutes on November 23rd.

With these changes, the design of the incentives for the players changed. The rewards for participation did not change, but the cost of participation in terms of time went dramatically up. This happened in two ways: first, requiring at least two matches (if not three) meant that the rounds were fundamentally longer than best-of-1 matches; second, quick decks such as aggro mono-ruby, would be less likely to win in a best-of-3 match as such decks are sensitive to reserves giving players more options to remove early troop drops. If quick decks drop out of the meta, then rounds become longer. This can lead us to two hypotheses about the structure of the changes.

**Hypothesis 1**: Players will play fewer constructed gauntlet games in best-of-three matches than best-of-one matches.

Players are sensitive to costs whether they are material or temporal. As such, requiring more time will mean some players will play less constructed gauntlet, players who do play gauntlet will be able to play fewer matches in a day, and others may find better expected value elsewhere.

**Hypothesis 2**: Players will play fewer constructed gauntlet games in 60-minute matches than 40-minute matches.

Likewise, adding up to 20 minutes to a round may deter players from participating in constructed gauntlet.

We can consider each of the hypotheses above to be sentences we want to discover the truth of. As such, we will dive into the data to see if either of these statements have some bearing on reality.

## Data

Since the Invitational Qualifier season started, Hex Entertainment has made available the lists of decks that earn at least 5 wins in a given day. As such, thanks to sites like HexMeta.com, we are able to figure out how many 5-x decks make it each day. This is not a perfect capture of the number of players that start and finish a constructed gauntlet each day, but it does give us a proxy for turnout as each 5-x deck require some other decks to facilitate their victory. As such, we can use this number as an estimate for player turnout. The data presently runs from October 6th until December 31st.

### Independent Variables

Our primary variables of interest are structural variables and do not vary; consequently, they will be simple binary variables (often called dummy variables as they contain very little information) that take a value of 1 when they are present and a value of zero when they are absent. Our first variable corresponds to our first hypothesis and measures if constructed gauntlet is best-of-three. Our second variable is also binary and tracks whether constructed gauntlet matches are 60 minutes long (a value of 1) or 40 minutes long (a value of zero). We expect both variables to negatively correlate with the dependent variable and to suppress overall activity.

### Control Variables

Beyond our control variable, there are a ton of other variables that likely affect player turnout that we should control for so our estimations are not contaminated by mitigating factors. First, we likely expect that the price of set three boosters (Price) determine how much constructed gauntlet people are playing. As booster prices drop, players will have less incentive to play in constructed gauntlet. I gathered this data from hexprice.com which gives you the median value of boosters per day (at the bottom of the page there is a menu you can expand). For the sake of endogeneity, I lag this variable by one day to see if yesterday’s booster prices affect today’s turnout. If I used the data from the same day, there is a good chance that constructed gauntlet turnout would affect the price of boosters and that would undermine our estimation.

Second, we may expect a few different events to effect constructed gauntlet. If there is a special in-client event (Flashback drafts or VIP tournaments) people are likely to play those instead of constructed gauntlet (Event). Patch Days are also likely to suppress turnout as people cannot play the game. Holidays (Halloween, Thanksgiving, Christmas Eve, Christmas, New Years’ Eve, and New Years) are similarly likely to either decrease turnout (people are doing other things) or increase it (people have time to play Hex). These are all binary variables as well.

Third, there are two factors related to IQs that will affect turnout. First, Days Until IQ is a simple count of how many days it is until the next qualifier that resets on the day of the qualifier. This variable ranges from 1 to 15 and we expect it to be negative (the further out you are from an IQ, the less likely you are to grind for the IQ). Also, if the next IQ is constructed, people may have more incentive to play in constructed gauntlet to practice, so the binary variable of Next IQ_{constructed} may positively affect turnout.

Finally, I include a cubic-polynomial for time. That is, I include Time, Time^{2}, and Time^{3} that begins as a zero on October 6th and ticks up for every day that passes. This set of variables captures unobservable data that trend with time and can act as a proxy for things like player interest in set 3 waning. Generally, as a newness of an event wears out, we should expect player participation and interest to drop over time and the collective sum of the coefficients will be negative after accounting for the exponentiated variable values.

Table 1 contains the distributional characteristics of each variable.

### Results

To estimate the impact of the structural changes, I run several linear regressions with robust standard errors; this is allows me to find the conditional effect of each variable on the outcome variable of interest. On a very technical note, there is an argument to use a Poisson or negative binomial regression instead as both of those are better suited to count models. However, the distribution of the Poisson starts to become approximately normal (like our linear regression) with large means which we have here and the data are overdispersed which makes Poisson inefficient for estimation. Additionally, the negative binomial regression tends to be overly demanding in regards to sample size and having smaller sample sizes tends to lead to inefficient standard errors. Using the alternative models does affect the results for price and the time variables, but not in a way that affects how we interpret the evidence for our hypotheses, so I will leave the alternative specifications alone for now.

Table 2 presents the results of the series of regressions using alternative specifications of the model. The first number is the coefficient of the variable and the number in parentheses is the standard error.

Model 1 in Table 2 is a simple regression that includes our 2 independent variables of interest without controls to see if, by themselves, they correlate with the dependent variable. The results in model 1 are less important than the other two. Model 2 includes controls for the price of set 3 booster packs, special events in client, and the number of 5-x decks in the previous day to look at any type of day-to-day trending, while model 3 adds in the rest of our controls except for time. Model 4 includes the temporal controls while dropping the lagged dependent variable and model 5 is a reduced model that includes what correlates to see the change in the coefficients. Of note, in interpreting these numbers, only the results that have asterisks to them are worth thinking about. We can interpret a result without significance as being indistinguishable from zero. Beyond that, for results that have significance, you can interpret them in a linear way—a one unit increase in the independent variable causes the coefficient change in the dependent variable. For the binary variables, this simply means that if they are on (take a value of one), then they have the full effect of the coefficient on the dependent variable, all else being equal.

Across the board, two trends are apparent for our independent variables: Best-of-three matches suppressed turnout while 60-minute matches have no effect on the number of 5-x decks. It appears that the likely result is that we lost between 12 and 14 5-x decks per day by having best-of-three matches. This number is only lower when we include the lagged dependent variable which is a bit misleading as we are then looking at the daily change rate instead of the daily rate. The result of 60-minute matches, across the board, is indistinguishable from zero.

In terms of controls, we see some interesting effects. The price variable correlates with an approximate 3:1 ratio of price to 5-x decks, so for every 3 plat increase, we get another 5-x deck. However, this effect goes away when we control for the temporal polynomial and this variable losing significance is likely a result of price being largely a function of time. The proximity of an IQ event also encourages turnout. If an event is 15 days away, we get roughly 9-10 fewer players than when the IQ is tomorrow. Whether the next IQ is constructed does not appear to really affect the turnout for daily constructed.

The R^{2} metric tells us how much of the data is predicted by the variables within the model. For the research I normally do, my R^{2} values tend to very low as I predict rare events. However, here, it is pretty high where we have up to 93% of the variation of the dependent variable explained by the model. Consequently, despite having relatively limited variables we can use for the independent variable, they are quite sufficient at capturing the daily number of 5-x decks in Hex.

### Conclusions

What we learn is pretty clear: The introduction of best-of-three matches decreased the number of decks making the 5-x list daily and suppressed the overall number of people entering the queues. Of course, this partially means we have fewer repeat entries of decks, but the overall ticket generation from constructed gauntlet dropped substantially. In effect, by giving players what they demanded on the forums and reddit, we earned 13 fewer IQ tickets per day as a result of this change from constructed gauntlet. My expectation is that many players shifted over to sealed gauntlet to make up for the temporal cost of grinding tickets at the cost of reward expected value, but I do not have the data to know for sure.

Of course, there are a few things we are unable to model in these estimations and we are still in the experiment presently. Ideally, I would be able to include a few more variables that could capture intangibles like player interest, meta changes, and similar shifts in the player base, but those things are difficult to accurately capture and our model appears to be performing well despite missing useful information. In regards to the full sample, I started initially collecting and testing the data back in late October and the results, over time, have been mostly stable. As such, when we are done with the IQs, I expect the results to be similar to what we have here. This, of course, could change if we see another shift in the format of the constructed gauntlets. There is plenty more to say in regards to much of this data and the results, but I will keep this discussion short as we cross three thousand words.

Going forward, for those players that are either designing tournament series or are calling for changes in existing ones, remember that every rule change or adoption affects player turnout and, for TOs, one of our primary goals is to get more people playing this game in a competitive (or fun) environment. Tournament Organizers need to be careful in selecting incentives and costs for participation; whenever an organization seems slow to respond to player feedback, it partly has to do with weighing the costs of changes against the demands of vocal players on public forums.

If Hex added a coin-flpipping queue that paid out tickets, it’d fill up fast and be ridiculously popular. But it wouldn’t mean Hex suddenly players liked coin-flipping or adding coin-flipping was a good change. Just that gamers are notorious min/maxers that are looking to get the cheese at the end the maze ASAP. The real test is when the tickets go away and players are playing for fun rather than grinding tickets.

Thx a lot for this post. I m currently finishing my bachelors degree in Business Informatics and still have to revisit lots of fundamental material like regression and game theory in microeconomics. Your work allows me to combine my work with my hobbies just as it does for you, so I m very grateful for this content. To be honest, I d like to be a little greedy and ask if you could make any of your calculation available to me. Retracing the background of this article is equivalent to repeating 70% of my statistics module ^^. And depending on how outgoing you are with this data, I would also be interested in presenting parts of it on my stream that started this year on twitch. I m spending a lot of time talking about general TCG-theory and depending on the feedback would love to go into more detail on the empiric analysis that happens in the background. Obviously I would respect your decision if you would be willing to share you work with me for personal use, but not allow me to capitalize on it in terms of content creation. Anyways, cheers for contributions.

Hey Narya,

If you email me (my email address is in my cv) your gmail address, I can share the data with you (it is on google drive) and I can also send you the code. The code is pretty simple, but it is in stata, so I am not sure how useful it will be for you to use, but it is mostly just simple regression commands, so it should not be too difficult to translate.

You are welcome to play with it on stream; I am cool with that. I come from a discipline that strongly encourages sharing data and I am happy to do so for stream-based content.

Thx a lot! Will mail you within the following week hopefully. Got distracted and only now remembered to follow up on this.

Very interesting analysis, I just have one remark: Your results suggests that due to the introduction of bo3 less people are entering the ques. While it is true, that there less entries, it doesn’t necessarily mean that there are less people playing the constructed gauntlet, since the time it takes to complete one also increased. So while your analysis regarding IQ-tickets/plat-/pack- generation is inteteresing, it wouldn’t be the only measure i would choose to measure the success of a format, but rather the amount of time it is being played. So if I would do the regression, I would also be looking at the amount of games being played as a response variable (so e.g. use a factor between 2 and 3 for your responses after November 10th). I would be very interested to see if the introduction of bo3 was actually beneficial for the time being spent playing constructed gauntlet.

Just a little example in case my post is confusing: Player A plays constructed gauntlet for two hours a day, regardless if bo3 or bo1. So pre change he contributed one completed gauntlet per day, while after he only completes one every other day. According to your model one might assume that players like A enjoy bo3 less, while in fact they might be indifferent to it, and just be bound by time.

Hey ValueCity, that’s a good point, I agree that is an issue, and I do mention it briefly right after the first hypothesis when i say “players who do play gauntlet will be able to play fewer matches in a day.” Naturally, this is an important issue, though the decline is pretty staggering. This could be causing 12 fewer 5-x decks if we were going from say 100 5-x decks to 88 (as the increase in time would be captured by those 88 remaining 5-x decks), but the numbers we are talking about are much small. The highest we ever had was within the first few weeks when we had 56 decks; however, in the week before the change we averaged about 23 5-x decks a day and the week after, excluding patch day, we averaged 9.7. So, the time to complete certainly hampers solid grinders, but I don’t think that explains all of the changes.

Thanks!