The way I test mine for accuracy is to compare its results vs those of Vegas. Vegas (closing lines) is pretty smart (smart but still beatable) and is always what any simulator should be first tested against imo. Some in/out of sample testing can protect against back-fitting to a large degree.

Speaking from experience, there is a TON of stuff that goes into a good baseball simulator.

]]>In practice, the approaches are different though. One typical Markov chain approach is to formulate the problem mathematically and then use proofs to determine things about it. Unfortunately, most problems don’t lend themselves to closed form solutions when you do this. You typically can’t do this with a simulation, since simulations usually don’t represent their transition probabilities in a way that you can evaluate them without actually running them (this a variant of the halting problem).

Then you have the more CS approach, where you’d estimate Markov states and transitions from data. This sort of approach IS useful and in theory one could represent any computer simulation using an equivalent Markov chain. With that said, given practical constraints this sort of approach won’t excel at the same things a simulation would. Using a train/test approach, you can get a decent idea of states and state transitions- given the parameterizations of the states you set up (i.e. what they consider). You can estimate these from data, then use the derived weights to simulate or apply math proofs.

A Markov model with the standard form of states/transitions should really be considered one form of representation of a Markov state generator. A typical computer simulation is, in fact, simply another way of representing the same sort of problem. With that said, Markov models tend to require a lot of explicit definition of states- which could quickly get out of hand for a rich model. You could do the same thing a lot more sparsely using a simulation approach.

With that said, I think that both simulation and Markov models are useful tools for looking at baseball. If I had all the time in the world, I’d definitely try to spend some time modeling this sort of stuff. Unfortunately, given that it doesn’t provide a lot of money or social utility (i.e. benefit to the world), I just try to comment on it and hope somebody else has more free time than me :)

]]>I personally don’t think that naive algorithms are the way to go with baseball, but one could employ some basic rule and physical constraints and then use data to train into them, without introducing much (if any) assumptions. A constrained Hidden Markov Model is a simple example of this sort of thing.

]]>