One of the most misunderstood things about stats is the assumptions going into a model. Like you said, there is hopefully enough data to make a model valid, but that doesn’t mean that those percentages apply equally to any one specific situation. It applies to all situations that went into the data building the model. In other words, each situation has an inherent bias vs all other data and its contingent probability (based on those specific circumstances) may be different than that of the rest of the data. For example, those percentages of success also include situations where your team has been ramming it down their throat in the red zone repeatedly, etc. All this is to say that game flow does matter, because your situation may be one of those areas that ‘stresses’ a model more than others.
Finally, I get to put my Masters in Predictive Analytics to work lol.