The true goal is the health and productivity of our projects. In more simple and measurable terms, the goal is to maximize contributions. The number of edits or size of edits are poor measures, there may be different work patterns between the two editors. So as a practical matter, the metric we want is user retention and sustained contributions. Sustained contributions are particularly significant, as an edit by a knowledgeable experienced user is far more reliable and valuable than edits by newbies. We want to look at new user retention over as long a time span as practicable. If you want to expand on that, you could count the number of days the user has been active. That would avoid any small-scale differences in editing style between the two editing environments.
Metrics such as edit-completion-rate may be more convenient to measure, and may be useful for catching certain glaring issues, however past studies on VE have demonstrated that there are complexities with defining and interpreting that metric. If edit completion rates were to point in the opposite direction as user-retention&contributions, then obviously we disregard the irrelevant completion rate. For example it's a not-uncommon part of the wikitext workflow to open additional throwaway edit-sessions just to view or copy wikitext from a page. Closing that session without saving does not indicate any sort of failure. The original VE research project explicitly excluded any session where the user closed the editor without any content change. Failing to account for that issue will result in invalid low figures for wikitext success rate.
Another metric you want to look at is whether there is any preferential direction in users switching away from one editor and into the other. Assuming retention and contributions are roughly equal between the editors, obviously we should not be forcing new users to switch away from a bad initial default.
And finally, I'd like to note that the test scenarios for positive test results lays out "a proposal to make the VisualEditor the default mobile editing interface on all wikis", but if the test results are negative it instead directs analysis to figure out why you didn't get the desired results. Can we please get that changed? If the research finds that a VE-default is actively harmful to new users, that obviously warrants an equal-and-opposite proposal to make the Wikitext the default mobile editing interface on all wikis.