Every tennis AI app puts an accuracy number on its marketing site. Most of them say something like "95% accurate ball tracking" or "90% shot detection." Almost none of them say what they're measuring. This post explains what those numbers actually mean, what's behind them, what to ask, and why the same model can claim 95% or 70% on the same video depending on which metric you choose.

I'm the founder of AceSense, and we publish our accuracy methodology on /accuracy. This post is the layperson version of that page, written for rec players, coaches, and anyone deciding whether to trust a marketing number on a tennis app's homepage.

TL;DR

"Accuracy" is not one number. It's at least three: per-frame detection rate, event-level precision, event-level recall.
A model can be 95% on one and 70% on another. Both are technically true.
The right composite metric for tennis is F1 score, which combines precision and recall. F1 is what you should ask for.
Tennis ball tracking is genuinely harder than most sports, small ball, fast motion, occlusion, low frame rates on phones.
AceSense publishes per-shot-type F1, bounce-detection accuracy with a tolerance window, and the test set composition. We are aware of no other tennis vendor that does.

Why the question matters

The Google "people also ask" box for tennis ball tracking app surfaces the question How accurate is tennis ball tracking? on multiple SERPs (source). Reddit's r/10s carries the same question on a recurring cycle (thread). The honest answer is more nuanced than "X percent", and the honest version starts with vocabulary.

The three numbers people confuse

Imagine your phone video has 5,400 frames (3 minutes at 30 fps). The ball is visible in roughly 4,200 of them, the rest are between points, the ball is off-camera, or it's hidden by the net. Of the 4,200 visible frames, your tennis app's detector outputs a ball position for some subset.

Per-frame detection rate

The simplest number. "We located the ball in X% of frames where the ball was visible."

This is what most "ball tracking accuracy" claims actually measure. It's also the easiest to game, you can crank up the model's sensitivity, output a ball position even when you're not sure, and your detection rate goes up. The cost: you also detect a lot of non-ball things (line markings, ball-bag specks, the opponent's shoe).

A 90% per-frame detection rate sounds great. It says nothing about whether your detected positions are correct.

Precision

"Of the ball detections we made, X% were actually correct."

If you crank sensitivity up and the detector starts firing on every white pixel, your precision drops. A 60% precision rate means 4 out of every 10 detections are wrong, and downstream of that, your shot count, your heatmap, your stroke-quality scores all inherit the noise.

Recall

"Of the real ball positions, we detected X%."

Recall is the inverse. A model can have 99% precision (almost everything it outputs is correct) and 50% recall (it only outputs half the ball positions because it's being conservative).

F1

The harmonic mean of precision and recall. F1 = 2 × (P × R) / (P + R). Penalises being lopsided. A model with 95% precision and 50% recall has F1 = 65.5%. A model with 85% precision and 85% recall has F1 = 85%. The second model is meaningfully better for tennis.

When an app says "90% accurate" without qualifying it, it usually means one of detection rate, precision, or recall, and not the worst one.

Why tennis is harder than other sports

People sometimes ask why football tracking systems hit 99% and tennis vendors hover around 85–95%. Five reasons:

The ball is small. A tennis ball is 6.5 cm in diameter. At baseline-to-baseline distance on a phone camera, it's often 4–8 pixels across. A football is ~22 cm. There's an order of magnitude more pixel signal to work with.
The ball moves fast. A 100 mph serve covers ~14 meters per second. At 30 fps phone video, that's roughly 47 cm per frame, multiple ball-diameters of motion blur. The ball isn't a circle anymore; it's a streak.
Direction changes are abrupt. Bounces, racket impacts, and net clips reverse the ball's velocity in a single frame. Most generic object trackers assume smooth motion; tennis breaks that assumption violently.
Occlusion is constant. The net, the racket, the player's body, the opponent, all of these block the ball at high-leverage moments (impact, bounce). Your shot-quality model needs to know what happened during the occluded frames.
Phone video is messy. 30 fps. Variable lighting. Sometimes shaky. Often not the same camera angle from session to session. Real-world tennis video is harder than the curated datasets most vision models are trained on.

This is why purpose-built models like TrackNet (the architecture AceSense and several other vendors use) outperform off-the-shelf object detectors on tennis. TrackNet was designed for small, fast, occluded balls; YOLO and friends were designed for cars and people.

What "90%" can mean across vendors

Imagine three apps all market "90% ball tracking accuracy." Plausible meanings:

App A: 90% per-frame detection rate. Precision = 75%, recall = 70%. F1 ≈ 72%. Headline number is technically true; the underlying experience is noisy.
App B: 90% precision on shot events. Recall = 65%. F1 ≈ 75%. Misses a third of your shots, but the ones it reports are mostly real.
App C: 90% F1 on shot events. Precision and recall both around 90%. The honest version. Roughly what AceSense reports for its top-tier shot types, and the number we publish is the F1, not the easier number.

Three identical marketing claims, three very different products. The only way to tell them apart is to look at the methodology page or run them on your own video.

What "current build" means

A small but important caveat: any accuracy number you read about a tennis AI app is a snapshot of that build at that time. The underlying models get retrained, datasets get expanded, and numbers shift. AceSense's /accuracy page is dated and versioned for exactly this reason, when we ship a model update that changes the F1 by more than a percentage point, we update the page.

That means the direction of an accuracy claim matters more than the precise number. A vendor that publishes a methodology and updates it is committing to a process. A vendor that publishes a static "95% accurate" badge is committing to a marketing line.

How AceSense measures accuracy

The methodology, in plain English:

Test set. A library of tennis match videos, hand-annotated frame by frame for ball position, shot events (forehand/backhand/serve/volley), and bounce locations. Surface mix: hard, clay, indoor. Level mix: NTRP 3.0–5.0. Resolution mix: 720p, 1080p, 4K.
Per-shot-type metrics. Precision, recall, and F1 are computed for each shot type separately. We don't roll them up into one number, because the model is generally better on serves than on volleys, and a single number hides that.
Bounce detection with tolerance. Bounces are evaluated within a 5-frame tolerance window (≈167 ms). A bounce detected within 5 frames of the ground-truth bounce counts as a true positive. This matches how rec players actually use the data, nobody cares if the bounce was at frame 247 vs frame 251.
Regression testing. Every model update runs against the same test set. If a number drops, we either find the regression or update the published number. The script is compare_events.py and the output is what feeds the /accuracy page.

The Tennisnerd review of SwingVision (source) is one of the few independent looks at any tennis-AI vendor's accuracy, and even there the methodology is anecdotal. App Store complaints about clay courts and serve speeds (source) are user-reported failures, not a structured benchmark. We chose to publish a structured benchmark because the alternative is asking you to take the marketing copy on faith.

What to ask a tennis-AI vendor

Five questions that will sort honest from hand-wavy:

Is your accuracy number F1, precision, recall, or detection rate?
What test set was the number computed on?
How does accuracy change between hard, clay, and indoor courts?
Does the number include occluded frames or just clean ones?
When was the number last updated?

A vendor that answers all five gets your trust. A vendor that hand-waves on three of them, be skeptical.

What's "good enough" for amateurs

For NTRP 3.0–4.5 players using video AI for self-coaching:

Per-shot F1 above 85% is broadly usable. Heatmaps look right. Shot counts feel correct.
Bounce detection above 90% within 5 frames is the threshold where the heatmap stops looking suspicious.
Sub-80% F1 starts to be visibly wrong, you'll see a forehand counted as a backhand on review, and the trust collapses.

The marginal gain from 90% to 95% F1 matters less than you'd think for self-coaching. The gap that matters is honest 85% vs hand-wavy 95%. App Store reviewers can tell within two matches when the marketing number was generous (clay-court complaint example).

FAQ

What does "90% ball tracking accuracy" actually mean? Almost always one of: per-frame detection rate, shot-event precision, or shot-event recall. F1 is the composite to ask for.

Is tennis ball tracking really hard? Yes, small ball, fast motion, abrupt direction changes, frequent occlusion, low phone frame rates. Purpose-built models (TrackNet) outperform generic detectors.

How accurate is AceSense? Per-shot-type F1 and bounce-detection accuracy with a 5-frame tolerance, computed against a published test set. Numbers and methodology on /accuracy.

Why don't other tennis apps publish their accuracy? Publishing commits a vendor to a number. Most prefer marketing language. We made the opposite call.

Is 85% F1 good enough for me? For self-coaching at NTRP 3.0–4.5, yes. The takeaways are stable above 85%. The bigger risk is being misled by a generous marketing number.

See AceSense's published accuracy methodology at /accuracy. Start free · How AceSense works · Why tennis AI gets confused on clay · AceSense vs SwingVision

Tennis ball tracking accuracy: what 90% actually means