AceSense Accuracy: Published Benchmarks and Methodology
How we measure shot-detection F1, ball-speed error, and bounce-classification accuracy. With reproducible regression data. Updated per release.
Most AI tennis apps describe accuracy with adjectives. We describe it with numbers.
This page is the whole methodology: the dataset we measure on, the F1 / precision / recall the current build hits, the speed and bounce-localisation errors, and — critically — the failure modes where the pipeline still gets it wrong. We update this page every release, and the changelog records when each number moved.
If you've read complaints about AI tennis app accuracy on the r/10s thread "How accurate is Swingvision?" or "Is this swing vision MPH accurate, my hardest serve only 66 mph?", this page is what those threads are missing — a competitor with their numbers in public.
TL;DR — current build
Numbers below are from the 2026-04-25 release measured against AceSense's internal held-out test set. Detail and methodology in the rest of the page.
| Metric | Current build |
|---|---|
| Shot detection F1 (forehand) | ~ 0.92 |
| Shot detection F1 (backhand) | ~ 0.91 |
| Shot detection F1 (serve) | ~ 0.88 |
| Shot detection F1 (volley) | ~ 0.78 |
| Shot detection F1 (slice) | ~ 0.83 |
| Bounce localisation (median error, hard) | ~ 22 cm |
| Bounce localisation (median error, clay) | ~ 38 cm |
| Ball-speed error vs handheld radar (median) | ~ 6.2 km/h |
| Court keypoint detection (hard, full-frame) | > 99% |
| Court keypoint detection (clay) | ~ 97% |
| Court keypoint detection (indoor) | ~ 96% |
| Doubles per-player attribution | beta — see below |
These are not "lab numbers" stated as ground truth — they're current-build measurements against a representative test set, and they move each release. Read the rest of the page for what the test set is and where the numbers come from.
How we built the test set
A benchmark is only as honest as the data it sits on. Here's what's in ours.
Sample size. ~150 hours of human-labelled tennis video, of which 35 hours are held out as the regression test set. Roughly 50,000 individually labelled frames within those 35 hours.
Court distribution.
- Hard court (acrylic / DecoTurf / Plexicushion-style): 55%
- Red clay (EU): 25%
- Indoor hard: 15%
- Other (grass, har-tru green clay, carpet): 5%
Player level distribution. All matches are amateur, NTRP 3.0-4.5 equivalent. We deliberately do not include pro-tour footage in the test set, because the model isn't being marketed to pro-tour players — including pro footage would inflate the numbers in a way that doesn't help amateur users predict their experience.
Camera-position distribution. We collect from beta-player submissions and internal recordings. Roughly:
- Fence-clip mount (5-10 ft height, behind baseline): 70%
- Tripod (5-7 ft height, behind baseline): 20%
- Side-fence mount: 7%
- Below-recommended (sub-5 ft, hand-held): 3%
The 3% sub-recommended is deliberate — we want the model to fail gracefully on bad inputs, not catastrophically.
Labelling protocol. Each frame has labels for: ball position (x, y, in-frame), bounce-or-not (binary), shot-or-not (binary, +1 frame for moment of contact), shot type (forehand/backhand/serve/volley/slice), striking player. Labels are produced by trained annotators using acesense-annotate — a desktop tool we built specifically for this — and double-labelled with reconciliation on disagreements.
Shot detection accuracy by shot type
The headline number. F1 = 2 × (precision × recall) / (precision + recall). Higher = the model both finds shots that exist and doesn't hallucinate shots that don't.
| Shot | F1 | Precision | Recall | Notes |
|---|---|---|---|---|
| Forehand | ~ 0.92 | ~ 0.93 | ~ 0.91 | Most common shot, biggest training set, highest confidence |
| Backhand | ~ 0.91 | ~ 0.92 | ~ 0.90 | Slight one-handed/two-handed asymmetry; two-handed is slightly better detected |
| Serve | ~ 0.88 | ~ 0.91 | ~ 0.85 | Recall hurt by occasional missed first-serve detections in tight first-server-camera angles |
| Volley | ~ 0.78 | ~ 0.81 | ~ 0.76 | Hardest stroke. Net occlusion, fast contact, less pose feature signal |
| Slice | ~ 0.83 | ~ 0.85 | ~ 0.82 | Sometimes confused with backhand (continental grip, similar contact angle) |
What this means in practice. On a 90-shot match, you can expect roughly:
- 80-85 of 90 shots correctly detected and classified
- 3-5 misclassifications (most often slice ↔ backhand, or volley ↔ groundstroke at the service line)
- 0-2 missed shots entirely (most often a quick swing-volley)
These rates are tight enough that the headline coaching insights (shot mix, stroke quality trends, top-three-things-to-work-on) are robust. They're not tight enough to use AceSense as a chair-umpire-grade scoring tool; we're explicit about that in the comparison page.
Ball-speed error vs handheld radar
We compared AceSense's serve-speed estimates to a Pocket Radar Smart Coach on 200 first serves across hard and clay. Pocket Radar is the same handheld radar coaches use as a reference at the club level; it's not Hawk-Eye-grade but it's the right reference for amateur use.
Median error (current build): ~ 6.2 km/h. 90th-percentile error: ~ 12 km/h. Bias: AceSense slightly under-reads vs radar by ~ 2 km/h on average. We're investigating why; suspected cause is camera-distance estimation in the homography step.
For context, Pocket Radar's published accuracy is ±1 mph (~1.6 km/h) at 100 mph. So the ground truth itself has noise; subtracting that, AceSense's intrinsic error is ~5 km/h median.
Honest caveat: the r/10s threads about SwingVision over-reading and under-reading speed are talking about a completely different problem from ours. SwingVision's serve-speed numbers are computed differently (we believe — they don't publish methodology), and we've seen field reports of 130 mph and 66 mph readings on the same player. AceSense's variance on the same server across 20 serves is typically under 8 km/h, and that's measurable on the examples page.
Court detection accuracy
Court keypoint detection is the foundation of every other downstream step — if this fails, the whole pipeline degrades. We measure it as: percentage of frames where all six keypoints are detected within ±15 pixels of human-labelled ground truth.
| Surface | Current build | Notes |
|---|---|---|
| Hard court (full-frame) | > 99% | Strong baseline; the model has seen ~100 hours of hard court |
| Clay (EU red) | ~ 97% | Slight degradation when court has been heavily kicked up (lines partly obscured) |
| Indoor hard | ~ 96% | Lighting variability + reflective floor coatings cause occasional keypoint drift |
| Grass | not separately reported | Limited test data; works in practice but no published number until we have ≥10 hours of labelled grass |
If court detection fails, the pipeline still produces a stroke-quality report (which is camera-relative, not court-relative) but skips the heatmap and bounce-localisation. This degradation path is intentional — we'd rather ship a partial report than fabricate a court geometry.
Doubles support — current state
Doubles is in beta as of the 2026-02-10 release. We document this on the comparison pages, the FAQ, and during in-app analysis if a doubles match is uploaded.
What works well in beta:
- Court detection
- Ball tracking
- Shot detection (binary: was a shot hit, yes/no)
- Aggregate shot-mix and heatmap
What is unreliable in beta:
- Per-player attribution on net exchanges (Player A or Player B at net? mis-attribution rate ~ 15% in current build)
- Stroke-quality scoring on net players when the partner is in the same frame
We do not recommend Pro-tier subscribers buy AceSense primarily for doubles use yet. The changelog tracks doubles improvements.
Where the model still fails
This list is curated and short on purpose. If we kept everything that's "imperfect", you'd stop reading. These are the failure modes that matter for buying decisions.
1. Hand-held filming. The pipeline assumes a stationary camera. Hand-held footage causes the homography to drift between frames and the ball-tracking confidence intervals to widen. We warn at upload.
2. Sub-30fps input. Frame interpolation can't recover the ball trajectory through a fast serve at 24fps or 25fps. We warn at upload and offer to process anyway with reduced confidence.
3. Doubles net exchanges. Per-player attribution is unreliable. See above.
4. Heavy clay dust. When a single rally has 10+ ground strokes on red clay and the dust hasn't settled, ball detection through bounces is harder. Bounce localisation median error climbs from ~38 cm to ~60 cm in those rallies specifically.
5. Wide-angle and fish-eye phone lenses. Some phones (the iPhone 13/14/15 ultrawide, certain Samsung wide-angle modes) introduce barrel distortion that the homography step doesn't fully correct. Use the main (1×) lens, not ultrawide.
6. Junior courts (78 ft). Treated as full-size court; bounce coordinates skewed. Junior support on roadmap for 2026 H2.
7. Carpet courts. Out of scope. Insufficient training data, and the category is shrinking globally.
How to reproduce these numbers
We ship the measurement script as part of the GPU backend repo:
python scripts/compare_events.py \
games/tennis/data/<your_match.annotations.json> \
<output_directory> \
--tolerance 5
The script compares AceSense pipeline output against a human-labelled annotations file (generated in acesense-annotate, our desktop labelling tool) and outputs per-event precision, recall, and F1.
What is not yet public (working on it):
- The 35-hour held-out test set videos and annotations. Most footage is from beta players who consented to internal use only. We're building a smaller public benchmark subset (~5 hours) for 2026 H2 with explicit redistribution licences from the players.
What is public:
- The script.
- The methodology described on this page.
- Per-release deltas in the changelog.
Update history
- 2026-04-25 — Re-measured on expanded clay test set (+8 hours). Clay bounce-localisation median improved from ~45 cm to ~38 cm (the 2025-12-15 clay improvements shipped to GA).
- 2026-02-10 — First doubles-beta numbers added.
- 2025-12-15 — Court detection on clay improved from ~93% to ~97%.
- 2025-09-04 — Stroke quality v2 launched. Forehand and backhand F1 climbed by ~0.04 each.
- 2025-06-10 — First public accuracy numbers, Android launch.
Why we publish all of this
Three reasons:
- It's the only honest way to compete with SwingVision. They have a five-year head start and a much bigger team. They don't publish accuracy. We do, and that's the wedge.
- It builds real trust with players who got burned. The r/10s threads about over-reading serve speed are a four-year-running complaint. We can't fix the SwingVision experience but we can prove our numbers are different.
- It keeps us honest internally. The accuracy page is the regression test on the marketing site. If a release doesn't move numbers in the right direction, we don't ship it.
See also: How AceSense works · Examples gallery · Compared to SwingVision · Changelog · Pricing
Frequently asked questions
- Why publish your accuracy numbers when no competitor does?
- Because every other AI tennis app describes accuracy with adjectives — 'best-in-class', 'highly accurate', 'professional grade'. None of those mean anything. If a number can be measured we publish it, including the failure modes. The accuracy page is the foundation for every other claim on the site.
- Are these the numbers your customers see?
- Approximately. The numbers on this page come from our internal held-out test set; your courts, your cameras, and your style of play will move them up or down. The test set is built to be representative of EU amateur tennis (NTRP 3.0-4.5, hard/clay/indoor) — if that matches your situation, the numbers are a reasonable forecast.
- Can I reproduce these numbers?
- Partially. The regression script (`python scripts/compare_events.py`) is in the gpu-backend repo. The test-set videos and annotations are not yet public for licensing reasons (most footage is from beta players who consented to internal use only). We're working on a public benchmark subset for 2026 H2.