Copilot Studio Feedback Root-Cause Analysis
Why users are unhappy
Every thumbs-down, sorted into one of nine root causes. Telemetry-backed causes are decided by hard signal; the rest are inferred from the transcript and the user's comment.
Connector health
Failure rate per data connector across every diagnosed conversation. This is where the access and availability failures originate.
Weekly trend
Thumbs-down volume per ISO week, with the leading cause that week.
By agent
Which deployed agents draw the most negative feedback, and their top cause.
Diagnoses, with evidence
A sample of real diagnoses. Each one names a primary cause, a confidence level, and the concrete evidence the engine used. Filter to explore.
What the evaluation does and does not prove
The honest part. This demo grades itself against synthetic ground truth using a deterministic stand-in for the LLM. That makes some of the score real and some of it circular, and the chart below separates the two.
Genuinely validated
The three telemetry-backed causes are decided by heuristics that read real schema fields: status codes, permission-denial records, empty retrieval sets, never the ground-truth labels. Disable the LLM entirely and these three still hit perfect recall. This path, plus all the ingest, triangulation and aggregation plumbing, is exercised end to end and is real.
Not independently validated
The other six classes are decided by the mock LLM, which is pattern-matched to the same structural anchors the generator injects. Their recall measures generator-vs-mock agreement, not classifier accuracy. Stub the LLM out and all six fall to zero, dragging macro recall from to . Proving real accuracy on those six needs hand-labeled transcripts, not this loop.
Confusion matrix
Rows are the true cause; columns are what the engine predicted. A clean diagonal is the goal.
How it works
A four-stage pipeline turns raw feedback into a triaged diagnosis. The analysis core depends only on typed schemas, so the synthetic source swaps for Dataverse, Microsoft Graph and Azure Application Insights with no change to the logic.