Grades as a fragile proxy
Grades historically functioned as a compact between institutions and employers: a promise that certain standards were met under comparable conditions. AI does not invalidate that compact, but it changes how easily it can be misunderstood or gamed.
Credential as a social contract
A degree classification does not merely summarise performance; it signals that an institution has applied a stable set of expectations and quality controls. This has worked tolerably well when tasks, tools, and conditions were broadly legible to outsiders. In many sectors, recruitment has relied on that legibility because it reduces transaction costs at scale.
Yet even before generative AI, grades were an imperfect proxy: they correlate with opportunity as much as with capability, and they often compress a complex story into a single number. The shift now is that the distance between “submitted work” and “personal capability” can widen materially without obvious traces.
Assumptions no longer holding
Two assumptions are becoming less dependable. First, that a good outcome is evidence of the same underlying process as in prior cohorts. Second, that comparable grades imply comparable effort and decision-making. In regulated domains, employers already triangulate grades with aptitude tests and structured interviews. In creative and analytical roles, portfolios and practical tasks have been gaining ground. AI accelerates both trends, because it amplifies variance in how much the tool is used and how well it is used.
Importantly, this is not only a threat. If grades are a blunt instrument, AI-era redesign could make assessment more faithful to real performance.
When work can be generated
The central complication is not that AI can produce outputs, but that it can obscure authorship, inflate fluency, and standardise style. That changes the economics of trust for employers and the governance burden for institutions.
From plagiarism to provenance
Many institutional responses still frame AI as an integrity problem, adjacent to plagiarism. That lens is understandable, yet incomplete. The deeper issue is provenance: the ability to show how an outcome was produced, with what tools, under what constraints, and with what human judgement. In professional settings, AI-assisted work is rapidly becoming normal, so the question becomes whether a candidate can demonstrate responsible use rather than absence of use.
Business examples of shifting signals
Consider recruitment in software and data roles. Git repositories, code reviews, and timed technical screens have been used for years because they reveal working practices. Generative coding assistants make it easier to produce plausible code, while also increasing the premium on debugging, test design, and explaining trade-offs. Similarly, in marketing and communications, AI can draft competent copy quickly, yet brand stewardship still hinges on audience insight, risk calibration, and ethical judgement in sensitive contexts.
In financial services, risk and compliance functions increasingly ask for evidence of reasoning, not only conclusions. A polished report matters, but so does the audit trail of how conclusions were reached. AI pushes recruitment towards that same logic: less emphasis on surface polish, more emphasis on traceable thinking.
The new asymmetry
AI creates an asymmetry: it is easier to manufacture an impressive artefact than it is to verify the capability behind it. When verification becomes expensive, employers either narrow recruitment to familiar brands or introduce their own assessments. Neither outcome is ideal for social mobility, nor for institutions seeking to demonstrate distinctive value.
Science of Adult Education
Traditional educational paradigms often struggle to adapt to rapidly advancing technologies. At the intersection of learning science and AI, this module stands as a beacon for educators seeking to navigate and flourish in this evolving environment....
Learn more
Signals that resist automation
In an AI-rich environment, the most trusted signals tend to be those that are observable in context, difficult to fake repeatedly, and informative about judgement. Several candidate signals are emerging, each with design and equity implications.
Performance under constraints
Work-sample tests, assessment centres, and probationary projects are likely to grow, because they move evaluation closer to real work. The question is what constraints are applied. A “closed book” task may test recall, while a “tool-enabled” task tests the ability to frame problems, check outputs, and act responsibly. In engineering and product roles, scenario-based exercises can reveal whether a candidate spots failure modes, handles incomplete information, and makes decisions that stand up to scrutiny.
Structured evidence of judgement
Employers often struggle to measure judgement directly, yet it is frequently what differentiates dependable performance from brittle performance. Scenario simulations, reflective write-ups, and viva-style defences can make judgement more visible when designed well. They also create a record that can be sampled for quality assurance, which matters when trust is at stake.
Reputation signals with auditability
References and employer brand endorsements remain influential, but they are uneven and can reinforce closed networks. More auditable variants are emerging: verified portfolios, externally reviewed projects, and participation in standards-based competitions or communities of practice. These can be complemented by micro-credentials, although their signalling value depends on assessment integrity and transparency about what was actually tested.
A subtle shift is underway: signals are becoming less about “what was learned” and more about “what can be reliably demonstrated”, including the ability to use AI without surrendering accountability.
Institutions as signal stewards
If grades are a weakening proxy, institutions face a choice about what they are stewarding: content delivery, credential issuance, or trustworthy evidence of capability. Each path carries governance, cost, and reputational consequences.
Assessment redesign as institutional infrastructure
Moving beyond grades does not require abandoning standards. It may require treating assessment as infrastructure rather than a by-product of teaching. That includes designing tasks that elicit reasoning, documenting tool use expectations, and calibrating marking with clear rubrics that reward verification, not just fluent output. Some disciplines may shift towards more observed performance, while others may formalise “AI-permitted” assessments that explicitly test the ability to work with tools responsibly.
Data governance and legitimacy
As institutions collect richer evidence of learning and performance, data governance becomes central. What is stored, for how long, and with what consent? How are AI tutors and analytics audited for bias or drift? How are claims about competence made defensible to regulators and employers? Policy-aware design matters here, particularly in the UK context where quality assurance and public trust are intertwined, and where providers registered with the Office for Students operate under scrutiny regarding outcomes and standards.
A quiet shift in the institutional role
Some providers are experimenting with assessment models that resemble professional practice. At the London School of Innovation, for instance, AI-supported formative assessment and role-play simulations are used to make decision-making visible, with human academic input focused on judgement and feedback rather than repeating content delivery. The interesting question is not whether one model wins, but what mix of human and AI processes produces signals that employers can trust without disadvantaging capable learners who lack social capital.
Evidence worth building next
The sector has many opinions about what employers will trust, but less shared evidence. Progress may depend on targeted empirical work that tests which signals predict performance, and on decision tests that clarify what each institution wants its credentials to mean.
A practical research agenda
One genuinely useful area of empirical study would be predictive validity across signal types. A multi-institution consortium could track graduates into early-career roles and compare how well different measures predict supervisor-rated performance and progression: traditional grades, supervised simulations, externally verified portfolios, and structured viva assessments. Results could be segmented by discipline and by role type, because “good signals” are likely to vary between, for example, healthcare-adjacent contexts, consulting, and product engineering.
A second study could examine robustness under AI assistance: when candidates are permitted tools, which assessment designs still discriminate meaningfully between high and low judgement? This would move the debate from abstract integrity concerns to testable design choices.
Decision tests for credential redesign
Several futures remain plausible. In one, employers build extensive in-house testing and degrees become a weaker filter. In another, reputable institutions strengthen their signalling power by publishing clearer evidence standards. In a third, sector-wide credential frameworks emerge that make skills and performance more portable. Preparatory choices today tend to be similar across these futures: invest in assessments that reveal reasoning, adopt transparent policies on AI use, and build auditable evidence trails.
Insight: trust is migrating from the artefact to the process that produced it, and from the score to the evidence behind it.
Uncomfortable question: if an institution could no longer rely on brand or classification, what would remain as defensible proof that its graduates can be trusted with consequential decisions while using AI?
London School of Innovation
LSI is a UK higher education institution, offering master's degrees, executive and professional courses in AI, business, technology, and entrepreneurship.
Our focus is forging AI-native leaders.
Learn more