LSI Insights - Future of Higher Education

Academic assessments amid the rise of AI

Higher education assessment is being pulled in two directions: protecting trust in credentials while embracing AI as a legitimate part of modern knowledge work. The old bargain, that assessment can reliably separate individual capability from assistance, is weakening. The risk is not simply misconduct, but a quiet loss of confidence in what grades and awards actually signal.

17 min read September 17, 2025
Executive summary
AI makes it harder to treat assessments as a clean measure of individual authorship, and easier to treat them as a window into judgement, process and real-world performance. The immediate question is not whether AI should be “allowed”, but what institutions are trying to evidence, under what conditions, for which stakeholders. A credible path forward may combine clearer assessment purposes, stronger governance, and new forms of evidence, while accepting genuine uncertainty about what “good” looks like at scale.
Credential as a social contract

Credential as a social contract

Assessment has always been more than measurement. It is part of a social contract between institutions, students, employers, and regulators about what a qualification stands for. AI changes the cost and nature of producing academic work, which puts that contract under pressure.

Why assessment carries institutional risk

In most systems, assessment is the mechanism that turns learning into a credential. That credential then travels: into labour markets, professional bodies, immigration systems, and public confidence in higher education funding. When assessment is questioned, the impact is rarely contained to one module or one cohort.

Example: The take-home essay under new conditions

A take-home essay once assumed a rough alignment between time spent, writing quality, and individual effort. With widely available tools like ChatGPT, Claude, Gemini, or Copilot, the same task can be produced faster, in multiple drafts, in tones that mimic academic convention. The essay may still show understanding, but it may also conceal it. The task becomes less a test of writing under constraint and more a test of prompting, editing, and topic comprehension, whether or not that was the intent.

Assessment integrity is not only about cheating

Academic misconduct matters, but a narrower framing can miss the deeper issue: even fully permitted AI use can blur what an assessment is evidencing. If a grade no longer reliably signals capability to interpret evidence, construct an argument, or make a decision, the credential weakens even without rule-breaking.

Where trust is currently anchored

Many institutions still anchor trust in familiar proxies: invigilated exams, plagiarism similarity, or statements about independent work. Those anchors were designed for a world where assistance was costly, visible, and relatively scarce. AI reduces that scarcity.

Assumptions that no longer hold

AI does not merely add a new tool; it breaks some assumptions that have sat quietly beneath assessment design for decades. When assumptions change, policies that look sensible can produce unintended outcomes, including inequity and misalignment with workplace reality.

Assumptions that no longer hold

Can authorship still be treated as binary?

Assessment rules often assume a clear line between “the student’s work” and “external help”. In practice, higher education already permits layered support: peer discussion, library guidance, writing centres, reasonable adjustments, and staff feedback. AI introduces assistance that is continuous, personalised, and hard to observe. The question becomes whether authorship is best treated as a binary state, or as a documented spectrum of contribution.

Scarcity of high-quality output has shifted

In many disciplines, producing a competent first draft is no longer a strong indicator of deep understanding. Scarcity has moved elsewhere: selecting a direction, justifying trade-offs, checking claims, and integrating constraints. Those are closer to judgement than production.

Detection as a stabilising mechanism is weakening

Text-based detection is an arms race with uncertain accuracy, uneven performance across language backgrounds, and high reputational risk when false positives occur. Over-reliance on detection can also shift institutional posture from learning to surveillance, with knock-on effects for student trust and staff workload.

Assessment time and place are less meaningful controls

Moving more work into controlled environments can help, yet it also changes what is being assessed. A timed exam can measure recall or speed under pressure, but may tell less about capability in professional contexts where resources and tools are used responsibly. The question is not whether control is valuable, but which capabilities deserve controlled conditions.

AI use is becoming a workplace baseline

Many employers now expect graduates to use AI well, not to avoid it. If higher education assessment insists on “AI-free” performance everywhere, it risks graduating students who can pass university tasks but struggle in AI-mediated work.

Science of Adult Education

Science of Adult Education

Traditional educational paradigms often struggle to adapt to rapidly advancing technologies. At the intersection of learning science and AI, this module stands as a beacon for educators seeking to navigate and flourish in this evolving environment....

Learn more

Evidence of capability, not output

A productive pivot is to treat assessment as evidence generation: what evidence would persuade a sceptical stakeholder that capability is present, even in an AI-rich environment? This does not require abandoning essays or exams, but it does invite a rebalancing towards observable reasoning and performance.

Evidence of capability, not output

Judgement becomes the scarce capability

If AI can propose options, draft text, or summarise research, then the differentiator is often the ability to set a goal, choose an approach, and defend decisions under scrutiny. Assessment that foregrounds judgement can be more robust to AI assistance because it asks for rationale, not just results.

What does “authentic” evidence look like?

Authenticity is sometimes treated as a slogan. A more useful interpretation is evidential: the assessment creates artefacts that resemble decision-making in the domain, and makes the reasoning auditable.

  • Viva-style conversations that probe why choices were made, including the limitations of sources and models.
  • Scenario simulations where conditions change and the candidate must adapt, documenting decision points.
  • Portfolio trails that show iteration, feedback incorporation, and reflective critique of AI outputs.
  • Team-based deliverables assessed alongside individual accountability narratives, reflecting organisational reality.

AI as permitted instrument, not hidden advantage

Some institutions are experimenting with “open AI” assessments where tool use is allowed but must be disclosed. This reframes integrity from prohibition to transparency. It can also reduce inequity, since private access to better tools becomes less of a hidden differentiator when usage norms are explicit.

Example: Simulation with disclosed AI support

In an entrepreneurship module, a student might use AI to draft a market analysis. The assessment then focuses on whether assumptions are validated, whether risks are recognised, and how the student responds when a simulated competitor changes pricing. The AI output is not the point; the judgement is.

For example, at the London School of Innovation, we have incorporated AI-supported formative assessment with private virtual tutors and repeatable role-play simulations, partly because they create a richer evidence trail than a single submission event. The interesting question is how such evidence can be standardised without flattening disciplinary nuance.

Governance for AI-mediated assessment

Assessment redesign is rarely a purely academic matter. It touches regulation, reputational risk, procurement, accessibility, staff capability, and data governance. AI makes these connections tighter, because the assessment process increasingly depends on tools, logs, models, and policies beyond the course handbook.

Governance for AI-mediated assessment

Policy choices that shape behaviour

Institutional AI policies can unintentionally drive behaviour underground. Overly restrictive rules can encourage covert use; overly permissive rules can produce ambiguity about standards. Useful policies tend to specify purpose: what is being assessed, what tools are allowed, and what disclosure is expected.

Quality assurance under new evidence types

When assessment includes vivas, simulations, or portfolios, moderation needs redesign. The challenge is ensuring reliability without turning assessment into a compliance exercise that erodes the value of richer evidence.

  • Rubrics that prioritise reasoning quality rather than surface features of writing.
  • Sampling approaches for moderation that focus on decision points and feedback consistency.
  • Examiner calibration for oral and performance assessments, including bias awareness.

Equity and accessibility in tool-mediated systems

AI can widen or narrow gaps. Students with stronger digital fluency, better devices, or quieter study environments may benefit more. At the same time, AI can support neurodiversity, language development, and flexible pacing. Governance needs to treat equity as an outcomes question, not only an access question.

Data, privacy, and vendor dependence

Assessment increasingly generates sensitive data: drafts, interaction logs, audio, and behavioural signals. In the UK, considerations include GDPR, contractual terms, and how OfS expectations on quality and standards might intersect with AI-enabled processes. Internationally, cross-border data flows and differing regulatory norms complicate consistency for transnational education.

Staff capability and workload realities

Assessment change often fails due to capacity constraints. The key issue is not whether staff can learn new tools, but whether workload models, recognition, and support structures match the new assessment design. AI can reduce some burdens, yet it can also create new ones, such as managing disclosures or interpreting complex evidence trails.

Empirical clarity and decision tests

The sector could benefit from shared empirical work that goes beyond tool debates and asks what assessment designs actually preserve standards while improving relevance. The goal is not to find one best model, but to understand trade-offs, costs, and unintended consequences under real conditions.

Empirical clarity and decision tests

Where evidence is currently thin

There is growing local experimentation, but limited comparative evidence across disciplines, student groups, and institutional types. Many decisions are being made under uncertainty, often guided by risk tolerance rather than measured impact.

What study could move the sector forward?

A genuinely useful programme of study would be multi-institutional and longitudinal, comparing assessment formats under controlled policy conditions, with transparency about AI allowances.

  • Design: Compare modules using invigilated exams, open-AI take-home work with disclosure, and performance-based assessments such as vivas or simulations.
  • Measures: External examiner judgements, reliability across markers, student learning gains, differential outcomes by background, and post-course performance indicators where available.
  • Operational data: Staff workload, moderation costs, misconduct rates, and student trust in fairness.

Such a study would not settle every debate, but it could turn abstract arguments into concrete trade-offs that governance bodies can act on.

Decision tests for resilient assessment

Rather than aiming for a single policy stance, one practical approach is to test each assessment against decision questions that remain stable across possible futures, including faster model progress, tighter regulation, or shifts in employer expectations.

Difficult questions worth holding open
  • What capability is the assessment attempting to evidence, and would a sceptical employer believe the evidence?
  • Which forms of assistance are being prohibited in theory but tolerated in practice, and what does that inconsistency do to trust?
  • What is the institution willing to treat as legitimate tool use, provided it is disclosed and critiqued?
  • How will moderation and external examining work when evidence includes conversations, simulations, or AI interaction logs?
  • What equity impacts will be measured, and what thresholds would trigger redesign?
  • Which risks are being managed through surveillance, and which could be managed through better task design?
  • How will assessment data be governed when it becomes more granular, more personal, and more commercially entangled?
  • If AI capability continues to accelerate, which assessment approaches remain credible without constant reinvention?
London School of Innovation

London School of Innovation

LSI is a UK higher education institution, offering master's degrees, executive and professional courses in AI, business, technology, and entrepreneurship.

Our focus is forging AI-native leaders.

Learn more