Risk or revolution? Why ethical testing of GenAI is critical

A young woman sitting at a desk, illuminated by her laptop screen, surrounded by lines of computer code and abstract AI graphics, symbolising the intersection of human responsibility and generative AI systems.

by Hendrik Richter (ERNI Switzerland)

Is it ethical to trust AI systems with decisions that can change lives – when we still don’t know if they’re fair, transparent or even safe? This question is at the heart of a fast-growing debate in technology and society: the urgent need for ethical AI testing. As generative AI systems (GenAI) – such as ChatGPT, DALL-E and other advanced content creators – increasingly influence everything from job applications and loan approvals to healthcare and criminal justice, the stakes for getting AI ‘right’ have never been higher.

What is ethical AI testing?

Ethical AI testing goes beyond traditional software quality assurance. It means assessing AI systems not just for technical performance, but for fairness, transparency, accountability and respect for human rights. The goal: ensure these systems do not perpetuate bias, violate privacy or make decisions that harm individuals or society.

Fairness

AI systems must not discriminate against individuals or groups based on race, gender, age or other protected attributes.

How is fairness tested?

  • Data audits to verify balanced representation in training data.
  • Controlled test cases for protected groups.
  • Measuring disparate impacts using statistical fairness metrics such as demographic parity or equal opportunity.
  • Using counterfactual fairness analysis to detect hidden bias in model outputs.

Transparency (Explainability)

Stakeholders must be able to understand how and why an AI system reaches certain decisions or outputs.

How is transparency tested?

  • Implementing model explainers (e.g., LIME, SHAP) to make system logic transparent.
  • Audit trails documenting how data was processed and used for decisions.
  • Requiring traceable documentation for GenAI prompts and generated content.

Privacy

AI must protect personal and sensitive data, both in training and during inference.

How is privacy tested?

  • Privacy impact assessments (PIAs) and adversarial testing to uncover leakages (e.g., targeted prompting of LLMs based on training data).
  • Differential privacy testing to quantify and limit the risk of personal data extraction.
  • Simulative Attacks (‘Red Teaming’), which specifically attempt to disclose protected information.

Accountability

There must remain a clear chain of responsibility for AI-driven decisions and their outcomes.

How is accountability tested?

  • Reviewing processes to ensure that responsibilities for oversight and handling of incidents are defined.
  • Evaluating compliance with documentation and governance requirements (i.e., as required by the EU AI Act).
  • Implementing rules ensuring no AI can make a key decision without human intervention.

Why do we need ethical AI testing?

Because even with the best intentions, GenAI can cause real harm, intentionally or not. Specific recent failures prove what is at stake:

Case example:

Investigations by outlets such as the Washington Post showed that tools like Stable Diffusion and DALL-E amplified bias in gender and race, even though developers sought to detoxify the training data. These platforms tended to default to lighter skin tones for high-status roles and darker tones for lower-status positions, demonstrating entrenched stereotypes. Source: https://www.washingtonpost.com/technology/interactive/2023/ai-generated-images-bias-racism-sexism-stereotypes/

Other risks:

  • Law enforcement agencies have adopted GenAI for case analysis and facial recognition, but undetected biases have led to wrongful accusations.
  • Healthcare GenAI has suggested inappropriate treatments due to under-represented minorities in training data.

Ethical testing is therefore critical:

  • To prevent social harm and legal violations;
  • To build and maintain user trust;
  • To address growing regulatory demands; and
  • To protect organisations from lawsuits and reputational damage.

What’s new in ehical AI testing? (2025)

Regulatory momentum

A major change is the implementation of the EU AI Act:

In 2025, the EU AI Act introduces binding obligations for companies and developers of GenAI systems in Europe. The regulation classifies high-risk AI applications and sets out minimum standards for risk management, transparency, human oversight, and documentation.

Modern ethical AI testing is no longer optional – it is a legal prerequisite for deployment in the EU. For example, GenAI models used in recruitment, health or law must undergo strict risk assessment and bias audits. Non-compliance can lead to heavy sanctions and market exclusion.

How do we support risk assessment and bias audits through testing?

Risk assessment:

  • Conduct technical and organisational risk analyses throughout development.
  • Use scenario-based testing: Simulate real-world misuse, adversarial attacks or unintended output scenarios.
  • Require documentation of all identified risks and applied mitigation strategies, mandatorily reviewed before release.

Bias audits:

  • Routinely analyse system outputs for systematic disadvantages, using fairness metrics as part of the regular test reporting.
  • Engage external reviewers or ethics boards to validate audit results.
  • Document and remediate any discovered biases prior to deployment, as per legal requirements.

Direct impact on software development:

  • Developers must integrate fairness, transparency and privacy checks in each iteration (‘Ethics by Design’).
  • Compliance with the Act necessitates versioned documentation, explainable systems, active monitoring and human involvement for key decisions.
  • Non-compliance can lead to severe penalties, forced recall or product bans in the EU.

Ongoing research

  • Researchers are exploring ways to embed ethical guidelines directly into AI development pipelines and automate the detection of bias and unfair outcomes.
  • There is growing recognition that auditing and monitoring must be continuous, not a one-time event.

Conclusion

Ethical AI testing is not just a technical challenge – it’s a societal imperative. As AI systems become more powerful and pervasive, the demand for robust, transparent and fair testing will only intensify. The question is no longer whether we need ethical AI testing, but how we can do it effectively. 

Are you ready
for the digital tomorrow?
better ask ERNI

We empower people and businesses through innovation in software-based products and services.