https://www.servicenow.com/workflow/it-transformation/generative-ai-changing-the-way-companies-operate.html

Even OpenAI cautions that GPT has limitations, not the least of which is its propensity to write plausible-sounding but incorrect or nonsensical answers. “This ‘hallucination’ of fact and fiction is especially dangerous when it comes to things like medical advice or getting historical facts right,” it warns on its website.

This underscores a key guardrail for companies to deploy: a human-in-the-loop, such as a “prompt engineer,” an emerging IT role, to refine the text prompts that people type into the generative AI to yield more accurate outputs.

“Prompt engineer” job title aside, every employee deploying generative AI should closely monitor outputs and flag potential bias or factual inaccuracies, notes Fabio Casati, principal machine learning engineer at ServiceNow and lead of ServiceNow Research's AI Trustworthiness and Governance Lab.

“Monitoring, steering, and constraining the AI to align it to behaviors and values that match what a company or society believes in is the most important aspect of human-in-the-loop,” says Casati. “This is the form of human-in-the-loop I'd expect to be in place for the longest time, possibly forever.”

Generative AI could run afoul of these behaviors and values in any number of subtle but dangerous ways, he says.

Consider talent recruitment. “We all know how hard it is to find and select the ‘right person’ for a job,” says Josh Bersin, leading HR analyst and CEO of the Josh Bersin Company. “Suppose you could crawl millions of employee profiles and assess, based on comparative data with people in similar roles at other companies, how ‘good’ this person is at this job? That would be impossible to do manually. Generative AI can do it.”

But those “good” candidate outputs could be tainted by unconscious biases baked into the AI, posing a huge diversity, equity, and inclusion (DEI) issue in hiring. For example, a UC Berkeley researcher asked ChatGPT which race and gender were the “best” scientists, and it replied they were white and Asian males. And a Stanford University study found that the bigger the large language model dataset, the more toxic the bias in its outputs.

That’s why Bersin says a human-in-the-loop fail-safe is essential for good governance. “I’d tell companies to be very careful drawing direct conclusions from GPT’s results without a human double-checking it,” he advises. “It’s not a perfect calculation engine at this point.”