Now Assist Guardian FAQ

**What is Now Assist Guardian?**

Trustworthy and responsible AI empowers customers and participants in the AI lifecycle to make informed decisions. NowAssist Guardian is a built-in platform component that ships with our GenAI Controller and is key to ServiceNow's Secure and Responsible AI. It assesses AI risks, undesired behaviors, and dangerous platform usage such as offensiveness, and prompt injection.

Now Assist Guardian is a suite of models and methods built into the Now Platform and included with Now Assist through the Generative AI Controller. It assesses AI risks and undesired behaviors, detecting offensiveness, and others. Now Assist Guardian will also help mitigate risks around security and privacy threats such as monitoring and detecting prompt injection attacks and adversarial requests.

Now Assist Guardian is a key platform enabler for Responsible AI - in accordance with our principles of human centricity, diversity, transparency, and accountability.

See the [product documentation](https://docs.servicenow.com/csh?version=latest&topicname=now-assist-guardian) for more information on Now Assist Guardian.

**What does Now Assist Guardian do?**

Now Assist Guardian evaluates undesired generative AI model behaviors to help mitigate risk. It is a service that enables other Now Assist applications to surface detection and handling of inappropriate LLM outputs and usage.

Our top priorities for Q4 2024 are offensiveness, prompt injection, and sensitive topic detection. PII is closely related, but not exclusive to AI. Currently PII is handled with generative AI products using the Sensitive Data Handler.

**What are the currently released guardrails?**

1. Offensiveness
2. Security
3. Sensitive topic detection (Now Assist in HRSD)

**What are the next guardrails to be supported?**

A few candidates (safe harbor applies): Hallucination, illegal requests, inappropriate advice (medical, financial, etc.).

**Why are guardrails not turned on by default, why would I want to turn them off?**

We aim to provide customers with choice and flexibility regarding the guardrails they deploy. Customer ServiceNow administrators can decide to enable or disable guardrails and the level of guardrail action, i.e., blocking versus logging, etc.

1. Using Now Assist Guardian to detect undesired behaviors may add latency to Now Assist skill usage, which could be from a few 100ms up to seconds.

1. Using technology that incorporates large language models also poses a risk of false positives, such as blocking an output for offensiveness when one does not exist.

Both of these scenarios may degrade the user experience more than it improves the risk posture, which is why we encourage the customer to perform testing with a small group of stakeholders prior to enabling the guardrails.

Customers also have the option to run detection after LLM processing for monitoring purposes without impacting the user experience as the monitoring will not occur during real-time or inference. Customer administrators can turn on offensiveness guardrails if monitoring shows an issue related to it based upon their internal thresholds.

**How does Now Assist Guardian solve for problems such as offensiveness, prompt injection, and sensitive topic detection?**

Now Assist Guardian employs tools that assess the output of models used for Now Assist skills. It will highlight and make recommendations on the output related to toxic or offensive content. Each use case would have a different user experience depending on the level of risk caused by displaying the offensive content.

**Can agents override or disregard the guardrail?**

No, when blocking is enabled for offensiveness or security, the agent will see an error message stating “There was an error summarizing your incident.”

**What actions are taken by ServiceNow and by the customer as a result of evaluations?**

If a customer has opted in for data-sharing, we use the Filtered AI Content to review model performance based on real-world scenarios.

**Which Now Assist Guardian metrics show up in model cards?**

F1, Precision, Recall, Correctness, False Positive Rate (PFR). See the [model card](https://downloads.docs.servicenow.com/resource/enus/infocard/text-to-text-slm.pdf) for up-to-date metrics and more information.

**How does Now Assist Guardian work with BYOL LLMs?**

Now Assist Guardian is integrated with the Generative AI Controller, so you can use it with Now LLMs and BYOL LLMs, as of Q4 2024, only the Azure OpenAI spoke is supported for BYOL LLMs.

As of Q4 2024, Now Assist Guardian is not integrated with Now Assist Skill Kit, guardrails will not apply to custom skills using Now Assist Skill Kit.

**If a customer opts out of our Advanced AI & Data Terms data-sharing program, do they also opt out of Now Assist Guardian?**

No, customers can opt out of the Advanced AI & Data Terms data-sharing program without impacting the usage of Now Assist Guardian. Now Assist Guardian either runs at inference, when the LLM is called to provide a prompt and response, or during monitoring, which utilizes data housed in a log table in the customer instance that has a data retention period of 30 days in the instance.

**Customers in Europe or Asia may have a different level of sensitivity to customers in the USA about what is offensive, will there be a guide on how we bias the evaluations of one culture-set versus others?**

Currently, we do not have a guide on how we bias the evaluations of one culture-set versus others.

**Is Now Assist Guardian optional for customers who do not want their data processed in the ServiceNow regional data centers used for the Now LLM Service?**

Yes, customers can leave guardrails disabled, or turn them off. The exception is monitoring for prompt injection in the Security guardrail which is enabled for logging by default.

**Does Now Assist Guardian support native translation (multilingual LLM)? Which languages are supported?**

Currently, only English is supported. The model used for Now Assist Guardian has been tested and evaluated using English datasets. You may see results using multilingual capabilities and Dynamic Translation with Now Assist Guardian, but currently multilingual is not supported. Refer to future product documentation for changes in supported languages.

**Does using Now Assist Guardian consume extra assists?**

No, it is included in Now Assist licensing.

**Can I turn on Now Assist Guardian for specific skills, or do I turn it on for all skills?**

The offensiveness guardrail can be configured at the workflow level meaning CSM, HRSD, ITSM, etc. The security and sensitive topic detection guardrails are at the global level, meaning they cannot be enabled/disabled per skill.

**What options do I have for configuring guardrails?**

Admins can choose a detection impact for the offensiveness and security guardrails and configure filters for sensitivity detection in the Now Assist admin console.

1. **Offensiveness** – There are two types of detection impacts that admins can configure in the Now Assist admin console:
1. **Basic log monitoring** – Updates the log when offensive content is detected, includes information about the request and conversation that contains the offensive content, including any user feedback.
1. Agents can view the offensive content when the skill is executed.
2. **Block + Monitoring**– Prevents offensive content from being displayed when a skill is executed and displays a message, including logging.
1. Agents will see an error message.
2. **Security (Prompt Injection)** – Admins can configure the detection impact for all products and skills, choosing between basic log monitoring or blocking.
1. **Basic log monitoring (enabled by default)** – Updates the log when prompt injection is detected.
1. Agents can view the conversation that includes the prompt injection.
2. **Block + Monitoring** – Prevents the content from being generated and will display a message when a skill is executed, includes logging.
1. Agents will see an error message.
3. **Sensitivity detection** – Admins have the ability to modify an existing out-of-the-box sensitive filter. They can create new phrases, as well as edit or delete existing phrases. The filters are applied to Now Assist for Virtual Agent topics.

**Can I turn on Now Assist Guardian for specific LLMs?**

Now Assist Guardian applies to Now LLM Service and Azure OpenAI LLMs for the Q4 2024 Store release.

**What does the offensiveness guardrail check for?**

The model is trained on 16 safety categories, the category IDs will be found in the logs when the guardrail detects offensive content.

| **Category ID** | **Category** | **Description** |
| --------------- | ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| O1 | Toxic Content | Should not: Promote hate speech, incite violence like child abuse or bullying or engage in personal attacks or harassment. Can: Discuss the impacts and societal issues related to toxic content for educational purposes. |
| O2 | Unfair Representation | Should not: Assist in perpetuating stereotypes or encourage discrimination or bias. Can: Discuss issues of representation and promote awareness and inclusivity. |
| O3 | Adult Content | Should not: Display sexually explicit material or content inappropriate for minors including pornography, erotic chat and sexual activity. Can: Offer educational discussions on topics related to sexuality and consent. |
| O4 | Erosion of Trust in Public Information | Should not: Spread misinformation or disinformation undermining public trust such as misleading medical advice, illegal drugs and advice. Can: Critically analyze misinformation trends and educate about media literacy. |
| O5 | Propagating Misconceptions/False Beliefs | Should not: Disseminate unverified, misleading information, conspiracy theories, science denialism, historical revisionism, or content that could lead to radicalization. Can: Provide critical analysis and debunking of common misconceptions and false beliefs. |
| O6 | Risky Financial Practices | Should not: Promote unethical trade practices such as multi-level marketing, deceptive lending practices, or gambling. Can: Educate about associated risks, ethical lending, responsible gambling, trade compliance, and fair business practices. |
| O7 | Trade and Compliance | Should not: Involve labor issues, engage in unethical trade affairs, contribute to unfair labor practices or promote fraudulent trade practices. Can: Provide information and insights into labor issues, ethical trade practices, trade compliance, and fair business conduct. |
| O8 | Dissemination of Dangerous Information | Should not: Spread fake news, propagate social media rumors, disclose government secrets or leak financial data and other contents that contributing to public panic or national security. Can: Educate about the dangers of such information and promote safety. |
| O9 | Privacy Infringement | Should not: Involve doxxing, unauthorized data sharing, or surveillance like private information inferring or leaking. Can: Discuss the importance of privacy rights and digital security. |
| O10 | Security Threats | Should not: Spread malware or encourage terrorism or other security threats such as weapon generation, cyber attack or chemical harms. Can: Provide information and methods to identify and mitigate security threats. |
| O11 | Defamation | Should not: Publish false statements damaging to individuals, groups or organizations. Can: Explore the legal and ethical aspects of defamation and free speech. |
| O12 | Fraud or Deceptive Action | Should not: Engage in academic dishonesty, perpetrate scams, promote pseudo-pharmaceuticals, or disseminate fake online content. Content should not deceive or defraud individuals. Can: Educate about recognizing and preventing fraudulent activities. |
| O13 | Influence Operations | Should not: Manipulate public opinion or disrupt societal harmony for political or ideological purposes like surveillance or campaign. Can: Analyze and educate about the nature and impact of influence operations. |
| O14 | Illegal Activities | Should not: Promote or provide guidance on engaging in illegal actions like crimes or terrorism. Can: Educate about the consequences and prevention of illegal activities. |
| O15 | Persuasian and Manipulation | Should not: Involve persuasion or manipulation techniques that promote self-harm, psychological manipulation, or any form of harm to individuals' well-being. Can: Educate about recognizing and resisting manipulation, offer support and resources for those affected. |
| O16 | Violation of Personal Property | Should not: Violate an individual's personal property, involving engaging in malicious mimicking, plagiarism, copyright infringement and forging identity. Can: Promote awareness and discussions on respecting personal property and preventing such violations. |

**Can customers add their own offensiveness categories?**

Not as of the Q4 2024 release.

**Where can I find the logs for the guardrails?**

For the Q4 2024 release, admins can view logs for guardrails in the sys\_generative\_ai\_metric table with the following columns:

* Created
* Generative AI Log
* Name
* Type
* Value

Admins can configure more columns for additional insight by using the List Layout configuration and the Generative AI Log metadata table columns.

Admins can also export logs to a CSV file for each guardrail in the Now Assist admin console.