logo

NJP

Large AWS Landscape Tuning and Troubleshooting

Import · Jul 09, 2024 · article

This article covers simplified steps for tuning use of the ServiceGraph Connector for AWS (SGC-AWS) and when the landscape being discovered contains more than 10,000 EC2 instances.

Please refer to the Product documentation for new release notes and enhancements

https://docs.servicenow.com/bundle/washingtondc-servicenow-platform/page/product/configuration-manag...

Use AWS Central Aggregrator - Reduces frequency of API calls to AWS and allows you to spread load across multiple AWS SGC Import Sets using multiple AWS SGC AWS Instances.

An aggregator is an AWS Config resource type that collects AWS Config configuration and compliance data from multiple AWS accounts and Regions into a single account and Region to provide a centralized view of your resource inventory and compliance.

Using the AWS Central Aggregator reduces the amount of calls to ListDiscoveredResources for individual Account and Regions under an organizational structure.

Once AWS Aggregator is setup, return to the SGC-AWS Setup and enter the AWS Config Aggregator Details:

Navigate to Setup:

SanjeevanPujji_2-1718314606894.png

Update Configuration Properties for instance Step 3 under Configure the connection

SanjeevanPujji_3-1718314679432.png

SanjeevanPujji_1-1718314500977.png

Enter you AWS Config Aggregator Data into the fields:

SanjeevanPujji_0-1718314389877.png

The ServiceGraph Connector is now configured to use the AWS Config Aggregator

Separate the AWS Landscape into smaller Central Aggregator Scopes and configure in AWS Console

Using the AWS Console, create as many Aggregators as you need to spread the size of the AWS landscape.

Assign Accounts into the AWS Aggregator. Make sure that the accounts specified do not exist in more than one Aggregator else the AWS SGC will be duplicating imports and over-stepping other AWS Instance runs.

Add new AWS Instances through the AWS SGC Guided Setup

Once you have configured the connection properties for the AWS Central Aggregator, you can return to the Guided Setup and use the section Add Multiple Instances. This will allow you to specify new credentials that look at different configurations in the AWS Central Aggregator and will create isolated sets of Import Schedules for each defined AWS Instance. Follow the Guided Setup steps:

SanjeevanPujji_1-1720562241598.png

You must create new S3 Storage Buckets so that the information from other AWS Instances does not get intermized. Do not reuse a S3 Bucket name across multiple AWS Instances.

Increase the Batching Size for ListDiscoveredResource API calls

To reduce the number of API calls made to AWS to collect information, the default batching size of 20 records should also be increased.

There are two ways to do this:

1. Change one of the following AWS SGC properties

sn_aws_integ.sgaws_config_list_discovered_resource_count

2. Change the default BATCH size variable in the Script Includes

Edit the Script Include SgAwsSendCommandDataSourceUtils and find the line:

this.SSM_GET_INVENTORY_MAX_RESULTS = 50;

Change the value from 50 to a max of 200

Use Optimized Delete

To use this feature you must have installed and configured AWS Config Aggregator.

Enable the sn_aws_integ.enableOptimizedDeletion system property for the optimized deletion of retired configuration items (CIs) in AWS during delta import of data.

Refer to the ServiceNow Enable optimized deletion of retired CIs in AWS Documentation Page

Review Logs for Memory Errors

When there are a large number of resource objects returned for an API call, the payload size of the response can exceed the default system size which is 20MB.

The response buffer size needs to be increased but should be increased with some calculated precision so as to not hold JVM Heap resources that may be required by other threads on the instance.

Examining the logs for the message :

To calculate a new response buffer size, use the following steps:

Navigate to the SYS_OUTBOUND_HTTP_LOG table

Add the column RESPONSE LENGTH

Sort RESPONSE LENGTH descending

CalculatedMAX = the largest RESPONSE LENGTH value + growth amount

Use the CalculatedMAX to set these two properties:

  • com.glide.transform.json,char_buffer_max_size = CalculatedMAX+32000
  • com.glide.transform.xml.char_buffer_max_size = CalculatedMAX+32000

Look for HTTP 429 errors specifying RequestLimitExceeded or ThrottlingException

- Adjust below properties as shown:

  • sn_aws_integ.throttling_min_wait_time_in_ms = 5000
  • sn_aws_integ.throttling_max_wait_time_in_ms = 10000

- You may to contact AWS Support to have then increase your API rate

Monitor System Logs for HTTP errors

  • Monitor the system logs for issues with the application sn_aws_integ
  • Filter messages and look for ERROR and WARN messages
  • Some errors will be very specific to certain accounts and/or subsets of Instances!
  • Focus on HTTP Response codes:

–HTTP-400: General Bad Requests – Triage AWS configuration settings

–HTTP-403: Check credentials

–HTTP-404: Make sure CFT scripts are applied and pushed to all instances. Check S3 permissions

–HTTP-429: AWS API Throttling issues. Contact AWS to increase API call rate

Monitor executions in Flow Designer

  • Open Flow Designer
  • Navigate to Executions Tab
  • Filter for Name – SGC-AWS*
  • Look for State not Completed
  • Look at the Runtime values to determine if there are long running flow issues
  • Start to triage the ones that are not Completed
View original source

https://www.servicenow.com/community/cmdb-articles/large-aws-landscape-tuning-and-troubleshooting/ta-p/2962981