logo

NJP

Optimizing Service Reliability with ServiceNow Service Reliability Management (SRM)

ServiceNow Community · Sep 18, 2024 · video

[Music] in the Xanadu release service now launched service reliability management a solution that helps organizations monitor slos manage error budgets and improve service Health SRM integrates incident response and on call management reducing disruptions and speeding up resolutions one of srm's key strengths is its ability to empower distributed teams to collaborate seamlessly on the now platform this ensures that teams no matter where they are can work together efficiently to manage service reliability effectively the now platform admin is responsible for the initial setup of service operations workspace admin Center although this isn't an SRM role the now platform admin needs to define the SRM admins the SRM admin manages configurations setup and overall governance of the SRM application the Sr r m manager oversees and manages reliability efforts including incident resolution and service performance monitoring the SRM responder engages directly in addressing incidents and resolving issues to maintain service reliability let me show you what this looks like in the SRM environment I start by logging in as the systems administrator as the now platform admin my role is critical in setting up the foundational components of service reliability management you'll be responsible for installing the s SRM modules assigning roles and ensuring the proper configuration of teams and Services I will navigate to the service operations admin Center once you are in the service operations admin Center I can choose the service reliability management module and navigate to assign and activate and then I can assign our team leads the SRM admin role this role gives them the ability to manage the SRM application for the SRM admin user I'm going to select Amelia Caputo the team lead for a microservice application called order service once she is added I click the save button next I will import my teams and services into SRM I do this by selecting the activate teams and service module I will select the teams I want to activate and then click the button activate at the bottom of the page next I am going to import services and then activate them from the service class drop down select any service CI class and select the services I want to import select activate at the bottom of the page before I complete the service activation I need to define the support group I will select the application Development Group next I will Define governance and autonomy from the navigation select governance and autonomy and then select service governance then select the option for approval required when associating an existing cmdb service within SRM then click the save button on top right corner next let's define team governance use the link at the bottom to open catalog item request form for a new team I will use the default SRM catalog request settings I will review the basic info details then I will select review and submit then select submit to request this item for SRM team access as the now admin I have completed the necessary preliminary work to enable the teams to begin using the SRM application next I will assume the Persona of an SRM manager by impersonating leein the SRM manager and team lead in this role Lee is responsible for overseeing day-to-day reliability operations including setting up teams services on call schedules escalation policies and configuring slos sis and alert automation additionally Lee will Define incident escalation rules to ensure service reliability is maintained next select the add a team button to begin creating my team I am going to create a new team called SRE Ops and add my team members Daniel zil Josephine stutler and Phil Henry I'll provide a brief description of my team then select add team I select the SR Ops Team I just created and it takes us to the SRM team's guided setup page team details and team members are complete but I still have a few more tasks that need to be completed before I select assigned Services let's define the on call schedule for our team by clicking the schedule tab I will Define a monthly shift for my team when I click the create shift button I will be able able to see all the shifts on the calendar I select a shift and click on the members tab on the right once I Define a rotation for primary responder level I click the save button and then click the save and publish button on the bottom right next I will Define the escalation triggers and policies for our team before I proceed I'd like to point out that we're leveraging itsm functionality in this step I start by clicking the create trigger button I will call this escalation trigger P1 incidence in the conditions section I will use the default incident table and I will select priority is one critical I will leverage the default on call workflow I click the save changes button to complete my escalation trigger next I will Define the escalation policy by clicking the new escalation policy button I will name the escalation policy escalate P1 incidents and select monthly shift I will make this my default escalation policy by selecting use as default I will add an escalation step and name it rotate through members for the responder level from the drop- down I will select primary I will set up two reminder notifications every 15 minutes the time to next step after the last notification is also 15 minutes select done select save changes I just completed the trigger notification and policy next I will Define the services managed by my team by clicking on the services managed Tab and then selecting ad service I can create a new service or choose an existing service I will select associate existing cmdb services to SRM then I will select the order service from the select Services drop down I will click next and then click done next I will select the order service then select add integration in the alert data source tile this may look familiar to many of you because we are leveraging the service operations workspace Integrations Launchpad module that is part of our itom health offering I will select the service now Cloud observability integration I will name this integration Cloud OBS alerts and the description is order service alerts click next and save then I will click the activate button click continue I will copy the connector URL and save it next I will Define the service reliability metrics for order service we start by cting the add SLO and SLI in the guided setup we start with the service level objective I will name the SL availability monthly we will measure this objective by duration the objective percentage is 99.99 next we will set up the service level indicator and specify the metrics we want to capture to inform the SLO I will select add SLI name the Li availability monthly add a new condition set the field is metric name the operator is contains the value is availability then I select save next I will select error budget and SEL select add threshold my threshold value is 50% then I select save I added my service reliability metrics but I need to select activate for it to take effect now that leean has completed her SRM manager responsibilities let's shift Focus to the SRM responder role we'll log in as Daniel zil an SRM responder to explore the service operations workspace and see how a responder inter interacts with the system on a daily basis I will start by navigating to the reliability task and select the alerts tab to look at the latest alerts we have five critical alerts I will click on the grouped alert to view its details I will select the details tab to review the alert this alerts looks like a good candidate for an incident we are going to promote it to an incident using the promote to incident UI action after confirming I will see an incident created this can be verified by looking at the task field on the alert form I will open the incident and add a comment that I will start my investigation that brings us to the conclusion of this demo as you've seen SRM offers a comprehensive and streamlined solution for managing service Health from setting up teams and defining escalation policies to ensuring proactive Incident Management and alert automation thank you for your time

View original source

https://www.youtube.com/watch?v=aMraEpuObVA