Data Collection Best Practices - Oct. 21, 2020 - Performance Analytics Academy

morning good afternoon and good evening um welcome to performance analytics and reporting office hours uh welcome to everybody i see lots of returning names and a few new ones let's get started so first of all this session is for you the the idea here is that we're able to share some new ideas uh give you a better understanding especially today kind of how pa works and and how you can get the most out of it today uh all our sessions are being are recorded if you have an issue with that please disconnect now but we're recording all these and they are posted to the community and they'll be posted to youtube as well for your reference and to share with all your friends um and q a as we're going through this and we're talking about some things please use the q a window in zoom to ask us questions if you put it in chat uh we may not see it it may get missed as it scrolls by the q a allows us to make sure that we answer everybody's question that comes in it also allows us to record those questions so we can post those answers later on so please if you have any questions as we're going and in this session i expect we'll have some questions please put them in q a so we can address them as we're going so today i want to introduce myself uh i'm adam stout i'm your i'm your host for today an outbound product manager for now platform and now intelligence and our speaker today is thomas davis as who we've heard from in the past uh also an outbound product manager for now platform and now intelligence uh and thomas has done a lot of work to prepare the best practices for for uh data collection before we get into that uh as a reminder we we record these we post these the deck that thomas presents is going to be posted into the community all the past ones are posted in there as well as the recording and a shortcut to the links that we talk about in the references and and there are a lot of references in this presentation so if you hit that community article you can view all the past ones and when this one comes up you'll get quick access to all the links that we're sending out and as we go as we talk about every uh every session there is lots of training available um a lot of what we're talking about in the best in these best practices are covered in the training whether it's k20 labs and we go through them or in the performance analytics training that's available lots and lots and lots of training for you the k20 labs um i've heard are going to be uh retired at the end of the year so if you have not taken advantage of the performance analytic analytics and reporting labs from from knowledge20 i'd recommend doing that in the next couple months um i believe we have 10 labs out there 10 or 11 labs uh just in this area so if you haven't taken them i'd recommend to take them send them to send them to your colleagues they're free it's self-paced you get some hands on some more hands-on experience uh that's there a lot as well as a lot of other content out there out of the community uh i think most of us found this from from looking in the community and seeing the posting in there we're going to do our best to answer the questions that you have today but if you have more please post a question in the community and if you haven't read this guide there's just some great tips about how to get an answer that you're looking for in the fastest amount of time when you just go i need help it doesn't it's hard for us to help you but if you follow some of these goal or some of these recommendations you're going to be able to get an answer much more quickly okay and with that i'm going to turn it over to thomas uh who's gonna talk talk us through some data collection best practices okay thanks adam let me share my screen here make sure i screw this share the right one okay all right you can see my screen right looks great okay good thanks adam so um thank you for everyone that has come today and today we're going to talk about data collection best practices um again my name is thomas davis and although this may not be all of the best practices that are out there these are the ones that we felt we could share with you all to give the best impact on your data collection process and its performance to hopefully help it run more efficiently that's what we hope for today so our agenda um so what we're going to try to talk about today we'll first discuss what data collection is although i'm pretty sure everyone here is aware of what data collections are we think it would be a good idea to just talk through the process and explain how things work then we'll talk through some of the do's and don'ts of data collection talk through some solutions to make data collection jobs run more efficiently then we'll finish up with some great sources and of course open up the call to some questions about data collections hopefully we'll have some of those so what is a data collection when using performance analytics you have to gather data together to then display in the analytics hub or dashboards whichever you choose to performance analytics uses scheduled jobs to collect and clean scores and snapshots of data and also enable you to manually set or import scores as well if you need to collect data immediately for existing records you would you would run a historical data collection job for ongoing data collections there's a couple different methods that we can choose from if you need to measure and measure an indicator once a month a quarter or year enter scores manually or import scores if you need to measure indicators more often or frequently or if you just want to eliminate any human involvement you'll want to use a scheduled data collection for that so how does it work so here we have all the different ways and some of the additional things that you can do inside of data collections that we have here with the links so again i'm sure that most of you know but if you don't know this whole slide deck will be made available to you after the after the call is over and we get it pushed out into a blog in the community so you'll actually be able to have access to these links where you can get more in-depth information about every one of these but we just wanted to hit all of these uh at a very high level so you would understand exactly what you can do and exactly how it works so again talking about collect historical data this is to collect scores and snapshots of existing records remember when collecting data for the first time such as a new indicator run a historical data collection wants to generate scores and snapshots for uh existing records so the big thing to remember about this is an historical collection is not something that you would run every day for data that you have already collected it's not necessary uh and you don't want to potentially write over something that you're collecting in a different manner somewhere else so just remember the biggest thing for historical data collections is is that first time for an indicator or even breakdowns if you add a breakdown or whatnot to an indicator that's already out there that's when you'll use that so you can create or schedule data collection jobs obviously so here we schedule a data collection job to run or to regularly collect scores so this may be something that is a lot more that happens a lot more than a historical job configure a job indicator here we can increase the efficiency of your data collection by configuring job indicators to collect only what is necessary and sensible gives you the ability to be a little bit more precise for exactly what you want you obviously can cancel the data collection anytime hopefully if you've had to use it hopefully you've never had to use it but if you did you know exactly where that is so you can cancel an active data collection job to stop the job from finishing the data finishing the collection job in its current state add or edit indicator scores manually you can always manually enter scores score data for indicators if you need to including automated indicators as well so view data collection job events so when you look at a job event you can see which jobs have been executed for pa and which actions if any have been triggered in your instance such as notifications or business rules so the job collection uh or the data collection job event is is a good source for you to actually go to to see what has actually happened with that job with what is triggered and so on and so forth uh data collection jobs so job logs are our friend they should be your friend it's a great source for you to find things out not necessarily all the time when a job fails but also just see if there is some latency that's happening inside of a job or whatnot you can really see some good things from there so it will uh display information about data collection jobs that have run for pa you have the ability to view the job logs create events and view and edit the event registry if you need to all logs display all logs are displayed unless they're filtered so next is the data collection process and logging so this is the best place to debug a data collection it's always helpful to understand the data collection process and how it's reflected in the jobs so this gives you the ability to actually look in depth into the job logs and see you know step by step if you needed to of what's actually happening in the job and then of course if there were any warnings or errors or whatnot and then last you can view the data collection usage so if you need or want to view statistics about job collection jobs click the data collection overview and that is a usage tile that's inside of the pa admin console itself so here are some of the best practices that we have listed out uh we will go into more detail for each one of these um but again what can you do at the indicator source to make job uh data collection jobs better it can start right there and it should start right there with some of the decisions that you make around your indicator source to make sure that as it goes forward it will collect the data collection job will collect more efficiently are there things that you can do at the indicator level absolutely uh at the data collection level itself what practice should you always try and follow we'll go over those and then we'll look at some of the advanced practices that again if we can try and follow it will make data collection more efficient more efficiently for you so indicator sources so the big the big thing that i'm sure that most of you have heard if you not if you have not heard this uh this is huge so dates not states uh i sometimes think that this could be a slogan for adam i've heard adam say this more than anybody but it's really something that's that's big that needs to happen at that indicator source level itself so uh here you can see in the example that we're actually creating an indicator source and um when we zoom in there you can actually see that we're going to create some different conditions but we make sure that we're looking at specific days so although an active filter in many cases is important here it is not as using as using it does not allow accurate historic collections although that you do want to see for anything that is actually active and it's okay to put that into the conditions but don't let that be the only thing that you put into the conditions if you can put us a date in there make sure that you actually do that it's going to make everything work better and more efficiently as you go down the line so with a set of day conditions indicators based on this definition retrieve the accurate set of records when running historic scores collection because the dates do not change using today is the cen is simply today when you're collecting on that day this lets us know what is currently open today the today operation operand is placeholder for real dates and is replaced by uh real data during data collection so you'll see that um when you're collecting even though it says today it's just basically it knows the dates of today and that's what it's actually going to capture inside of the data itself so again if you take anything away from indicator sources dates not states so next at the indicator level itself the number of incidents that were open yesterday is not going to change so it doesn't make sense to pull that data again and if an indicator is in multiple daily jobs collecting multiple times you're you're causing unnecessary unneeded work to be done so in this example here we can see that this particular indicator was actually a part of multiple daily jobs there's no reason for the indicator to be in more than one daily job so going through the process of actually looking at your indicators obviously during creation you want to make sure that you do not put them in multiple data collection jobs however as you know instances go on and things get added and other people have the ability to create indicators and add them into jobs it's probably a good practice for you to every once in a while try to figure out a schedule where you actually go in and look at some of these some of the indicators if not all of them especially the indicators that are used quite heavily and make sure that they're not in multiple daily jobs it just doesn't make any sense there's no efficiency in that so make sure that you're actually pulling them out of the multiple daily jobs themselves they can stay in a historical job it doesn't matter it doesn't hurt for them to sit there uh because historical jobs um are should only be ran on demand so it's not like they're actually part of a schedule so it doesn't necessarily hurt them sitting there because they're probably not running um in that historical job anyway so data collection itself so let's talk about exactly what what we're trying to pull with a daily daily collection and when we should actually run that job so here you can see that we have a relative start date of one and one which means that we're looking for anything that is yesterday not today and the reason for that is we cannot give an accurate measure for a day until the day is actually complete so we'll capture the data for yesterday one day ago not today which would be zero days ago we save processing time by by not recalculating things that do not change as we talked about earlier is important that the indicator sources themselves include a date filter every extra day's worth of data increases the processing time and hair linear linearly so the more that you're looking for the longer the processing time is going to take historical jobs containing a wider time frame should not be run or on a scheduled basis they should be only used when collecting data for the first time we also want to make sure that we run this job between midnight and 6 am again we cannot give an accurate measure for a day until that day is complete the more quickly data changes they should be run closer to midnight but not before to capture the actual accurate values for fast changing breakdowns like state time is displayed in your time zone but should be run after midnight for the time time zone the job is set to run for so what does that mean so that means that if i'm in the eastern which i am in the eastern time zone and i have that uh job set up to run that's that's when it's actually going to kick off but the time itself is going to it is going to be based off where you want it actually ran at for an example so if i tell it i need it to run at one o'clock in the morning zero one zero zero zero but i'm in eastern but i know the data is for the west coast well that the west coast has not gotten to midnight yet i actually want to make sure that i run that maybe three hours after midnight in my time zone to make sure that i capture everything that has been done for the day prior in the west coast so make sure that you pay attention to the data that you're collecting where you're actually at yourself that has you know the time zone that you have it set for to make sure that you're still going to be able to capture anything for the day prior itself so next we'll talk about the run as user so the rennes user must be an active account honestly if you can dedicate a system account to it that's the best option and really what is preferred the data collector itself bypasses acl so there's no there's not to worry nothing to worry about it being able to collect all the data based on table security it will bypass that and it will pull all the data in whatever run as account you use it generally doesn't need any roles so again if you were to create a dedicated system account you don't have to worry and it was for this specifically you don't have to worry about any roles that are needed inside of that account itself in order to run the data collection if you're using scripts um they execute as a run as user and acls will are honored in scripts so if there's a table that you're accessing inside of the script understand that whatever acls are on that particular table since you're calling it in the script that particular run as user would need that in order to pull any of the data based on what the script is um and then obviously remember that when displaying any of these records acls are enforced if and when you need to debug a data collection it'd be simpler if you're using a dedicated user account because when you look through job logs if you decide to look through job logs if you need to look through job logs then you know being able to find and search for that specific um user account will make life a lot easier when you're looking for particular things at a particular time or whatnot so again the biggest thing to take away from this is if your run as user can be a system account uh we would definitely recommend that you do that and by all means it definitely has to be an active user so let's talk about time zones again a little bit here there are two that are involved with the data collector what time zone the jobs are run in so remember that today will be defined with this time zone this is important when you make sure that you are collecting all the relevant scores timezone is also used for the score start and score end these are used in your scripts if you're writing a script the variables so you'll want to make sure that you're getting the exact timestamp that you need with that time zone to make sure that whatever you're using inside of the script itself is collecting the right information if the time zone is not set it'll just use the system time by default next when the job runs so this displays uh in your time zone so this is what time the job itself will actually run and we talked about this a little bit earlier but just remember make sure that if you're in a different time zone than the data that you're actually trying to to gather make sure that you schedule it in enough time where the day will be over for the day prior for you to make sure so you make sure you get everything so a couple of quick best practices here so for daily jobs this will normally be between midnight and six as we said for the run as try not to have more than two jobs overlap and always stagger the start times we'll talk about this actually a little bit more on this next slide here maybe if i get my slides to advance there we go all right so here let's talk about scheduling jobs and if they're scheduled on top of each other so this is very important it is worth taking the time to manage this to make sure that again you have more efficient job collection so what we want to talk about in this case is making sure that none of your data jobs are scheduled on top of one another so actually you know when you're doing this and when you're actually creating it's worth it for you to look at the other jobs that are already there if you're creating a new job to make sure that you choose not to actually schedule it the same start time of an existing job the other big thing to always look at is how long does a job take and a lot and obviously that comes down to how much data that you're collecting uh what type of data that you're collecting but if you can look at how long on average a particular job takes to run then that will help you from a scheduling perspective to decide when you want this particular job to start after another job that is maybe let's say that there's a job that's starting at 4 30 in the morning and you want to schedule one to start around that same time if the one at 4 30 on average ends in about three minutes well then it wouldn't hurt to go to 435 for years just to give it a little bit of buffer and to make sure that they're not starting and staggering and they're on top of each other and we can kind of see that here a little bit inside of this here so here we have some jobs and this would be kind of fast but you'll understand the premise so here we have some jobs that are all starting at 4 30. let's fix these so the easiest way to do that and i first time for this particular for office hours i didn't go into all of them but basically you'll just go into them and then you'll just change that start time so basically what i did here when i fixed these is i went in there and i just changed these by five minutes so now you can see it goes from 4 30 or from 4 30 to 4 35 to 4 40. again your start time after an existing job may not be the five minute window that i did in this this is just an example again what you need to look at is what's that average time that it takes the job that you're scheduling after to finish for you to decide what's the best start time for that so again these are small little practices that you can do that will help the efficiency and also how much is actually running at the same time and again these are one of those things that if you stay on top of it it can be controlled quite easily but if things get out of hand and there's multiple people that have the ability to create jobs or add jobs to indicators or anything of that sort uh you know that it could get out of control so it's probably again a best practice for something that maybe you want to go in and look at every so often to make sure that you know the times are offset and they're not there and they're staggering and they're not on top of each other so again this is a good practice to try to follow for that so next we'll talk about historical collections uh and their limits so these are only applied historic scores which are only performed once after a new indicator and or breakdown is created so again going back to what we talked about earlier historical collections should really be used when something is new for example a new indicator or if you have an existing indicator and you've added a new breakdown to it and you need to be able to show that historical information based on that breakdown historical jobs themselves are not something that that should be ran every day they honestly i mean it really should be set up where it's an on-demand type job so don't schedule it it doesn't need to be scheduled it should be something that you do uh again whenever any of these particular things happen and then and the other time that they should be ran is if there's something that's out of the box that has been installed that's been turned on if a plug-in has been turned on or whatnot that's out of the box and you want you want to be able to collect the historical information for that those are the times that you want to run historical jobs um so there are some limits so it's not it's not it is not possible to retrieve accurate historical scores on open records for indicators that are based on conditions or fields for which the values will change during the life cycle for example if a state you know if a state changes the historical historical job is only going to collect whatever state it is in when it's ran so if you have something that's looking at opened and and then you decide to run historical and that historical is looking for a state that's closed and it happens to close but it does that you're you're not you're not ready to actually show it like that then it will change that data so remember that when you do those things historically uh data will obviously change for that so last updated the reassignment count breached for indicator breakdown combinations that have breakdowns on fields of which the values will change during the life cycle of the open record so all of that information uh can change so that's a that's that's really a limit to what the historicals job do because they're looking for very specific so make sure that you know when you're running an historical job collection again it's an on-demand thing don't make it a a daily thing so some of the more advanced things that we want to kind of talk to a little bit so to make make sure that you understand that these uh can and will affect uh job collections uh their processing times and things of that nature so scripts so like we hit on earlier score start and score in variables are used in scripts when date processing is needed for indicators like age and this is something that is is very common this is not something that's uncommon just understand that the time that the data is collected going back to that indicator source where you have you know uh where we say you know dates not states and the score start is the today right it's gonna it's gonna capture that date so you wanna make sure that all of those things are set up correctly before you get to the point where now you're using a script to make sure that the score start and the score end variables are what you need them to be in order to make that script run and return whatever data that you're actually looking for so the next thing is manual breakdown so they're they're out there but however they should be replaced with database views for better better performance so whenever you have the ability to actually change the manual breakdowns to a database view it's highly recommended you do that and we have a link to a prior office hours that actually talks in great depth about solutions around database views that we'll show here in a second so and then the last thing is data transformation um so it should be handled with business rules and event based transformation to support the support using the data in reporting reporting filters predictive intelligence floated on designer etc so where data transformation is needed and used make sure that you're using you know the data transformation and business rules will really help you in that as well so if you have the ability to do that or if you have to get with your admin to talk around things like that anytime that you can use things like that it will make data collection better and it's highly recommended that you do those things so um with that so again here are some great sources so the the very the first couple slides were actually talked through every one of those particular things as it relates to data collection here are the links for all of those they're in those slides but they're also right here as well there are some great community uh blogs uh our office hours i'm sorry that have already been have already been done in the past that it would be a really good idea for you to reference back to those and then obviously inside servicenow learning itself the fundamentals advanced and uh application specialists those courses talked and talk in depth about data collection as well so if you've not done any of those and you want to get a get more comfortable and more information about those those are highly recommended as well and then there are some uh k20 uh courses as well that are still out there like adam said that you can reference as well so although you know you may not need all of these now i would definitely tell you to take the time to look through or at least bookmark these links for future reference because uh we believe that they'll most definitely help you out if and when there you need more information about data data collection in the future so make sure that you look at these if you need it i think they'll definitely help you out and with that hopefully we did not go through that too quick obviously all of this information will be put into uh the blog all of the videos will be out there that are part of this display i mean of this office hours will be in the blog as well um but we're at the time probably that that most want which i'm actually seeing the questions come in so what do we have for questions uh okay so let's we'll start from the ones that came in for a global environment for a global company when is the best time to run uh to schedule a job so i think and adam i don't know if you um have a better answer for this but i think that it it really comes down to um you know the data itself when it's ending so you know where the person is at or where the time zone is at that's actually running the job will play a huge part in that just like i was speaking to earlier and adam i don't know if you know if there is a best practice around of that because it really just talks about when the day is going to be over for a particular area but this obviously is a moving target so i don't know if there's a best practice or a time that you're thinking yourself so uh where i where i go for this one is it depends right generally speaking if i'm looking at things like number of incident a number of incidents in a day it doesn't matter what the day is it's it's 24 hours it's 24 hours and the numbers should average out um so part of it doesn't matter but you do have to pick one so the one i go with is wherever wherever i close my books um if my indicator was something about like a mia incidence or if it had some some area that it was focused in on it was four of my apj execs then maybe i'd pick uh i'd pick a time zone relevant to them but ostensibly for servicenow our company is based in santa clara in california pacific time zone so i pick pacific time zone and my my teammates in emea their day is defined by when when by california time same thing for the australian team a day has to do with what has to do with when when that day is in pacific time um and the logic is it's the same as closing your books uh financially speaking you gotta pick a time so we pick one where most of the people are and then we go with it um and again because 24 hours is 24 hours regardless it doesn't really matter uh generally speaking people can get hung up on that but it normally doesn't matter we have to pick a date i pick where the company is based um or where if i had a support center in uh in amsterdam then i'm gonna probably pick the amsterdam date but uh or time zone but it's it's gonna be subjective you just generally wanted to link up and communicate that uh just like our financial books are all are based on one time zone uh i i put all of my indicators based in that one in that same time zone it answers all those and as we go through the the challenge the challenge goes through it is reporting won't match but when we get into indicators and we're looking at monthly levels again we just say it's going to be a pacific time everybody understands specific time and we go from there yeah that makes perfect sense that's why i definitely gave it to you yeah so i think the next one is when should manual breakdowns be used so whenever you have a breakdown that you can't find a fair a reference for in a particular table um i.e the cis choice or something like that if you can't find or if there's not a reference to the particular breakdown that your your your need inside the data then um or not necessarily inside the data but there's just not a reference that you can actually use you can't get to the the table or it could be quite a few results inside of that table so you know you would create a manual breakdown and i think the one that probably is used most often um and adam you can definitely elaborate on this as well i i see a lot of people use a manual breakdown for like hours for you know buckets and things like that so um that's generally when manual breakdowns are there uh to be used but i i think that most most of times everybody can get whatever they need uh based out of what is there but adam again you may have a better example that that you could answer with this as well and those are good the hours one is certainly one that comes up um my general answer my preference i'll go with my preference uh is manual breakdowns would be used with manual indicators that's fine right because manual's manual and you can't do it i have uh heartburn if i see a manual breakdown on an automated indicator um because when you use manual breakdowns you need you need a script that script requires a query to the database your job time is going to take a hit if you'll if it doesn't matter it doesn't matter but i i run into very few instances where performance does not matter so i generally don't want to use them and we talked about this in in the descripting pa we talked about in the descripting pa office hours a while ago about alternatives to manual breakdowns um so that they are there if you need them for the use cases thomas talked about if we can't get rid of them but if i need to re something really quick i'll do it and not feel good about myself but i'll do it but the real use case for manual breakdown is a manual indicator where it is something that does not exist in the data it does not the indicator doesn't exist in the database it's just something i'm typing in avoid them if possible correct so this next question i actually think is is a really good question and i think that they probably answered it in their first line but i adam i don't know if there's something that you're aware of but it sounds like something that would be a real a really great idea especially to make things more efficient and take the the manual side out of having to really understand or know when the best time to schedule is i mean being able to have a job finish and then the next one start automatically sounds like a great idea and i don't know if there's something that can be done with that or if it it definitely is an idea steven the check is in the mail uh we're gonna talk about this in two weeks in our next office hours um but if i if i want to daisy chain my jobs together uh we're what we're gonna talk about in two weeks is putting integrating pa into your workflow uh and so this i actually haven't done this use case but i have now two weeks to figure it out half an hour less than two weeks to figure it out um so we're going to see if we can tackle this in two weeks on how i would how we would do it um but yeah today you can't i will say it generally doesn't matter not in all cases but it generally doesn't matter because if i have uh if i have several indicators i want to run i'm actually going to run one big job generally there are some exceptions when we get into if our job's taking hours i i might break it up um but the key that and thomas talked about this is you can put them in different jobs and stagger by at least a minute so that load balancing takes over so all of our all of our nodes all of our production instances are most of ours are going to be multi-node instances and the pa job run the the pinch for a pa job is memory on the node if i fire all of my jobs in one minute in the sa at the exact same time the load balancer goes oh this node's not busy and it might if i have five jobs that fire all five jobs will go to the same node which is gonna cause me problems potentially it's gonna cause me problems so by staggering by a minute one starts i can start the next one it's actually gonna go to a different node the query to the database is pretty light and then the other node is taking the memory load and the processing load so by staggering by at least a minute between my jobs i can overlap you don't want to overlap by more than the number of nodes you have which you may not know how many nodes you have so don't do 100 at a time but if you're doing two or three or four at a time it's generally going to be okay and and if you need if you're if you're doing more you generally have a bigger instance so it's generally going to be okay but the use case is still perfectly valid so we will talk about uh we will walk through how how we would um chain jobs together if that's what we needed to do um although i think general in most use cases you wouldn't need to do that you can just stagger them or put them in one big big job then i don't think we talked about this specifically but when we are putting jobs together the jobs run as indicator sources so the more indicator sources the more indicators per source you get in a job the more memory it takes but the more efficient you are because we only have to pull the data once the indicator the indicator all the indicators in an indicator source use the same data so i only have to pull that data once into an indicator source if i put indicators from the same indicator source in different jobs i'm actually doing more work because i have to pull the data twice from the database once for each job um and the memory usage is is cleared when i switch the indicator source in the job and this is getting quite detailed but i think we're in the right session for it so if i put unrelated indicators on different indicator sources in the same job they run serially it's just like i chain them together uh there's no there's no downside to doing that um so even if yeah as long as they're different indicator sources if i have two jobs daisy chained together or i have one job with two indicator sources that's the equivalent processing wise time wise everything it's the equivalent um so there may be a different use case for it but generally speaking i just put them in one job and let it go okay hopefully that answered that one did i just answer the next one too uh so question is is it true that it's more efficient to group indicators together that use the same indicator source in a job yes yes so and again to hit that specifically is that the indicator source is the data set you're getting so if that has 50 000 records in it i pull that 50 000 records and then i'm just counting i'm counting one field summing another field but the heavy lift is pulling that data into the app node to process it so it is absolutely more efficient to process more indicators from the same indicator source in the same job the one caveat would that be with that is it it will end up taking slightly more memory so there there are cases because of memory consumption and the number of scores we're collecting and breakdowns and elements where we'll we'll be required to break them into separate jobs just because we can't do it all at one time um but that should be the exception generally you're going to put all of your indicators for the same indicator source in one job and collect them together it'll be more efficient um uh and specifically if you have scripts the scripts only evaluate once for the indicator source so if i'm that can be a lot of my time so by breaking them apart i will effectively double my time um and adding more in there should adding an additional indicator from the same source should have a nega should add a negligible amount of time like i don't think you'd be able to consistently see it um if it took a minute you know maybe take a minute five when you have the second one but maybe it would take 55 seconds so absolutely group your indicator run your indicators for the indicator source together uh until you have an issue all right thomas here's here's one for you to follow up on uh what's the best practice for historical jobs should we keep the historical job around per indicator or per indicator source or just how or just have one big historical job with everything in it well i don't know that i would put i would make one big historical job with everything in it uh again i think it it it comes down to how you manage it right in the maintenance of it i don't know that you need a historical job for each indicator i don't think that's necessary i know that in the text that we went through it talked about making sure that you use a you know a new historical job for a new indicator or something that was or indicators that were maybe for something that was out of the box or whatnot so i i would just say that you know when you when you need to run something historical i don't think that there's anything wrong with using an existing an existing historical job if you pull out what's not needed that's in there but the other thing that you have to think about is how that historical job is set up from a time perspective right how much time are you trying to pull maybe you have an historical job that looks at uh i want to see a year's worth of data or a quarter's worth of data or six months worth of data or whatnot so having maybe something like that and i'm obviously going to pass this back to adam because he may have a best practice that he's done i just know that from what i've done in the past is you know keeping a few historical jobs out there that i just know which one to put the indicator into whenever i need to pull a certain amounts worth of data so i know that in the one um preset or a little display or video that we did there was an indicator that was in two daily and one historical so even though it's sitting in that historical it doesn't necessarily mean that it's been ran since that very first time that it was actually put in that historical but maybe the person just left it in there so if i would i would look at when was the last time that particular historical job was actually ran um and if you can get you know away with using that one and just moving the indicators out of it that you don't need maybe you can do that but if you also feel comfortable that you don't want to mess with any of the historical jobs that are out there because maybe whoever put them who whatever the other person was that put in that historical job they want to make sure that when they go back that they need that indicator that it's there then maybe you need to create a new historic job i don't know if there's anything um a strict set of rules that are around that unless adam you're aware of something i think it just comes down to how you manage it i think the bigger thing to take away from historical jobs is only run them when it's when you need that historical data don't depend on historical jobs to update information that you can get from a from a daily job i think that's the way that i would answer in that i don't know if you want to elaborate on that anymore um i think it's pretty good i i end up creating one historical job per daily job if historical is possible so i'm generally one-to-one but sometimes there are there is no historical job uh but if i have like my ins incident daily metrics i'll have an instant in incident historical metrics that goes along with it um make sure in your historical jobs you exclude the breakdowns that are not uh historically safe to collect like uh like state we can't we just can't collect state historically so just exclude it so it's empty um and then historical jobs um probably should write this up a little bit more there are some nuances in it it's covered in i believe in the advanced training um how you tune it so what you can collect and what you can't collect and remember if if you have them all in one job and you want to collect your historical your historical data you're going to overwrite whatever you have so if you're for instance if you're adding a breakdown you can collect that historically but you might have to create a new historic job where you're tuning the indicators or you're saying don't collect the main number don't collect any of these breakdowns but collect this one breakdown but then i'll generally delete it when i'm done i'll keep it make sure everything went through and then i i will have one historical job which is optimized it's not just a blank blank throw it in but it's optimized that if i had to collect this historically i could um and it's rare that i have to run a historical job but it does happen right so we found some problem was not that some integration was broken and all of my numbers that i have that i have for the last you know uh 15 days are invalid i'll then have to run a historical collection right i'm not going to be able to get the breakdowns by state but the numbers i do have will be right um but again to the best practices it's going to be one it's ostensibly going to be one to where possible we're one-to-one and then the historical job will be tuned and configured for what's historically safe to collect and it's good when you're building it to think about it so that way when you're in an emergency you have a plan right you know i can run this and you don't have to spend two or three days creating a job configuring it and causing problems we do do see people run into problems where they run a historical job that they didn't mean to and erase really good data and that that's quite difficult to recover pa scores that you've overwritten with bad historical data okay uh we have a couple more in here so one came up which was a follow-up i'll get um uh here's one for well i'll follow up on this one uh is there best practice in the number of indicators in a single indicator source um is many as makes sense if i can reuse the indicator source i want to reuse the indicator source again i get that the economies of scale that i don't have to pull that data back um the the guideline is the balancing side of this is that i want my indicator sources as small as possible so i don't want to have all of my new indicators uh an indic an indicator source that can contains all open incidents all incidents open today and all incidents closed today right because that set doesn't really overlap so i i want to have my indicator source which gives me my superset but as small as possible and i want to make sure my indicators are using it so there's there's some subjectivity here but um i i i want to see an indicator on every indicator source that has no conditions because that tells me i'm not pulling back extra data and i want to again get that indicator source where i can get multiple if i'm always one to one to one to one i i think i can probably group them together um it's not inherently bad but generally i'm going to have multiple indicators on an indicator source there is not an out it's very data specific so there's not an optimum um but just make sure that if you can reuse an indicator source you know if i can make the indicator source five percent bigger and group together three indicators uh that's better fewer indicator sources is is gener is generally better performance um it's a follow-up to that account distinct is usually usually takes longer to collect yes i'm a big fat liar they all do better count distinct has to keep all the count has to keep things in memory this shouldn't make more sense until you're running low on memory so count distinct on very large volumes of distinctive values you can have an impact to memory and once you get to a certain memory limit the job slows down and we start paging um so the countess account distinct on a large with a large number of distinct values is one where i actually might pull just that count distinct indicator into a separate job same indicator source still but a separate job because then i have a different memory pool it will move back i can load balance it to a different node so count distincts that's generally where i am going to separate them out so i still i stand by the we should group everything together but uh uh the count distinct is where is where i might start pulling stuff out um again once i see that it's slow that's where i'm gonna look at it okay uh this is one that comes up as well so indicator it comes out regularly in indicator sources monthly should i collect it on the first day of the month the the analogy to this is uh this is what thomas talked about with a daily indicator daily indicators don't have to run at midnight they don't have to run at 1201 they can run at 3am and and it's fine a monthly indicator i generally do run on the first day of the month but i don't run it at 1201 i don't run it at 1am i'll probably run it i actually might run it during the daytime or the really the sweet spot for a monthly indicator if you can handle the monthly indicator like this is to run it after hours on the first so 7 pm 8 pm on the first and now i don't have it ready on the first of the morning so sometimes i do need to run it in the early am but i'm going to run it just like we do midnight to midnight for a daily i'm running it from the first the 31st that number doesn't change whether i run it at 1am or 7pm so if i am able to get away with it i'm gonna run it at uh after hours on the first uh if not then i'll run it on the first but if i can put it to half hours on the second you know that's fine too um there are some some indicators i think of this is like closing my financial books um just because my quarter ends i don't close my books for another 10 days i have to let some things settle and if that's the case then you actually might want to run it on the third right let the assignment groups get settled in case anything moves let the priorities get settled things get resolved the slas finally find the breach or not so there are some metrics which you can't which aren't final in that month they do need a few more days to correct the data again just like closing our financial books um and in that case you'd push it out okay so uh thomas here's here's one for you if um uh if we need to collect historical for a new breakdown on a matrix so i'm just adding a new breakdown to i'm adding a new breakdown to an indicator second level first level i don't think it matters here uh do i need to recollect the indicator well i mean if you're if you're looking for anything that's in the past then then obviously you you have to uh but then not only that it needs to be able to the indicator itself needs to know the new breakdown so it's got to look for the data that includes that so you would have to re i mean that indicator that's the only way that you're going to get the new data for that breakdown because the breakdown is in the indicator itself um i'm understanding that well when you collect it i believe this and i i i'm doing this for memory i believe what you're going to want to do though on that job is to set the indicator to inactive and the job so just like just like you exclude breakdowns yeah when you do the historical collection you can set the in the job in the the new historical job you're going to set the indicator to active as false or inactive i think it's active as false and collect it and that should collect just the breakdown that new breakdown it'll exclude the other ones because you'll explicitly exclude the other ones and the indicator act or inactive is it whether or not i'm grabbing the global number um whether or not you need it that's that's a decision for you but if you don't and i'm just purely adding the new mate the new breakdown which may not add up to the total so you got to be aware of that right but if that's the case uh i believe you'll just set the indicator you you add it but you set it to inactive or active as pulse it's a bit tricky um but if you're doing that make sure you test it but i believe that's the setting you're gonna want indicator is added it's false all of the other breakdowns are excluded and the one the one new breakdown will not be excluded all right um we have a couple of of other ones that come in that came in and i'm just reading them to it so make sure i get them correct um so a question a question came in in the community um and i believe we uh i believe it's in the idea portal now uh that has to do with with conditional with conditional pa on a data collector setting the condition and the condition i will say it's inherited from the from jobs all jobs that support conditions uh if you evaluate to false then the job doesn't run um and i i had a question about this about uh why why right like i run everything daily i run everything um yeah run it i run everything daily or monthly depending on what it needs to be and it runs and it runs i don't have a false condition um the uh and this is a very interesting use case that there's a condition to check to make sure that uh to check some to check the data i believe to check the data so that it doesn't blow up um so my job doesn't fail i am actually more in favor of the job failing um and uh i'm gonna follow up about this because i actually like to see the failures and handle them uh and the same thing happens somewhat related to what we're talking about next week or sorry in two weeks about about how to handle failures i would look at how do we handle the failure and but making sure we do fail because a fail is a fail if i don't collect i consider that to be a fail but maybe we want it to be a little bit of a soft of a softer failure so i appreciate that that comment and we will we'll follow up offline to get how how we can better handle that uh whether we whether we improve the product or if there's a better design for this um and i i think the last question we have in the queue if you have anything else please please feel free to submit it we do have just a couple minutes left um but the uh so this next question is for interactive filters and date filters and the the differences between performance analytics and reporting um and there's a few of these so i mentioned time zones so time zones for pa are fixed but if i ran if i look at if i look at that indicator source so so thomas is in the eastern time zone i'm in the pacific time zone so when i look at pa and run reports i get the same result but if he looks at it he might get a different result because he's three hours ahead of me uh that's a difference in reporting and then the filters can do the same thing um there are some inconsistencies i i the question is are there existences yes there are inconsistencies it is it is being worked on to to unify them um there's some exciting developments that that you'll hear about that you might hear about in the next few months um where we are we're we are trying to reconcile it um we've actually absolutely heard everybody's comments about the differences between pa and reporting are painful interactive filters and dashboards uh dashboard breakdowns so we are doing everything we can to reconcile those so that it is harder to tell the difference between reporting and pa that that's our goal and that pa effectively is just reporting with history um we're not there yet but but we're we're working towards it um it is it's one of the issues we come into with best practices that when you put pa and reports on the same dashboard it can get weird um uh so generally on a specific sorry in a specific tab i'll try to stick to pa or stick to reporting uh but we are working on it and if we go through these if you have some case and i see we have some cases here but if you put them in the uh some screenshots in the community about we're trying to get to we might be able to come up with what the best setting is um uh and and whether that means modifying the out of the box date range filters which can be quite challenging or modifying the time series that are available in pa that that's actually a little bit easier there are some solutions to line these up and i want to make sure that we know about where they're off so that we can correct them but again our goal is that we resolve these and and tune it so that they all behave the same regardless of where the source of the data is okay and i believe we got that last question uh answered oh i'm gonna add one a question came in about i'm gonna this is a plug for me so i'm gonna go for it uh a question came about where do i find reports that are used on which dashboard um and so we have reports report statistics that they talk about what's being used um i posted a blog a couple of weeks ago i think about creating a database view that that yeah database view that shows you where ever all the reports that are on dashboard also if you go to the dashboard and use the impact dependency analysis so if you go to the dashboard properties you can say show me what's on here it will show you the everything that's on that dashboard one dashboard at a time but specifically if you're looking at reports the way i do it is use the view that i posted you can create the view and it allows you to see all the reports on a dashboard and all the dashboards that a report is on along with those statistics and combining it all together that's that's probably the easiest way to do it if you build out that view if i'm looking for performance analytics or i'm looking for interactive filters then i'll use the dependency analysis okay we will come back i'm going to try and steal screen sharing okay thank you very much thomas very very helpful about the best practices uh to wrap up uh again this session will get posted uh to the community uh later today um hopefully i'll remember to add it to the playlist in youtube as well uh thank you for reminding me about that earlier uh that'll be in there we'll post the deck we'll post the links that we have uh what we answered in the questions we'll post those links as well so you can come back to that community article and get everything you saw today um what we're gonna talk about in two weeks so we we talked about this briefly but we're gonna talk about uh introduction into leveraging performance analytics in workflows so how do i get my score into a workflow how do i make my workflows smarter and then also about how do i integrate pa into my workflow in terms of listening to events and doing things when things happen whether it is a job failing we talked about or a job completing and i want to fire the next job or if i want if a threshold is met and i not just i don't want to just get the indicator or an email but i want i want to take some other action when the threshold's met so we're going to talk about this for half an hour and two weeks it's an intro into it so we want to talk about what we can do um and and get your your thoughts on that on how this works but workflows are central to servicenow analytics are important to all of us how do we merge those together to get a much better result for our organization so that's what we're going to talk about in two weeks this one i'm going to is is one that gets brought up occasionally and how we want to do it and ask for specifically today what else can we do for you uh lots of different perspectives out there so uh please find this this community article post your comments in there and what you'd like us to cover in the future we're planning out the rest of the year um in the next week or so so having your your comments will be very helpful for us to cover those topics that you want to see and until next time if you have a question post in the community again i recommend checking out that article about how to write a great question so you can get a very quick and an accurate answer screenshots are incredibly valuable but there's a lot more to it uh catch up on previous office hours so i think in this one we referenced three three or four previous office hours those are all in that reap on that page where you register for this event you can get the recaps you get those decks and you'll see the list of what we're planning for in the future submit your ideas we talked about one of the ideas that came up today how can we make performance analytics and reporting better for you tell us through the idea portal and if you have a few minutes look through what else is there and vote those ideas up we are we are actively planning new releases so seeing those um hearing your comments asking for what you'd like it to do will help us get that into the product so that it does those things for you right this is the best way to record your vote and tell us what's important to you even if it was somebody else's idea and always take some training uh the knowledge labs are are gonna get pulled in two weeks or sorry at the end of the year the end of the year but take it take the opportunity to do that again virtual agent fundamentals predictive intelligence fundamentals all those knowledge labs lots of great learning out there for you self-paced hands-on it's great training it's free take some send it to you send it to your colleagues if they're trying to learn about performance analytics don't take three hours and explain it to them send them a lab and let them let them explore and learn okay uh thank you for all your questions thanks for listening and we will see you in two weeks thanks adam