Transcript for Webinar entitled "Research Example using Linked Administrative Data" 
August 5, 2020
[Voiceover] National Data Archive on Child Abuse and Neglect.
[Erin McCauley] Ok - so we're going to kick off this session. Thank you everyone for being here. This is the NDACAN Summer Training series 2020. It is our third annual summer training series and we're really excited about it. If you have questions, you can share the questions throughout during the Q&A box. And that at the end we'll also have the dedicated Q&A. This session is being recorded. We turned the series into a video series that's available on their web site. Make sure that you're on our list serve because we'll be announcing the sessions being posted on line as we go! So as I said, this is the NDACAN - our national data archive on child abuse and neglect summer training series. We've had the privilege of doing this series for the last three years so we've come up with the ideas for what the different sessions will focus on through our end of year surveys. So, if you've registered for any of the events this will go out, we also send it on the listserv just in case anyone wanted to throw ideas out even if they are unable to attend a session. We really are always working to kind of improve the series based on your feedback. So, we hope you complete that when it comes out. We are hosted in the Bronfenbrenner Center for Translational Research at Cornell University, although, starting this summer we are also associated with Duke University, with some of our staff being at Duke and some at Cornell. This summer's kind of over arcing theme is New Horizons for Child Welfare Data. So, we continued our plug for administrative data. Which is data collected kind of in the normal day to day running of a program, but then we kind of use it to answer research questions after and then we also unveiled our historical data acquisitions. Which kind of continue to back that administrative data series, farther back in time, and so those were kind of sessions two and three and then now we're focusing on the kind of regular admin data and we'll be having a research example today. The National Data Archive on Child Abuse and Neglect has a contract with Children's Bureau to archive data, and then, support researchers in the pursuit of research using that data. We do like to think of ourselves as not necessarily kind of a just depository for data. We try to really engage our research community and our user community and so part of us doing that is hosting this series. So we're really grateful for your participation. We really wouldn't have a summer series without you all. So here's our overview for this summer. If you were last, with us last week you know that we experienced some work-from-home technical difficulty, but we'll talk about that more at the end. But yeah, so this was originally our last scheduled event and now our second to last scheduled event. So if you've been with us all summer, we just want to say thank you so much! It's been a really great summer and I know we've had a really engaged Q and A session at the end of each session. It's been fun to get to know you guys a little bit. So I'm going to pass it over to Frank Edwards. He is a research associate with us and he is a professor at Rutgers in the Criminal Justice College. Frank...
[Frank Edwards] Yup thanks, thanks Erin and thanks everybody for joining today. Really excited to share this new work with you all. So, I'm going to present a research paper that I'm developing, but I'm going to focus my attention on the methods for how I link the data and conduct the analysis. So, we're first going to cover, and I'm going to be using R, so if we have any R users here, this will provide you some examples of how you can use the software to conduct a link. But the principles for the link apply to other statistical packages as well. So, we'll be using AFCARS and NCANDS today. And the goal here is to evaluate the lifetime risks of various child welfare events. So, we're going to track a series of different outcomes that could occur to a child over the course of their life, by linking up the AFCARS and NCANDS and relying on those unique identifiers across the two systems to track the first time an event happens to a child, by merging that with population data to get appropriate denominators for the size of the child population by age and race/ethnicity. And the goal here is to evaluate inequalities in the exposure of American Indian and Alaskan Native children to various child welfare system outcomes. And here, the purpose of this paper is really to think about what stages the case process do inequalities for these children and families emerge. Is it at the reporting stage? Is it at the substantiation stage? Is it at the removal decision stage? Is it at the termination stage? Or is it across various stages? So, the only way to really address that question well is by linking these administrative data sets. Ok. So, just to give you a brief background into the problem, obviously we could spend an entire day talking about the history of indigenous communities with the child welfare system, but American Indian and Alaska Native children have, for all racial and ethnic groups in the United States, the highest risk of going into foster care of any other group. And that risk has been persistent over time. Family separation through fostering and adoption historically, up until the late 1960s and 70s was used to force the assimilation of American Indian and Alaska Native children into white cultural social and economic practices. So, for a time there was a really close relationship between, kind of settler colonial child and family policy, and the child welfare and adoption system. And so, those inequalities have really deep and particular meaning for native communities that is substantially different for other groups. The Indian Child Welfare Act of 1978, recognizes the harm that mass separation poses to tribal nations as such. Right? That the mass separation of children and families threatens the continued cultural existence of tribal nations and so, the Indian Child Welfare Act enshrined certain protections for native children and families through family courts that are distinctive. But despite the protections ICWA we still see large inequalities in child welfare outcomes. So, this analysis is one attempt to start to think about quantifying where those risks differ and what stage of child welfare system contact those risks are emerging. I want to be clear, that today, I'm focusing on non-tribal child welfare system contact. So the cases we're looking at here come from state and local child welfare systems, not from tribal child welfare systems. I also want to emphasis that the AFCARS system does not track eligibility for ICWA. I know that's something that CB is considering implementing in revisions of AFCARS and I think that will be a huge help in thinking about where and when these inequalities are emerging and whether ICWA compliance might be playing some role in this. So, the questions I'm going to be addressing today are, we're going to effectively ask a couple descriptive questions, right? We're going to ask how likely American Indian and Alaska Native children are to experience various child welfare system outcomes and how likely they are to experience those relative to the risk of white children. So we'll be estimating both a cumulative risk and an age specific risk at the same time as risk ratios for American Indian and Alaska Native children relative to white children. And we're going to link the AFCARS and NDCANDS to think about transition probabilities. So, we're going to think about these as conditional probabilities of experiencing one event after experiencing a prior event. So for example, what's the probability that you had a substantiated case if you ever had an investigative case? Or what's the probability that you had a foster care removal if you ever had a substantiated child welfare case in the NCANDS? Right? So those are the questions where the linked administrative data is going to come right to the fore. And again, as Erin mentioned, please feel free to drop questions in the Q&A as I go. A lot of this is technical material and I'm happy to address questions as they come up. Ok. So, now we'll get into the practical details of how we do this. So, again, I'm assuming you have access to the AFCARS foster care file and the NDCANDS child file. Here I'm using the 2000-2018 files for this project. So I have 19 years of the data that I'm using here. NDCANDS, the data is not complete for all states until around 2011 I believe. Someone can correct me if I have that year wrong, where we have all 50 states in the data. Prior years you can subset this to a smaller number of states but participation becomes complete later in the series. But, so you can request those data from NDACAN along with our very helpful user's guides. And the code that I'm going to be showing you here today is available on a couple of GitHub repositories. So if you want, if that's of interest, I'll post those links toward the end, where the code I use for the project and a couple related projects is on line as a replication package. Now the original data is not. The data must be requested from NDACAN but the code is available for those who want to try and replicate this work or to try and piggyback on some of the methods. I see a question. Basically to check for a negative trend in both of the outcomes. Yeah, I mean if if if we were thinking of this as kind of follow-up to ICWA, you know, our null hypothesis here might be that the protections available to Native American and Alaska Native children, should have reduced rates of contact between 1978 and today. Our administrative data doesn't go back that far. So, I'm working on using some of the historical data that the archive has to think exactly about that question.  I'm not really interested in trends today, I'm interested in just describing baseline prevalence using some more sensitive measures than just simple population rates. So, so here I just want to capture kind of where is the, what is the magnitude of inequality, where is the inequality happening in the system? I'm not really doing any hypothesis test or kind of thinking theoretically today. This is a primarily descriptive project. Although there's plenty of inferential work to do on this question. Ok. So, here's where we're going to start. Right? So, I want you to check out, this is a subset of the AFCARS, where I'm showing you the fiscal year, that's reporting year of the data sets, 2009 and everyone who's worked with the AFCARs knows that the reporting year is not equal to a calendar year. Right? It, it runs for 12 months but it's not, it's a, run, what, October 1 to September 31 each year. So, we want to distinguish between calendar years and fiscal years when we're using the data. You see this StFCID variable, a RecNumber variable and a State variable. Notice that StFCID is just RecNumber, which is the child's unique identifier within a state. Right? So across states these can be duplicated but within states, they can't. And a state pasted on to the front of it. So we have Alabama 0000. You know, you get the idea. 3371. And that uniquely identifies one child. So, for every time that child is in the AFCARS, they should have that number attached to their record. And so, we can use that to follow the outcomes for a child over time. Now we can also use this AFCARS identifier to link the records in the AFCARS to all children in the NCANDS who ever were in foster care. Here's what that looks like in the NCANDS. We have this AFCARS ID. Right? So, the AFCARS ID is the record number pasted with a state fips code. And so, how do I set that up? Right? So, I have, we have this AFCARS ID and what I want to have is consistent, what I want to set up as, you'll notice here that I have StFCID as Alabama and then the all the numbers. What I'd like to get to is a state number. So, here's state number 4. Right? Alaska with a unique ID behind it. So, that's where I want to get to. So the way I do that, is that I take the numeric identifier and I paste a fips code on to the front of it. Right? And the fips code is the two digit identifier for the state and I do that for the RecNumber variable and the AFCARS ID variable in NCANDS. Right? So, effectively I end up with, in this case Alabama instead of being AL000371, it'll be 100, because Alabama is fips 1. Now you can use the StFCID as is to do the linkages as well. Because I was using more years of the data, this method worked a little better for me. But there's a lot of different ways you can join these. But you need to understand that a record number is incomplete without a state identifier attached to that. And that could be either the two letter state abbreviation or a two digit state fips code. Which is what I prefer to use because then it's very easy to link with things like the census. Ok. So, in R using tidy verse packages, this is how I create those new harmonized ID variables. Which I'm calling stfcid, lower case, right? And I'm just pasting the two digit state code on to the name of that variable in each data set. And again, the code I used for the whole thing, these are obviously snips of of larger projects, the full code I'll paste in the chat at the end as a GitHub repository. Ok. So does this look ok? Right? So we look now at the AFCARS and NCANDS. So the child ID in the NCANDS is not what we're after. Right? That's a unique state identifier for the child. We want that foster care identifier. And when we look at it, we've got a 12 or 13 digit identifier on both sides. And that's exactly what we're after. Right? Because once it's numeric, that two digit fips for those states that have fewer than two digits in their numbers like the first, the first nine, right, will have only one digit at the front and the others will have 13. So this looks great. Right. This is something we can join on. The identifier has the same format and those stfcid on my NCANDS xwalk which I'm using as a kind of table that I use to join on, matches with the stfcid in the AFCARS ID that I'll be matching on. Ok. Now we're going to try join here. So notice that in the AFCARS ID, that's all of the unique kids in AFCARS that I've got for the years data that I'm looking at. I have 91,239,000 unique IDs. Right? So that's kids who were ever in the system, in a state or local child welfare system, over the 19 years of data that I'm looking at. For the NCANDS, I've got 400, 47,491,000,I guess actually there might be duplicates in there, I apologize. But there's 47,000,000 unique kids where in the NCANDS over the period that I'm looking at who were also in the AFCARS. So what happens when I join them? Right? You'll see that when I join them, I get 254,000,000 rows. Right? Which is way more than the 91,000,000 that I should have expected. So, what happened here, is that those AFCARS IDs can get duplicated year to year as a unique child's can be present in the child welfare system year to year. So if a kid stays in foster care for two years, that ID is going to have a record in 2009 and 10. Right? And what we're doing here is joining each of their unique NCANDS reports on to each of their yearly data rows in the AFCARS. And that may not be what we're after. So you need to think about the data structure really carefully when you're designing these joins. We don't have one record per child, especially when we're looking at multiple years of the data. In the AFCARS we have one record per child per year and in the NCANDS we have one record per child per report. So within a single year you may have multiple records for the same child. So, typically for for for, it depends on your research question, but you want to think really carefully about what unit of analysis you're working with when you're doing these joins, because it's really easy to create nonsensical data structures when you're not kind of thinking carefully about exactly what you're after. And I I I I tell you this from experience from from from having wrestled with making sure these joins work appropriately. But the easiest way to do that is to just check, does the number of units that I'm expecting to have, and usually that'll be unique children, does the number of units match before and after I conduct my join? Questions so far? Ok. Onward. Right. So, each AFCARS StFCID has one entry for each reporting year. Each NCANDS AFCARS ID might have multiple times, might occur multiple times in the reporting year. So this is the fundamental challenge of of joining here is you've got to think really carefully about your data structure. So, you know, and that's just something that you, that'll be, that'll pull at the part of your design and each join might be different. You may be interested in annually linkages, you may be interested in report level linkages, but it really depends on your question. There's lots of different ways to join this data. Ok. So, let's talk through how I use these data to identify child welfare event risks. So, I'm going to use a life table method. Which is a common method in demography to evaluate a risk of an event to a cohort of individuals over time. So, what we're going to do is we're going to estimate the risk of an event occurring dich, each, during each year and then we're going to estimate for for a cohort of children going from birth to age 18, what's the cumulative risk that they ever will experience the event? So we're going to estimate the risk of an event in each year of life. Covered more than once in a report year in the NCANDS? Yes! Absolutely. The NCANDS will often have the same child's in multiple reports in a single year. So you do have to be careful about that. The the the level the unit of analysis for NCANDS child file, is the child report pair. Right? So, a child, you could have multiple children on a report, and you can have a child be a subject of multiple reports within a year. So you do need to be really careful thinking about what your unit is when you work with the NCANDS. The AFCARS is a little more straight forward because it's the child year level, regardless of the number of placements. You'll still only have that one entry per a child per a year. But for the NCANDS you will have for children that have multiple screened in reports, you will have multiple entries in the NCANDS. So that's definitely something to think about and my big caution with the, with the linkage. And I suspect that's a lot of what where those duplicates came from. There's not only, you know, multiple entries for a child over time, but multiple entries for a child within a year in the NCANDS. So that's going to create a lot of excess data that each of those pairs now year, report, child, gets a row in my resulting joining. That's not what I'm after. So you got to think about that. The way I handle it is, I basically, I'll I'll tell you how I handle it later, but what I'm going to do ultimately is take each child identifier and i'm going to sort for the first event relative to each event and after. So I'm going to use the date, I'm going, I'm going and then look for each unique child. I'm going to pull the first date that an event occurred and then drop the remaining observations. Because I'm really interested in first incidents. Oh I'm sorry, yes, I will read the question before I answer. I'm just lazy, questions. That question was whether NCANDS reports can have multiple incidents within a year. So, apologies. So getting the hang of this new life we're all living. Ok. Life tables. So this is a really common method that folks associated with NDACAND including our director, Chris Wildeman, myself and a number of others have used in applications in child welfare. But, it's also very common in other kinds of demographic and academological applications. So a traditional life table would follow a cohort over time. So we would follow all children born in 1982 and then track how many of them experienced some outcome over the course of their lives. But typically we we don't have an adequate amount of data to really do that. And we, in the case of the AFCARS/NCANDS we don't have a full link of all child births to ever experience foster care. Right? What we have is who is in foster care in that year. And we may want to compress the period time that we want to make an estimate over it. So, risks may change over a course of a lifetime but, you might want to think about, ok so for the last 5 years roughly, what did the risk profile look like? So we're going to use a single period of time to approximate a cohort. Right? And what we're going to do, is we're going to pretend that a risk of an event that an infant was exposed to could be applied, you know, synchronically over time to all infants within this period, but we're also going to assume that the risks stay stable over time. So we're going to observe the risks for infants, 1 year olds, 2 year olds, 3 year olds, 4 year olds and then we're going to imagine that we have 100,000 births and we're going to apply that risks to that cohort over time to see how many of them ever experienced that outcome. So we're going to use...I apologize if you can hear my 6 year old melting down in the background, she likes to that at this time of day. Home, home, home camp is fun. So anyway, here's what that looks like. This is the raw data that's feeding all of the tables and figures that follow. Right? So, for 2016 this is the number of unique children who were American Indian or Alaskan Native that experienced a maltreatment investigation at each age. So, we can that 6,660 American Indian/Alaskan Native infants in 2016 have a screened in maltreatment report out of a population of 76,508. Right? Then 2,948 the bar column is is the number of unique children who had a maltreatment investigation. Population is population estimates from the census. So, and that's using the SEER small population estimate file, if you're interested in that. I can point you to where, to where we get those. Those are really nice ways to get age specific population estimates. Ok. But, here we have just an incidents count and a population count. So from here we can construct a crude incidence rate quite easily. Right? So Q is our age specific risks. Right? Q is just how many people at each age experience that event. Right? So in this case, you know, we we effectively divide the variable by the population. Now, that doesn't get us too far. That's kind of our typical age-specific incidence risk and that can be useful but it doesn't help us think about this kind of, ok, but what about those kids who had, you know, an investigation at zero. So this is first investigations here but it doesn't tell us about like but how likely are you to ever get investigated by the time you hit 17 in a way that it adjust for exposure at prior ages in a way that also adjusts for mortality in a population as children exit the population through death. Right? So we want to think about, I have, where that, that true kind of risks estimate is. So, we want to convert this in to a probability of not experiencing the event. Right? So we can get a sense of how many children ever experience it. Right? And it's often easier to work with a -1 probability. When we're doing that because once you experienced the event, for our purposes, you have experienced it and you know, you're out. Right? So it's it's it's it's a 1 or 0 condition. It's not a count. So what we're going do is we're going to take this P and this is the proportion of people who who did not experience the event. The Q is the proportion of people who did and LX is our starting cohort size. So we're going to imagine that we observed 100,000 American Indian/Native Alaskan child births in 2016. Right? And we're going to say that each of those children has a 0.0834 + 0.377 probability of experiencing the events before their second birthday. Right? We're going to apply that risk to those children and then we're going to subtract those children from our running total from this LX. That is members of the cohort that did not experience the event. Right? So by doing this what we're going to get ultimately D, is the count of people who did experience the event. Right? So D is is counting how many of our hypothetically cohorts did experience the event and then LX is our running tally of what in in in demographic research would get called survivors. Right? What what what's the running tally of people who did not experience that event over time I recognize that's unfortunate language to use in this context, but that's how many people, how many children never experienced the event out of this initial cohort. Right? And so we can see for child welfare maltreatment investigations by the time we hit age 17 at 2016 levels about 35% of American Indian and Alaskan Native children will have experienced an investigation by the turn, by by the end of their 17th birthday. Right? So that's that's quite substantial. So C is our running estimate. Right? So C we could think of as 100,000 minus 6568 divided by 100,000. Right? And so that's just the proportion of the original cohort whoever experienced the event. So C is often what we're after. C is our cumulative incidents rate. These, these methods are detailed in a bunch of papers and I'll share I guess I'll share one with you now. Sorry, I had it up and I closed my tabs, of course I did. But there's a recent paper in the New York Journal of Public Health that I published with Youngmin Yi and Chris Wildeman that we used these methods for and a lot of the code for these projects are in there. So I'm going to post this. I don't know how to actually post it. Can I post it in the Q&A area? I think so.
[Erin McCauley] Put it in the chat and then you can just select all panelist and attendees. Because the Q&A box, once we answer it kind of goes away.
[Frank Edwards] Got it! Ok, so there's a link to that paper and for those that want to follow along with the code, I'll I'll I'll share the code later. We'll keep pushing along here. Sorry, should of had all of those links up. Ok, but there's the paper that that this is kind of building off of that analysis and the code for this project is particular I'll give you at the end as well. But this life table method is described in detail in that paper. And the code that I used to generate a lot of the tables comes from that paper. Ok. So, how do we, so we, so we, so we established our methods. We we established how we link our data. We've established this life table approach. The method I'm not talking about today is is any of you who have used these data sets know that race and ethnicity is occasionally missing in these. So that, we do use multiple imputation to address uncertainty in the race and ethnicity variable and to also, we use that to construct uncertainty estimates for some of the life tables as well. Ok. So, here's how we're going to do this with NCANDS. We're going to identify the first investigated maltreatment report for each child. And the way we're going to do that is we're going to search within each NCANDS state child ID. Right? So we have a CH ID for a child that's valid within states. So for each state, we're going to look for the first incident, we're going to take the earliest incident of that report. So effectively we're going to use the date column and to pull each first incident. And this is not something I recommend doing by hand. Right? Because we're talking about millions of identifiers here. So, again if you want to see the code for how I did this it will be up on this repository at the end. And then we're going to identify the first substantiated report for each child. Right? So, we're going to use the report victim flag for that. So we're going to again look within each state child identifier for the earliest report date where the report victim is equal to one. Right? So we want to get, for each kid, we want to get the first time they had a screened in report and we want to get the first time they had a substantiated report. And the report victim flag in NCANDS is an index of substantiated confirmed, I think it also has alternative response in there. I'd need to check the code book. But it's an aggregation of a few different possible case outcomes that is a little more inclusive than any one in particular. Ok.  So for AFCARS, those are the two variables in NCANDS we're looking at. Right? And so, from that, what we're going to get is a table for each child's ID by state where we get the date of their first report and the date for their first confirmed report. And then within AFCARS we want to get the first foster care entry for each child. So, there we'll use the entered flag and total rem flag variables. So the entered variable is equal to one if the child entered foster care in that reporting year, and zero otherwise. So, a child will be in AFCARS if they remain in foster care year to year. Right? So they could be in the AFCARS system and not have entered foster care that year. And then we also want to get only that first removal. Only that first time they were removed. Because again, our goal here is to estimate did you ever enter foster care? And so, the second time shouldn't count twice. The first time is all we need. Ok. So, we're going to grab that first entry and then I also want to evaluate the placements of American Indian and Alaska Native children with non-kin and non-Native foster care givers. The provisions of the Indian Child Welfare Act specify that there should be preference given to kin first, and then within tribe family second and then non and then other native families last with non-tribal families being a the the last resort. And so, one of the, this is not a strict test of the applications of ICWA by any standard but it does give us some sense of how often American Indian and Alaska Native children are being placed in non-kin, non-native foster homes. So we're going to use the current placement setting is equal to two as a kinship placement or the substitute caretaker is American Indian or Alaska native equals. One, that will flag if either the child was placed in a kinship setting or with a Native foster parent. Otherwise that will be coded to false. Ok. And then we're going to join the data. Right? So we have that table of IDs for NCANDS, that table of IDs for AFCARS, we'll join those. The marginal probabilities I'm going to show you, that is, those just, the the probability that an event occurs from the life tables, is coming from a single data file. That's either the AFCARS or the NCANDS, depending on what the outcome is. But the conditional probabilities, right, so, if you ever had a screened investigation, how likely are you to have ever gone into foster care? Those are results of the join. Right. Of the first NCANDS event, the first AFCARS event, by those unique identifiers. So each child in that table has one record. After the joins, each variable is aggregated to annual national count by age and race, ethnicity and the, I run my tables over those aggregated totals. Ok. So let's get to the results unless anyone has methods questions. Now is the time to stop me and ask methods questions if you have any. I'll give you a second to type while I drink my coffee. Ok. I guess I will continue. Here's the probability of investigation. Here the pink line and dot indicates American Indian and Alaska Native risks. The blue kind of teal, line and dot represent white risks. So, here I'm running tables for each year. Right? So we're effectively asking what's the lifetime risk of an event as risks changes over time. Right? So, we're imagining here that we compressed 100,000 children's life course to go only into 2016. So, what if if risks stayed constant at 2016 levels American Indian Alaska Native children by the time turn 18 about a 34.3% chance of ever having a screened in investigation. Beyond that risk was at its lowest in 2010 when it was about 31.5% and at its peak in 2015. Right? And that risk is consistently higher than the risk for white children. Right?  But each of those points represents a life table for that year's data. Right? And we could pool these to smooth out the trends a bit if we think there's substantial change in risk over time though, it might be a good idea to run these as single year estimates. And of course there are a number of events happening between 2008 and 2016 that have affected child welfare system caseloads. So in this case it does seem appropriate to split those out annually. Ok. Someone apparently had a question. This is a national data set versus a specific state. So, we have a question asking if this is the national data or the state data. This is fully national data. Yeah. So this is national data and when I have periods when all states are not included in the data, I adjust the population denominators to make that work. The the version of the paper that's published to this will be looking I believe 2011 forward but here I'll show you the estimates for 08 and earlier. There's a little more noise in those estimates because some states are not in the NCANDS data in that period. Now we look at the age specific risks and this is kind of interesting. Right? We see that the levels of inequality that American Indian and Alaska Native children experience for maltreatments reporting are rather investigation, really are pronounced at the earliest ages. Right? So at birth American Indian/Alaska Native children are much more likely to be investigated than our white children but by about the age of 6 or 7 that risk levels off. Right? So, so the inequalities in these risks that we observed, these cumulative inequalities, are really driven by inequalities that occur early in life. Right? And when we think about who, who and why American Indian children are often being reported in that makes a good amount of sense. A lot of those reports are being generated by medical professionals. So, in other work I'm trying to sort of think through kind of what might be driving some of those early in life patterns. But we see that it's really a gap in the younger ages. It's not a gap at the older ages. That's investigation. Here is substantiation. We we see a much clearer gap here. Right? The cumulative risk of the event is on the left. The age specific risk is on the right. And a age specific risk I should state is averaged over the the full period. Right? So we're assuming the age profile hasn't changed much over time. We might assume that the aggregate risk level has changed but the that sort of shape of the curve of the age specific risk hasn't changed much. Which when we look at it it hasn't changed that much. Ok. So, here we we do see that that that risk of substantiation for American Indian children has increased pretty substantially over time. Right? We see from 2008 to 2016 risk move from about 13% up to about 16% of all American Indian Alaska Native children by the time they turn 18 can expect to have a substantiated report as indicated by NCANDS. Compared to about 10, 11% of white children. Right? So that's a pretty substantial gap in cumulative risk of substantiation and unlike with screened in reports with substantiations we see a gap that persists over, over the childhood. Right? That that American Indian and Alaska Native children are at higher risk of being substantiated then white children over the course of their childhood in a way that is greater than their risk of having the gap is greater than the this the the screened in report gap. And this suggests, you know, that there could be underlying situational conditions that make American Indian/Alaska Native children once their screened in, more likely to be substantiated. Perhaps the alleged maltreatment was more serious or there could be systematic factors that lead to higher rates in substantiation for native kids relative to white kids. We're not really at a point with this project to unpack what's driving that gap, but the gap is there. Kind of the goal here is to identify it. Ok, now here we get into the conditional probabilities, and these are pooled across period. So, here we've joined, no I'm sorry, this is within the NCANDS. So, here we're taking, if you ever had an investigation, so if you're a child to ever had an investigation, how likely are you to ever have a substantiation? Right? So we can think of this as the proportion of children who were investigated who were ever substantiated on that report or on a later report. And here we see that if you were screened in as an infant and were American Indian you had about 42% chance of substantiation relative to about a 35 point, you know 36% chance for a white child. Right. And that gap does persist over time. So American Indian children, if they were ever investigated, have a higher likelihood of ever being substantiated then white children at all ages. Here's our probability at foster care placement. So, we show for 2008-2016, at 2016 levels, you know, about 11.5% lifetime risk of foster care entry for American Indian children by age 18 compared to about a 5% white risk of foster care entry by age 18. And that is higher than all other groups. African American children have the second highest risk, a point or two below American Indian and Alaska Native children generally for most years. And the age specific risk of foster care entry on the right is also higher for American Indian and Alaska Native children at all ages than it is for white children. So across, across childhood they're at much higher risk of of going into foster care then are white children despite the fact that risk for having a screened in report declines over time. At least first screened in report. Ok. So here are our two conditional probabilities. So, first I want to ask what's the likelihood that you'll ever go into foster care if you've had an investigation? And this is where the join comes in. So, here we can look at the age specific conditional probability of going into foster care if you were ever investigating for American Indian children. Right? And at birth, it's over 25% that you know, 25 26% and for white children it's 15.5% or so. Right? So for all American Indian children who were ever screened in, many more of them are going into foster care than are white children who were ever screened in. The same is true for substantiation. The gap is a little bit smaller there but the numbers are much higher. The axis kind of note the scale on that. So if you were ever substantiated as an infant as an American Indian Alaska Native child you've got about 55% chance of being placed into foster care compared to a 38% chance for white children. Right? And that gap between you know a roughly 15 point gap between American Indian and white children persist over the life course. So we see a pretty substantial differences here between groups in terms of those people, those children that haven't experienced a substantiated case or an investigation relative to placement. And again, a lot of things could be driving that gap. My goal here isn't to necessarily speculate on what is driving that gap. I'm trying to address that on some other research but here our goal is to just really get a, our, a handle on it. How big the gap is and where it's emerging. Yes. Given that the disparities are often greatest when younger, have I looked at the probability of event by ages other than 18? I would think that the gap probabilities are much greater. So that's a good question. Right? And we can kind of see that by looking at the age specific risks. Right? So, apologies, let's go back. So, you know, even when we look at foster care. Right? This gap, the difference in estimates in the age specific risk is highest there. And so we might think that if I trimmed the probability of the event to happening by age 18, we could think of that as the area under those curves, the difference in the area under those curves. Right? And we might see even higher levels. I, you know, we, I go to 18 because that's kind of conically when jurisdiction for maltreatment investigations ends. So, you know, obviously children can be in foster care past age 18 but removals aren't going to happen that point. Why don't we stop earlier? I think, I mean, I think that's a legitimate question. I think the age specific risks give us that information. But don't necessarily visualize it as that difference in areas under the curve. So we would see a steeper gap if we truncated say at age 5. If we wanted to know what's the likelihood that they're going to go into foster before they, for example, start elementary school. That that could be an interesting question to examine. And please follow-up if that didn't exactly address what you were interested in asking. Ok, sorry. I I don't usually use Excel. Ok. Foster care, so this is the the kind of this is where the join really comes in and I think this is really illuminating. Right? That for those kids, once they come into the system, through investigation or substantiation, they're much more likely to go into foster care then are white children. Right? So we can see that the gaps really emerge after investigation. Right? We have a gap at investigation, but it's not substantial. Especially at older ages. Right? It is for younger children. But it looks like what happens, once American Indian Alaska Native children come into the system, for a variety of reasons, they are more likely to have their cases pushed upstream into substantiation and into foster care placement. This is unrelated exactly to that kind of life time risk model, but here we just want to think about if you were an American Indian Alaska Native children who was in foster care, were you ever in non-kin, non-native foster care placement? So, we take all those American Indian Alaska Native children who of a given age that were in the foster care system and ask were you in a placement setting that was non-kin, non-native and we can actually pretty high numbers of children were documented as being in a non-kin or non-native foster care setting. For infants, for young children it's lower but it's still around 75%. 70% is the lowest at age 8 but for older children, 15 year olds would get up to 95% of them were ever in a non-kin or non-native foster care placement. So we a see really high rates of American Indian Alaska Native children being placed in settings that are not with kin or not with native care givers. And again, that's not a formal test of ICWA compliance but it it it should be correlated with it. So to kind of summarize, this this report verifies what many have already demonstrated. And was demonstrated in the testimony supporting the American Indian Child Welfare Act of 1978, that American Indian Alaska Native children have much higher exposure to the child welfare system than white children. And conditional on a screened in case, American Indian Alaska Native children are more likely than white children to enter foster care. And conditional unsubstantiated case they're more likely than white children to enter foster care. So, there's something about the foster care decision making process that is driving higher levels of entry and that could be about the seriousness of the alleged abuse or neglect or it could be about other institutional features, it could be about systematic discrimination, it could be about a lot of things. And I think future research has a good amount to unpack there to help us to understand the dynamics of these inequalities and obviously there's been a tremendous amount of work in this area already. The majority of American Indian Alaska Native children in foster care have been in a placement in a non-kin or non-native foster home. Let's see...one question. Oops, sorry. At the beginning you said, right. So I'll get to that in just a second. But that's the end of my presentation here. So we can kind of open it up if you have additional questions I'll get to you. This question here in just a second. But thank you so much for attending today. I really appreciate it and look forward to the discussion. So we have a question that says, at the beginning I said that the data do not include American Indian Alaska Native children addressed by tribal authorities. Do I know about what percentage of American Indian Alaska Native children are addressed by tribal versus state child welfare authorities? By focusing only on those addressed by state authorities, do I think the probabilities for substantiation are likely higher or lower than if all children were included. Ok, so the first part, do I know about what percentage of children are addressed by tribal versus state child welfare authorities? I don't. Those tribal systems are occasion, are increasingly being folded into federal data systems but that is obviously subject to the tribes willingness to participate and sharing their data. So, I don't currently know that answer. Someone in the room may know what that portion is. It's it's its, it's relatively small, relative to the size of the whole population in state and local child welfare systems, but I don't know the number. And again, there is issues of data sovereignty and they're at play related to accessing information on children in tribal foster care systems. Some are reporting into the AFCARS at this point but they're not included in the analysis I have here. Ok. So by focusing only on those addressed by state authorities, do I think that the probabilities for substantiation and placement are likely higher or lower than if all children were included? So, they're almost certainly higher but I suspect that they're not much higher. Right? Because those, so so the denominator is all the children who were in the population according to the census. The numerator is only those children who were recorded in the AFCARS or the NCANDS. Right? So, for those children who were not recorded in the AFCARS or the NCANDS if those counts go up then then the incident rates go up. But again, because I suspect the proportion is relatively low and I suspect the rates of removal in tribal foster care systems are somewhat lower as well, the, I don't expect the numbers to be off by much. So these are conservative numbers. I don't think that they're very conservative though. So, what software do I recommend for joining the administrative data sets? So here I demonstrated doing it in R and that's my preferred, my preferred programming language. I do all of my data analysis and modeling in R but I recognize not everyone does and there are certainly very good reasons not to. I, you know, I think with with stats software it's it's about opportunity cost. Right? But, you know, use what you were trained on. I think the cost of switching to another program are often higher than the cost of learning to do it in what you already use. But the Archive staff can assist you in in developing a join for these data in SPSS or SASS or STATU and, you know, again I'll provide you in just a moment with a link to the code base that I use for doing the links for for both this report that there the findings I showed you today and for a paper that is related. So I'm going...
[Erin McCauley] What will also happen, in say that next week we're having our makeup session about linking this data using SPSS that will also include code and last year we had a presentation in our summer series which is available on our website that had code for linking this that I believe STATU. So they're both resources are available but as as Frank said, we have kind of a diversity on our staff of people using different software's, and so we can always help folks get through the link themselves in another plug for the Summer Research Institute, so if you have a question or you're really struggling through linking the data, we do accept, you know, a small number of people to do kind of a four day, intensive process, that's where you get one on one support in a research project, including help with linking and analysis and what not.
[Frank Edwards] Thank you Erin for that reminder. Those are really valuable workshops and a good opportunity to get one on one help with it. I just posted in the chat, links to the GitHub repositories for two projects that use life tables and the NCANDS and AFCARS to do this. That AIAN transitions code has the linking in it and that'll be in a read.r, so if you are an R user and want to walk through that code, feel free to do so. So, one attendee asks that the study compares American Indian Alaska Native children against white children, has this been verified with other races or all children? So, in the HAPH study that I link to earlier, we do run these numbers for all children, not the conditional probabilities, just the life time risk probabilities for substantiated maltreatment and for foster care. But, yeah, the the the risk profiles for black children are similar, slightly lower than American Indian Alaska Native children, much higher than white children. And, yeah, there's, we have ongoing research that's looking into this. Couple colleagues and I are working on developing this analysis with American Indian Alaska Native children at the state level, because obviously, state systems differ pretty substantially and we think that a lot of the variation we see going on nationally might be accounted for by, within state, across state variation. So we're going to be looking into that. We're also looking into broader geographic patterns for risk for all groups. So, not just white, relative to native, but also also looking at Latinx children, Black children, and Asian and Pacific Islander children as well.
[Erin McCauley] Would you mind moving to the next slide Frank?
[Frank Edwards] Ah, yes, there's a next slide.
[Erin McCauley] It's just a hype for next week's makeup session. So, we again apologize, if you were with us last week for our technical difficulty but we've added a session. It's just Wednesday of next week. We have our new statistician, we'll be linking, kind of talking about the process of it, which should help users of all different analysis software types and then we'll have the exact code for SPSS. You can also bring questions. I know that Sarah herself is not traditionally an SPSS user, she's a STATU and a R user so she can field some of those questions as well. And for folks who won't be able to join us next week, I know it's kind of a last minute addition, we'll be recording it and putting that on the website along with the rest of the sessions from this summer. So, thanks again everyone for being here with us today and big thanks for Frank for taking the time to do this presentation. I know I learned a lot and I'm really excited to see where Franks work goes.
[Frank Edwards] Absolutely, yeah, thank you everyone for attending. One one person did ask whether there are other papers I would suggest and of course this is a really vast literature, but I think starting through the citations in that HAPH paper would be good and if you have any particular questions, feel free to follow up by email with me.
[Erin McCauley] I would also recommend checking out CANDL, which is our online repository, where we kind of collect all the publications that use our data. So that might be a good place to look as well.
[Frank Edwards] Yeah, I'll post a link to CANDL in the chat. CANDL is great. It's a Zotero library that tracks all of the publications using the Archive data. Highly recommend. Thanks everyone and be safe, be well, relatively of course and hope to talk to you all soon.
[Voiceover] The National Data Archive on Child Abuse and Neglect is a project of the Bronfenbrenner Center for Translational Research at Cornell University. Funding for NDACAN is provided by the Children's Bureau.