[Voiceover] National Data Archive on Child Abuse and Neglect. [Erin McCauley] All right folks good morning or good afternoon depending on where you are. It is august 10th the final day of the ndacan summer training series. Thanks everyone for being here for joining us for this summer series. So as I said this is the ndacan or national data archive on child abuse and neglect summer training series. If you've been with us through this summer series you'll know that we have this is the sixth workshop all related to ndacan and especially the power of linking. The national data archive on child abuse neglect is co-hosted at Cornell University and Duke university. And as I said the theme of this summer is the power of linking administrative data. And you know we are at Cornell and duke as I said but we are funded through Children's Bureau. This is the schedule of this summer. Here we are rounding it out but the previous ndacan summer training series are being turned into webinars as we speak and then put up on the website. We announce the kind of release of each video on our twitter so if you don't follow us on twitter I recommend doing so at ndacan underscore c u. This summer we've talked about ndacan and then some data management strategies for these large administrative data sets. We then covered the administrative data sets and how to link them together. Then we link to ndacan with some external data products then we had two different kind of specific analysis workshops using linked data for a structural equation modeling and then propensity score matching workshops. And now we're rounding it out using a few of the strategies we've learned throughout the summer and focusing specifically on the topic of studying racial disparities using ndacan data. Alex is going to be doing today's talk so I'm going to pass it over to him. Thanks Alex for being here. [Alex Roehrkasse] Thanks Erin and thanks everyone for being here I'm really excited for today's session on studying racial disparities in child welfare using ndacan data. This is a topic that's central to my research interests and I'm excited to share some strategies and and and thoughts with you. So today we're gonna sort of start by talking about overall racial disparities in child welfare, how should we think about racial inequality in system contact, in the prevalence of different experiences? What are some of the challenges and things we want to keep in the back of our mind when we're doing this kind of research? Then I'll give an overview of what kind of resources ndacan provides in terms of data that are well-suited for studying racial disparities. And then I'll do a demonstration as we've done in prior sessions where we'll work through some some with some real data and some code in r and rstudio to do some analysis of racial inequality over long periods of time in the incidence of children's residence in foster care in the united states. Okay broadly speaking how should we think about racial disparities in child welfare? Well we know from important reports published by the children's bureau using ndacan data that the incidence of different maltreatment events and types of system contact is highly racially disparate in the united states. So this table comes from the children's bureau annual report on child maltreatment and just looking at some basic figures we can see that in 2020, 13.2 African-American children per 1000 children were victims of confirmed maltreatment. This compared to 15.5 children among American Indian and Alaska native populations, compared to 7.4 children per 1000 among white children. So right off the bat we can see on a national level the incident of mal confirmed maltreatment is disparate by a factor of about two between American Indian and white children. Furthermore though we can see that the degree of racial inequality varies furthermore by state. And so whenever we're studying racial inequality we not only want to ask for whom is this experience unequal but where might it be more or less unequal. The figures I just showed you were pretty basic numbers about the incidence of maltreatment but we can use a lot of different ndacan data products to generate other kinds of quantities like the likelihood that a child experiences maltreatment or foster care placement or the termination of parental rights before reaching some age say 18 years old. So this figure is from a paper from the pnas published by one of our research associates at ndacan Frank Edwards our director Chris Wildeman and several other colleagues. And frank and Chris and their colleagues use county level data from the afcars to generate estimates of the cumulative risk of foster care placement before age 18. Among some of the more within some of the more populous counties in the united states. And we can see furthermore that this likelihood is divided pretty starkly by race but also by geography. Okay so let's say we want to study racial inequality in the child welfare system it behooves us to ask well inequality in what? We might be interested in examining inequality in rates of reporting, rates of investigation, or rates of the substantiation of maltreatment. We might also examine racial disparities in the types of maltreatment that children experience or inequality in who perpetrates abuse and neglect. We might examine racial disparities in foster care placement or or exit out of foster care, we might look at termination of parental rights or the likelihood that children of different ethno-racial groups are adopted. So these are sort of basic descriptive quantities but we might also develop statistical models to analyze different child maltreatment outcomes and it may furthermore be the case that sort of our model parameters, the sort of coefficients we estimate, may also vary by race. Whenever we're working on questions of racial inequality I think it's always helpful to think about the data generating process with a healthy degree of skepticism. So let's say we observe for example some of the disparities that I just showed you between different ethno-racial groups in their likelihood of experiencing confirmed maltreatment or of entering foster care. It may be the case that these underlying disparities exist but the the racial inequality itself may not be best explained in terms of race and ethnicity very often these inequalities are confounded by other factors. And so it's always important to ask ourselves why these racial disparities might be occurring when we observe them there may be other unobserved processes going on that we should think carefully about. Racial disparities may also arise through the cognitive bias of of caseworkers through prejudice or through sort of systematic bias in reporting in investigation or in risk assessment. It's always important in the united states where child welfare is administered quite differently across different jurisdictions to consider the possibility that administrative or jurisdictional variation may itself contribute to racial inequality in different child maltreatment outcomes. Okay so let's say we're interested in racial inequality in child welfare or child maltreatment. What resources does ndacan provide that might help us to study some of these questions? Almost all ndacan data have information about children's race and ethnicity and sometimes they have information about caretakers or perpetrators race and ethnicity. My presentation today is going to focus predominantly on administrative data which I think offers some special advantages for studying ethno-racial inequality in child maltreatment. In the first place administrative data are quite large data sets and they're also designed to be linked to one another and so it's quite easy to link large administrative data sets to study causes, trajectories, and trends in child maltreatment and to compare those those outcomes across different ethno-racial groups. Because these data by and large represent complete counts of events or of children administrative data allow us to study small populations like American Indian children which is a really important resource. American Indian children have a very special and sort of a unique experience of of maltreatment and of child welfare system contact in the united states and but many surveys aren't large enough to to capture this population which is which is fairly small compared to other populations that we tend to take an interest in. So administrative data because they're large and and usually represent complete counts offer us important opportunities to study high-risk groups like American Indian children. Of course though again I want to offer some more cautionary questions. We should always ask ourselves are race and ethnicity measured in the same way across different sources, across different years, or across different jurisdictions? As we'll see a little later there were important changes in the way race and ethnicity were measured in a variety of data sources particularly around the year 2000 where a number of different official data sources began to measure measure multiple race and Hispanic ethnicity a little bit differently. It's always important to think about who records childrens' race and ethnicity a lot of sociological evidence tells us that who is asking information about race and ethnicity and the context in which they're asking these questions matters for how people will respond. Are children offering their own racial ethno-racial identity? Are parents or caseworkers recording race and ethnicity? How do these decisions about racial categorization actually occur? Even though administrative data offer complete count data we should nevertheless ask ourselves, do our samples allow us enough power to generate estimates for small groups that we can have some confidence in? And ndacan data have quite a lot of missing data and so depending on what kinds of outcomes or events we're interested in we should always be examining our missing data and trying to think carefully about how the the missingness in our data arises and how that missingness might affect the inferences we make. So I'm going to talk a little bit about a paper I published last year using afcars and also a new data set that the archive published last year called the VCIS or the voluntary cooperative information system and a couple other data sets that the that ndacan is in the process of of cleaning and publishing as we speak. That paper is publicly available online it's an open access paper and so I'd encourage you to look at it you can even follow along now if you want but the paper basically asks two pretty straightforward questions. How has ethno-racial inequality in rates of foster care residents changed in the united states over the last 50 years? And how have levels and trends in inequality varied by place? So the analysis in that paper relies on linking several different administrative data sets. Two of the most important data sets that I'll talk about right now are the VCIS and afcars as I just mentioned. The VCIS are available from ndacan as counts of children either entering in or exiting foster care tabulated by year and by state of residence. And then the VCIS also includes further cross tabulations by children's age, or by children's sex, or by children's ethno-racial group so the VCIS doesn't include crosstabulation say by year, state, age, and sex. You can get year state by age, year by state by sex, or year by state by ethno-racial group. The focus of the presentation today and in the focus of the paper is going to be analyzing variation in in children in foster care by year, state, and ethno-racial group. So again to be clear the VCIS data are count data, their aggregated data, we don't have individual records of children from the VCIS. By contrast the afcars includes individual records of children. And so in order to link it with the VCIS we're going to have to aggregate these individual records. First we'll keep only children in care at year's end, and then we'll count up children, each individual record, by state year and ethno-racial group. So by the end in both sources we'll have counts by year state and etho-racial group and by having the same unit of analysis defined by a clear set of observable variables we'll be able to link those data across years. How are we going to measure ethno-racial inequality in children's foster care residents? Well we're basically going to do two pretty basic calculations. First we're going to calculate the population rate of foster care residents and to do this we just divide the number of children in foster care by the total number of children in the united states belonging to that group and we'll do this calculation separately for each state, for each year t, and for each ethno-racial group r. So this is just like calculating some of the rates we should we we looked at earlier the number of children in foster care per 1000 children say among black or African-American children among white non-Hispanic children etc, etc. You'll notice quite quickly though that this foster care data is going to come from the VCIS and the afcars but in order to make this calculation we need information about the size of different child populations. And so not only are we going to be linking ndacan data we're also going to be linking ndacan data to external data namely from different sources of population data like the CDC wonder database the SEER database and some other census data. Once we've calculated rates of foster care residents in each state, year, and ethno-racial group we'll calculate a rate ratio which will be our primary measure of ethno-racial inequality. And the way we'll do this is to divide each ethno-racially specific rate by the rate of foster care residents among children of all ethno-racial groups within each state and year. This will generate our sort of estimate of disparity gamma which is a rate ratio. And so if this rate ratio were equal to say 2 for American Indian children let's say, what that would mean would be that American Indian children in that particular state year are twice as likely as the average child in that state year to reside in foster care. We could calculate a rate difference but what's nice about rate ratios is that their scale-independent it doesn't matter if we're dealing with rates that are very low or very high, we can compare low and high rates or inequality in states with low and high rates. Whenever you're using rate ratios though it's important to use log scales in the analysis of the data particularly the visualization of the data so that you can accurately compare over and under representation of children in foster care. Again some more cautionary questions. We want to think carefully about what the population at risk of experiencing the event is. So for example if we were interested in confirmed maltreatment we could look at the number of confirmed maltreatments per 1000 children in the population or we could look at the number of confirmed maltreatments per 1000 reports of maltreatment and those rates would tell us different things about the incidence of maltreatment. By extension we're going to be talking about foster care today we could look at the incidence of foster care residents within the population or we could look at say the number of children who entered foster care as a proportion of the number of children who experience a maltreatment report. Those are equally interesting rates but which would tell us very different things about the child welfare process in the united states. I just mentioned a little bit about the difference between rate differences and rate ratios. Sometimes when we're studying inequality the scale or even direction of inequality can look a little bit different whether we're examining a rate ratio or a rate difference. So whenever you're examining sort of disparities in rates I would encourage you to examine both ratios and differences and examine whether they tell you similar or different stories. This slide is just to kind of preview where we're going. So we're going to end up generating I think this slide precisely in our in our demonstration a little bit later but I think this slide tells us a lot about what we've just discussed and where we're going. So let's take a minute to discuss what we're actually looking at in this slide. Well first on the horizontal axis here we have years so our data here are spanning roughly from 1982 to 2018. We're looking at three different states California, Minnesota, and Georgia. On the horizontal axis here we have the ratio of group substitute care to the rate for all children so this is our measure gamma our rate ratio where we've taken the rate of foster care residents for each ethanol ratio group and divided it by the rate of foster care residents for children of all ethno-racial groups or the whole the total child population in each state and year. I've gone ahead and drawn a line here at one. The reason for that is that if a dat were to fall on that line it would mean children of that ethno-racial group are just as likely to end up in foster care as children overall children from all ethno racial groups combined. Furthermore dots that fall above the line indicate that children from that group are over overrepresented in the foster care system and points falling below the line indicate that children are underrepresented in foster care system. So for example here in California if we were to look at these black points points for black children in the 2010s we would see that black children in California in the 2010s are about four times as likely to live in foster care as the average child in California. Hispanic children on the other hand have a fairly average experience of foster care residents in California. American Indian children particularly in Minnesota are extremely overrepresented in foster care, in recent years approximately 16 times more likely to reside in foster care than the average child in Minnesota. Conversely Asian and Pacific Islander children are generally underrepresented in the foster care system. I've also gone ahead and plotted points though of different sizes and the reason I've done this is that it illustrates how our inferences about the prevalence of foster care residents and inequality in the prevalence of foster care residents are based on different sized samples. So most of these dots say particularly from California are pretty large dots indicating that they're based on samples of a larger number of children. But you'll see here for Georgia for example some of these children American Indian and Asian and Pacific Islander children who we estimate to be fairly underrepresented in the system these estimates are based on a very small number of children who actually reside in foster care in those states. And so it's not surprising to see that these rates or these disparities in rates bounce around quite a lot. This is to say that we should have greater confidence in our estimates of inequality that are based on larger numbers of children than smaller numbers of children. We can expect our estimates to be more stable over time and we can sort of have greater confidence that those rates would obtain if we observed larger populations in those states. So that slide showed us sort of these estimates of disparities for just three states but we can zoom out and look at these disparities all across the united states. Okay now I'd like to transition to working in r in rstudio and actually show you how to calculate some of these estimates of disparities using some of these data sources that I've highlighted so bear with me just a moment while I sort of transition out of Powerpoint and into rstudio. Before I do that briefly I'll say I've included some links in this slide that will help you get started if you're not already an r user. Here is where you can download r the statistical language or the programming language. [onscreen https://cloud.r-project.org/ ] here's where you can download rstudio for your desktop which is a very helpful interface for actually programming in r. [onscreen https://www.rstudio.com/products/rstudio/download/ ] and then I've included two of my favorite resources for learning r and rstudio which has been helpful for me in learning how to do child maltreatment research in r. [onscreen https://rstudio-education.github.io/hopr/ https://r4ds.had.co.nz/index.html] Okay so let me pull up rstudio and actually start doing some analysis. [voiceover] The program, written in r, is included in the downloadable files for the slides and the transcript. [Alex Roehrkasse] So here we've opened rstudio we have our script here which I've titled s6 corresponding to our presentation today and Alex gibbons has very helpfully linked the script. Follow along and annotate it if you haven't already downloaded rstudio you can still download an r file in a program like text edit or notepad it's just basically a text file. Again we're going to be walking through today some of the analyses from this demography paper which again is publicly available so you can you can read more about these types of analysis and and the kinds of inferences we make from them in that paper. Okay the first thing we want to do as always is clear our environment. I've already installed these packages we're going to be using four packages today so I won't install them again but if you've never used any of these packages you would need to install them to use them the first time. Data table is a good package for reading data, tidyverse a good package for data management data manipulation. Geofacet is a helpful package which I used to make that map of plots that you just saw and socviz is a helpful package for visualizing different types of social data. Okay I'll go ahead and load those four packages in. I'll set some file paths so that r knows where to read data from. I'll set a seed which we're not really going to be using today but I think is helpful practice to have in every piece of code you use and then I'm going to create it create an object here that just includes some information geographic identifiers for different states. I use this file in all different kinds of research and so I'd encourage you to just sort of have a file like this that you can use in in different sort of historical and geographic projects where you can transition from sort of state fips codes to state names to sort of two-letter abbreviations or even regional codings. Okay we're gonna work with real data today but I'm not allowed to I'm not able to show you sort of micro data in this context so what I've done is go ahead and pre-processed some afcars data to generate counts of children by state, by year and ethno ratio group as we talked about earlier. If you're interested in how to generate some of these sort of counts of children I'd encourage you to go back and look at our data management series from from earlier in the summer. Let's go ahead and read this object in though and let's take a look at it. Our afcars object represents counts of children which we'll call stock for different states, different years and different ethno racial groups. And I've go ahead and I've went ahead and created a variable source which is going to remind us that these data come from afcars. And you'll see that for some state year race combinations we don't have any data. But for many other years we do in the afcars. So already we're dealing with sort of some missing data in our historical analysis. Okay let's go ahead now and read in the VCIS and here I'll go ahead and read in the file just as you would read it if you downloaded it from ndacan. And we'll walk through how you would manipulate it in order to link it to count data from afcars. So let's go ahead and read in the after the VCIS and take a look at it. There's quite a lot of variables in the VCIS maybe it's more helpful to sort of look at the object itself. Okay here we have state fips codes, state names, the year for which different data were collected the rate the historical range for the VCIS is 1982 to 1995. And then we have different counts of children so this is the number of children in foster care at the beginning of the year, the number of children entering foster care in the fiscal year sort of further subdivided by the age of different children so each of these counts sort of cross tabulated by different attributes of children is going to represent a different column. If we were to go further to the side here, We would find children in foster care broken down by say sex children whose sex is female, children whose sex is male, children whose sex is unknown. Okay one of the things we're going to do here right off the bat is do a little bit of missing data management. So what you can see here is that if we were to examine the number of children in foster care in alabama we have counts for children in foster care who are black, who are Asian Pacific Islander, who are American Indian Alaska native but you'll see that there are also counts of children who are in foster care whose race and ethnicity is unknown. In many cases this is a small number of children but in other cases it's not, it's a larger number, and so we need to do something to account for those children in foster care with an unknown ethno-racial identity. So we're going to go ahead and do is generate well this tells us about how many children in each state year have a missing value for their race and ethnicity and we can see that in many cases on average each state has hundreds of children with missing ethno-racial information. What we're going to go ahead and do is impute missing ethno-racial information according to the distribution of observed non-missing values. So we're going to recode our sort of count of children in foster care who are white non-Hispanic as the number we observe, plus the number of children who have an unknown ethno-racial identity, multiplied by the observed proportion of children who have white non-Hispanic racial identity. And so if we do this for our different ethno-racial groups we'll basically redistribute this count of children with unknown ethno-racial identity to the different ethno-racial groups according to the observed distribution of ethno-racial identity. Let me go ahead and do that. Now we're going to pivot our data so that there'll be a sort of organized in a manner that will allow us to link them to afcars. So as you can see right now our different counts of children in care are organized sort of horizontally, right? Each column represents a count whereas our afcars data were organized differently where each ethnic racial group represents a row. So in this chunk of code the primary thing we're going to do is pivot our data longer and we're going to generate a variable named var which is going to have the names of our different ethno racial counts and we'll turn that into an ethno-racial group variable. So if we go ahead and run this chunk of code, We'll see that our VCIS data are now very nicely reorganized very similar to how our afcars data are organized. Okay we're going to link these two objects and so we want them to have the same structure and indeed this object now has five columns, similarly the VCIS has five columns all with the same variable names. In this paper we also rely on a couple other even earlier sources of historical data that go back to the 1970s and even the 1960s. These data aren't released yet and cleaning them is considerably more complicated than the VCIS so for now I'm just going to go ahead and read in another pre-processed file but you'll notice that it also has a very similar structure to the VCIS and afcars objects. Okay here's something that's kind of helpful the range for the VCIS is from 1982 to 1995. The range for the afcars is from 1995 through 2020. What this means is that we have data from two different sources in 1995 and we can compare our counts from these two different sources to see how reliable our estimates of children in care actually are. So with this chunk chunk of code does is bind our VCIS data to our afcars data, pulls out only our 1995 data. We're going to again pivot our data so that it's easier to calculate a ratio so now we'll calculate a ratio of the count of children in 1995 observed in VCIS and afcars. Let's go ahead and create that object and if we look at it now we can see that for each state, year, ethno-racial group we have a count for 1995 from the afcars and a count from the VCIS. In many cases we only have a count from one source for any given state by year by ethno-racial group but sometimes we have observations for both and so we can look at the ratio and see how those counts compare. We can furthermore visualize that comparison, here, so that we can see that the counts from the two sources for children of all ethno-racial groups are fairly similar and that the similarity in the counts is not sort of systematically related to the size of the population in foster care. [Alexandra gibbons] Yeah sorry can I interrupt you super quickly? Sure. We just have a quick question about what the imputation you performed is called and if it comes under hot deck imputation or multiple imputation? [Alex Roehrkasse] Great question what I just showed you would be something akin to a hot deck imputation. It's not a multiple imputation because I only imputed one version of the data. So multiple imputation would involve using statistical Like how confident are we that we've correctly estimated the missing values so the validity of the missing data strategy I just used is predicated on a number of assumptions that you might criticize you might not believe. So I'd encourage you not just to sort of blindly do what I've done here. I chose that method for a variety of reasons including that it's fairly simple but there it's not going to be applicable or defensible in a number of other contexts. So sort of a more extended discussion of of a missing data imputation is sort of beyond the scope of our our time here today. But if you ever have any questions about the appropriate sort of method of of imputation to use including with these types of data sets I'm always happy to answer those questions on a case-by-case basis. Um I'm seeing another question from Mary that with NCANDS data there are variation in the states and years in terms of how consistently race and ethnicity is reported. This makes it difficult to track over time. Do I have any suggestions for dealing with this challenge? Yeah so in in the analysis I've just shown you I've sort of imputed some some some missing observations but there are whole years for which we have no information about particular ethno racial groups. I'm about to show you how to estimate some some counts of children across all ethno-racial groups. We won't actually work through the imputation of sort of entire missing populations for different ethno-racial groups in this analysis but I'll just sort of say that the same kinds of missing data strategies that you might use in other contexts are going to be equally appropriate here except that you may be using more time series models to impute your data compared to say computing data using a cross-sectional survey. Okay I said earlier that our sort of our most important measure of ethno-racial disparities is going to be rate ratios but before we calculate a rate ratio we need to calculate a rate and those rates depend on some underlying population. And ndacan itself does not sort of maintain or publish information on the sort of underlying populations of children at risk of maltreatment and so we need to pull that data from somewhere else. Again I'm going to read a pre-processed file here sort of collecting data from some of these earlier years in particular is quite complicated but suffice it to say that I've collected these data from various sources from predominantly from the census bureau using data management techniques that again we we highlight in our first summer training series session. Looking at the structure of this denominator data though we see that again it's very similar in structure to our other sort of maltreatment sources from which we'll pull our numerators. So here in our denominator object we have for each state, year, and ethno-racial group a count of all children in the united states belonging to that group. I've listed here a number of different sources that I often draw population data from which one is best will depend on just the specific nature of your project. Okay let's go ahead and link these different data that we've pulled together. To link our numerator data all we have to do is bind the data. Binding in r is equivalent to a pending in stata and really all it is is stacking these objects on top of one another. And we can stack these objects precisely because they have the same column names and the same number of columns. So for these three sources from which we're pulling numerator data we'll just stack them on top of one another, and then we'll arrange them according to the values of state, year, and race. So let's go ahead and do that and we'll get this object d. Now we'll see that for each state we have information going back all the way 19 to 1960 for different ethno-racial groups. Again we have lots of missing data. We would have to have a longer conversation about precisely how we would want to deal with some of these missing values, and we've sort of named the source from which each piece of information comes from. So if we were to scroll down we'd see that at some point our source shifts from the children's bureau statistical series data to the VCIS. And then further down still we'll start using the afcars data. The way we'll incorporate our denominator data will be a little bit different though. Instead of binding the data we'll join the data, and this is because instead of stacking our population counts below all of these counts of children in foster care, what we'd really like to do is have a separate column here that counts the number of children at risk of living in foster care. So to merge a file or an object sort of from the side we'll go ahead and use a command called join. Now let's look at our data object again and we'll see that we now have a population count of children corresponding to our counts of children in foster care in any given year. I'm going to go ahead and create an analysis file that basically combines and averages our counts of children in the VCIS and afcars in that year in particular. But I'll keep a raw file that will allow us to examine how well our coverage is sort of how how how how much coverage we have across our different groups over time. Let's go ahead and take a snapshot of how much data we actually have. So here I'm just sort of organizing our data into different counts and let's go ahead and create a visualization here so this plot is actually in the appendix of the paper I've discussed and I think this plot is a nice summary of what kind of data is available through NCANDS in terms of sort of historical analysis of ethno-racial inequality. You can see that for children of all ethno-racial groups we have information going all the way back to the 1960s. Some of these counts of children across all ethno-racial groups though get a little bit spottier in the sort of mid to late 20th century and it's only in the 21st century that we have complete counts again of all children. And to be clear what we're counting here the number of states in any given year that have non-missing values for any given ethno-racial group. So what you can see is that our information about children's race and ethnicity really are quite good in the afcars but only after the year 2000. Afcars data in the 1990s was highly missing and our data for the VCIS include a highly variable number of observations across different ethno-racial groups and different states. Uh someone earlier asked a question about how we might make estimates of counts of children in years where we just have no data at all. So what I'd like to show you is how to how to estimate some of these missing values not for different not for children of different ethno-racial identities but just for all children all together. You could use this method though to make imputations about children of different races and ethnicities as well. Let's go ahead and create a prediction function which is just going to help us it's just a function that sort of makes it nice and easy to sort of predict different values based on on different models. We're going to create an object that just includes blank values of different years and then we're going to go ahead and apply our prediction function to our data and generate a set of predictions about what some of these missing values might be. Go ahead and run this chunk of code. And now let's visualize the results of our predictions. So here what you can see is again a map where the red dots represent actually observed rates of children in foster care in any given state year. Again this is for children of all races and ethnicities but then you'll also see a blue line and a blue ribbon representing a 95 confidence interval. And so these blue lines and blue ribbons represent model-based estimates of what the rates of foster care might be in these state years where we don't actually observe children in care. I can't get too much into the nature of this model it's a sort of locally-estimated regression but this is just to illustrate how you might think about dealing with missing data over time. You can use the information from particular states in in years that we we that we observe to make guesses about trends in between. Okay let's turn back though to this question of ethno-racial inequality. I'm going to read in just a little more pre-processed data here that's just a few more minor sort of modifications of our our sort of earlier objects specific to our purposes here. Let's go ahead and look at some of these objects. Again what we have here is sort of different states and different years and different ethno-racial groups. We have counts of children in care, counts of children in that state year belonging to that ethno-racial group, and then we can calculate a rate of children in foster care by dividing one by the other. Now what we want to do though is calculate the ratio of rates and what I mean by this is again sort of examining say in Alabama in 1995 say what is the ratio of oh I'm looking at the wrong object here if we look at sort of children say in Alabama in 1982 we see that the rate of foster care for all children is about 3.4 children residing in foster care per 1000 children. We might be interested in how that rate compares to say the rate of black children in foster care in that same state in that same year. What we want is to generate a ratio of this rate of black children among black children to the rate among children of all ethno-racial identities in that very same state and year. This sort of long chunk of code basically does just those calculations, and plots them and here we see this very same plot that we were looking at earlier where we've taken our ethno racially specific rates of foster care, divided them by the rate for children of apologies children of all racial ethnicities racial and ethnic identities and plotted this ratio here for different states. So what we can learn from this is that for example again children black children in California are overrepresented in foster care, American Indian and Alaska native children are slightly less overrepresented in in foster care in California, white children are broadly underrepresented in foster care in California, and at least in recent years Hispanic children have a roughly average incidence of foster care residence in California. We can then do the same sorts of calculations by summing up our counts from different states right to generate national estimates of rates of children in foster care and rate ratios as well. Again by grouping and summarizing our data. Here here's a sort of I think a chunk of code that helps illustrate precisely how we're doing these calculations. When we pivot our data wider we're able to see that we can calculate a rate ratio for different groups by dividing the ethno-racially specific group rate by the total rate. And when we do this we're able to similarly generate rates and rate ratios for oops, apologies, we need one more package here. I won't talk through this code but I hope that you'll explore it yourself because it just includes some sort of helpful tactics for visualizing data specifically making plots of rates on the one hand and plots of rate ratios on the other hand, combining those plots into a grid, extracting a legend for the plot, and plotting it. So here I visualize national rates of foster care per 1000 children for different ethno-racial groups but you'll notice I've centered my analysis beginning in the year 2000 and that's because from a national perspective it's only in the year 2000 that we have complete count information for children of all different ethno-racial groups in every single state. If you wanted to extend these national time series backward in time you could but you would use you would use you would need to use some sort of defensible missing data imputation strategy. So here sort of to conclude what you can see is by summing up these different ethno-racially geographically specific accounts to the national level we can not only look at sort of trends and disparities in rates of foster care but we can calculate ratios that help us understand the scale of inequality. So here again American Indian children are in recent years somewhere around three to four times more likely to reside in foster care than the average u.s. child. White children continue to be underrepresented in the foster care system although their underrepresentation is decreasing over time. We could similarly plot these differences in terms of instead of rate ratios and that would be an important robustness check for our findings about ethno-racial inequality in children's foster care today. So I want to grant that some of this code we moved through quite quickly. In many cases though this code is actually quite simple if you were to look at it it's a lot of sort of labeling and sort of reorganization and so I don't think it's worth talking through line by line in this setting but my hope is that you'll sort of use this code as a template to do some of your own analyzes. My hope is that you'll reach out if you have any questions about the specifics of of the analyzes I've talked about today but my preference would be to leave the remainder of our time for questions whether about those sort of sources that ndacan offers for studying racial inequality, the methods we've used today, or the kinds of questions or strategies we might use to study ethno-racial inequality in child maltreatment in the united states today. Thanks for your attention and I look forward to your questions. So I see a question from holly about the difference between the term disparity and the term disproportionality. Which approach am I using here? Well first I want to concede that there there may be sort of specific uses of that term of which I'm not aware but broadly speaking I think of disparities as sort of any measure for measuring inequality, right? So anytime we're talking about disparity we're talking about groups that have a disparate prevalence or disparate incidence of some experience. When I see the word disproportionality I think myself a little bit more in terms of ratios. And so here again we're looking at rates of foster care residence here and then here we're looking at rate ratios. I think this graph I think is is more or less exactly measuring what we would usually call disproportionality, right? So when a group's sort of estimates here fall above this line what we're talking about is a disproportionately large incidence of residence in foster care. When we look at Among that group, right? If we were to look at differences between these rates we that would be I think a measure of disparity but not necessarily disproportionality and so at least in terms of my own use of these terms I think of disproportionality as being a sort of more specific way of measuring a disparity. So I hope that answers holly's question I'll move on to richard's. Disparities in the child welfare system are pronounced. Now that we established that are there any states or other entities doing something about that or any of you are doing to address this issue? Well Richard that's a big question there are many people out there trying to do lots about these disparities I think it's beyond the scope of our our talk here today. I myself find that that trying to understand these disparities better on their own is is actually quite important. We can take it for granted that these disparities exist sometimes just as we can take it for granted that they might be sort of smaller than they actually are. I think it's very valuable to measure these things precisely, find new ways to measure them, leverage data to measure them with greater confidence than we had before. My hope is that by talking about these data and these methods I can encourage other people to do that kind of research. I hope you can find people out there who are sort of working on this issue to to improve these these issues of disproportionality. Heather: are there any recent publications that have used approaches similar to what you demonstrated today with NCANDS data that you can share? It always helps to see published examples. Yeah so heather ndacan maintains a library of all well at least we hope all publications using ndacan data and so I'd encourage you to go to our website and you can find a lot of bibliographic references there or bibliographic resources there to find papers that that use our data for for studying these kinds of questions. Aaron's put the link right there in the chat it's called candl I'd point you in that direction to find good studies that resource will be able to sort of connect you with more research that I'm able to on my own. [Erin McCauley] Also a plug and reminder that if you publish a study with our data let us know so we can add you to canDL. [Alex Roehrkasse] Definitely. It's a great way for people to find your research. [Erin McCauley] All right well we have another few minutes so I'll leave the q and a open. But also just a plug that this is the final session of the ndacan summer training series 2022. Because you all have obviously attended at least one event you'll be receiving an email from me in the next week with a survey asking feedback on this series. You know we use the results of that survey to kind of justify the continuation of the program and then also source ideas for next summer's training series theme. Linking administrative data and then pairing it with analysis workshops was y'all's idea, so if you have any fun new ones or something that you really want to see next year make sure you follow the survey so that we make sure to include it. Thanks everyone for being here and thank you Alex for coming to do this presentation after many flight delays and cancellations and very little sleep. [Alex Roehrkasse] Thanks everyone for being here I really appreciate your great questions. I hope you'll be in touch if you have any questions about this kind of research whether my research or your research I'm always really excited to talk about people who are working on racial disparities in child welfare. There are a number of other staff members at ndacan who would be equally excited to talk about this work so please don't hesitate to reach out we'd love to talk to you about your work. [voiceover] The National Data Archive on Child Abuse and Neglect is a collaboration between Cornell University and Duke university. Funding for NDACAN is provided by the Children's Bureau an office of the Administration for Children and Families. [musical cue] [Full text of R code] # NOTES # THIS PROGRAM FILE DEMONSTRATES SOME OF THE METHODS # FOR STUDYING RACIAL INEQUALITY DETAILED IN # SESSION 6 OF THE 2022 NDACAN SUMMER TRAINING SERIES # COMPARE WITH ROEHRKASSE (2021) "LONG-TERM TRENDS AND ETHNORACIAL # INEQUALITY IN U.S. FOSTER CARE" # https://doi.org/10.1215/00703370-9411316 # FOR QUESTIONS, CONTACT ALEX ROEHRKASSE (AFR23@DUKE.EDU; ALEXR.INFO) # 0. SETUP # Clear environment rm(list=ls()) # Install packages (only necessary once) install.packages(c('data.table', 'tidyverse', 'geofacet', 'socviz)) # Loads packages library(data.table) library(tidyverse) library(geofacet) library(socviz) library(cowplot) # Set filepaths afrmac <- '/Users/alexanderroehrkasse/Library/CloudStorage/Box-Box/Presentations/-NDACAN/2022_summer_series/' afrpc <- 'C:/Users/aroehrkasse/Box/Presentations/-NDACAN/2022_summer_series/' # Set working directory wd <- afrpc # Toggle filepath here to change working directory setwd(wd) # Set seed set.seed(1013) # Inputs state codes stcodes <- fread("data/stcodes.csv") %>% mutate(fips = stfips) # 1. READS, CLEANS NUMERATOR DATA # AFCARS afcars <- fread("data/afcars_counts.csv") view(afcars) # VCIS vcis <- fread("data/vcis_clean.csv") head(vcis) # Impute missing race according to distribution of non-missing values vcis %>% group_by(Year) %>% summarize(missrace = mean(InRacEthUnk, na.rm = T)) vcis <- vcis %>% mutate(InRacEthW = InRacEthW + InRacEthUnk * (InRacEthW / (InRacEthTot - InRacEthUnk)), InRacEthH = InRacEthH + InRacEthUnk * (InRacEthH / (InRacEthTot -InRacEthUnk)), InRacEthB = InRacEthB + InRacEthUnk * (InRacEthB / (InRacEthTot - InRacEthUnk)), InRacEthAPI = InRacEthAPI + InRacEthUnk * (InRacEthAPI/ (InRacEthTot - InRacEthUnk)), InRacEthAIAN = InRacEthAIAN + InRacEthUnk * (InRacEthAIAN / (InRacEthTot - InRacEthUnk))) # Pivot, relabel for join vcis <- vcis %>% select(StateName, Year, starts_with('InRacEth')) %>% pivot_longer(-c(StateName, Year), names_to = "var", values_to = "stock") %>% mutate(race = case_when(var == "InRacEthTot" ~ "Total", var == "InRacEthW" ~ "White", var == "InRacEthH" ~ "Hispanic", var == "InRacEthB" ~ "Black", var == "InRacEthAIAN" ~ "American Indian/Alaska Native", var == "InRacEthAPI" ~ "Asian/Pacific Islander"), source = "VCIS") %>% filter(StateName != "Puerto Rico" & StateName != "Virgin Islands") %>% filter(!is.na(race)) %>% rename(year = Year, state = StateName) %>% select(-var) # CHILDREN'S BUREAU STATISTICAL SERIES / NATIONAL CENTER FOR SOCIAL STATISTICS cbss_ncss <- fread('data/cbss_ncss.csv') view(cbss_ncss) # ROBUSTNESS CHECK: COMPARE VCIS AND AFCARS IN 1995 # Creates dataframe compare <- vcis %>% bind_rows(afcars) %>% select(state, year, race, source, stock) %>% filter(year == 1995) %>% arrange(state, race, year, source) %>% pivot_wider(id_cols = c(state, year, race), names_from = source, values_from = stock) %>% mutate(ratio = VCIS / AFCARS) %>% mutate(race = factor(race, levels = c("Total", "American Indian/Alaska Native", "Asian/Pacific Islander", "Black", "Hispanic", "White"), labels = c("Total", "American Indian/Alaska Native", "Asian American/Pacific Islander", "Black/African American", "Hispanic/Latino", "White"))) # Plots stocks compare %>% ggplot(aes(x = AFCARS, y = ratio, weight = AFCARS)) + geom_hline(yintercept = 1, linetype = "dashed") + geom_point() + geom_smooth(method = "lm", se = TRUE) + facet_wrap(~ race) + scale_x_continuous(trans = "log10", breaks = c(10, 100, 1000, 10000, 100000), labels = c("1e1", "1e2", "1e3", "1e4", "1e5")) + scale_y_continuous(trans = "log2", ) + labs(x = "AFCARS Count of Children in Foster Care", y = "Ratio of VCIS Count to AFCARS Count", color = NULL) + theme_bw() + theme(strip.background = element_blank(), legend.position = "bottom") # 2. READS IN, STANDARDIZED DENOMINATOR DATA # PRE-PROCESSED DATA FROM SEER/CB denom <- fread('data/denom.csv') view(denom) # GOOD SOURCES OF POPULATION DATA # https://wonder.cdc.gov/ # https://www.nhgis.org/ # https://usa.ipums.org/usa/ # https://seer.cancer.gov/popdata/ # 3. MERGES DATA # Binds numerators data d <- cbss_ncss %>% bind_rows(vcis) %>% bind_rows(afcars) %>% arrange(state, year, race) # Joins denominator data d <- d %>% full_join(denom) %>% arrange(state, year, race) # Creates analysis file that averages 1995 VCIS/AFCARS values da <- d %>% mutate(source = replace(source, year == 1995, "VCIS/AFCARS")) %>% group_by(state, year, race, source) %>% summarize(stock = mean(stock, na.rm = TRUE), pop = mean(pop, na.rm = TRUE)) %>% mutate(stock = replace(stock, is.nan(stock), NA), pop = replace(pop, is.nan(pop), NA), stock_rate = stock / pop * 1000) %>% select(source, state, year, race, stock, pop, stock_rate) %>% left_join(stcodes) %>% filter(state != "Puerto Rico" & state != "Virgin Islands") %>% ungroup() %>% mutate(race = factor(race, levels = c("Total", "White", "Black", "Hispanic", "Asian/Pacific Islander", "American Indian/Alaska Native")), source = factor(source, levels = c("CBSS", "VCIS", "VCIS/AFCARS", "AFCARS"))) %>% rename(childpop = pop) # Creates raw file for analysis of missing values dr <- d %>% mutate(stock = replace(stock, is.nan(stock), NA), stock_rate = stock / pop * 1000) %>% select(source, state, year, race, stock, pop, stock_rate) %>% left_join(stcodes) %>% filter(state != "Puerto Rico" & state != "Virgin Islands") %>% ungroup() %>% mutate(race = factor(race, levels = c("Total", "White", "Black", "Hispanic", "Asian/Pacific Islander", "American Indian/Alaska Native")), source = factor(source, levels = c("CBSS", "VCIS","AFCARS"))) %>% rename(childpop = pop) # 4. ANALYZES MISSING DATA stock_miss <- dr %>% filter(!is.na(stock)) %>% mutate(source = source %>% as.character()) %>% group_by(year, source, race) %>% summarize(statecount = n()) %>% ungroup() %>% full_join(dr %>% filter(fips == 1) %>% # selects one set of race-year rows select(year, race, source)) %>% arrange(year, race, source) %>% mutate(statecount = ifelse(is.na(statecount), 0, statecount), source = ifelse(year %in% c(1969, 1971, 1972, 1976:1981) | (year < 1982 & race != "Total"), "No source", source), source = ifelse(year %in% c(1970, 1973:1975) & race == "Total", "NCSS", source), source = factor(source, levels = c("CBSS", "NCSS", "VCIS", "AFCARS", "No source")), race = factor(race, levels = c("Total", "American Indian/Alaska Native", "Asian/Pacific Islander", "Black", "Hispanic", "White"), labels = c("Total", "American Indian/Alaska Native", "Asian American/Pacific Islander", "Black/African American", "Hispanic/Latino", "White"))) stock_miss %>% ggplot(aes(x = year, y = statecount, color = source, fill = source, shape = source)) + geom_point(size = 1) + facet_wrap(~ race) + scale_shape_manual(values = c(21,3,22,4,24)) + scale_fill_manual(values = c("blue", "brown", "red", "forestgreen", "black")) + scale_color_manual(values = c("blue", "brown", "red", "forestgreen", "black")) + scale_x_continuous(breaks = seq(1970, 2010, 20)) + labs(x = "Year", y = "Number of States with Nonmissing Data", color = NULL, shape = NULL, fill = NULL) + theme_bw() + theme(strip.background = element_blank()) # 5. GENERATES STATE TOTAL ESTIMATES # What proportion of children 2000 onward are multi-racial (and therefore excluded from group analysis)? multiracial <- da %>% filter(year >= 2000) %>% mutate(istotal = race == "Total") %>% group_by(istotal) %>% summarize(stock = sum(stock)) 1-(multiracial[1,2]/multiracial[2,2]) # PREDICTION SETUP # https://stackoverflow.com/questions/52331501/using-predict-function-for-new-data-along-with-tidyverse # Prediction function pred <- function(x, ...) { z <- predict(x, se = TRUE, ...) as.data.frame(z[1:2]) } # New data frame for predictions newdat <- data.frame(year = 1961:2018) # PREDICTS TOTAL STOCK RATES state_stock_rate_est <- da %>% filter(race == "Total") %>% left_join(stcodes) %>% group_by(fips) %>% nest() %>% mutate(models = purrr::map(data, ~ loess(stock_rate ~ year, span = .2, data = .)), predictions = purrr::map(models, ~ pred(., newdata = newdat))) %>% select(fips, predictions) %>% unnest(cols = c(predictions)) %>% mutate(year = 1961:2018, moe = se.fit * 1.96, lower = fit - moe, upper = fit + moe) %>% left_join(stcodes) %>% left_join(da %>% filter(race == "Total") %>% select(state, fips, year, stock_rate)) %>% mutate(point = ifelse(is.na(stock_rate), fit, stock_rate), l = ifelse(is.na(stock_rate), lower, stock_rate), u = ifelse(is.na(stock_rate), upper, stock_rate), est = ifelse(is.na(stock_rate), "Estimated", "Observed")) # PLOTS WITH LOESS SMOOTHING state_stock_rate_est %>% rename(code = ab) %>% ggplot(aes(x = year)) + geom_ribbon(aes(ymin = lower, ymax = upper), fill = "blue", alpha = .2) + geom_line(aes(y = fit), color = "blue", size = .35) + geom_point(aes(y = stock_rate), size = .25, color = "red") + facet_geo(~ code, grid = "us_state_grid1", label = "code") + labs(x = "Year", y = "Children in Foster Care per 1,000", color = NULL, fill = NULL, size = 12) + theme_bw() + scale_x_continuous(breaks = c(1970, 1990, 2010)) + geom_text(aes(x = -Inf, y = Inf, label = code, group = code), size = 4, color = "black", hjust = -0.1, vjust = 1.1) + theme(axis.text.x = element_text(angle = 45, vjust = 1.1, hjust = 1), strip.background = element_blank(), strip.text.x = element_blank(), panel.border = element_rect(colour = "black"), axis.title = element_text(face = "bold")) # Note that the above, manual process is identical to using geom_smooth(method = "loess") # state_stock_rate_est %>% # rename(code = ab) %>% # ggplot(aes(x = year, y = stock_rate)) + # geom_smooth(span = .2, linewidth = .1) + # geom_point(size = .25, color = "red") + # facet_geo(~ code, grid = "us_state_grid1", label = "code") + # labs(x = "Year", y = "Children in Substitute Care per 1,000", color = NULL, fill = NULL, size = 12) + # theme_bw() + # scale_x_continuous(breaks = c(1970, 1990, 2010)) + # geom_text(aes(x = -Inf, y = Inf, label = code, group = code), size = 4, color = "black", hjust = -0.1, vjust = 1.2) + # theme(axis.text.x = element_text(angle = 45, vjust = 1.1, hjust = 1), # strip.background = element_blank(), # strip.text.x = element_blank(), # panel.border = element_rect(colour = "black")) # SUBSET OF STATES state_stock_rate_est %>% filter(state %in% c("California", "Minnesota", "Georgia")) %>% mutate(state = factor(state, levels = c("California", "Minnesota", "Georgia"))) %>% ggplot(aes(x = year, y = stock_rate)) + geom_ribbon(aes(ymin = lower, ymax = upper), fill = "blue", alpha = .2) + geom_line(aes(y = fit), color = "blue") + geom_point(aes(y = stock_rate), color = "red") + facet_wrap(~ state) + labs(x = "Year", y = "Children in Foster Care per 1,000", color = NULL, fill = NULL, size = 12) + theme_bw() + scale_x_continuous(breaks = c(1970, 1990, 2010)) + theme(strip.background = element_blank(), panel.border = element_rect(colour = "black")) # 6. CALCULATE STATE RACIAL RATE RATIOS # READS MORE PRE-PROCESSED DATA dr <- fread("data/dr.csv") dtr <- fread("data/dtr.csv") # PLOT A SUBSET OF STATES dr %>% filter(state %in% c("California", "Minnesota", "Georgia")) %>% mutate(state = factor(state, levels = c("California", "Minnesota", "Georgia"))) %>% mutate(smalln = case_when(stock %in% 1:9 ~ "1-9", stock %in% 10:49 ~ "10-49", TRUE ~ "50+") %>% factor(levels = c("1-9", "10-49", "50+"))) %>% mutate(race = factor(race, levels = c("White", "Black", "Hispanic", "American Indian/Alaska Native", "Asian/Pacific Islander"))) %>% ggplot(aes(x = year, y = stock_rate_ratio, color = race, shape = race, size = smalln)) + geom_hline(yintercept = 1, size = .5, color = "darkgray") + geom_point() + facet_wrap(~ state) + labs(x = "Year", y = "Ratio of group substitute care rate\nto rate for all children", size = "Number of children in substitute care", color = "Ethnoracial group", shape = "Ethnoracial group") + theme_bw() + scale_color_manual(values = c("red", "blue", "green4", "brown", "purple")) + scale_shape_manual(values = c(4,3,19,20,22)) + scale_size_manual(values = c(1,2,3)) + scale_y_continuous(trans = "log2", breaks = c(.0625, .25, 1, 4, 16), labels = c("1/16", "1/4", "1", "4", "16")) + scale_x_continuous(limits = c(1980, 2020), breaks = c(1980, 1990, 2000, 2010, 2020), labels = c("'80", "'90", "'00", "'10", "'20")) + theme(strip.background = element_blank(), legend.position = "bottom", legend.box = "vertical", legend.spacing.y = unit(-6, "pt")) # 7. CALCULATE NATIONAL RACIAL RATE RATIOS # Tests first year when each ethnoracial group has an observation in each state dnatrace_test <- dtr %>% group_by(race, year) %>% summarize(stock = sum(stock, na.rm = FALSE)) %>% filter(!is.na(stock)) %>% group_by(race) %>% summarize(minyear = min(year)) # CREATES DATAFRAME OF RATES dnatrace_rate <- dtr %>% filter(year >= 2000) %>% group_by(year, race) %>% summarize(stock = sum(stock), childpop = sum(childpop)) %>% mutate(stock_rate = stock / childpop * 1000) %>% ungroup() %>% rename(stat = stock_rate) %>% mutate(var = "Children in Substitute Care per 1,000") %>% mutate(race = factor(race, levels = c("American Indian/Alaska Native", "Asian/Pacific Islander", "Black", "Hispanic", "White", "Total"), labels = c("American Indian/Alaska Native", "Asian American/Pacific Islander", "Black/African American", "Hispanic/Latino", "White", "Total"))) %>% select(var, year, race, stat) # CREATES DATAFRAME OF RATE RATIOS dnatrace_ratio <- dtr %>% filter(year >= 2000) %>% group_by(year, race) %>% summarize(stock = sum(stock), childpop = sum(childpop)) %>% mutate(stock_rate = stock / childpop * 1000) %>% ungroup() %>% mutate(racecode = paste0("stockrate_", rep(c("aian", "aapi","b","h","t","w"), 19))) %>% pivot_wider(id_cols = c(year), names_from = racecode, values_from = stock_rate) %>% mutate(stockrateratio_aian = stockrate_aian / stockrate_t, stockrateratio_aapi = stockrate_aapi / stockrate_t, stockrateratio_b = stockrate_b / stockrate_t, stockrateratio_h = stockrate_h / stockrate_t, stockrateratio_w = stockrate_w / stockrate_t) %>% select(year, contains("ratio")) %>% pivot_longer(-year, names_to = "variable", values_to = "stat") %>% mutate(var = "Ratio of Group Substitute Care Rate\nto Rate for All Children") %>% mutate(race = case_when(variable == "stockrateratio_aian" ~ "American Indian/Alaska Native", variable == "stockrateratio_aapi" ~ "Asian/Pacific Islander", variable == "stockrateratio_b" ~ "Black", variable == "stockrateratio_h" ~ "Hispanic", variable == "stockrateratio_w" ~ "White")) %>% mutate(race = factor(race, levels = c("American Indian/Alaska Native", "Asian/Pacific Islander", "Black", "Hispanic", "White", "Total"), labels = c("American Indian/Alaska Native", "Asian American/Pacific Islander", "Black/African American", "Hispanic/Latino", "White", "Total"))) %>% select(var, year, race, stat) # BIND RATES AND RATE RATIOS dnatrace <- dnatrace_rate %>% bind_rows(dnatrace_ratio) # PLOTS # Rates p_dnatrace_rate <- dnatrace_rate %>% ggplot(aes(x = year, y = stat, group = race)) + geom_line(color = "gray") + geom_point(aes(color = race, fill = race, shape = race)) + scale_shape_manual(values = c(21,3,22,4,24,8)) + scale_fill_manual(values = c("blue", "brown", "red", "forestgreen", "black", "purple")) + scale_color_manual(values = c("blue", "brown", "red", "forestgreen", "black", "purple")) + labs(x = NULL, y = "Children in Foster Care\nper 1,000", color = NULL, fill = NULL, shape = NULL) + guides(color = guide_legend(ncol = 1)) + theme_bw() + theme(strip.background = element_blank(), axis.title = element_text(face = "bold")) p_dnatrace_rate # Ratios p_dnatrace_ratio <- dnatrace_ratio %>% ggplot(aes(x = year, y = stat, group = race)) + geom_line(color = "gray") + geom_point(aes(color = race, fill = race, shape = race)) + geom_hline(yintercept = 1, size = .5, color = "darkgray") + scale_shape_manual(values = c(21,3,22,4,24,8)) + scale_fill_manual(values = c("blue", "brown", "red", "forestgreen", "black", "purple")) + scale_color_manual(values = c("blue", "brown", "red", "forestgreen", "black", "purple")) + scale_y_continuous(trans = "log2", breaks = c(.125, .25, .5, 1, 2), labels = c("1/8", "1/4", "1/2", "1", "2")) + labs(x = "Year", y = "Ratio of Group Foster Care Rate\nto Total Foster Care Rate", color = NULL, fill = NULL, shape = NULL) + theme_bw() + theme(strip.background = element_blank(), axis.title = element_text(face = "bold")) p_dnatrace_ratio # Combines into grid of plots grid <- plot_grid(p_dnatrace_rate + theme(legend.position = "none"), p_dnatrace_ratio + theme(legend.position = "none"), ncol = 1, axis = "tblr", align = "hv") grid # Extracts a legend from the rates plot legend <- get_legend(p_dnatrace_rate) # Assigns the legend to the grid plot <- plot_grid(grid, legend, ncol = 2, rel_widths = c(1, .75)) plot