[Erin McCauley] All right folks, it is noon. So, I figured that we'd hop right in to the NDACAN Summer Training Series. It is hosted by the National Data Archive on Child Abuse and Neglect, which is also where most of our presenters work. We are housed in the Bronfenbrenner Center for Translational Research at Cornell University, although [as of] this month, we are also associated with Duke University. So, one of our co-Directors, Dr. Christopher Wildeman, is moving to Duke University, and some of our staff are as well. But we maintain our affiliation with Cornell University, and about half of our staff still work there. So, now we are both Cornell and Duke. This series is focused on new horizons in child welfare data. So, if you've been with us this summer, you know that we were focusing on our historical data acquisitions, our new data. So, it's pretty exciting. That can be linked in with our administrative data, so we thought we'd spend the second half of our summer looking at the administrative data. So, we'll be going through our three primary datasets today, linking next week, (I'm about to look at the schedule) with a research example after. The Children's Bureau is who contracts with us to archive the data, and then we are NDACAN, National Data Archive on Child Abuse and Neglect, or sometimes we just say the Archive. Here is our summer overview. So, we started July First, we're wrapping the first week of August, and we've really be chugging along. We have Tammy White with us, who is from the Administration for Children and Families. She is really an expert on this data, so we want to give her a moment to introduce herself and talk a little bit about our [administrative] data. [Tammy White] Hi everyone, welcome! I want to echo Erin's welcome to the fourth series of the Archive Summer Series. This is a great one. I am Tammy White. I'm a data analyst at the Children's Bureau on our data analytics research team. I am excited about this one because it is about our NCANDS, AFCARS, and NYTD datasets, the acronyms you will learn later on if you don't already know them. The great thing about our datasets from the Children's Bureau is that they can all be linked using common identifiers. You'll hear from the Archive folks on how that is to be done, an overview of them, and how you can link them at the end. So, I don't want to take up any time, but I will be around for the session, listening in. I will be here at the end for questions and answers if you have any. But I'm excited to introduce the archive. [Erin McCauley] Thank you for that introduction. If you've been with us previous summers, Tammy has been one of our lead presenters. So, hopefully we will have her back in that role soon. We're just really grateful that she's here to lend her expertise, especially during the question and answer. Thank you, Tammy. So, now I'm going to pass it over to Clayton Covington, who is going to be our primary presenter for this session. I'm going to come back at the end to preview what is coming next, but for now, I'm going to pass it over to Clayton. [Clayton Covington] Hello everyone! My name is Clayton Covington, as Erin said. Thank you, Erin and Tammy for those introductions. I currently serve as a Research Associate in the Department of Sociology at Duke University, and I'm also a Research Aide for the National Data Archive on Child Abuse and Neglect. I've been working at NDACAN for little over a year, and the primary thing that I will be doing for you all today is-both for those of you who are super familiar with our data and also for people who are looking to see how the administrative cluster can be utilized in your own research-give a somewhat detailed overview of the cluster. But I want to emphasize that this is by no means exhaustive, and we have a lot of resources to view later. So, the administrative data cluster consists of both mandated and voluntary data that is not collected exclusively for research purposes, as they are often used by policy makers and government affiliates. This cluster covers the progression of individuals through the child welfare system. Specifically, the National Child Abuse and Neglect Data System or NCANDS provides insight into child protective histories. The Adoption and Foster Care Analysis Reporting System (AFCARS), highlights the experiences of children while in foster care. And, finally, the National Youth in Transition Database, also known as NYTD, addresses youths transitions out of care. So, to start with NCANDS. Created in 1988 as a response to the Child Abuse Prevention and Treatment Act, also known as CAPTA, amendment, NCANDS was created as a voluntary system in which case-level data are collected for all children who received a response from a child protective services or CPS agency in the form of an investigation response or an alternative response. Data are collected annually during the federal fiscal year for the purpose of tracking the volume and nature of child maltreatment in the United States. The first major grouping of the data is the NCANDS Child File. This file contains child-specific records for each report of alleged child abuse and neglect that received a CPS response. Each individual record is organized in what is known as a "Report-Child Pair," which is a combination of the Report ID and Child ID. A child identifier may appear in more than one record because the child could be included in more than one report. Similarly, a report identifier may repeat, because there will be a separate record for each child on the report. However, no two records will have the same Report-Child pair ID within the same year. A Child File contains many reports extracting data from administrative records in all fifty states, the District of Columbia, and Puerto Rico. Most CPS agencies use a two-step process to respond to allegations of child maltreatment: first being the screening and second being the investigation and alternative response. A CPS agency receives an initial notification, called a referral, alleging child maltreatment. A referral may involve more than one child. Referrals that meet CPS agency criteria are screened in (and called reports) and they receive an investigation or alternative response from the agency. To be more specific, a single child file contains data related to only one child in a given report; data representing a victim or non-victim, including up to four allegations of maltreatment and the decision regarding the allegation; data in all fields for victim records, including any new fields that were added subsequent to its creation in 2001; and finally, data concerning up to three perpetrators for victim records. So, for example, a fully saturated individual child file record could contain data on one child and three perpetrators with four child maltreatment allegations per perpetrator, totaling twelve allegations of maltreatment. [This example is] just to give you all an idea of what the full extent of the file could look like. Report Data contain the two identifying fields (submission year and state ID) and general information about the report. If a report involves multiple children, the report data fields are identical on each record containing the same report ID. For example, if there were three children in the report, the data in the entire report data section would be identical for all three child records, except for the three different child IDs. Child Data contains general information about the specific child in the record. All fields in this section are attributes related to the child ID. Maltreatment Data, on the other hand, include information about maltreatment types and maltreatment disposition levels. Up to four allegations of maltreatment are coded with the decision regarding the allegation. The maltreatment death field is also included in this section as it is a contributor in determining the child victim status. For federal fiscal year 2018, in response to the Justice for Victims of Trafficking Act, a new maltreatment type of sex trafficking was also added to the Child File. Child Risk Factors contains data about the child's characteristics or environment that may place the child at-risk for maltreatment. This includes diagnosed disabilities, exposure to domestic violence, and other behaviors or problems such as drug abuse. Now, onto the NCANDS child variables. Caregiver Risk Factors contains data about the child's caregiver characteristics or environment that may place the child at-risk for maltreatment. This includes domestic violence, substance abuse, financial problems, and more. Services Provided contains information about services that are provided for the child or family. Post-response services are reported to NCANDS if they were delivered between the report date and up to 90 days after the disposition date. For services that were begun prior to the report date, if they continued past the report disposition date, this would imply that the investigation or alternative response reaffirmed the need and continuation of the services, and they should be reported to NCANDS as post-response services. Services that do not meet the definition of post-response services are those that (1) began prior to the report date but did not continue past the disposition date, or (2) began more than 90 days after the disposition date. Perpetrator Data contain information about perpetrators of maltreatment. Up to three perpetrators per child may be reported, as stated earlier. If the child was not found to be a victim of maltreatment, the perpetrator data section is left blank. The four perpetrator maltreatment fields for each perpetrator should be linked by the state to the four sets of maltreatment type and maltreatment disposition level fields reported for the child victim. Relatedly, there are new data collection efforts around sex trafficking. Additional Fields contain fields that were added to the Child File subsequent to its creation in 2001. Currently, these fields include AFCARS ID, incident date and time, report time, investigation start time, date of death, and foster care discharge date. For FFY 2018, the two newest fields that were added were plan of safe care and referral to CARA-related services in response to the CARA legislation amending CAPTA. For victims under 1 reported in the Child File with a medical personnel report source and child risk factor of drug or alcohol abuse, it is asked how many have a plan of safe care, and how many were referred to CARA-related services. We will now look at a few entity relationships in the Child File. We will see that a report can pertain to one or more children, and there is no limit to the number of children in a given report. So, for a child in a report, each perpetrator can be associated with each substantiated or indicated maltreatment. A perpetrator included in the record must be associated with at least one maltreatment, and a perpetrator can be associated with more than one report and more than one child in a report, as demonstrated here with this visual. The Agency File is the second largest grouping of the NCANDS file, and it includes aggregate data, typically not obtained from a single state data system or automatically extracted. The sources of Agency data vary by state. The Agency File is composed of 27 summary data elements, such as the number of children and families supported by the state's Prevention Services funding sources; information on child protective services personnel, average response time, and referrals of substance exposed infants by health care providers screened out; information on child victims and fatalities reported in the Child File, including family reunification and preservation services provided within the last five years; and fatalities reported in the Agency File are those that could not be reported in the Child File. Some child maltreatment deaths may not come to the attention of CPS agencies. To improve the counts of child fatalities, states consult data sources outside of CPS for deaths attributed to child maltreatment. So, that is an example where child fatalities will show up in the Agency File but not necessarily the Child File. So, this slide is here to show a visual of working in the data and how entities within the Child File can be isolated for individual analyses. So, you can see the various connections between the maltreatments, the child data, the perpetrator data, and how they can all be disaggregated to get very precise points of analysis. To provide you an overview of NDACAN an what our interaction with these various administrative data clusters, the National Data Archive on Child Abuse and Neglect (NDACAN) was created in 1988 to promote scholarly exchange among researchers in the field of child maltreatment. Additionally, NDACAN acquires data from leading researchers and national data collection efforts that makes these datasets available to the research community for secondary analyses. Lastly, our team likes to think of ourselves as less as a passive repository of information and more as an active resource for child welfare researchers and practitioners seeking to advance the field of child welfare. We provide such support through the Child Maltreatment Research List Serve, the Updata e-newsletter, and providing data analysis opportunities to researchers, including through the annual Summer Research Institute which just concluded last month. And we're really excited about the work we got to do with our participants. So, another thing to know about the data that we work with-whether it be this dataset or other datasets-is that there are several confidentiality protections. To protect the confidentiality and safety of individuals within this highly vulnerable population, there are a few routine alterations made to the data prior to dissemination. Date of birth, county of residence, worker and supervisor IDs, and incident dates are not included. For cases of maltreatment death, state, county, and all IDs are masked. To clarify, masking involves recoding to ensure that specific individuals are unidentifiable in order to minimize disclosure risk, and this [procedure] is particularly done for members of vulnerable populations. For example, if there is only one Hawaiian or Pacific Islander child in Renkford County, South Carolina, all the race and ethnicity variables for that child will be recoded to Unknown. The Report Date is recoded to the 8th or 23rd of the month, and all other dates are recoded not to necessarily to march the exact dates [of the incident(s)] but to match the time span since the report date. In this research example from NCANDS from Frank Edwards in 2019, the study found that police file more reports of child abuse and neglect in counties with the highest arrest rates and violence. Moreover, Edwards found that policing helps explain high rates of maltreatment investigations of American Indian-Alaska Native children and families. Here is the citation for this research example. If you are interested in learning more from Frank Edwards, he is also a Research Associate here at the National Data Archive on Child Abuse and Neglect. [He] has done a lot of work with the Summer Research Institute, and he will present on August 5th of the Summer Training Webinar Series, where he will speak to another research example using linked administrative data. So, [we're] really excited to seeing his presentation, and [we're] looking forward to hearing more about this work. The next largest cluster that we'll address is the AFCARS. The Adoption and Foster Care Analysis and Reporting System is a federally mandated data collection system intended to provide case-level information on all children who are under the placement, care, and/or responsibility of a title IV-E child welfare agencies and children whose adoptions were finalized during the federal fiscal year. States document information pertaining to these groups of children in their electronic case record system, compile the data, and send it to the Children's Bureau. The Children's Bureau then works with states to correct errors, which is why it possible that you may receive a notification from NDACAN about an updated dataset if you have previously used an AFCARS dataset that was corrected. These data are organized by child. AFCARS includes several variables such as demographic, removal, placement, and other case related information, totaling 37 adoption data elements and 66 foster care data elements. A few variables I will highlight include caretakers of the child, self-identified race information on both the child and foster parents, the number of placements a child has had, and dates of termination of parental rights just to name a few. Dating back to the 1990s, AFCARS files can be "stacked" and used for panel data analyses. When working with more year of the Foster Care File, there will be duplicated AFCARS IDs, which has the variable name "StFCID." Across time, a child has a record for each year that they are in the foster care system, and if in your research, you desire to resolve data to one row per child, one piece of guidance is to keep the most recent year of the AFCARS ID. In this research example from Wildeman et al. (2019), the authors used a synthetic cohort life tables and data from AFCARS to generate cumulative prevalence estimates of the termination of parental rights. Their results showed that 1 in 100 U.S. children will experience the termination of parental rights by age 18, according to estimates from 2016. This risk was highest in earliest years of life and among Native American and African American children compared to white and Latinx children. Finally, the authors found dramatic variation in loss of parental rights across states, and ultimately assert that their research suggests that the termination of parental rights is far more common than often thought. And here is the citation again just for reference. So, Dr. Christopher Wildeman is currently the Director and Co-PI of the National Data Archive on Child Abuse and Neglect, and he's currently based at Duke University. We also the work of Frank Edwards in this citation as well as the work of Sarah Wakefield, another really esteemed criminologist, working with this administrative dataset. The final of the administrative cluster datasets is the NYTD dataset. The John H. Chafee Foster Care Independence Program also known as CFCIP was initiated in an effort to improve outcomes for youth in foster care who are likely to reach their 18th birthday without having found a permanent home. The program provides funding to states to develop and administer programs designed to help ease the transition from foster care to independence. The law that created this program, as you can see on this slide, also requires states to help develop 1) a system for tracking the services provided through CFCIP, and 2) a method for collecting outcome measures so that the effectiveness of the program can be assessed. These two components together form the National Youth in Transition Database (NYTD). The files contain case-level data on services and outcomes for youth likely to age out of foster care from all 50 states, as well as from the District of Columbia and Puerto Rico. Beginning at age 17, youth are surveyed on a voluntary basis every other year until age 21. Hence, the major difference in data collection between NYTD and other administrative datasets is that it is survey-based with earliest cohort beginning in fiscal year 2011. It's also the newest of the administrative datasets. In addition to demographic information such as date of birth, race, sex, delinquency, and education level, NYTD datasets include several variables under the groupings of services and outcomes. Examples of services variables include academic supports, career preparation, budgeting, and financial assistance. Outcomes variables include financial self-sufficiency, educational attainment, high-risk behaviors, and access to health insurance as well as incarceration. As with all of the datasets covered today, I want to emphasize that these lists of variables are not exhaustive. And, if you would like to further explore them, I would highly encourage visiting the NDACAN website. I believe Andres just shared a link to some of those resources from our website, where you can access the User's Guides and codebooks for more detailed and nuanced information. In this slide, you can see the beginning of data collection efforts for the first three cohorts in fiscal years 2011, 2014 and 2017 respectively. After the initial survey is administered, the next waves are administered in two and four-year intervals. Not shown in this particular slide is the end of collection for Cohort 3, which is scheduled for fiscal year 2021. Additionally, Cohort 4 begins their initial survey this year in fiscal year 2020, and Cohort 5 is scheduled for fiscal year 2023. So, when it comes out outcomes of the NYTD dataset, the baseline population include all youth in foster care who reach their 17th birthdays in fiscal 2011 or in every third fiscal year after 2011. The cohort of youth eligible for follow up at ages 19 and 21 are a subset of these baseline youth. I will go into details about what qualifies someone as an eligible person versus an ineligible person. To be eligible at age 19, youth must have participated in the survey within 45 days of turning age 17, been in foster care at the time of taking the survey, and answered at least one survey question that was not 'declined' or 'not applicable.' Again, this is [referring to] taking the survey at age 17. To be eligible for a follow up at age 21, participants must be in the follow-up population at age 19 and not reported to be deceased at age 19. An additional caveat to notes is that for states that opt to sample, only youth randomly chosen to be in the sample are included in the follow-up population. Between fiscal year 2011 and 2018, there were roughly 1000,000 young adults have received independent living services each year. From that population, between 60 to 70% of those eligible to be surveyed have participated in first three cohorts between 2011 and 2017. The sampling frame...did someone say something? Okay, the sampling frame for the NYTD datasets are all youth who participated in the survey at age 17, and the sample size consists of a simple random sample at a 90% confidence level plus 30 % attrition. If you have more questions about the sampling of the NYTD dataset, please visit the appendix referenced on the slide. Eligible sample states include any state where a calculated sample size plus 30% would not be larger than the number of baseline youth in the state. The first cohort consisted of 12 states, and the second and third cohorts consisted of 15 states with some overlapping with the first cohort and some not moving from the first cohort to the second and third cohorts. When working with multiple years of NYTD data, for services data, each year has two reporting periods which are the A and B halves of the federal fiscal year, and so a youth may appear up to two time in each year. For outcome data, a helpful thing to know is NDACAN provides these data in a "long format," meaning there are many rows in the organization of the data often due to variables that are capturing sequences like time-series data. However, some analyses will require researchers to reshape the data into what is known as a "wide" format to make it a lot easier of a process. So, when it comes to waiting for non-response while using the NYTD dataset, there are subgroups that are over-represented among respondents, and they, therefore, need less weight. Those that are under-represented need more weight. Subgroups can have any characteristic that is known for both respondents and non-respondents, and usually with surveys, only a few characteristics of non-respondents are known. One thing to note is that most people [have found] that results are quite similar between weighted and unweighted estimations. NYTD data have weights that correct for any potential non-response bias. Variables included in this are sex, race, and Hispanic ethnicity. There are 32 variables from AFCARS were used in the weighting at Wave 1 of NYTD. 42 variables were used at Wave 2 and Wave 3. Again, the there pretty similar results between the [weighted and unweighted] estimates. In this research example from Watt & Kim (2019), the authors use NYTD to examine educational attainment, employment, homelessness, and incarceration for white, African American, Hispanic, and American Indian/Alaska Native emancipated youth. Their results showed that African American youth were less likely to be employed and more likely to report incarceration compared to white youth. However, African American youth were also more likely to enroll in higher education institutions than white youth. They found no significant differences between Hispanic and white youth. Perhaps the most shocking of their findings was that American Indian/Alaska Native youth were significantly disadvantaged with all outcomes relative to all other racial/ethnic groups. So, again, this just another research example and citation for reference if you would like to read further into this study. But that just been an overview of the administrative data cluster here at the National Data Archive on Child Abuse and Neglect. Thank you for listening, and I'll pass if off to Erin. [Erin McCauley] Thank you, Clayton. That was a wonderful overview of our three administrative data sets. I know for newer users of our administrative data that was a lot of information, and we'll move soon to our dedication Q&A time where Clayton, along with our panel of experts, will be available for questions. I also want to point out that there fewer four questions that came up in the Q&A box. Thank you for putting the questions in there. I went ahead and answered one of them, which was about the video series. I will leave the other three [unanswered] for now because I want to bring them to the whole panel. Before we wrap up today's presentation, I wanted to take a moment to start the discussion on linking which is going to be our presentation for next week and preview some of the base data management practices needed to link. With these three datasets fresh on our minds and the potential of the linked power of these datasets, I thought it would be a good moment to preview what's coming next week, especially if people want to play around a little bit and come with questions next week. Michael Dineen, who is on this call with us today, is going to be one of our Q&A panelists. He will be able to answer questions today, and he will be leading the entire session next week. He is the data analyst for all three of these datasets, so he brings a lot of expertise. p First, there are a few big benefits of linking I want to emphasize before getting into the nitty gritty and potential headache of linking today and later next week. Administrative data, as Clayton said, are data which were collected for non-research purposes, typically as part of administering the programs from which the data come. While they are not intended for research when they are collected, at the archive we prepare them for use by researchers. We then publish these data products, along with guides and supports to help child welfare researchers really take advantage of this data that is already being collected for other purposes. These data provide us with the opportunity to explore population level questions about child welfare, evaluate the effect of various policies and socio-cultural changes and programs over time, and when linked, even track a particular youth through the entire child welfare system, which allows to ask and answer interesting questions. Linking this administrative data builds on the existing strength of administrative data. It really a great breadth of data. These population and system wide data allow researchers to ask unique questions, such as the research presented last week by Dr. Alex Roehrkasse, where he explored the macro-levels factors which can help to explain children's risk of substitute care and how the importance of various factors changed over time. I know personally I was very impressed with Alex's presentation last week and the potential of the new historical data acquisitions. For those of you who could not be with us, we will have a video series [on our website] at the end of the summer, I highly recommend checking it out. These data provide details and context to individuals' data in each system. For example, if you've been with us from the start three years ago at the first Summer Training Series, I presented a research project examining disability and social exclusion in the transition to adulthood for youth who age out of foster care. I have continued to build on that project since them, and I am now looking at how knowledge about youth with disabilities experiences during foster care and with child protective services effect our estimation of the association between disability status and type and indicators of social exclusion. It turns out that the background information about those prior experiences plays a significant role in estimating the relationships between disability status and type and social exclusion in the transition, suggesting the importance of being able to see a youth in the entirety of their experiences. Thank you to the person who said I was talking too fast. I just got excited, and I promise I'll slow down. Last, these data allow provide population all-encompassing data for specific populations. In our final series session of this series Dr. Frank Edwards, he is going to show how the utility of this feature, allows us to explore inequalities in child welfare system contact using linked data. So, that is something to look forward to. While these data have unique benefits, which are emphasized when they are linked, the process of linking data can be clunky and take a bit of time. So, I also wanted to talk about these barriers to linking as well as long as we're talking about the highlights. The difficulty in doing the linking is well worth it. We have resources and supports to assist you in linking the data, such as next week's presentation by Michael [and] previous summer series presentations as well. Michael has been with us every summer leading the series in how to link these data in different software [packages] for analysis. We also have PDFs and video guides on the website that Michael had a big part in creating that can help guide you through this. Moreover, we can provide one-on-one linking support through the Summer Research Institute, which is something that Clayton mentioned earlier. So, if you're interested in applying to participate next summer, check it out! The Summer Research Institute, or SRI, is an annual online distance learning experience in the secondary data analysis of child abuse and neglect data. Participants are selected on a competitive basis and we provide organized sessions and one on one support with statisticians, data use experts, and data analysts to help people move through their analyses. I also see that we have some folks from this summer's SRI on the call with us today. So, that's really exciting for us. And a handful of our staff, including our director Dr. Chris Wildeman, have been past SRI participants. So, if you're interested in using the administrative data and you think you might need support, I really suggest checking that out. We announce the call for applications on our website, the listserv, and on Twitter. Next week we're going to be discussing how to address differing data structure between the datasets when Michael does his presentation on linking. As Clayton mentioned, some datasets are organized by child [and] others by report. We need some basic data management practices, such as collapse and restructuring data to help us get all the data into the same structure. Because these data were collected for non-research purposes there can be entry or recording errors. Because of this, we try to add in a few checks to make sure that the data are being correctly linked. We often do a more conservative match, meaning that we are more certain that these youth are the same youth, but we may be [excluding] more youth. So, Michael will be talking about decisions like that next week. And Frank will be talking about when he used this himself the week after. I'm also going to talk about [the] base data management skills needed for linking, which, again, we will also have the opportunity to discuss more later. There are two really important commands you need to know in order to link: collapse and reshape. So, collapse allows you to combine variables. For example, if you're interested in the number of child protective service reports by state per month for a year, you'll need to collapse those variables, so that you have one entry with the number of reports for each state [in] each year. This can be helpful in examining population level trends and for the purposes of linking, you might need to collapse data by the child so that you can see their experiences. A second required data management skill is reshape, which allows you to re-organize data structure. Some data is in long format, which is where the rows are not uniquely identifying, and others are in wide. For example, in the NYTD data, as Clayton mentioned, you could view it as long format, where for each year there is an ID and those youths' reports for wave 1. In a different one, there will be the next year, year 2, that same youth ID, and how they responded. So, instead of seeing the youth three times for the three waves, you can reshape that data so [that] each line is the youth. Then you have the columns that are wave 1 incarceration (yes/no), wave 2 incarceration (yes/no), and so on. If you're interested in linking and want to bring specific questions, I'd looking into these commands before next week. And last, once you've mastered these data management strategies, you can rearrange and reshape the data as you wish. There are some other to help you in analyzing the data. One thing about working with administrative data, is that the data files are extremely large. This is even more true when you start linking multiple databases together. Additional data management strategies can help support the analysis of such large datasets. This includes grouping or summarizing (such as by group or across time). You can also add population level data (such as state characteristics for things like income or social welfare policies). This is similar to what Alex did in his presentation last week and Dr. Frank Edwards did in his presentation about policing and child welfare last year. Last, as this data file becomes larger and larger, it can take an enormous amount of data processes sing power and time to run even basic analysis code. To help conserve data processing power and time, one thing I typically do is draw random samples that allow me to practice my code and make sure everything is correct. I then refine my analysis strategy using less of the sample, and you can then apply it to the rest of the sample once you're happy with your model specification. We're going to continue talking about these things next week, but I wanted to give time to start answering those questions. Clayton and myself led the session this week, and-because we covered so much information this week-our Q&A panel will also have Tammy White and Michael Dineen. Again, Michael will be leading next week's presentation. So, I think at this moment, I'll just hop over to the Q&A box and start going down them and have our panelists respond. So, our first question is, "I understand that worker and supervisor IDs are not included, is there other information provided to track differences between worker and supervisor cases and outcomes?" Michael or Tammy, do you want to...Michael, yes go ahead. [Michael Dineen] This is Michael. The answer is no. There is no other way to track a worker or supervisor. That's in the NCANDS data where [the participant] is talking. That's the only place where that data would be, but it is not included in the public file. And there is no other way to do that. [Erin McCauley] There is no way to that with our administrative data, but if, [the participant] goes onto our website, I know that our survey-based data has information bout worker IDs. [Michael Dineen] The NCANDS Agency file has a count of workers. So, you could get caseload that ways, but that's the only piece of information you could glean-the average caseload for the whole state. That's my best answer for that question. [Erin McCauley] That's great! Thank you, Michael. Next, "Is there information in NCANDS on whether the primary caregiver is a perpetrator?" [Michael Dineen] Yes, there is. There are two variables. The perpetrator's relationship to the child includes parent/primary care giver. There is a second follow-up variable that breaks down the caregiver as a parent, adoptive parent, live-in partner of the parent, [and] in-laws. These are types of parents. So, the main breakdown is the relationship with [the] child to the perpetrator, and the second variable breaks down the type of caretaker [the perpetrator] is. So, those are two variables in [NCANDS]. [Erin McCauley] Great, thank you Michael. As always go onto our website and look at the codebook. It will have everything Michael just told you up there. So, another question: "Can you say more about when race/ethnicity is coded as unknown? For example, in my state the AFCARS data has a lot of unknown [cases]." [Michael Dineen] If they don't answer a value for the race variable, then the value would be unknown. That's the answer. [Erin McCauley] So, there are a lot of different reasons [for a variable to be unknown], but if it's unknown, we don't know. [Tammy White] This is Tammy. That is true there is an actual way to code unknown in the AFCARS dataset to choose for states, and there is guidance in our technical bulletin on the Children's Bureau website. Technical bulletin 1 gives the definition of how you code for that race code. It's usually from the caseworker who doesn't know the race of the child because the child was either abandoned or too young to self-identify. There is a definition in our technical bulletin 1 as well. [Michael Dineen] Well, the parents identify the race of the infant. If a parent is not available, that could be [an explanation]. There are also two kinds of [unknown race categorizations]. They could be explicitly unknown with a value such as 99, but there is also just not answering the question, which is a different kind of missing [value]. [Tammy White] Oh, and then you can't recode missing to unknown? Because we don't do that. I didn't realize that you all (NDACAN) did that. [Michael Dineen] It would be called null or empty-no value. That's a different kind of unknown. It's more ambiguous, so it's not explicit. That can be problematic with certain variables, but with race it just means that they didn't answer the question. So, you presume it to be missing. [Erin McCauley] Thank you for those [answers]. There's one question that is a follow-up to a previous question, so I'm going to move to that one. "As a follow-up to the last question, then variables related to caregiving characteristics in NCANDS, such as disability statuses, can be related to whether the perpetrator is the caregiver described in those variables?" [This comes from] Wendy, [who] is about halfway down on the open questions. [Michael Dineen] Well, no. You don't for sure whether that particular perpetrator, even though they're coded as caretaker, is the caretaker that has the disabilities. It could be a different caretaker. There is no hard connection between those two. They're not under the perpetrator category. They're under the child variables. So, you can't guarantee that it's the same person. If the person who is asking the question [needs further clarification], feel free to ask a follow-up if I wasn't clear in my answer. [Erin McCauley] Yes, thank you, Michael. So, another questions say, "I plan to link AFCARS and NYTD outcome files, and overall I think most variables are available in AFCARS each year such as the age at first placement. But I would like to control for whether the youth were ever placed in group/residential care, and I'm not sure about the best way to go about doing this since the question asked in AFCARS only refers to the current placement." [Michael Dineen] There are techniques for that, but it is dependent on how you're using your data [such as] what software program you're using. So, I use function like "max" or "min." It's usually a max or a min if the codes are "1/0" or "1/2" to indicate "did or did not do something." You might have to a little [data] preparation. If 99 is a value for missing, and you want to a max of the valid values, then you might have to recode the 99s to -1 or something like that. So, it can be done easily with those kinds of workarounds. [Erin McCauley] Great, thank you. The next question says, "Will you be sharing more details on merging long and wide [data] for the next session?" So, I will just say first that in my experience that you have to restructure the datasets, so that they're all in the same structure. But Michael may have more to say on that. [Michael Dineen] [The data] need to all be on the same level-almost always the child level. So, you have to restructure the data so that you have one row per child like Erin was describing about NYTD. There are three waves, and the waves are stacked. And then you have to make them long. You don't want to restructure the demographic variables. Let's say there a highest level of education variable, so there would be highest level of education 1, 2, and 3 as columns in a table where the rows are one per child. [Erin McCauley] Precisely, thank you, Michael. Our next question is, "Which specific datasets and/or items in the administrative can be used to assess the children in family service review measures?" [Michael Dineen] Well, I think Tammy may know more about that because I'm not an expert on the child or family services review. [Tammy White] I believe, and I would have to double check, but we have on Children's Bureau website a public portal to get the syntax for how we do those measures. If not, there just be link on that website for where to ask for help and ask questions, and someone will refer that [question] to our data folks who run those measures. You should be able to get the syntax for how we calculate those [measures]. It looks like Devin Malloy has some information as well. That may be a participant. [Erin McCauley] That's just me marking that we're answering the question. [Tammy White] Oh, I thought that was somebody from the state who had more information. I guess the best I can do is [say that] we do have the syntax and you can duplicate them. Many states do. Many states run their own measures rather than wait for the profiles from us. So, you should be able use of those resources. [Michael Dineen] So, why does it say that Devin Malloy is trying to answer the question? [Erin McCauley] So, I'm on two computer right now, and I'm just marking them as live [answers]. So, we have some more question coming through the chat. If you're asking questions in the chat, please put them in the Q&A [box] so that we can go through them one by one. Our next question is, "Can you please show the NYTD slide with states available in each cohort? I'm interested in how many states are available across cohorts." So, what I think you're referring to is the slide that highlighted which states are using sampling after the first wave. [Michael Dineen] Do you want answer that one, Tammy, or should I? [Tammy White] Sure, either way...Michael you can start and I can certainly add on. As Erin noted, every state reports information to NYTD on the youth who complete the survey at ages 17, 19, and 21. For states that have large baseline populations, we have given states the option to sample a follow-up population. For some states, if they've got thousands in their baseline, they don't want to follow up with thousands, so we allow them to sample. Michael can tell you where you can get the names of those states on the website. There are 15 states that choose to sample on a follow-up in waves in 2 and 3. So, that's probably where you'll find that information, but every state has all cohort information. But Michael you can follow up on that. [Michael Dineen] My understanding is that the criteria that were shown were met by all states so that all states had sufficiently large samples to be eligible to sample, but not all states chose to sample. [Tammy White] For some of the smaller states, that wasn't true. There were some states that had very small baseline population numbers. Actually, on the website under the National Youth in Transition Database reporting system link menu at the top of the Children's Bureau website, we have some NYTD outcomes and services reports. There are tables, and in that table, you can see the number of youths who participated and the participation rates. Michael, I think you also report a table like that, but you can see the numbers. There were just a handful of states that were not eligible to sample, but in theory, you're right. Almost all of them were [eligible to sample]. [Michael Dineen] Yes, almost all of them [were eligible to sample]. So, that list of states wasn't a list of states that met the criteria. It was a list of states that chose to sample. There is a variable in the dataset called, "sample_state," where it is 1 or 0. So, you can see whether or not that sampled. But to reiterate and underscore, all states are in the dataset; there are no states that are not in the dataset. [Erin McCauley] Great, thank you. Next question, "During the presentation, they said race/ethnicity is coded as unknown to make sure they're not identified. Can you explain that further?" I'll take a first stab and say that's referring to a county where a [number] such as two youths have that [particular] race or ethnicity. It's not a general rule; it's just based on the size. [Michael Dineen] That's a think we do only in the NCANDS data. It doesn't apply to AFACARS or NYTD. When I say NCANDS and Child File, those are synonymous for the purposes of this statement that I make. So, the Child File has a higher level of security than the others because of perpetrator information and child maltreatment, which is a hotter topic than just being in foster care. So, in the Child File, if there's a county where a child or a perpetrator is the only one in that county of a particular race, then we would hide their race information. We would turn it to "unknown." That's what we do and why we do it. I hope that answers your question. [Erin McCauley] Thank you, Michael. And there's always more information [on our website]. Michael writes our user's guides on our website, so I really recommend checking it out there too. Our next question is, "I will need to restructure the dataset as you discussed and I plan to use all three waves of NYTD, but I did see that it is stacked by wave. I'll be using SPSS. Can I get support on how to structure this by individual so that I only have on record per respondent?" [Michael Dineen] Well, that is exactly next week's education, so I'll be presenting SPSS code that restructures the data. So, if you come back next week, you'll see that, and it eventually will be posted. So, you'll be able to reproduce the code itself. [Erin McCauley] Great, thank you for that. Going off of that, here is our preview for next week. Michael is going to be our presenter. We're going to be talking about linking this data. I know some people may have to go as we get close to the end, but we'll just keep answering questions for the next two minutes. "Has your department been able to collect any data of SIDS after COVID-19? Unfortunately, the increase in child abuse is not reflected by the number of reports CPS is getting. I'm hoping to look at it from the angle of potential increases in infant deaths and SIDS, however, I'm having trouble finding recent data." So, we have a pretty big delay between. The data is collected and when it is released by us for researchers. I know Michael can speak more to that. [Michael Dineen] We just received the AFCARS 2019 data this week, and we're on fiscal years. The last day of that year was September 30, 2019. So, that's about 8 or 9 months until we got the data. We have to 2019 Child File, but we're not allowed to make it available until January because we have a rule that we can't make it available until the Child Maltreatment Report is published. So, the 2019 data would be available January 2021. So, that's about a year and three months from the end of the fiscal year. So, that's the delay we're getting. Since COVID started in January 2020, it probably won't be available until mid-2021 for the AFCARS and January 2022 for the Child File. I'm just thinking out loud, trying to figure it out. So, I hope that was clear enough. [Erin McCauley] So, I have a quick question for Michael. Someone posted a question if we could post any code in SAS. I can't remember. Last year do you remember what software you used in the linking presentation? Was it SAS? [Michael Dineen] No, last year was Stata. This year is SPSS. I'm hoping to do one in SAS or R next time. [Erin McCauley] So, we rotate every year, but unfortunately, we will not be able to do SAS next week. We'll be doing SPSS. We have one last question. If people have to go, feel free to leave. Another question related to NYTD, "Has any attrition analysis been conducted across waves of cohort 2 to determine if there is any selection bias across waves?" [Tammy White] Michael you can speak to this as well. The weighting that is done with the cohorts takes into account whether you followed up with a response of an [eligible youth]. So, the weights do incorporate that nonresponse piece. Specifically, I haven't done any individual analysis, and as Michael and the NDACAN alluded, there really isn't much difference between weighted and unweighted results. Part of that I think is because the variables we capture don't necessarily capture the variable of why the response rates would be so different. I think it probably more at the state level and the methodology level. That's information we don't' capture. Michael, do you have anything to add? [Michael Dineen] Some generalities around that: one is that the weights that go along with the data are weights from 34 variables. It's possible to make your own weights if you have a specific hypothesis about overweighting or underweighting. You can create weights yourself because you have access to demographic information on all the participants. If you link to AFCARS, you get information on comparing the respondents to the non-respondents. You can get the cohort of people who were 17 in the foster care file. So, you can make your weights for specific questions about something being weighted. A second item I wanted to say around that is...." [Erin McCauley] While you think of it, I'll take a moment say that in previous Summer Research Institutes (SRIs), creating your own weights has been a topic that Frank had a lot of questions on and I believe Michael was also consulted. So, if that's something you're interested in doing, I would definitely encourage doing it. But if something you would need support in doing, it would perhaps be an excellent SRI application for next year. [Michael Dineen] Okay, now I remember. I think, and Tammy correct me if I'm wrong, in Clayton's presentation, he said that you had to have completed the wave 2 (age 19 follow-up) in order to be eligible for the age 21 follow-up. My understanding is that is not the case. A child could skip the age 19 follow-up and still be eligible for the age 21 follow-up. [Tammy White] That is correct. As long as you're eligible to be followed up-you completed the survey within 45 days of your 17th birthday-you can take [the follow-up survey] at age 17, 19, or both. [Erin McCauley] Great, thank you for noting that. I think that wraps up our presentation for today. Sorry that we went a little bit over [time], but it was an excellent conversation. I hope that the attendees that are still with us will be joining us next week. Michael has an excellent presentation lined up. Such a big thank you to Tammy, Michael, and Clayton for doing these presentations and moderating this expert Q&A panel today. This has been really helpful for everyone, and I look forward to seeing everyone next week. [Michael Dineen] Thank you everyone. [Tammy White] Thank you. [Clayton Covington] Thanks.