National Data Archive on Child Abuse and Neglect. [Erin McCauley] So welcome to summer of NYTD! For those of us who did not join us last week welcome to your first session for those of us who did join thanks for coming back. And this is a summer training seminar series by the National Data Archive on Child Abuse and Neglect. We are housed in the Bronfenbrenner Center for Translational Research at Cornell University. And here is kind of our outline of a summer training series. I'm actually going to be the presenter for today's session my name is Erin McCauley and I'm a graduate researcher at the Bronfenbrenner Center with the data Archive. And I'm kind of the host of the series so you'll hear from me every week but this week I'll also be the presenter. Last week we had an absolutely wonderful introduction to the NYTD data set from Telisa Burt and Tammy White the NYTD project manager and the NYTD data analyst from the children's Bureau. We were really lucky to hear from them that definitely bring a different perspective it was really nice to hear their introduction. And then today are going to be talking more about the data structure and send just a preview moving forward Michael Dineen was kind of our resident NYTD expert will be leading the next three sessions, co with Frank Edwards for next week. So today were going to be talking about how to download the data and then kind of the overall layout of the data and then Michael is going to be getting more into the nitty-gritty for the next three weeks kind of working through the common issues that we hear from data users. So I'm expecting those to be particularly good presentations. Really looking forward to that so this time I'm going to be the presenter here's my contact information and you can ask me questions via email about this session are frankly anything about the series. And then here's our agenda for today. So I'm going over exactly how we download the data because for anyone who wants to use it that's fairly important. Then I'm also going to talk about the structure of our data so we have service and outcome files and then I'm also going to talk about the available extra available resources through our data archive because we really strive to kind of go beyond just where people can download data and we really want to engage with our users. And so I think we've got some really good resources that you guys will like to hear about. So just hopping right into how to download the data. So first you're going to go to the NDACAN webpage at NDACAN.Cornell.edu and this is kind of your first step for accessing any of our data sets our user's guides, codebooks or any of the additional resources were going to talk about today. I know we also got a couple of questions last session about how to sign up for the listserv that I've been talking about and so our listserv is where we send out kind of like informational blasts about things like this training seminar. It's also where we where we put when we have new data releases. And then we also talk about other resources available to data users. We have summer research Institute and so when the application is released for that we send it on the listserv so it's really good to be associated with it. And you can do that through this website. And so were kind of be going back and forth to this page and then screenshots of how to download it and so when you go to the the NDACAN website this is what you see. To download our data you're going to be looking at the data sets tab which I have little kind of gray green box around. And so you can see here it's the third tab from the left and then once you've hit that still have to select the data set that we're talking about today which is the NYTD data set the national youth in transition Database. And so once you click that there's two areas to click it. When you're on the data sets across the top we kind of have our core data sets in quick links so you can click the NYTD there. You can also see that in the kind of list area across the bottom we have all of our data sets and the the NYTD data set is right there as well. So either option will take you to the same place and then there are three options we have a service file, an outcomes for cohort two and an outcomes for cohort one and then I have the data set numbers there as well so when you click NYTD you end up here and this is kind of our home base. You can see that there's data set numbers on the left side and filenames with cohort and wave specifications. So a little bit later in this presentation we'll kind of talk about what these different files mean, what's included and the different data structures. But for now if you select any of these data sets you'll be brought to a data set details page which looks like this. And in the upper right-hand side there's a list of links and one of them is for ordering the data it's that top one so you'll do this for any data set that you would like to download from the archive. There's also instructions for sending a question about the data set and then returning to the data set list. I didn't picture it here but across the bottom of this page there's also the codebook and user guide as well as a link to publications about this data set cluster. And so to download a copy of the data you click that link in the upper right-hand side. And this will bring you to instruction page for how to download the data. And so the first step will be to sign up for the mailing list and then we also have a digital terms of agreement PDF for each data set. And so for each data set that you'd like to receive you kind of fill that out and then there's a whole thing of instructions but you rename it and email it back in to us. And then within five days of submission you'll receive an email invitation from box.com to login and download the data. So we'll kind of place the data in this secure box for you and then you have 10 calendar days from when you receive the invitation to download the data. And then after 10 days we remove the data from box.com. And then we also have kind of a separate structure for classroom instructors who wish to order the data set for use in a classroom. So there's just a different terms of use agreement that the entire class can sign. And then if you do want to use it for classroom use we kind of ask that the professor, not the students is in charge of managing the download process and then contacting us for technical support and questions as needed but it is a a fun data set to play with in a classroom setting. So that's kind of summary for how to download the data. And so now will kind of be discussing the data structure and the various data files that NYTD contains. So as Telisa and Tammy told us last week the primary investigators are the Children's Bureau in the Department of Health and Human Services. So data was first reported and collected through the John H Chafee Foster Care Independence Program and in 2010 states began collecting NYTD data and submitting data in 2011. And the general idea was to have data to evaluate the effectiveness of independent living programs and to just generate more information about those who age out of foster care which could be used to develop new services and to help people go through that transition. But we could also use this to kind of create more effective services. But there are basically two types of files from the NYTD data. We have services file and an outcome file. So the service file is a cross-sectional file where information about services provided and the youth who received them is collected. And then the outcome files follow cohorts who age out of the foster care system to collect information about their well-being, financial, and educational outcomes. And so this file structure that we use kind of reflects the goal of the Foster Care Independence Program which is to improve outcomes for youth in foster care who are likely to reach their 18th birthday without a permanent home. And so this way the data can be used to evaluate the effectiveness of the services provided through the Independence Program for the outcomes that are related to health, well-being, and generally success for former foster kids. So first were going to discuss the services component and so this is one file that contains cumulative data since the first reporting in 2011. And every six months it's updated on a continuing basis and this data set is number 214. And when we're looking at the NYTD data sets it is the first on the list. The service variables concentrate on youth information and then also kind of the services that were actually received by youth which were paid for through this system. And the youth information includes demographic information and then also the year and date of the report and they also have information about delinquency and education. And the service variables reflect as I said the actual services received that were provided with funding and they're all dichotomous. So we'll look at the codebook in a second but it's basically just a yes or no if the youth received services funded through this Independence Program in that time period. So here is a screenshot of the codebook for the service variables. The highlighted box at the top reflects the youth information in the box at the bottom has kind of these variables that were created by the archive such as a recoding of race or ethnicity variables into one variable. You'll as you'll kind of notice as we go through a lot of the different data sets will have the demographic information in the other files like the outcomes file also has demographic information such as race and ethnicity and date of birth but this one also has sex, location, their a tribe affiliation, educational level, and then delinquency information. And then in the middle here this middle box has the services. And so there are more in-depth descriptions of the services included in the codebook but generally the variable tells us kind of what you know which area the service is related to so there's like financial ones there's housing ones, health-related ones and then kind of more like academic employment related ones. And so were going to look more specifically at two codebook entries that are both related to academic support services. And so this is academic support services and this specific variable addresses if you have had service provision funded by this Foster Care Independence Program related to completing a high school diploma or equivalent. And so as you can see there's kind of a specific list of things that are included and this one actually also has what's specifically not included so things aimed at getting the youth general attendance in high school isn't included, but things like literacy training, help with homework, tutoring, GED application and so it's the high school finishing high school or the equivalent and so the the GED. And then you can also see kind of our data type, the element number, and then how it's coded and so as I said the services are all dichotomous, there's a yes no. And then we also have 77 missing. And then you can see here that we also have one this is also education related however this is looking at a post-secondary degree instead of completing high school. And again there's a list of services that are included we have the data type and the element number, and the coding is the same across the services so for all of them we have and no, yes and a blank. And so that is kind of a summary of the services. And then next will be talking about the outcomes component and there's kind of two outcomes files we have currently for two different cohorts. Now these data are collected from youth to examine well-being, financial, and education outcomes during this transition from foster care to independence. In each data set there are three waves of data. The baseline is conducted during the year of the youth's 17th birthday and then the survey is conducted again when the youth is age 19 and 21. And we currently have one full cohort of data for all three waves and another ongoing cohort and that's pretty close to terminating. So last week we heard from Telisa and Tammy about the cohort in wave structure but here's a general summary wherein we establish a new cohort every three years. And then each cohort has three waves of data so cohort one was age 17 in 2011 and has three completed waves of data and then cohort two is age 17 in 2014 and we're almost complete. So we published waves one and two and we are currently working on a wave three. And so there are two outcomes files, one for cohort one and one for cohort two. They're data sets 214 and 202 and then we access them under that's initial data sets page for NYTD. And you'll order them the same way just once you click the data set at the upper right-hand corner there's a link to download. And so here's a summary of the outcome variable areas again we have youth information and then we have outcomes related to well-being, financial information and educational information. Youth information includes demographics and reporting so if like the youth responded to that wave of data collection. And then we also have well-being which looks at topics such as experiencing homelessness, substance abuse, incarceration and information about childbearing or fertility. There's information about financial services or like financial outcomes so things like employment, public assistance if they are receiving housing assistance and then last we have educational information. So just highest certification completed, educational enrollment, things related to that. And so here is a screenshot of that codebook. The layout is similar to that of the service file and obviously it's longer but the created variable is at the bottom so we have things like the race in the race ethnicity recode we have youth is in the sample, we have if youth responded to at least one survey question so you'll see a lot of these are kind of related related to what Telisa and Tammy were talking about last week. And the rest of it is fairly self-explanatory you'll have the variable name and then the variable label and then obviously more in-depth will be that specific entry in the codebook just like last time. One important thing to note is that some of the outcomes they kind of vary in meaning between wave one and the later waves. So we'll talk about this and specifically the variable for homelessness to see these differences. And so you can see here that we have a definition but then we have to bullet points talking about wave one and then the second bullet point is talking about later waves. And so for the baseline interview for wave one again this is when the youth are kind of in their 17th year it asks of the youth have experienced this phenomenon so in this case homelessness ever in their lifetime however the follow-up questions are if the youth has experienced whatever phenomenon we're talking about in this case homelessness in the last two years. And so it's important to note that we should treat these differently so you can kind of look at if if youth have experienced homelessness at any point in their life but you can also control for pre-existing exposure or experience with homelessness and then look at kind of predicting homelessness after aging out. And then again at the bottom you can see the coding we have no, yes, declined, and if it was left blank. And so kind of how you treat them that missingness is is up to you and we'll also be having a presentation next week about missingness in the data. And I am going to go through some examples using the data. I just kind of calculated these using list-wise deletion which isn't how I would deal with missingness if I were doing a study or anything so I just want to throw that out there and but I just wanted to walk you guys through what the data looks like and then how we can use those differences especially in the outcomes file. So first I am looking at the outcome file first since that's one we just talked about and I used cohort one because we have the whole three waves. So if we look at these demographic characteristics we can see, that the sample is about 48% female and 52% male. And next if we look at the racial and ethnic breakdown of our sample we can see that the sample is majority kind of white black or Hispanic or Latin X and with smaller proportions of the others so it's now 44% white, 30% black and 18% Hispanic or Latin X. But then we also have some folks who are Alaska native, Asian, Pacific Islander, or multiracial and obviously the groups are smaller though and so if we are looking beyond the demographics and were looking at the outcome variables and I just chose some related to well-being since we talked about homelessness when looking at the differences between the first and later waves. And then using these differences I examined the prevalence of incarceration, substance abuse, fertility having a child and homelessness. So using the differences in meaning between the waves which again in the first wave was asking if they had ever experienced the phenomenon and send the later waves it asks if they experienced it in the last two years. And I created variables to reflect if by wave 3 the youth has ever experienced the phenomenon if they've experienced it before aging out or before wave one which is approximately aging out age 17 or if they've experienced it since aging out during that kind of transition period. And so I did this so we can see that really depending on how we treat the variable there's kind of large differences in prevalence of experiencing these phenomena. So if we look at that top one we have incarceration so 44% of this sample was incarcerated at some point in their life. 31% of the sample experienced incarceration before aging out of the foster care system or before wave one. And 29% were incarcerated after wave one or after aging out of the foster care system. And looking at substance abuse we see a fairly similar pattern it's about 34% of the sample had substance abuse issues and 25% before and 20% after but for fertility we can see that it's quite different. We have a higher percentage who have ever experienced but a very low percentage who experienced it before aging out so only 6% of the sample have had a child before aging out of the foster care system and then 25% had one after aging out. And so were looking at that, and there's quite large differences especially in comparison to substance abuse and incarceration. And then last if we look at homelessness about 40% of the sample experienced homelessness at some point with 17% being exposed to experiencing homelessness before wave one and 31% experiencing it after wave one meaning after they aged out of the foster care system. I didn't make these mutually exclusive so folks who experienced it both before and after show up both times. And I thought that would be just a nice easy rundown to see the ways in which how we treat those outcome variables vary a lot whether were looking at if the youth have experienced this at any point or if the youth have experienced it after aging out. And I'm also going to look at some examples using the service file and so I looked just specifically at 2016, and I found and I just looked at 4 different services to see what youth used these services in 2016. And so we can see that a lot of youth had academic supports which was that variable that we looked at which included things like helping study through the GED, literacy, homework help things like that moving people towards graduating from high school or getting the equivalent in a GED. And we can see that a little bit less so about 40% of the youth received services about health education and risk prevention. And then we can see that considerably less youth received supervised independent living and room and board financial assistance so I just chose these four for 2016 but you can look you kind of break it down by linking the service file with the outcomes file. And then I wanted to spend some time talking about additional resources available through the archive at the Bronfenbrenner Center for Translational Research. So I'm gonna kind of go through them quickly and I'll I'll go back to each one and talk about how to access it, what we offer and I may have Michael chime in in a few of them. But really one of the unique offerings of the National Data Archive on Child Abuse and Neglect is the additional support we offer users in addition to downloading the data. And so we have users guides for each data set, a user support tab for information for SPSS, SAS, STATA, and R users. We also have a frequently asked questions tab which is a really great place to start if you've never used our data before it has general questions like 'Do you have to pay for the data?'. The answer is no. And then at last we have the Summer Research Institute and canDL which is the child abuse and neglect digital library. And so I think these are two of our most valuable resources personally. And so the Summer Research Institute is a distance learning Institute where staff work closely with data users on projects using our data sets including NYTD. So those who are accepted through our competitive application process work pretty closely in small groups getting advice from our NDACAN staff as well as the computing and statistical consultants on Cornell's campus. We kind of refine analysis plans and move a project that people have started toward publication. It's also a great networking opportunity so you'll be kind of in small groups with other people who are using the same data set as you. And then last we have canDL which is I said is the child abuse and neglect digital library and it's a publicly accessible online database of research using our data including NYTD. And so it's a really great way to kind of get familiar with how other people are using the data, what choices other people made in their analysis, how people coded things, and then also just kind of make sure that nobody has answered your question. So first we're going talk about the user's guide and so this was written by our staff and so they have details available for every single data set that we have and this is just a screenshot of the table of contents for one of our user guides, for the first cohort of the outcomes file. So as you can see it covers a variety of topics from publication to data structure and sampling, analytic considerations. The guide's really assembled with users in mind so it's a good first step for research project to kind of get refamiliar with the data especially if you've forgotten things that we've told you here but I also think it's a super valuable resource when you're writing up your method section about how to talk about our data and how it's collected and all of that. So these are PDF guides. And so they are just a really great resource here at the bottom of each of the pages so when you download the data it'll have like a little summary, codebook, and then also this user's guide. We also have a user support tab which kind of gives overall guidelines so we have both PDFs which can be really helpful for different types of data analysis users on how to work with our data but we also have videos which are even I think better. And so we're going to have a session in a couple weeks about merging the NYTD the NYTD outcomes file with the AFCARS foster care file but we also have a video that can walk people through it as well if you need help afterwards. And so I really strongly recommend checking out this area. And we also have frequently asked questions tab and this is where we kind of answer common questions we get from users across all of our data sets so it's not NYTD specific. And this is the tab on the NDACAN home page so when you when you go to the NDACAN.Cornell.edu this is the page that you'll see and we have clicked on the third one over data sets but I also recommend checking out the frequently asked questions just in case. And then as I mentioned earlier we have the Summer Research Institute and I really think it's one of our most valuable resources as I said it's an online distance learning experience where we select applications to attend our Institute. And then we work in small groups to assist data users in the production of a publishable project. The goal of the Summer Research Institute is really to increase utilization of our data and to facilitate the analysis for publication. So we want to kind of be beyond a location where people just download data we really want to partner with our users to assist them in using our data when when we're needed. And the application process is competitive and multidisciplinary so we've had participants in the past from a variety of disciplines including sociology, policy, public health, psychology, human development, and applicants who have participated in the past really talk about it being a really meaningful experience where they're able to not only network but also to get a lot of great feedback people can meet via Zoom with the directors of the data archive and stuff like that and applications are evaluated on the quality of the proposal, the research background of the applicant and then kind of the entire possibility of publishing whatever project people are working on and so if anyone's interested and using the NYTD data set next summer and the timing lines up I would strongly recommend kind of checking it out. Both the staff here and participants from previous years report it as a meaningful experience. For us it's nice because we you know we create this data and put it out there with the hope that it's used in meaningful research to improve the lives of children. And so getting to get in there and work with people is exciting for us. And I think we also have a lot of support we can offer people who are trying to get really cool projects off the ground. So at the bottom here I have a link it's kind of an overview of the Summer Research Institute and a guide for how to apply but if you are interested in this definitely make sure that your signed up for our listserv so will send an email blast out when applications are open for next summer. And then last as I said canDL is a great resource especially for finding research related to NYTD or any of our data sets. And you can see full text articles depending on your institution's subscription you can find information related to research and our area and then you can also kind of explore or view citations and in a variety of styles so if you're kind of in that first step and you want to get familiar with the research it's nice to be able to just download all those citations and work through them. And here's a screenshot of what it looks like and so you can see on the left there's a bunch of tags and how I got to this was I just entered NYTD there and then all of these publications that come up use NYTD and so we ask people who use our data to let us know when they publish so we can kind of add them to this to this resource but it's a really great starting place for sure. And so as I said my name is Erin McCauley and I'm kind of the host of this series and then Michael Dineen is also on the line and can help answer questions. He is really our research support specialist and the manager of NYTD. And so he is a great resource so if you have any questions for him please just pop them into our chat box but before we do move to answer the questions however I do want to preview next week on the 22nd from 12 to 1 PM we will have both Michael Dineen and Frank Edwards presenting. And they are both they both work at the Center for Translational Research and the Data Archive at Cornell University and this is going to be our first expert presentation so they're going to be talking about converting from long to wide format and dealing with missingness in the data and kind of other common questions we get from data users so next week we'll kind of start start getting in the nitty-gritty especially with how to deal with the missingness in our data. So I strongly recommend people checking that out next week. But for now we can open up to questions. So I see a question is there a description of each data set so users can explore what data sets might be applicable for answering potential research questions? Excellent question. Yeah so on the NDACAN website under the the NYTD tab where you can see all three of the so links to our three data sets I really recommend kind of checking out the codebook because that will have everything that's covered but the general areas I kind of talked about so the the service file is the services that people receive and that's the cumulative one and then the outcome files are really focused around well-being financial and educational outcomes. So and we have another question: where did you access the research using data at the end of the presentation? So was that the canDL? I believe that was the canDL. And so this is the child abuse and neglect digital library and so this you can access through our webpage. And so I will actually I'm going to cancel out of sharing my screen and then I will show you guys how to do that live through the website and so under the publications tab across the top here. Somebody please send me a chat if you can't see this hopefully you can. Under publications how you access the canDL child abuse and neglect digital library. And so here you just click on this little click on canDL link to the page. And so then you'll get here and it's not just for NYTD so we have kind of all of our data here it's a really great resource and then under tags you can just put in NYTD and it will search the existing tags and then you click that. And then will be looking specifically at data that uses NYTD. And so that is really wonderful I suggest people spend some time figuring out how to use it. I know I have learned a lot you know because I'm starting to use the NYTD data that's linked. I'll be talking about that in a later presentation I kind of have an ongoing project where I've linked with the assistance of Michael the NYTD data to the other data sets the AFCARS and the NCANDS and so I the first thing I did was really kind of dig into the Zotero canDL and figure out what questions other people had answered and kind of where were there gaps in the literature and then I was also really curious about the coding decisions people made and how they framed their analyses so that was helpful. when do we anticipate NYTD cohort two wave three will be available? Michael do you know I'm not entirely sure. [Michael Dineen] Well that would be the 2014 cohort so it would be there they would be doing that survey this year fiscal 2018 so it's in process of being the survey being done now and then it probably be may be a year after that because the next data we should receive is the 2017 cohort wave one. That's our actually our next set that we expect to receive and that was done in 2017 fiscal 2017 and we haven't received that yet so if you estimate the third wave of the 2014 cohort from that it would be at least a year from when the survey is over which would be a 45 days after the end of fiscal 2018 which ends September 30 so it would be something like the end of the year they'd be done surveying the end of this year. [Erin McCauley] Perfect thank you Michael and I think this next question you might be helpful with as well: do the current NYTD data sets have weights already in to deal with missingness? [Michael Dineen] Well cohort 2011 first cohort has weights that are included in the data. The 2014 cohort does not have the weights included in the data. When I do the presentation on weights I'm going to be showing you how to do your own weights. So one big advantage of the NYTD over pretty much any any data is that you have information on the non-respondents you have all the demographic information and then you can get all the information from linking to AFCARS about anything you want to know about people who did not respond to the survey so that gives you a huge advantage in weighting. But the yeah I won't go more into it right now but will have a whole session on weighting so I did answer the question I believe. [Erin McCauley] Yeah that's great thank you and I definitely encourage people to check out especially the next three presentations I think are going to be particularly informative because Michael has so much knowledge about this data and and it's going to be getting nitty-gritty with this so I know I for one will be taking notes. And so we have another question does being a PhD candidate looking to use the NYTD for your dissertation disqualify you from being eligible to apply for the summer research Institute? Michael do you know if that's true? [Michael Dineen] I don't think that's true but I'm not no no that's not true. Nothing disqualifies you. [Erin McCauley] Wonderful so definitely look at applying and there's here I'll put up the link to the video just in case you there we go and it's across the bottom there so I definitely recommend checking out the the link and and it's really a great it's a great resource and I know personally at the data archive we get really excited about Summer Research Institute so I strongly recommend it. And do you know whether children's Bureau is publishing additional data briefs on cohort two? I'm not sure Michael do you know? [Michael Dineen] Well I assume so because cohort two wave three isn't in yet. They'll get it before we do so they'll probably do some data briefs on it once they get the third wave the age 21 survey. Fantastic so we will be getting additional data briefs as wave three comes out. And do we have any other questions? Do you have any recommendations for enhancing one's application I'm assuming for the Summer Research Institute or perhaps common mistakes? So I know Michael you are involved in looking at the applications as they come in. Have you noticed any common mistakes people make or any tips? [Michael Dineen] Well the I would say if you have a really tight project proposal is the best thing if it's if it uses one of the common errors is you can't do that research project using our data or you need some outside data that you don't have. Those kind of things. But if you have good, tight research project proposal specified in your application you have good good variables that you are using for your covariates and so forth you'll you'll be fine so the main thing is to have good, doable research project. [Erin McCauley] Yes definitely you specify the dependent variables and independent variables and anything like that then all that more specificity is good. And there's also tips for the video which I have the link across the bottom here and the video's really great that has kind of tips and an overview and we really encourage early career scholars to apply and you know we think that our our data is really great obviously and so we want people who are early in their career who can really kind of work with the data multiple times as well. And we really enjoy working with early career scholars that are excited to get out there and endeavor some projects with us. Well I might add too that helps too that you have the data like people who make a proposal but they don't have the data that's not a good sign. If you have the data and it's clear from your application that you've been working with the data and you're on the ball that'll help too. Yes having the data before applying is always better and what we're really looking to do is kind of help people refine their analysis and not kind of start from square one and so definitely when we're reviewing the applications and throughout the process we expect that people have kind of put the leg work in to ground their study and like have played around in the data and that they're coming to us kind of in the later stages so that we can really push the analysis to the next level. Okay well if we don't have any more questions then we will wrap up. We really look forward to next week's session as I said it'll be Michael and Frank kind of leading us off. So if we don't hear from you and between hopefully we will see you all next Wednesday at noon. Well thank you everyone very much for coming out. Thank you everyone. The national data archive on child abuse and neglect is a project of the Bronfenbrenner Center for translational research at Cornell University. Funding for NDACAN is provided by the Children's Bureau.