National Data Archive on Child Abuse and Neglect
Archiving Quantitative Child Maltreatment Data
This video was produced by the National Data Archive on Child Abuse and Neglect (NDACAN)
Focus
This video will provide an overview of the dataset archiving process by covering the following topics:
Introduction to NDACAN
The benefits of data sharing
Archiving is a collaborative process
Data sharing or data management plan creation
The process and steps for preparing and depositing data and documentation
Release of the dataset for use by the child maltreatment research community
Activities undertaken after a dataset is released
Terms 
Before we get started, there are a few terms which require introduction in order to assist your understanding of the video. 
The term “data archive” is defined by the National Institutes of Health, as “A place where machine-readable data are acquired, manipulated, documented, and finally distributed to the scientific community for further analysis”
A “Data contributor” is a person or organization who engages in preparing and submitting data for archiving
A “Dataset package” is a collection of study documentation and data files which describe, or are the result of, a data collection effort. 
Abbreviations
The following abbreviations are used in slides during the course of the video:
ACF refers to the Administration for Children and Families, a division of the Department of Health & Human Services 
ACYF refers to the Administration on Children, Youth and Families, a division under ACF
CB refers to the Children’s Bureau, under ACYF
CMRL is the child-maltreatment-research-list-serve
HHS is the United States Department of Health &Human Services
NDACAN (spell out and say as a word) is the National Data Archive on Child Abuse and Neglect at Cornell University
Introduction to NDACAN
As our name suggests, we are a trustworthy repository for data from quantitative research studies and administrative data systems on the topic of child maltreatment. The topic of child maltreatment includes data on all forms of child abuse, neglect, foster care, and adoption. 
NDACAN is located at the Bronfenbrenner Center for Translational Research in the College of Human Ecology at Cornell University in Ithaca, NY and was founded in 1988 by John Eckenrode, a professor in Human Development in the College of Human Ecology. 
The Archive is made possible through a federal contract from the Children’s Bureau in the Administration for Children and Families which resides within the U. S. Department of Health and Human Services. 
The Archive operates under the direction of, Chris Wildeman, a Professor in the College of Human Ecology’s department of Policy Analysis & Management at Cornell University.
Our mission is to facilitate secondary analysis of research data relevant to the study of child abuse and neglect and provide an accessible and scientifically productive means for researchers to explore important issues in the child maltreatment field.
Benefits of data sharing
Data sharing enables research transparency, through replication and validation of the original research findings, as well as opportunities for collaboration between researchers of different disciplines, and the extension of original research.
Many research funders have recognized the importance of data sharing and have incorporated data sharing requirements in their funding opportunities. The National Institutes of Health have promoted data sharing since 2003. The National Science Foundation has required all grant proposals include a data management plan detailing compliance with the Agency’s data sharing policy since 2011. 
Data sharing can benefit the investigator’s professional career, as well. In 2007, Piwowar, Day, & Fidsma reported finding an increased citation rate for investigators who share their data. Citation rate is often used to determine a researcher’s impact on the field which is an important factor in faculty tenure decisions. 
Archiving is a collaborative process
NDACAN views data archiving as a collaborative process between us and the data contributor. The collaboration begins at the earliest stages of a research project and will extend for years beyond the release of the dataset to the child maltreatment research community. Although this video focuses on the duties of a data contributor, we would also like to point out that we are your partner and can provide assistance at almost any stage of your research. 
The following are examples of the ways in which NDACAN assists researchers:
We provide resources to help researchers with creating data sharing or data management plans for submission with their funding proposal.
Researchers have solicited input from NDACAN regarding commonly used, construct specific, measures and instruments, found in our other datasets. This helps to inform their decisions on which measures to use in their proposed primary data collection research project.  
Data from the Archive has been used by researchers to calculate sample size targets and to construct weights that were applied to complex survey data which was designed to be nationally representative. 
Examples continued…
Data from the Archive can be analyzed by researchers to provide supporting evidence for hypotheses appearing in their proposed primary data collection project.  
Once the project has been funded and data collection begins, the Archive is available to respond to questions on the topic of data management. 
NDACAN staff are available to respond to questions that arise while data contributors are preparing the dataset package. In the past, staff have provided guidance on topics such as, which variables to include in the archived data, how to recode problematic variables, and how best to structure the data files.
Data contributors are welcome to send inquiries to NDACANsupport@cornell.edu
Data sharing or data management plan creation
Some funding agencies require applicants to submit a data sharing or data management plan with their funding proposal. 
For prospective data contributors who are at the stage of funding proposal creation, NDACAN provides guidelines and a template for creating a data sharing or data management plan to include with your proposal. The Contributor Data Management Guidelines are viewable at the following web address: 
https://www.ndacan.acf.hhs.gov/contribute-data/contribute-contributor-data-management-plan-guidelines.cfm
From the Contributor Data Management Guidelines web page, data contributors can also access the Template for NDACAN Contributor Data Management Plan. 
In addition to the guidelines and template, NDACAN has a document entitled “Data Sharing Resources” which is a collection of resources that might be helpful when creating a data sharing or data management plan. Please visit the Contribute Data page of the NDACAN website at https://www.ndacan.acf.hhs.gov/contribute-data/contribute-data-general.cfm.
Data contributors interested in designating NDACAN as the recipient of their research data should contact NDACAN to discuss their proposed research. NDACAN will also provide a letter of acknowledgement stating that we are aware of the research project and agree to archive the data resulting from the project. 
Prepare and Submit a Dataset Package
This section of the video will provide an over view of the steps that must be undertaken by a data contributor to prepare the documentation and data files for submission to the Archive.
Contributor’s Handbook
In order to keep this video concise, only summary information about the archiving process is provided. For more detailed information about the archiving process, please consult the document entitled, “A Contributor’s Guide to Preparing and Archiving Quantitative Data”  also referred to as the “Contributor’s Handbook,” which can be found on the Contribute Data page of the NDACAN website at the following web address:
https://www.ndacan.acf.hhs.gov/contribute-data/contribute-data-general.cfm
Web links to NDACAN resource documents discussed during this presentation can also be found in the summary description for the video
Overview of the Steps for Archiving a Dataset
This is intended to provide a quick outline of the steps we will cover later in the video.
Step One: Complete and submit Part I of the Study Submission Form and the Investigator Contact Cover Sheet. 
Step Two: NDACAN will set-up a call to discuss the dataset. The data contributor will have the opportunity to ask questions. NDACAN will decide whether the data are suitable for archiving at the data archive. 
Step Three: If NDACAN determines the data are suitable for archiving, prepare the remaining elements of the dataset package in accordance with the Contributor’s Handbook and as summarized in this video. 
Step Four: Once the dataset package is assembled, create a compressed .zip folder which contains the entirety of the dataset package. Notify NDACANsupport@cornell.edu that the dataset is ready for submission.
Overview of the steps for archiving a dataset continued
Step Five: When NDACAN receives the request to submit the dataset package, staff will set-up a means for the files to be electronically transmitted. 
Step Six: Once NDACAN retrieves the file from the file transfer system, then they will conduct a quick review to be sure that the files received match what is required or was discussed in prior conversations. Processing the dataset package may not occur right away if other datasets were in the queue ahead of the dataset submitted. 
Step Seven: NDACAN will process the dataset in the order in which it was received in the queue of datasets waiting to be processed. This requires a study contact person to be available to respond to questions and review the final dataset package once it has been prepared.
What cannot be archived?
NDACAN has established archiving exclusion criteria. If datasets meet any of the established criteria, they cannot be archived. The criteria can be found in a document entitled, “NDACAN Archiving Exclusion Criteria” located at the following link:
https://www.ndacan.acf.hhs.gov/contribute-data/contribute-application-process.cfm 
How to begin the archiving process?
The first step in starting the archiving process is to submit the Study Submission Form: Part I and the Investigator Contact Sheet found at the following NDACAN webpage:
https://www.ndacan.acf.hhs.gov/contribute-data/contribute-application-process.cfm 
Once NDACAN receives the completed Study Submission Form: Part I and the Investigator Contact Sheet, NDACAN staff will contact study staff to set up a call or online meeting to discuss the data collection, unique attributes of the data, data disclosure issues, and the next steps in the archiving process.
This initial contact should be made as soon as possible after funding has been awarded in order to ensure the smoothest archiving experience for the data contributor.
Study Submission Form: Part 1 
The Study Submission Form: Part I collects the following information about the study to be archived:
Study title
Abstract
List of investigators as they would appear in a publications
Keywords to describe the study
Sponsor or agency name
Award number
Award start and end dates
Once the form is complete, submit it to NDACANsupport@cornell.edu.
Study Submission Form: Investigator Contact Sheet
Submit a completed Study Submission Form: Investigator Contact Sheet for each investigator involved in the study.
The Investigator Contact Sheet collects the following information:
Study title
Salutation or name prefix
First, middle, and last name
Investigator degree or name suffix 
Position title
Institution or organizational affiliation and address
Phone, fax, and email address
Investigator’s role: Indicate whether the person is a Principal Investigator and/or a contact person for questions about the study
Prepare the study materials
The next steps in the process are undertaken by study staff once the data collection effort has concluded. 
Study staff will prepare the following materials for archiving:
A codebook or data dictionary containing unambiguous variable names, descriptive labels, missing data codes, values and value labels for categorical variables, derivation logic for derived variables, and variable data type. 
Electronic copies of data collection instruments or measures
Interim and final reports related to the data collection effort. Copies of the reports should be provided with the submission whenever possible.
A list of bibliographic citations for published articles and reports based on the data collection effort
Institutional Review Board review and approval letter and the approved informed consent template
Complete Study Submission Forms Parts II, III, and the Instrument Information Form which are located at the following web address: https://www.ndacan.acf.hhs.gov/contribute-data/contribute-application-process.cfm
Study Submission Form: Part II- Dataset Details
The following information is collected in Part II of the Study Submission Form: 
Contact person
Study title
Types of data collected
Date data collection started
Date data collection ended
Geographic area to which the data are relevant
Unit of analysis
Sample description
Response rates
Study design description
Data collection procedures description
APA formatted list of published works based off from the study
Data contributors will also provide responses to the following questions in the form:
Were the data being submitted collected by you or were the data obtained from another source?
Are there any secondary identifiers which present challenges to preserving respondent confidentiality?
Is this data submission part of a longitudinal study?
Will the data in this contribution need to be updated?
Is this a new edition or a special version of data already archived?
Blank space to provide any additional information not covered by the form.
Study Submission Form: Instrument Information
Complete an Instrument Information form for each measure or instrument used in the study and for which there are corresponding data in the data file.
The following measure or instrument information is collected in the Instrument Information Form:
Study nickname
Full measure name
Abbreviated or nickname for the measure
Version of the measure
Bibliographic citation for the measure
If the measure is project derived, provide a general description
Description of how the measure was modified for use in the study, if applicable
Prepare the data file(s)
Study staff must remove all direct identifiers from the data files prior to submission of the dataset package to the Archive.
Direct identifiers include:
Names
Social security numbers
Phone numbers
Medical record number
Insurance card number
Highly specific geographic variables, such as, street addresses, geo-coordinates, or census block information
Prospective data contributors can consult with NDACAN staff to determine how to identify and appropriately reduce the disclosure risk of problematic variables.
NDACAN will also conduct a disclosure risk review upon receipt of the data.
Prepare the data files continued
Each data file should contain the following required data file elements:
An unambiguous variable name that matches the name appearing in the codebook.
A descriptive variable label which is a textual description of the item, or a clear reference to its associated question in the data collection instrument.
A list of valid values and corresponding labels for categorical variables.
Missing or inapplicable data codes and their meanings.
Variable data type, such as, numeric, character, or date.
Column specifications for each variable.
Decimal settings should reflect the data contained in each variable.
Note: When there are multiple data files and Codebooks, include a document that maps the data file to its respective Codebook document. 
Prepare the data files continued
NDACAN accepts data in a variety of file formats. NDACAN currently can receive data files formatted for SPSS, Stata, and SAS and delimited text data files. 
If the data files are not in one of these file formats, please contact NDACAN to discuss what you have and to confirm that we will be able to directly open or convert the file to another format upon receipt.
Study Submission Form: Part III – Data File Characteristics
Part III of the Study Submission Form should be completed after the data file has been processed and prepared for archiving. 
Part III is a Microsoft Excel spreadsheet
Enter the following information into a numbered row in the spreadsheet for each data file that will be a part of the submission:
File name
Number of records
Number of variables
Number of records per case
Format, such as, tab delimited, SPSS, SAS, or Stata
Indicate whether the following were performed:
consistency checks 
checks for undocumented codes
Indicate whether the data contain variable and value labels
Collate the dataset package for submission
Once prepared, the dataset package, including all documentation and data files, should be contained with a .zip folder prior to finalizing the upload arrangements with NDACAN.
Data contributors should check their dataset package against the NDACAN publication entitled, “NDACAN Archiving Checklist for Dataset Packages” found at the following web address: https://www.ndacan.acf.hhs.gov/contribute-data/contribute-application-process.cfm
Send an email to NDACANsupport@cornell.edu stating that the dataset package is ready for upload.
Submission of the assembled dataset package, should happen no later than 8 months prior to the expiration of study funding. The reason for this is due to the likelihood that dataset processing will not begin upon receipt of the dataset package, as other datasets may be in the queue ahead of the one being submitted. Depositing the dataset package prior to the expiration of funding will permit study staff to be available to respond to questions during processing.
Transmit the dataset package
When NDACAN receives the announcement that the dataset package is ready for submission, staff will evaluate and choose one of the following file sharing methods for the data contributor to send the dataset package, based on what is known about the data at that time: 
Cornell Enterprise Box or Cornell Dropbox, which is also known as Cornell Secure File Transfer
More information about the security specifications for these file transfer systems can be found on page 6 of the web document entitled, “NDACAN Archiving Process and Steps” which is available for download at the following web address:
https://www.ndacan.acf.hhs.gov/contribute-data/contribute-application-process.cfm 
These are the same file sharing methods used to send the dataset package to secondary analysts.
Processing the deposited data
This next section of video provides an overview of the steps that NDACAN takes, from the point when we receive the dataset package to processing and preparing it for release.
Receipt of the dataset package
Once the data and documentation files are received by NDACAN, they will be saved to a secure file server, the dataset will be logged into the internal tracking system, and given a unique numeric identifier known as a dataset number. 
NDACAN will process the dataset in the order in which it was received in the queue of datasets waiting to be processed.
It is important that study staff be available to respond to questions during processing and also to review the final NDACAN prepared dataset package prior to it being released to the child maltreatment research community.
Summary of dataset processing
The information presented in this slide is not intended to be all inclusive of the activities undertaken by Archive staff to process a dataset. The steps involved in processing a dataset varies by dataset and is based on how well a data contributor has addressed and incorporated the required elements of the documentation and data files prior to submission. What is presented here are the steps that most datasets will undergo.
NDACAN will conduct a disclosure review of each data file and work to reduce or eliminate disclosure risk by recoding or removing problematic variables in consultation with the data contributor. 
Archive staff will create a Section 508 compliant User’s Guide containing study level metadata about the original data collection effort which will accompany the dataset upon release. The information supplied by the data contributor in the Study Submission Forms is used to populate the contents of the Users Guide.
NDACAN will produce the final versions of the data files in the following formats:
SPSS native, which is represented as file extension .sav
SAS native which is represented as file extension .sas7bdat
Stata native which is represented as file extension .dta
Import program files for SPSS, SAS, and Stata which are represented as .sps, .sas, and .do & .dct respectively
Text data file which is represented as file extension .dat
Tab delimited data file which is represented as file extension .tab
Data contributor review
Once NDACAN has prepared the dataset package it gets sent to the data contributor for review.
Data contributors must have a signed Data Contributor’s Agreement on file. NDACAN prepares the document and sends it to the data contributor for signing. The document can be signed at any time during the archiving process but must be on file prior to the dataset’s release. 
Data contributor supplied edits will be vetted by Archive staff and incorporated into the final version of the dataset package.
Dataset Access Procedures and Release
In this section, the two pathways that secondary analysts will have to follow to access the data are discussed and the process of announcing the availability of the dataset to the child maltreatment research community, also known as the “release.”
Dataset access procedures
Archive staff will assess the final version of the dataset to determine which data ordering access procedures are appropriate. The order procedures outlined below will determine the steps, eligibility criteria, and requirements that must be met before a prospective data user can receive the data.
Terms of Use Agreement – This data ordering access pathway is the least restrictive and provides access to a widest audience of researchers. The Terms of Use Agreement document can be found at the following web address:
https://www.ndacan.acf.hhs.gov/datasets/order_forms/TermsofUseAgreement.pdf
Restricted Access Data Licensing – This ordering procedure requires additional documentation before an analyst can gain access. Datasets requiring these additional measures are those with highly sensitive data, such as, vulnerable populations, availability of small-scale geographical data, contractually required data access restrictions, etc. More information about the restricted access data license process and forms can be found at the following web address: 
https://www.ndacan.acf.hhs.gov/datasets/request-restricted-data.cfm
Release of the dataset
After the final version of the dataset package is reviewed by all stakeholders, the dataset will be released to the child maltreatment community. 
The dataset title, abstract, and User’s Guide are posted to the Datasets page of the NDACAN website along with instructions for how to procure the data.
The dataset’s availability is announced to the child-maltreatment-research-list serve (CMRL) and to the NDACAN Twitter account.
Post Dataset Release: Ongoing Activities
After a dataset is made available for use by the child maltreatment research community, NDACAN’s work is not done. We promote the use of datasets, provide technical support, and actively track publications produced using datasets in our holdings.
Promote use of the dataset
The dataset will appear in the quarterly electronic newsletter entitled, “The NDACAN Updata”.
The newsletter is sent to over 6,000 NDACAN mailing list subscribers
NDACAN hosts an annual Summer Research Institute. The Institute is an online distance learning experience where successful applicants work on their proposed research projects using NDACAN datasets and receive technical assistance from Archive staff and consultants.  The primary goals of the Institute are to increase utilization of NDACAN's holdings and to facilitate a secondary analysis project, from which, child abuse and neglect researchers can publish. 
NDACAN hosts periodic dataset specific webinars and trainings which are recorded and added to the website and NDACAN YouTube channel. The videos are designed to provide an introduction to a dataset or statistical procedure featuring a specific dataset in an effort to promote use of NDACAN data.
Technical support
NDACAN provides robust technical support for datasets contained within its holdings. 
There is a dedicated support email which is connected to a technical support ticket tracking system. 
The support email is NDACANsupport@cornell.edu 
Archive technical support serves as the first point of contact for dataset technical assistance requests
The User’s Guide instructs secondary analysts to send support requests to this email address.
If there is a question that Archive staff cannot answer using the supplied data documentation, then Archive staff will reach out to the study contact to request assistance in responding to the question.  
For the most popular datasets at the Archive, the number of times Archive staff need assistance from study staff is rather low, around 0-5 times a year. The highest volume of assistance is noted during the first year the dataset is released and then it trails off each year after.
Another way NDACAN provides technical support is with staff developed user support documents to assist data users working with the datasets. These documents are posted to the User Support page of our website.
Track publications
NDACAN maintains an online searchable digital citations management database called the child abuse and neglect Digital Library or “candle”
This database is where we store bibliographic citations for publications relating to our archived datasets.
Data users are required to submit their bibliographic citations for published work using our data. 
In addition to the submission requirement, NDACAN periodically reviews the published literature to identify citations for published works not previously captured in the canDL. A call for data users to send NDACAN their citations appears in each quarterly Updata newsletter.
The canDL is accessible to data users and contributors on the Publications page of our website at the following web address: https://www.ndacan.acf.hhs.gov/publications/publications.cfm
Conclusion
This concludes the substantive content portion of the video presentation entitled, “Archiving Quantitative Child Maltreatment Data.” 
Please submit questions about archiving data at NDACAN to NDACANsupport@cornell.edu 
Bibliography
This slide contains the resources cited during the video.
National Institutes of Health (2003, March 5). NIH Data Sharing Policy and Implementation Guidance. Retrieved May 10, 2019, from https://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm#archive 
Piwowar H.A., Day R.S., & Fridsma D.B. (2007). Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. http://doi.org/10.1371/journal.pone.0000308  
The National Data Archive on Child Abuse and Neglect is a project of the Bronfenbrenner Center for Translational Research at Cornell University. Funding for NDACAN is provided by the Children’s Bureau