Research Data Management

Research Data Management (RDM) means taking care of your research data in an organized and responsible way throughout the entire data life cycle. Well-managed research data follows the FAIR Data Principles, meaning others should be able to understand your data and reproduce your results. Effective RDM helps your project meet legal, ethical, and funding requirements while ensuring that research outputs remain discoverable, reusable, trustworthy, citable, and protected for the long term.

Email the Data and Visualization Librarian, Siti Lei (siti.lei@dukekunshan.edu.cn) for support with RDM, data management plan (DMP) creation, and data deposit.

Visit Office of Research Support (RSO) at DKU for support with research applications and funding opportunities.

What is ‘Research Data'?

Research data is the information collected, observed, generated, or created for the purpose of analysis and the production of original research results, together with any associated documentation, code, or scripts. Research data can be digital and analog, and includes both primary data created by researcher(s) and secondary data (obtained from other sources, such as census datasets or materials produced by other researchers).

Research data is diverse:

Raw/processed data (e.g., survey results, sensor readings, experimental measurements)
Text and documents (e.g., interview transcripts, fieldnotes, sketches, manuscripts)
Images, audio, video (e.g., medical images, photos of fieldwork, audio recordings)
Code and software (e.g., code scripts, models, algorithms for analysis)
Digital projects (e.g., website, StoryMap, 3D models, games)
Derived datasets (e.g., cleaned/aggregated data produced from raw data)

Why manage research data?

For yourself and your project:

Efficiency: RDM saves time and resources in long term
Data Integrity: RDM ensures accuracy and reliability of your data
Transparency & Replication: RDM makes your research process clear and enables others to replicate your results
Preservation, Sharing, & Reuse: RDM supports long-term access and future use of your data

There are also practical reasons:

Journal Policies: many journals require authors to share supporting data and include a data sharing plan or data availability statement as part of their publication requirements
Funding Requirements: research funding agencies require clear data management plans in grant proposals to ensure that research data is managed in accordance with open science principles
Ethics Compliance: researchers are responsible for managing data ethically and responsibly

FAIR Data Principles

FAIR stands for Findable, Accessible, Interoperable, Reusable.

Findable: For data to be findable there must be sufficient metadata; there must be a unique and persistent identifier; and the data must be registered or indexed in a searchable resource.
Accessible: To be accessible, metadata and data should be readable by humans and by machines, and it must reside in a trusted repository.
Interoperable: Data must share a common structure, and metadata must use recognized, formal terminologies for description.
Reusable: Data and collections must have clear usage licenses and clear provenance, and meet relevant community standards for the domain.

* Refer to the National Library of Medicine, https://www.nlm.nih.gov/oet/ed/cde/tutorial/02-200.html

Data Life Cycle

Best practices for RDM involve the entire data lifecycle, from the start to the end of a project. The main stages include Create, Store, Use, Share, Archive, and Destroy, each governed by applicable policies, rules, laws, and regulations that ensure ethical and responsible data handling.

The lifecycle of research data does not end when a project concludes. Instead, researchers are responsible for guiding data through stages of long-term preservation and potential reuse, ensuring that data remain accessible, secure, and valuable beyond the original study.

* Image courtesy of the University of Virginia Library Research Data Services + Sciences, http://data.library.virginia.edu/data-management/lifecycle

Data Classification Levels

Classification Level	Sensitivity	Explanation	Storage Requirements	Examples
1	Non – confidential	Research data that can be accessed by the general public	Must be properly configured by DKU requirements	Research data that has been de-identified in accordance with applicable rules; published research; published information about Duke Kunshan University; public-facing websites
2	Benign information to be held confidentially	Research data that Duke Kunshan University has chosen to keep confidential but the disclosure of which would not harm the institution	Must be properly configured by DKU requirements	Unpublished research data; drafts of research papers; patent applications; work-in-progress papers
3	Sensitive, or confidential information	Research data that if disclosed could cause risk of material harm or legal liability to individuals or Duke Kunshan University	Must not be stored on personal devices unless such devices are encrypted according to DKU requirements	Research data containing personally identifiable information and not classified in Level 4; Duke Kunshan IDs when associated with information that could identify individuals; any personal data protected under Chinese laws and regulations and not classified in Level 4 or 5
4	Very sensitive information	Research data that would likely cause serious harm to individuals or Duke Kunshan University if disclosed	Data must be stored on the DKU Protected Network. Should not be transferred locally unless thoroughly anonymized and verified	Individually identifiable financial or medical information; information commonly used to establish identity that is protected by Chinese laws and regulations, and not classified in Level 5; individually identifiable genetic information that is not in Level 5; national security information; passwords and PINs that can be used to access confidential information
5	Extremely sensitive information	Research data that would cause severe harm to individuals or the University if disclose	Data must be stored entirely on the DKU Protected Network at all times	Research data covered by a regulation or agreement that requires that it be stored or processed in a high security environment on the Duke Kunshan University Protected Network (DKUPN); certain individually identifiable medical records and genetic information, categorized as extremely sensitive

* Refer to Data Security and Storage by DKU Research Support Office.

Data Management Plan

Data Management Plan (DMP) is a document that plans out how research data is to be generated, managed, shared and stored during the entire research period from its implementation to after its completion. Funding agencies, research institutions, and journals often require a DMP to ensure that data are well-organized, secure, and reusable. Researchers should manage research data in accordance with the DMP to ensure responsible stewardship and future reuse.

The Office of Research Support at DKU provides Data Management Plans guidance to assist researchers in creating an effective DMP.

Support

Recommended tools for creating a DMP:

DMP Tool developed by the University of California
DMP Online developed by the Digital Curation Centre in UK
DMP Assistant developed by the Digital Research Alliance of Canada

For assistance with developing a DMP, contact Data and Visualization Librarian, Siti Lei (siti.lei@dukekunshan.edu.cn)

Submit your completed DMP to the Office of Research Support (research-support@dukekunshan.edu.cn)

Research Group Procedures

Research group procedures (aka ‘lab procedures’, ‘standard operating procedures’) set expectations for working in collaborative research environments. They vary by group but typically cover policies (e.g., data ownership, confidentiality), workflows (e.g., file naming, version control), roles and responsibilities, use of space and equipment, approved tools and software, and general research and data management practices.

They differ from a DMP as they define how collaboration and data stewardship are organized across multiple projects, while a DMP is project-specific, detailing how data will be collected, stored, shared, and preserved in alignment with those procedures.

Onboarding & Offboarding Procedures

Establishing and documenting onboarding and offboarding procedures is essential for all research groups and collaborative projects. These procedures should include clear actions related to research data to standardize knowledge transfer and ensure that all team members have appropriate access to information, systems, and files. Effective procedures help reduce the risk of data loss or mishandling and ensure compliance with institutional and data security standards.

Onboarding procedures for research data may include:

Reviewing relevant policies, procedures, and documentation
Reviewing data management expectations and best practices
Reviewing available tools and resources for data storage and collaboration
Reviewing or creating data workflows for the project
Clarifying roles and responsibilities for data management

Offboarding procedures for research data may include:

Transferring ownership of files and shared drives
Updating documentation and metadata for data files
Selecting files for retention, archiving, or secure deletion
Removing permissions and access to systems, drives, and repositories

Consent & Ethics

Ethical responsibility is essential to research. Consent and ethics safeguard participants, ensure that data is collected accurately and lawfully, and foster trust in research outcomes. At DKU, researchers should plan for consent and ethical approval before beginning data collection to ensure compliance with institutional policies, Chinese regulations, and responsible data management practices.

Things to consider before data collection:

Informed Consent: Participants should be clearly informed about what data will be collected, how it will be used, and who will have access. Their consent must be voluntary, documented (signed or digital), and allow withdrawal at any time.
Ethical Approval: Ethics protocols cover security, access and retention for human data. Research projects at DKU involving human participants require review and approval from the Research Support Office (RSO).

Contact Institutional Review Board (IRB) to review and approve your research’s ethical protocols.

Folder & File Organization

Folder

Instead of storing data and files in default computer locations (e.g., Desktop or Downloads), you should create separate folders to organize them by category.

Your folder directory structure should prioritize clarity and easy discoverability. Keep it simple – limit the structure to no more than 4 levels and 10 or fewer subfolders within each level.

File

Organizing your research files in a clear and consistent way makes your data easier to understand, share, and keep safe for the long term. It also saves you time when you need to find or reuse your files later. A good system should be descriptive, well-structured, and used consistently, with clear documentation explaining how the data was created, collected, and processed, as well as any information needed to help others interpret and reuse it accurately.

Examples of data documentation include:

README files – describe the file organization and naming system
Codebooks – explain attributes/codes and their meanings
Data dictionaries – define variables and fields
Scripts – record data processing and analysis steps

Recommended tools for creating data documentation:

README Template by Duke University Library
README Template by the Library of the École de Technologie Supérieure (ETS)
README Generator by the McMaster University
Codebook example from the National Household Survey, 2011 (Canada)
Codebook Creation Tools:
- Nesstar Publisher – Download
- Colectica – Download
- DDIEditor – Download

Folder & File Naming

Tips for naming folders and files:

Keep name short (under 32 characters)
Name files differ from folders
Use alphanumeric characters (avoid special characters such as & , * % # * ( ) ! @$ ^ ~ ‘ { } [ ] ? < > –)
Use CamelCase or underscores instead of periods or spaces
Use date format ISO 8601: YYYYMMDD
Use meaningful and unique names
Use leading zeros (for a sequence of 1-100 number: 001-100)
Use version control if needed (e.g. v1, v001, v1_1 instead of “final2”, “revised”, or versioning system like Git)
Be consistent!

An example of a filename convention:

YYYYMMDD_ContentDescription_Version.ext

File Format

File formats is important for long-term data preservation and accessibility. Whenever possible, use open, non-proprietary, and widely supported formats (e.g., CSV, TXT, TIFF, or XML) rather than proprietary ones (e.g., Excel .xlsx, SPSS .sav, or Photoshop .psd) that may require specific software to open.

Open formats increase the likelihood that your data can be accessed, shared, and reused in the future. In the meanwhile, it is also helpful to document the file formats used in your project and explain any software dependencies.

When proprietary formats are unavoidable, consider saving an additional copy in an open or standardized format for preservation.

Version Control

Version control helps you track changes to your data, documents, and code over time, ensuring that earlier versions can be recovered if needed. Clear version control practices help facilitate accuracy, reproducibility, and accountability throughout the research process.

When designing a file naming system, consider including version numbers or dates in filenames (e.g., 20251013_InterviewData_v002.csv), and maintain a change log or brief note describing what was modified in each version.

Backup & Storage

Tips for data backup:

Backup after a major edit/alteration (not after every save)
Create backup copies for:
- Things you cannot replicate
- Things that would be difficult or take a lot of effort or resources to recreate
Have multiple copies, saved in different places
- 3-2-1 rule (3 copies, 2 types of storage, 1 of which is offsite)
Automate your backup process when possible

Consider using following storage options:

Personal computer hard drive
External hard drives (with provisions)
Departmental servers (if available)
Cloud storage (if appropriate)

Support

For assistance with additional storage space for research data, contact the Office of Information Technology (IT) at DKU.

Security

Understanding the risks associated with your data can help you adequately protect it. This is important for:

Supporting integrity of your research data (e.g., no unauthorized modifications)
Protecting against data loss or intellectual privacy theft
Protecting confidential/sensitive data from unauthorized access
Ensuring compliance with sponsor or partnership agreements

Check out DKU Data Security and Storage for guidance on identifying data classification level and storage requirement based on their sensitivity, confidentiality levels, and relevance to human subjects.

Sensitive, Confidential, and Human Data

Sensitive Data

Sensitive data refers to information that, if disclosed, could cause harm to individuals, organizations, national security, or society.

Tips for managing sensitive data:

Encrypted if stored outside a secure server environment
Encryption optional if stored inside a secure server environment
Maintain a clear access log or audit trail of who opens or edits the data
Document storage locations and sensitivity levels in a metadata or README file
Review permissions and security settings periodically
Permanently destroy all copies after the official retention period using approved deletion tools

Confidential Data

Confidential data refers to any information that subjects to legal or contractual obligations to be kept private or restricted to authorized individuals or parties entrusted to safeguard them from unauthorized access, misuse, disclosure, modification, loss, or theft.

Tips for managing confidential data:

Store identifiable information (like names or IDs) separately from research files in an encrypted folder
Keep an inventory showing where personal data are stored and who has access
Use consistent file naming to indicate anonymized or restricted content
Limit access to authorized team members only and update permissions regularly
Follow the institution’s retention schedule and securely delete files when no longer needed

Human Data

Human data refers to information obtained from or about individuals, communities, and groups. Human data may be considered sensitive and/or confidential and may be subject to specific ethical, legal, and contractual obligations.

Tips for managing human data:

Store consent forms, ethics approvals, and related documentation in a labeled folder
Organize transcripts, notes, and recordings using a structured and consistent folder hierarchy
Keep de-identification notes describing what personal details which were removed or replaced
Use version control to record updates or cleaning steps for qualitative materials
Archive only anonymized or aggregated human data in repositories after the project ends

Finding Data Sources

In addition to the data you create yourself, you can explore the following sources to find secondary or third-party datasets:

Work with Secondary Data – guidelines for reviewing terms of use, copyright, and citation requirements when accessing databases and using secondary data or datasets.
Data Resource Search Tool – this tool provides access to both licensed (proxy-based) and open databases, helping the DKU community discover a wide range of data resources.
Data Availability Statements – many academic journals include data availability statements that describe how to access the datasets associated with a published article.

Processing & Analyzing Data

DKU Library’s data and visualization services provide workshops, software tutorials, and resource guides to support data processing and analysis. Topics include:

Accessing Tools & Software

Check the Office of Information Technology (OIT) and DKUL’s Tools and Software for available resources.

If you plan to use a campus computer, check out Public Devices for information.

Use of AI

When using Artificial Intelligence (AI) to process or analyze research data, researchers must apply strict ethical, legal, and security safeguards. Appropriate consent, privacy protection, and institutional approval are mandatory before working with AI.

Be mindful of uploading research data to open AI tools, which carries the risk of exposing unpublished or confidential information and may be retained into the AI’s training model without the researcher’s permission. Sensitive and personal data should never be uploaded to or exposed through such AI tools.

Check out DKU AI Literacy: Policies & Guidelines for more information.

After the Research Project

You’ve completed your project! Now what should you do with all the data?

Retention: Intentionally keeping data after a project is completed. Reasons for data retention may include:
- Meeting funder or institutional requirements
- Supporting your research if it is ever questioned
- Allowing for further or follow-up analysis in the future
- Preserving unique or irreplaceable data
Preservation: A set of managed activities that ensure your data remains stable, usable, and accessible for as long as needed.
Sharing: Making your data available to others for validation, reuse, or future research.

Note: Before sharing your research data with others or depositing your research data to a repository, check with the Office of Research Support(RSO) to ensure safe and compliant submission with national laws and policies. In some cases, you may be required to de-identify or destroy your data, especially when it involves sensitive or confidential information.

Law & Policy Compliance (China & U.S.)

As researchers at DKU, individuals are responsible for understanding and complying to the national laws and institutional policies of both China and the U.S. that regulate research data management and cross-border data exchange.

China Laws

The Cyber Security Law of People’s Republic of China (CSL) is a foundational regulation that ensures network security, protects national sovereignty in cyberspace, and safeguards the rights of citizens, organizations, and the public interest. It promotes the secure development of China’s digital economy by establishing systems for network security classification, user information protection, and critical information infrastructure management.

The Personal Information Protection Law of the People’s Republic of China (PIPL) is a special law that aims to protect the rights and interests of personal information, standardize personal information processing activities, and promote the rational use of personal information. It designs a system for the entire process of personal information processing, puts forward strict requirements for the protection of sensitive information and cross – border provision of personal information, and clarifies the rights of individuals and the obligations of processors.

The Data Security Law of the People’s Republic of China (DSL) is a fundamental law governing data processing and protection in China. It aims to ensure data security, promote lawful data use, and safeguard national sovereignty and public interests. The law introduces systems for data classification, risk assessment, security review, and incident handling to protect the rights of individuals and organizations while supporting secure data development and utilization.

The Provisions on Facilitating and Promoting Cross-Border Data Flow aims to balance data security with international data exchange. It establishes guidelines for data classification, requiring critical data to be stored domestically while allowing non-sensitive data to flow across borders. Companies must conduct risk assessments, implement security measures, and obtain user consent for data transfers. The provisions encourage international cooperation, streamline compliance procedures, and promote data-driven innovation. It also emphasizes protecting personal information and ensuring transparency in data processing. Overall, the framework seeks to foster global digital trade while safeguarding national security and individual privacy.

U.S. Laws

The Common Rule (45 CFR 46) is the main U.S. regulation governing research involving human subjects. It protects participants’ rights, welfare, and privacy through ethical standards for data collection, storage, and use. The rule requires Institutional Review Board (IRB) approval and informed consent, ensuring that identifiable and sensitive data are handled responsibly and securely in federally funded research.

The Data Management and Sharing Policy by the Natural Institutes of Health (NIH) sets national standards for managing, preserving, and sharing research data. It requires all NIH-funded researchers to submit a Data Management and Sharing Plan (DMSP) outlining how data will be documented, protected, and made accessible. The policy promotes transparency, reproducibility, and alignment with open science and FAIR data principles.

The Federal Information Security Modernization Act mandates strict security standards for information systems managed by federal agencies and contractors. It establishes a framework for protecting data confidentiality, integrity, and availability through risk assessments, access controls, and regular monitoring. FISMA ensures that research projects involving federal data or funding comply with federal cybersecurity and data protection requirements.

Data Repository

Data repositories are online platforms that store, organize, and preserve datasets, often making them available for sharing and reuse. They are widely used by research communities to share and discover data.

There are three main types of data repositories:

Disciplinary repositories focus on a particular area of research or type of data. They often have requirements for data formats, documentation, and metadata. You can find disciplinary data repositories by checking in with your peers, reviewing relevant journals for recommendations, or reviewing re3data, a registry of research data repositories.
Multidisciplinary/generalist repositories are not focused on a particular field and typically accept all types of data. Some examples of multidisciplinary repositories include FRDR, Dryad, Zenodo, and figshare.
Institutional data repositories are generalist repositories provided by a specific institution. Duke Research Data Repository is the Duke University’s institutional data repository. It accepts research data from research conducted at or under the auspices of the Duke University.

Understanding copyright and licensing helps define how your data can be shared and reused. Always include a license statement in your metadata or README to clarify permissions and restrictions to your users.

Copyright Ownership: In most cases, the creator or principal investigator (PI) holds copyright to the data they produce, unless otherwise stated in a grant, institutional, or collaborative agreement.
Secondary Data: If your project includes data collected or created by others, review and respect the original license or terms of use before redistributing or modifying it.
Licensing: When sharing your own data, attach a clear license that specifies how others may use it.
- Open licenses such as Creative Commons (e.g., CC BY, CC BY-NC) or Open Data Commons (e.g., ODC-BY, ODbL) allow others to reuse your data under defined conditions.
Institutional & Funder Policies: Some funders or institutions may have requirements about data ownership, rights retention, or preferred licensing models. Check these before publishing or depositing your dataset.

Research Data Management

Support

Folder

File

Support

Sensitive Data

Confidential Data

Human Data

China Laws

U.S. Laws

Contact us

© 2026 Duke Kunshan University 苏ICP备16021093号