What is Data?

Data is factual information created by people and recorded in symbolic or visual forms, such as text, numbers, images, measurements, statistics, or observations. Data is often described as “pieces of information”. Data provides the foundational material for reasoning, discussion, decision-making, and scientific or computational analysis.

Data Types

Qualitative

Qualitative (aka. Categorical) data is information that describes qualities or characteristics and is not measured numerically. Qualitative data is often represented using words, images, or symbols, and can be categorized into nominal and ordinal types:

Nominal data is categorical information that consists of unordered categories. Examples include colors, gender, types of cuisine, and country names.
Ordinal data is categorical information with a meaningful order or ranking. Examples include risk levels (low, medium, high), satisfaction ratings.

Quantitative

Quantitative (aka. Numerical) data represents measurable quantities and is typically expressed in numbers. Quantitative data is easily amenable to statistical manipulation, and is often further divided into discrete and continuous types:

Discrete data is the countable values that can only take specific, separate numbers. Examples include number of buildings, population counts
Continuous data refers to measurable values that can take any value within a given range. Examples include elevation, temperature, distance.

Primary Data

Primary (aka. Raw) data is the first hand data collected and gathered by the researcher(s). Primary data is obtained directly through original research methods, such as interviews, surveys, experiments, and etc..

Visit this page for guidance on working with primary data.

Secondary Data

Secondary data refers to existing data collected by others, often produced by government institutions, healthcare facilities, or other organizations as part of their routine record-keeping. Researchers extract and repurpose this information from various data files to conduct new analyses or studies.

Visit this page for guidance on working with secondary data.

Key Concepts

Dataset

A dataset is a structured collection of related data, often organized in rows and columns (like a spreadsheet or table), typically used for analysis or research.

Database

Database is a structured system for storing, organizing, and managing large volumes of data, typically allowing efficient retrieval, updating, and querying. It can contain multiple datasets and is often managed using database management systems (DBMS).

Variable

Variable is a measurable attribute or characteristic that can take on different values across individuals, observations, or records in a dataset. Variables are typically represented as columns in a data table and can be qualitative or quantitative.

Microdata

Microdata is data that provides information at the individual level, such as person- or household-level responses in surveys or censuses. Each row typically represents one unit, and each column represents a variable.

Macrodata

Macrodata is data that represents aggregated or summarized information at a group, institutional, or national level—rather than individual-level data. It is commonly used in economics, policy analysis, and official statistics.

Metadata

Metadata is the information about data that provides descriptive details about the content, structure, origin, and context of a dataset. It explains how the data was collected, what each variable represents, and how the data can be interpreted, reused, or cited appropriately.

DKU Support

For data-related support, contact the Data and Visualization Librarian, Siti Lei (siti.lei@dukekunshan.edu.cn).