A README file (often written in all capital letters to draw attention) makes data more understandable and reusable, whether by your future self or others. In data sharing, a README is typically a plain text or markdown file that briefly describes the organization of everything else included, like the table of contents in a book. A good README's strength lies in describing relationships among files, and can be the best place to include certain details about reproducing scientific analysis.
Here is an example of a README from a Figshare submission:
Setting up a README early in a project can help capture key information about datasets as they are being created, especially if many collaborators are involved in the collection and analysis processes.
A data dictionary, also called a codebook, is often distinguished from a README file by the way it is oriented toward defining variables within tabular data. These variable definitions may include information about the data type that is relevant to a programming language used in data processing, where a "data dictionary" can have a more sharply defined technical meaning. Depending on the scope and complexity of the associated scientific data, a data dictionary may overlap with a README or replace it. The underlying purpose is to facilitate reproducibility in research, saving your future self and others from unnecessary deciphering effort.
Variable name | Data type | Data format | Description |
---|---|---|---|
Name | text | Last Name, First Initial. Middle Initial. | Survey responder's name |
DOB | date/time | YYYY/MM/DD | Survey responder's date of birth |
SSN | integer | XXX-XX-XXXX | Survey responder's social security number |