LOVE YOUR DATA Day 2 – Organizing your data

Post by Tiffany Grant PhD, Research Informationist based at Donald C. Harrison Health Sciences Library

Organizing Data

When you’re generating data at a rapid pace, it can be easy to label files with names that seem good at the time, but that will have very little meaning to you later. This practice may save time in the present, but it will ultimately lead to great frustration in the future when finding these exact files seem nearly impossible.

A good practice for data organization is to give your files meaningful, descriptive names, but avoiding long file names. Files names should allow you to identify a precise experiment from the name.

How meaningful are the following file names?

  1. Test_data_2013
  2. Project_Data
  3. Design for project.doc
  4. Lab_work_Eric
  5. Second_test
  6. Meeting Notes Oct 23

Now, note the difference in the following file names:

  1. 20130503_DOEProject_DesignDocument_Smith_v2-01.docx
  2. 20130709_DOEProject_MasterData_Jones_v1-00.xlsx
  3. 20130825_DOEProject_Ex1Test1_Data_Gonzalez_v3-03.xlsx
  4. 20130825_DOEProject_Ex1Test1_Documentation_Gonzalez_v3-03.xlsx
  5. 20131002_DOEProject_Ex1Test2_Data_Gonzalez_v1-01.xlsx
  6. 20141023_DOEProject_ProjectMeetingNotes_Kramer_v1-00.docx

 

Another good idea is to include in the directory a readme.txt file that explains your naming format along with any abbreviations or codes you have used.

 

If you don’t already have a folder structure and/or file naming plan, come up with one today and start using it going forward. Some good practices are described below.

  • Be Clear, Concise, Consistent, and Correct
  • Make it meaningful (to you and anyone else who is working on the project)
  • Provide context so it will still be a unique file and people will be able to recognize what it is if moved to another location.
  • For sequential numbering, use leading zeros.
    • For example, a sequence of 1-10 should be numbered 01-10; a sequence of 1-100 should be numbered 001-010-100.
  • Do not use special characters: & , * % # ; * ( ) ! @$ ^ ~ ‘ { } [ ] ? < >
    • Some people like to use a dash ( – ) to separate words
    • Others like to separate words by capitalizing the first letter of each (e.g., DST_FileNamingScheme_20151216)
  • Dates should be formatted like this: YYYYMMDD (e.g., 20150209)
    • Put dates at the beginning or the end of your files, not in the middle, to make it easy to sort files by name
      • OK: DST_FileNamingScheme_20151216
      • OK: 20151216_DST_FileNamingScheme
      • AVOID: DST_20151216_FileNamingScheme
    • Use only one period and before the file extension (e.g., name_paper.doc NOT name.paper.doc OR name_paper..doc)

Follow Love Your DATA week on Twitter at #LYD16.

References:

  1. Data Management for Undergraduate Researchers: File Naming Conventions: http://guides.lib.purdue.edu/c.php?g=353013&p=2378293
  2. It’s the 21st Century – Do you know where your data is? https://loveyourdata.wordpress.com/tuesday/