Data entry tools for academia

In academia, technology choices often follow the path of least resistance. When a project gets greenlit, the principal investigator typically reaches for whatever they know best: Google Forms for surveys, Sheets for curation, and Drive for storage. If data needs to be collected offline, Microsoft Access becomes the default choice. While these tools work initially, they rarely scale gracefully.

The challenge of picking the right tool in the long term is a mix of technical factor and resource constraints. Academic environments often lack dedicated IT support, budgets are tight, and the primary users are researchers, not developers. Conversely, compliance and audit requirements overlap with enterprise requirements without the enterprise budget. This creates a unique set of constraints that enterprise solutions don't address well especially in terms of privacy.

Criteria for ideal data entry tools

Through our experience across multiple projects, we've identified the essential criteria for academic data entry tools. These criteria are based on our experience and the needs of our research projects.

Evaluating the landscape

Here is a list of tools that I've used in the past and how they compare to our criteria.

  1. Google Forms: Has the lowest friction if your organization uses Google Workspace. If PII is not a factor, it's a good option for simple surveys and basic data collection.
  2. Microsoft Access: Although discontinued and unsupported by Microsoft, it is still possibly the best candidate for data collection in offline or air-gapped environments. Researchers are typically familiar with setting it up however, it doesn't scale well for multi-user environments, especially when forms need to be updated or backed up frequently.
  3. KoboToolbox: A set of data collection tools that came out of Harvard. It is the most polished of all existing options. While they offer a generous free tier, it can become expensive quickly if you need to use attachments. Everything works in a browser, and data can be recorded offline and synced in the background later. They also provide a robust API for integration with other tools, as well as the source code for self-hosting.
  4. Directus: An open-source data platform that provides a database GUI with fine-grained permissions, audit trails, and form building capabilities. Discussed in more detail below.
  5. Epicollect: A mobile and web-based data collection platform developed by the University of Oxford, designed for field research with strong offline capabilities. They also have a robust API for integration with other tools and authentication with Google.
  6. REDCap: Developed by Vanderbilt University, REDCap is a secure web application for building and managing online surveys and databases, widely used in academic research. It is the gold standard for academic research, especially when it comes to HIPAA compliance. Their license terms require all IT support to come from within your own organization, which can be a concern for some research projects.
Tool Scale Setup Priv Auth Ext Integ Comp Avail Attach Offline Mobile
Google Forms 3 3 1 3 3 1 1 2 3 1 2
Microsoft Access 1 2 3 3 3 2 1 1 1 3 1
KoboToolbox 3 3 2 1 3 3 2 2 3 3 2
Directus 3 2 3 3 3 3 3 3 3 1 3
Epicollect 3 3 2 3 3 3 2 2 3 3 3
REDCap 3 2 3 3 2 3 3 3 3 3 3

Rating Scale: 1 = Not suitable, 2 = Suitable in some circumstances, 3 = Suitable

The Verdict: REDCap emerges as the gold standard, but it's often unavailable outside established academic institutions, particularly outside the US. This availability gap led us to explore alternatives. If offline data collection is not a requirement then the next best option is Directus.

The Search for a Database GUI

My ideal solution would function like an Object Relational Mapping (ORM) with a graphical interface—something that would let non-technical users create forms and manage data without writing code. Something like Django admin but with the ability to extend it without code.

Info

ORM is a design pattern that provides a way to map database tables to objects in a programming language. More crucially, it abstracts away the database management component and allows for programmers with limited database experience to create, manage and query databases without having to write SQL. A secondary benefit being that the database schema can be versioned and managed through code, making it easier to track changes through git. It is my preferred way of managing databases.

This led me down the "no-code" builder path, where I discovered tools like Retool and similar platforms. These builders promised everything but delivered frustration. They required substantial JavaScript knowledge for anything beyond basic functionality and manipulated databases in ways that made external data entry problematic. These tools were built for developers who wanted to prototype quickly, not for researchers who needed sustainable data management.

Shifting my search terms from "no-code builders" to "database management GUIs" led me to database CI/CD tools like Bytebase. These were very promising and allowed for robust data governance and compliance with the administrator having the ability to enforce audit trails and version control. However, the tool was designed for database administrators and some of the security features were behind a paywall.

Ultimately, after tweaking the search terms some more, I found Directus, a tool that positioned itself as a headless Content Management System (CMS) but proved to be exactly what we needed for research data management.

Directus as a Database Management Tool

Directus introduces itself as a tool to "Create, manage, and scale headless content," but beneath that marketing speak lies a powerful data management platform. Out of the box, it provides:

These features address every pain point we'd experienced with previous tools. The data model GUI allows non-technical users to create complex relationships between tables, while the built-in audit system tracks every change with timestamp and user attribution. Each data model also generates forms for data entry that can be customized to the needs of the project.

Options for adding new fields to a model. Adding a new field creates a form for data entry as well as adding a corresponding column to the database.

Real-World Implementation: THLHP

The Tsimane Health and Life History Project provided the perfect test case for our new approach. This longitudinal biomedical study, running since 2001, focuses on understanding healthy aging among the Tsimane indigenous group in Bolivia's lowlands. The project has collected extensive demographic, genetic, biochemical, and medical imaging data across over two decades.

Video

BBC Documentary: The Tsimane Tribe

Learn about the Tsimane indigenous group and THLHP studying their health and aging patterns in Bolivia's lowlands.

Multi-Language Challenge

Our data collection happens in Tsimane and Spanish in the field, while analysis occurs in English. Directus's translation capabilities work at both the data model level and content level, allowing us to maintain consistency across languages without data duplication.

File Management Revolution

Previously, we organized binary files using directory structures. This system worked with small teams but became unwieldy as the project grew. Different users wanted different organizational schemes, and finding specific files became a time-consuming treasure hunt.

Directus transformed our approach by storing file metadata in the database while maintaining the files themselves in organized storage. Each uploaded file gets a unique identifier, preserving the original filename in the database for easy searching and organization. This also enabled us to write database queries to find specific files based on information in other related database tables.

file repository
Distribution of file categories within our repository, showing the scale and variety of data we now manage systematically.

Directus as a LIMS

Managing biosamples requires implementing FAIR principles (Findable, Accessible, Interoperable, Reusable),something nearly impossible with traditional file systems. Purpose-built Laboratory Information Management Systems (LIMS) exist but are expensive or require dedicated IT support.

Directus's comprehensive audit trail, version tracking, and soft delete capabilities make it an effective LIMS alternative. Every sample's journey through our system is documented, traceable, and recoverable.

Built-in audit trail functionality tracks all database changes with the ability to restore to any previous version, similar to point-in-time recovery at the row level. The audit trail tracks current values, previous values, user attribution and timestamps which is critical for maintaining data integrity in research.
crud permissions
Create, Read, Update, and Delete (CRUD) permissions allow for fine-grained control over who can access and modify data in the database. Can be programmed at an individual user level or at group level.

The Path Forward

For most academic projects, I'd recommend this decision tree:

The key insight from our experience is that data entry tools aren't just about collecting information but about creating sustainable systems that support research over time. The tool you choose today will either enable or constrain your research for years to come especially in downstream analysis.