Responsible Surveys With a Vulnerable Userbase

Tom White
4 min readJan 5, 2021

In my Junior year of college, I completed what is know as the MQP. The Major Qualifying Project. This project is typically done throughout a WPI student’s senior year as it is required to show a mastery of subject matter in the field of the pursued degree (to a bachelors level that is). You spend a good portion of the year doing the project, and then some more time presenting it. This comes in the form of a very long (mine was 100+pages with 4 other team members) and a poster presentation where professors judge the project. Suffice to say, the MQP is kind of a big deal.

As a Computer Science student, I, along with 4 others, was tasked with making an Android app that did two things.

  1. Provide a support network and tool for a specific group of vulnerable people.
  2. Provide the project sponsor with a way to collect data to see if their program was helping these people.

Though we wanted to make sure the product fulfilled both of these goals, we were more concerned with the first one. We wanted to make features that helped these people. As we were nearing the project conclusion, we saw that we developed a number of features that we felt succeeded in this, but hadn’t spent significant time on the second point. Though we had our own bent, the project was sponsored, and we wanted to ensure that there was value returned to the organization. In comes the main focus of this article stage right.

Two Sides of a Survey

Imagine you are a user of an app that likes to survey its users on their weight loss. You get some sort of reward for completing these surveys like coupons for protein powder, gym memberships, etc., something that would interest the app’s userbase. Though you don’t mind giving this information, it does seem a bit personal and you’re not sure that you like a company tracking your fitness journey.

Now imagine you are the app developers. You need people’s data to show that your exercise methodology actually helps its users lose weight. There are a number of things you want to do to accomplish this. You want to randomize the order of multiple choice responses and questions to remove any bias, but also want to be able to compare results across the userbase. You especially want to avoid giving the same user the same survey twice. So here’s the issue: How do you give the user’s anonymity to satisfy their concerns, while also avoiding annoying your users by asking them the same questions every time the log in?

Believe it or not, the answer is your database schema.

The ERD is Key

ERD for survey feature design

As the systems and security engineer, I was responsible for both the database and thinking of what threats we were willing to accept. We did not want the data collection to be able to be used in a way that could harm an individual, and after a number of iterations I was able to come up with this.

Surveys can be created with prebuilt questions, which can have have a number of question choices. Because these are all separate, they can be randomized. A user can be presented with and complete a survey which is marked down in the Taken table. That user’s survey response is then stored in the SurveyResult table.

Why is this so cool you ask? Well here are some highlights:

  1. After a second user responds to a given survey, the database can no longer track what a User’s response is.
  2. Because the survey is a specific combination of questions, we can track with users have taken which surveys. This eliminates the chance of presenting a survey to a user twice.
  3. The separation of survey questions and choices allows for randomness to remove a source of bias in responses.
  4. There is effectively no loss in data granularity without having to track how each user responds.

I really want to emphasize that last point. An app that uses this design can truly say that their user’s data is only used in aggregate. They can’t tell who responded with which answers.

Conclusions

For our team, this was an amazing discovery. As I said before, we were mostly concerned with being able to provide these people with resource. The fact that we could provide the sponsor with all the data they could possibly want sans-user with this was a huge personal highlight that I was happy to have contributed to the project as a whole.

--

--