UX RESEArCH for A.I. Systems (MULTIPLE PROJECTS)

During my PhD in Human-Computer Interaction, I developed and evaluated several AI-powered systems, focusing on optimizing their effectiveness and usability. These projects explored innovative applications of AI in diverse contexts, from enhancing user interactions to improving sensing capabilities, novel hardware and data interpretation. The following projects showcase my work at the intersection of AI and UX. 


Eyes on the Road: Detecting Phone Usage by
Drivers Using On-Device Cameras

PROBLEM:

Distracted driving caused by smartphone use is a critical public safety issue, increasing the likelihood of accidents by 400%. Despite widespread awareness of the dangers, drivers continue to interact with their phones for activities like texting, navigation, and media control, leading to thousands of preventable injuries and fatalities each year. Existing solutions, such as blocking app functionality while driving, are ineffective because they rely on user input to determine if the phone is being used by the driver or a passenger. These methods are easily bypassed and often disrupt essential functions, such as navigation, further complicating the issue. There is an urgent need for an intuitive, scalable solution that can accurately distinguish between driver and passenger phone use, enabling smartphones to intelligently manage distractions and enhance road safety.


OUTPUT:

SOLUTION:
Our solution is a software-only approach that utilizes the smartphone’s built-in camera to differentiate between driver and passenger phone use in real-time. By analyzing the unique perspectives and geometric patterns of the car’s interior, the system can accurately identify the user’s role without requiring any additional hardware or user input. This enables the phone to automatically adapt its functionality based on the user’s context, such as limiting notifications and access to distracting apps when the driver is detected. With an accuracy rate exceeding 90%, this approach offers a practical and scalable way to mitigate distracted driving, ensuring safer roads while preserving essential functionalities like navigation and hands-free communication.

Figure 1. Lines detected in the photo captured by the phone when docked on the windshield at (a) passenger's right; (b) driver's left; and (c) driver's right side. The lines capture the perspective of the geometry of objects inside a car from different viewpoints.

EVALUATION METHODS

Evaluating the unique value proposition of this AI-powered software required a meticulously designed study to ensure both safety and ecological validity. Given the potential risks associated with simulating driving scenarios, it was essential to create a controlled environment that accurately reflected real-world conditions without compromising participant safety. The study was carefully structured to capture the system’s capabilities while minimizing any potential harm. The procedure is outlined below.

STUDY DESIGN:

In our data collection procedure, we have two variables:

In all conditions, the video was recorded at 30 frames per second with a resolution of 720p. The field of view of the camera is approximately 75 degrees.

DOCKED PHONE

For the two docked conditions, we collected the data in 10 different cars. We placed the phone in 6 different positions in the car, 3 each on the shield [Phone 1-3] and the vent [Phone 4-6] as shown in Figure 2.

When the phones were docked, the users did not need to interact with the phones. Thus, we did not recruit external participants for this part of the study. The members of the research team drove the cars in an urban area to collect the data. We chose this approach primarily because of the safety concerns around recording videos in a moving car. We recorded videos (avg. length=3.5mins.) from both the front and the back camera.

PHONE IN HAND

When the phone is held in the hand, apart from measuring the performance in different cars, we wanted to cover different user behaviors, postures, and approaches to holding the phone while driving. Thus, we recruited 33 participants (16 male, 17 female, mean age = 26.04) and recorded data in 16 different cars. To ensure the safety of our participants, we conducted the study in a stationary car and simulated the in-hand conditions as shown in Figure 2 [phone 7-8]. We chose to conduct the study in a stationary car instead of a driving simulator to capture signals in a real setting and to capture visuals of real cars. 

When on the driver seat, the participants were asked to pretend as if they were driving and using the phone at the same time. They were encouraged to behave as they usually would while driving (eyes on the road, hands on the wheel etc. Similarly, when the participants performed the task as a passenger, they were encouraged to behave/type as they would if they were passengers in a moving car. We did not control their phone usage behavior. The participants were allowed to move the phone or place the phone anywhere they desired. In fact some of them did place it in their lap, or the center console. This freedom allows us to capture more realistic data of phone usage in the car, instead of relying on predetermined positions chosen by us. The phone orientation was also not controlled, but all participants used the device in portrait mode while driving. 

TASKS:

The participants completed two everyday tasks on their phone: (1) responding to text messages; and (2) changing music. These are the two most common tasks a person performs in their car that require continuous interaction. So, we used them as our study tasks to capture realistic scenarios. Both tasks were performed once as the driver and once as the passenger by the same person in their car. For the duration of the study, we recorded videos (avg length = 2.5 mins) from both the front and the back camera. These videos were recorded using an off-the-shelf app that allows the phone to capture video while running in the background. This approach allowed the users to focus on their task and not get distracted by the video recording.

We evaluated the efficacy of the solution across users as well as cars. This ensures that the product was usable across a wide array of user behaviors and generalizable across different car make/models.


DATA ANALYSIS SUMMARY:
Our solution demonstrated high accuracy in differentiating between driver and passenger phone use across various testing conditions:

Overall Accuracy:

Testing Scenarios:

Robustness:

Low Computational Overhead:

These results validate the effectiveness of our approach in detecting driver phone use, providing a reliable and scalable method for reducing distracted driving incidents and enhancing road safety.


A video summary of the project and the final product can be seen below: 

GymCam: Detecting, Recognizing and Tracking
Simultaneous Exercises in Unconstrained Scenes


PROBLEM:

Despite the increasing popularity of fitness tracking devices, current systems face significant limitations in accurately monitoring a wide range of exercises, especially in dynamic and unconstrained environments like gyms. Wearable sensors are typically attached to a single part of the body, which restricts their ability to capture complex movements involving multiple limbs. This often leads to incomplete or inaccurate data, particularly for exercises that engage different muscle groups simultaneously. Camera-based systems, while offering a broader view of user movements, struggle with issues such as noise, occlusion, and distinguishing between similar motions performed by multiple people in close proximity.

These limitations create a gap in providing users with reliable, real-time feedback on their workouts, hindering their ability to track progress and maintain motivation. There is a pressing need for an exercise tracking system that can seamlessly and accurately monitor a diverse range of activities, account for the complexities of multi-user environments, and provide high-quality feedback without intrusive equipment or manual input. Addressing these challenges is crucial for advancing exercise tracking technologies and improving the overall fitness experience.

OUTPUT: 

SOLUTION:
We developed GymCam, an AI-powered vision system that revolutionizes exercise tracking by using off-the-shelf cameras to automatically detect, recognize, and track multiple people and exercises simultaneously in real-world gym environments. Unlike traditional wearable-based systems, GymCam captures full-body motion from a single vantage point, overcoming challenges like occlusion and noise. With the ability to accurately segment exercises from other activities, recognize exercise types with over 93% accuracy, and count repetitions to within ±1.7, GymCam provides unparalleled insight into users’ workouts. This innovative approach eliminates the need for cumbersome sensors and manual tracking, offering a seamless, user-friendly solution that transforms how we monitor and optimize fitness routines in complex, multi-user settings.

Figure 1. GymCam uses a camera to track exercises. (Top) Optical flow tracking motion trajectories of various points in the gym. Green showcases points classified as exercises and red showcases non-exercise points. (Bottom Left) Individual exercise points are clustered based on similarity to combine points belonging to the same exercise. (Bottom Right) For each exercise (cluster). GymCam infers the type of exercise and calculates the repetition count.

EVALUATION methods / STUDY DESIGN

To rigorously evaluate GymCam’s effectiveness in detecting, recognizing, and tracking exercises, we designed a comprehensive study conducted in an authentic gym environment. Our primary goal was to ensure ecological validity while collecting high-quality data to train and test our AI algorithms. The study was structured to capture diverse user behaviors and exercise types without disrupting participants’ natural routines.

Setting:

Participants:

Data Collection:

Evaluation Metrics:

RESULTS

GymCam demonstrated strong performance across all key metrics, validating its capability to accurately track exercises in a real-world gym setting:

Exercise Detection:

Exercise Recognition:

Repetition Counting:

Scalability and Robustness:

These results underscore GymCam’s potential to revolutionize exercise tracking by providing a seamless, accurate, and user-friendly solution that enhances both user experience and fitness outcomes. A summary of the work and a demo of the final product can be seen below: 

FitByte: Automatic Diet Monitoring in Unconstrained Situations
Using Multimodal Sensing on Eyeglasses

PROBLEM:

Accurately monitoring dietary habits is essential for understanding the relationship between diet and health, yet current methods are limited and cumbersome. Most diet tracking systems require manual logging, which is time-consuming and prone to inaccuracies as users often forget to record their meals or misjudge portion sizes. Wearable devices have been developed to automate diet monitoring, but they typically focus on a single aspect of food intake, such as chewing or swallowing. This narrow approach struggles to generalize across diverse food types and daily activities, especially in unconstrained environments like social gatherings or outdoor settings. As a result, these systems often fail to provide reliable data, making it challenging for individuals and healthcare professionals to track and manage dietary behavior effectively.


OUTPUT:

SOLUTION:

FitByte addresses these limitations with a novel multimodal sensing system integrated into a pair of eyeglasses, designed to monitor all phases of food intake in real-world settings. Utilizing a combination of inertial sensors, proximity sensors, and a camera, FitByte can detect chewing, swallowing, and hand-to-mouth gestures with high accuracy, even in noisy and dynamic environments. The system intelligently triggers the camera to capture visuals of the food being consumed, providing users with a detailed record of their dietary habits without manual input. By combining multiple sensing modalities, FitByte offers a comprehensive and unobtrusive solution for automatic diet monitoring, enabling more accurate and actionable insights into users’ eating behaviors.

Study Design

To evaluate the effectiveness and usability of FitByte, we designed a comprehensive study that captures dietary behaviors in both controlled and real-world, unconstrained environments. Our study was structured to ensure ecological validity while gathering robust data to train and validate the AI models powering FitByte.

STUDY STRUCTURE:


Evaluation Metrics:

Qualitative UX Research:

RESULTS:

The results from both study phases demonstrate the efficacy of FitByte in accurately detecting and monitoring dietary behaviors across a variety of settings.

Semi-Constrained Environment Study:

Unconstrained Free-Living Study:

IMPACT AND INSIGHTS:

The study’s design and results showcase the successful integration of UX research with AI system development, highlighting how user-centered design can inform and enhance the functionality of complex sensing systems. By combining qualitative insights with quantitative performance metrics, we demonstrated FitByte’s potential to transform dietary monitoring in real-world settings, offering a practical and socially acceptable solution for users seeking to track their diet effortlessly and accurately.

A summary of the project and the product demo can be seen below: