Why Saying 'Um' Too Often Secretly Reveals Your Dementia Risk Today

Listen closely to a casual conversation, and you will hear a hidden symphony of hesitation. The rhythm of everyday human communication is entirely reliant on the fractional seconds where we stop, stretch a syllable, or deploy a filler word. For decades, linguists and speech coaches have treated utterances like "um" and "uh" as mere verbal debris—bad habits to be trained away in public speaking seminars. But clinical neurologists and artificial intelligence researchers are now mining these exact micro-hesitations for medical data, and their findings are rewriting the protocols for early cognitive screening.

This year, a cascade of clinical data has transformed the humble "um" from a conversational filler into a potent digital biomarker. A sweeping joint study released by Baycrest’s Rotman Research Institute, the University of Toronto, and York University has provided definitive evidence that the specific timing of everyday speech—specifically the frequency and duration of pauses and filler words—can predict a person's risk of cognitive decline with startling accuracy.

The research establishes that how fast you talk, and specifically how you navigate the empty spaces between your thoughts, directly reflects the integrity of your brain’s executive function. By running hundreds of recorded clinical conversations through advanced natural language processing algorithms, scientists have discovered that the intersection of speech patterns and dementia is measurable years, perhaps even a full decade, before memory loss becomes apparent to family members.

"The message is clear: speech timing is more than just a matter of style, it's a sensitive indicator of brain health," says Dr. Jed Meltzer, Senior Scientist at Baycrest and senior author of the study. By weaponizing machine learning to track these previously imperceptible vocal shifts, the medical community is moving beyond traditional pen-and-paper assessments. They are tracking the disease exactly where it hides: in plain text, spoken aloud.

The Pathology of a Pause

To understand why a simple "um" acts as a window into deteriorating brain architecture, we have to look at the immense computational effort required to speak a single sentence. Spoken language is arguably the most complex motor and cognitive function a human being performs. It requires the brain to retrieve semantic meaning from memory, apply grammatical syntax, map the phonetic structure, and coordinate the lungs, vocal cords, tongue, and lips—all within milliseconds.

When a healthy brain encounters a slight delay in word retrieval—perhaps trying to recall a specific name or complex noun—it deploys a filler word like "um" or "uh" to hold the floor in a conversation. Linguists refer to this as a filled pause. It signals to the listener: I am still thinking; do not interrupt.

However, as the earliest stages of Mild Cognitive Impairment (MCI) or Alzheimer's disease set in, the brain's executive function begins to fray. Executive function is the mental command center responsible for planning, organizing, flexible thinking, and working memory. When neurodegeneration attacks the semantic networks or the prefrontal cortex, the cognitive load required to construct a sentence spikes dramatically. The brain has to work significantly harder to locate words and stitch them together.

In this state of elevated cognitive friction, the "um" ceases to be a mere conversational placeholder. It becomes a symptom of a neural misfire. Dementia patients, even in the prodromal (pre-symptomatic) stages, rely on these verbal fillers as a desperate coping mechanism. They deploy them to buy extra time as their brains sift through a deteriorating neural database. The frequency of the fillers increases, the duration of the pauses stretches by microscopic fractions of a second, and the subsequent vocabulary becomes noticeably simpler.

Following the Evidence Trail: Tau Tangles and Timing

The link between vocal hesitation and cognitive decline did not emerge overnight. It is the result of a steady drip of neuroimaging data colliding with the explosive growth of artificial intelligence.

In 2024, researchers at Stanford University initiated a critical phase of this investigation by matching voice recordings with brain scans. They evaluated 237 cognitively unimpaired adults, analyzing their speech against neuroimaging records. The results were unambiguous: individuals who exhibited longer pauses between sentences and a slower overall speech rate had significantly higher levels of tangled tau proteins in their brains. Tau tangles, alongside amyloid plaques, are the classic biological hallmarks of Alzheimer's disease.

This was a critical revelation. The Stanford study proved that the mechanical breakdown of speech was not just a side effect of late-stage confusion; it was directly correlated to the physical accumulation of toxic proteins in the brain, occurring long before the patients exhibited overt clinical symptoms of memory loss.

Sterling Johnson, a researcher at the University of Wisconsin-Madison, led a similarly massive longitudinal study observing patients over two-year intervals. Participants were asked to describe a specific picture. When researchers analyzed the audio recordings of those who later developed early-stage cognitive impairment, they found the patients had declined sharply in verbal fluency. They relied heavily on pronouns rather than specific nouns (saying "it" or "they" instead of "the car" or "the children"), spoke in shorter sentences, and required far more time to formulate their ideas.

"In normal aging, it's something that may come back to you later, and it's not going to disrupt the whole conversation," explains Kimberly Mueller, another study leader involved in the Wisconsin research. "The difference here is, it is more frequent in a short period."

The historical precedent for this phenomenon is well-documented, if only in hindsight. Linguists famously analyzed the unscripted press conferences of President Ronald Reagan during his time in office. Decades before his formal Alzheimer's diagnosis, Reagan’s speech patterns shifted. His use of filler words increased, his unique vocabulary shrank, and he began to substitute specific nouns with generalized terms. What was once dismissed by political pundits as aging or fatigue was actually the early, audible footprint of cortical decay. Today, modern researchers have the algorithmic tools to detect that footprint in real-time.

The Algorithms Listening In

Human ears are notoriously bad at detecting the onset of cognitive decline. We naturally accommodate the conversational hesitations of our aging parents or spouses, subconsciously filling in the blanks when they struggle to find a word. We write off their increased use of "um" or their slower talking speed as normal fatigue.

Artificial intelligence does not accommodate; it measures.

Over the past two years, the deployment of machine learning in vocal analysis has turned everyday speech into a highly precise medical diagnostic tool. Researchers utilize specific AI architectures, such as Artificial Neural Networks (ANN) and Bidirectional Gated Recurrent Units, to process audio files. These models currently predict a dementia diagnosis with accuracy rates hovering between 78 and 81 percent based purely on vocal input.

When an AI system analyzes an audio file for cognitive decline, it dissects the recording into two distinct categories: acoustic features and linguistic features.

Acoustic features involve the physical properties of the sound. The AI calculates the mel-frequency cepstral coefficients, pitch mean, and spectral centroid. It measures articulation clarity and exactly how many milliseconds elapse during a silent pause. It charts the exact speaking rate—the number of phonemes produced per second—and tracks the pairwise variability index, which monitors the durational variability in successive syllables.

Linguistic features, by contrast, focus on the content and structure of what is actually being said. The algorithm evaluates semantic density (how much actual information is conveyed in a sentence), syntactic complexity, and lexical repetition (how often the speaker loops back to the exact same word).

One of the most prominent models to emerge in recent months is WhisperD, a finely tuned iteration of existing speech recognition technology designed specifically to identify the markers of dementia. WhisperD excels precisely because it has been trained to isolate and categorize filler words. While commercial dictation software is designed to clean up transcripts by actively deleting "ums" and "uhs," clinical AI models hunt for them. By mapping the exact coordinates of every hesitation, WhisperD can detect the micro-symptoms of disorientation that occur when a patient is experiencing invisible cognitive strain during a simple conversation.

The "Post-Pause" Revelation

As algorithms grew more sophisticated, researchers began looking beyond the pause itself and started examining the words that immediately followed it. This has opened an entirely new front in the study of speech patterns and dementia.

Dr. Michael Kleiman, alongside Dr. James Galvin at the University of Miami Miller School of Medicine, recently published groundbreaking findings on what they term "post-pause speech patterns." Their research revealed that the true indicator of Mild Cognitive Impairment is not simply that a person stops to say "um," but rather the cognitive compromise that happens right after the hesitation.

"People with mild cognitive impairment show subtle changes in their speech, such as using simpler words after pauses and taking longer to resume speaking, especially during demanding tasks like storytelling," Dr. Kleiman observed.

In a cognitively healthy adult, a filler word is often used as a springboard to retrieve a complex, highly specific word. You might say, "I left my keys on the, um, credenza." The brain pauses to search the filing cabinet, finds the specific label, and delivers it.

In a patient developing Alzheimer's or MCI, the neural filing cabinet is jammed. The brain searches for "credenza," cannot locate the semantic pathway, and eventually settles for the easiest, most frequent alternative. The patient will instead say, "I left my keys on the, um, table," or worse, "the, um, thing."

Kleiman's models prove that algorithms can measure this specific degradation. By tracking the latency (the exact time it takes to resume speaking) and analyzing the complexity of the vocabulary immediately following the pause, the AI can successfully separate healthy aging individuals from those developing early MCI. This level of precision is virtually impossible to achieve in a standard primary care visit without technological assistance.

The Death of the Pen-and-Paper Test

The medical community’s aggressive push toward speech biomarkers is driven by a deep dissatisfaction with the current standard of care. For decades, the frontline defense against Alzheimer's has relied on subjective reports from family members and traditional cognitive assessments like the Montreal Cognitive Assessment (MoCA) or the Mini-Mental State Examination (MMSE).

These tests require a patient to sit in a sterile clinical environment, draw a clock face, repeat a sequence of words, and identify line drawings of animals. While they are verified clinical tools, they harbor massive blind spots. They are deeply susceptible to "practice effects." If a patient takes the MoCA multiple times over a few years, they naturally memorize the mechanics of the test, artificially inflating their score and masking their actual cognitive decline.

Furthermore, formal testing creates an environment of high anxiety, which can artificially suppress the performance of a cognitively healthy senior. They are also incredibly time-consuming for neurologists to administer, meaning they are deployed sparingly, usually only after severe symptoms have already prompted a doctor's visit.

"Executive functions decline with age and are often compromised early in dementia, but they are hard to track with traditional testing, which is time-consuming and vulnerable to practice effects," notes the team at Baycrest.

Everyday speech, by contrast, is known as an "ecologically valid" measure. It happens naturally. It requires no imposed time limits, induces no test anxiety, and cannot be cheated through practice. You cannot "study" for a casual conversation. Because speech can be captured unobtrusively and at scale, it provides a real-time, continuous monitor of processing speed and brain integrity.

When researchers combine natural speech analysis with standard clinical scores, the predictive power of the models skyrockets. A recent study demonstrated that machine learning systems integrating speech biomarkers alongside traditional clinical assessments improved overall classification performance dramatically. The speech data covers the blind spots of the paper tests, catching the nuances of function word reduction (a hallmark of Alzheimer's-related MCI) and lexical repetition (more common in Lewy Body dementia).

Language-Agnostic AI: Solving the Dialect Dilemma

One of the most persistent criticisms of early medical AI models was their lack of diversity. If a speech algorithm is trained exclusively on the audio profiles of white, upper-middle-class native English speakers from North America, it risks severely misdiagnosing marginalized communities. Different cultures use pauses, filler words, and pacing entirely differently. A southern drawl, a heavy Spanish accent, or the unique cadence of African American Vernacular English (AAVE) could trigger false positives in a poorly trained algorithm.

Researchers are actively dismantling this barrier to ensure the link between speech patterns and dementia can be measured accurately across the global population.

Dr. Kleiman’s ongoing work at the Miller School of Medicine specifically focuses on recruiting racially and ethnically diverse individuals. Supported by targeted grants, his team is feeding speech data from Spanish speakers and individuals utilizing AAVE into the machine learning models to teach the AI the difference between a cultural dialect and a cognitive deficit.

Simultaneously, computer scientists are engineering "language-agnostic" screening pipelines. In a major presentation at the 2025 IEEE 21st International Conference on Body Sensor Networks, engineers unveiled a ResNet-based binary classifier trained to detect early dementia through acoustic and linguistic features. Crucially, they tested this model on a held-out dataset of speakers communicating in languages the AI had never heard before.

The system used speaker diarization to isolate the target patient from background noise and analyzed their spectral centroid and articulation clarity. Even without understanding the actual vocabulary of the unseen language, the AI correctly identified cognitive impairment based purely on the acoustic degradation of the speech, achieving a 70 percent accuracy rate. This cross-lingual transfer capability is a massive leap forward, proving that the acoustic decay caused by neurodegeneration transcends language barriers.

The Diagnostic Tipping Point: Differentiating Dementia Types

The utility of vocal biomarkers extends far beyond simply flashing a warning light for general cognitive trouble. Advanced linguistic analysis is now being utilized to achieve "etiological stratification"—the ability to look at early symptoms and determine exactly which type of neurodegenerative disease is attacking the brain.

Not all dementias sound the same. The destruction occurs in different geographical regions of the cortex, producing distinctly different vocal signatures.

Recent literature published in the Journal of Preventive Alzheimer's Disease mapped the digital biomarkers used to separate Alzheimer’s disease from frontotemporal degeneration (FTD). Because FTD typically attacks the frontal and temporal lobes early on—areas heavily involved in language production and behavior—the resulting speech deficits are markedly different from the memory-driven hesitations of Alzheimer's.

Furthermore, data analyzing spontaneous speech has highlighted distinct contrasts between Mild Cognitive Impairment resulting from Alzheimer's (MCI-AD), Parkinson's disease (PD-MCI), and Lewy Body dementia (MCI-LB).

Patients with MCI-AD tend to exhibit a drastic reduction in their use of "function words"—the grammatical glue like prepositions and conjunctions—resulting in sentences that are heavily dense with content but structurally fragmented. Conversely, patients with Parkinson's-related cognitive impairment tend to speak in much shorter overall sentences, using fewer coordinating conjunctions, accompanied by significantly longer silent pauses. Meanwhile, those facing early Lewy Body dementia exhibit far higher rates of lexical repetition, getting stuck in verbal loops much earlier in their disease progression.

By mapping these specific linguistic typologies, doctors are gaining the ability to triage patients accurately years before a highly expensive and invasive PET scan or lumbar puncture confirms the specific amyloid or tau pathology.

The Ethical Horizon: Passive Monitoring in the Home

As the science solidifies, the delivery mechanisms for this technology are rapidly advancing toward commercial and clinical availability. The question is no longer if AI can detect dementia through our voices, but how it will be implemented in our daily lives.

Tech developers are currently integrating these diagnostic algorithms into digital platforms. Systems like digitalhumanOS are powering applications like GIA (an FDA-registered screening tool), which can evaluate a patient's voice in under two minutes, analyzing over 2,500 biomarkers to screen for subtle changes linked to Alzheimer's.

The long-term vision of the medical community is passive monitoring. Instead of visiting a neurologist once a year for a stressful memory test, elderly adults could consent to have their speech analyzed continuously in the background of their daily lives.

A new cross-sectional study protocol is already exploring multimodal screening in home healthcare environments. Researchers are utilizing automated speech analysis to capture patient-nurse conversations directly in the home. By piping the audio through Large Language Models (LLMs) and OpenSMILE feature extractors, the system tracks frequency, instability, and speech rate continuously.

The logical extension of this research leads directly to the smart devices already sitting on our kitchen counters. The microphones in our smartphones, tablets, and smart speakers are perfectly positioned to act as early-warning medical sensors. If an elderly individual opts into a health-tracking protocol, their virtual assistant could passively monitor their morning requests for the weather or the news. Over a period of five years, the AI would quietly track the widening gaps between their words, the increased reliance on "um," and the shrinking circumference of their vocabulary. The moment the algorithm detects a statistically significant deviation aligning with the speech patterns and dementia, it could trigger a notification to the user's primary care physician recommending a clinical evaluation.

This rapidly approaching reality brings profound ethical complexities. The privacy implications of allowing tech giants to monitor the acoustic degradation of our brain function are immense. There is also the psychological danger of over-medicalizing normal human aging. Every 70-year-old occasionally forgets a word; the fear is that an overly sensitive AI might trigger widespread panic over a harmless "um," flooding neurology clinics with terrified, cognitively healthy seniors.

Clinical thresholds must be perfectly calibrated to ensure algorithms flag legitimate neurodegenerative progression without weaponizing normal age-related slowing. Researchers emphasize that these tools are screeners, not definitive diagnosticians. They are the smoke detectors, not the fire marshals.

What to Watch for Next

The velocity of this research suggests a massive shift in neurological care protocols within the next 24 to 36 months.

We are currently watching the transition from clinical trials to clinical deployment. Over the coming year, expect to see the integration of speech-based cognitive assessments into standard Medicare wellness visits, administered simply via a tablet in the waiting room while the patient describes a picture of a crowded kitchen or a dog stealing a cookie.

Keep a close eye on the ongoing longitudinal studies at Baycrest and the University of Wisconsin-Madison as they follow their current cohorts into 2027. These multi-year studies are essential for drawing the definitive baseline between healthy, normal brain aging and the early onset of disease.

Furthermore, watch for major regulatory actions from the FDA regarding software as a medical device (SaMD). As companies push to get their proprietary voice-analysis algorithms cleared for diagnostic use rather than just screening, the regulatory scrutiny over data privacy, demographic fairness, and dialect inclusion will intensify.

The most profound realization hidden inside this data is the sheer resilience of the human brain. The increased deployment of filler words, the elongated pauses, the simplification of syntax—these are not just signs of a system failing. They are signs of a system fighting back. They represent the brain's desperate, real-time attempt to maintain human connection in the face of structural collapse. By finally teaching our machines to listen closely to the silence between our words, we are granting ourselves the most valuable resource in the fight against neurodegeneration: time.