Well it just makes no sense prefiltering when with Speech-to-Text you could see where the actual words are in time and filter the noise out accordingly. That's what you can see in the YouTube web app but it doesn't let you edit the audio unfortunately. But if any mulitrack editor could have the words alongside it like YT web app does that would be excellent.
You are viewing a single comment's thread from:
just wanted to add these few links we discussed for reference:
Link: How to Use Truncate Silence and Sound Smarter with Audacity
Link: Howto Truncate Silence in Audacity
Link: Deep Learning 'ahem' detector (github project)
not sure I understand this, but exporting YouTube caption via WebVTT gaves me the individual timings of each word shown in the auto-captioning. The multitrack editing (cutting across all tracks) phase would take place BEFORE any captioning takes place, and you would have already removed any of those undesirable "artifacts".
Interestingly though you might be able to do it that way too in reverse, get combined transcript and convert into audacity labels.. http://wiki.audacityteam.org/wiki/Movie_subtitles_(*.SRT)
Even if all the artifacts aren't there, maybe it makes it easier to find some of them to cut out.
btw, also added a feature to
srt2vtt
that converts caption files to the audacity label format:outputs audacity-compatible text labels from captions file