• Home
  • The Project
  • News
  • Research
    • Introductory Online Reading Resources
    • The Legacy of Mr George W. Broome
    • Andaman Islanders
    • Talking Machines Wax Cylinders and Earlier Recordings
    • The CLPGS Lectures
    • Soundscapes and Ecoacoustics Texts
    • An Introduction to Both Experimental and Ambient Music
    • Machine Learning Speech-Text Model
  • Cylinder Archive
  • Black Europe
  • Collections 1
    • Theories of African Music: A Collection
    • Northcote Thomas West African Wax Cylinder Collection
    • Songs of Nandom, Ghanaian Dagaare
    • Senegalese British Library Collection
    • The Dinkan Songs, from South Sudan
    • !Kung San Southern African Collections
    • The Eastern African Music, Collections
    • Saharawi, Western Sahara Collections
    • Zambian Collection
    • Decca: West Africa
    • Black Swan Records
    • The Classical Black Composers & Black Swans: Mid Century
    • Black Patti Records Paramount Records
    • Fisk Jubilee Singers
    • Tuskegee Institute Singers: Collections
  • Collections 2
    • Early Blues and Jazz
    • Haitian: Collections Alan Lomax Record
    • Recordings of Black Orators: Collections
    • The Arabic Cylinder
    • Ethnographic Music
    • Americas Collection
    • The Inuit Collection
    • Oceania & Australia
    • Andaman Islanders
    • Nepalese Collection
    • Tribal India / Nepal
    • The Tibetan Bonpos
    • Borneo: Collections
    • Philippines the Aeta
    • Micronesian: Music in Japan and Taiwan
    • Wax Cylinder Music of Korea: Collection
    • Music of China Tuva Mongolia & Siberia
    • South Slavic: Music
    • Global/World Music
  • Ambient Music
  • Radio
    • Crystal Radio
    • The WebSDR
  • Screen Test
    • Regular Monitor
    • *Larger Monitor
  • Restoration
  • Team
  • Contact
  • 🔍Search



The Usage of Machine Learning to Restore Speech

and Language from Wax Cylinder and Early Disc

Formats for Language Revitalization




'Every 40 days a language dies and around 90% of all languages could disappear in the next 100 years.'




The Ambientscape Machine Learning Speech-to-Text Model in 2025



In our 4th Principle on 'Usage of Artificial Intelligence Principles and Framework' we highlighted the importance of transparency when using AI technology.


The Ambientscape Project is currently training a structured Machine Learning prediction model that converts spoken audio from wax cylinders and early disc recordings transcribed to text format using an open source speech model based on the 'Wav2vec 2.0 encoder.' This particular encoder was first released by Facebook where it was trained using a self supervised objective of 60,000+ hours of read audio books from the LibriVox Project speech to text transcription training language.



Aim


Through our usage of Wav2vec 2.0 we aim to expand and fine tune the model to where it could assist in helping to restore historical spoken audio by interpreting difficult to decipher speech from record formats such as wax cylinders, transcription discs and 78s particularly for cylinders containing endangered languages and to predict more accurate translated speech printed to text from digitised audio data.



The Encoder


The Ambientscape Project uses an open source ASR (Automatic Speech Recognition) variation of the Wav2vec 2.0 base encoder adapted from 'Wav2vec2-large-robust-ft-libri-960h,' the paper is here, this allows us to tailor the code accordingly and to further develop the training using more appropriate specific datasets and training data with better output. The core of Wav2vec 2.0 is the transformer encoder which takes its input (the latent feature vector) and processes it through a workflow algorithm.



Wav2Vec (Technical Definition)


Wav2Vec is a framework for self-supervised learning of representations from raw audio data. The Wav2vec model is an encoder that converts audio features into the sequence of probability distribution (in negative log-likelihood) over labels.



Choosing the Encoder


The reason for using Wav2vec 2.0 is simple, we felt it had the right balance of accuracy and speed to work best with our equipment.



Challenges


Encoding Audio from old records can present challenges which involve audio based differences that the current model is not accustomed to hence why the training is so important, this includes limited dynamic range, noise, fragmented audio, speed variations, erratic pitch change. Some of these differences can be altered in the editing process however this can be time consuming and will always be limited by the recording itself. It makes more sense to train the model to understand the wax cylinder and disc format by training it with the appropriate suitable audio data.



What about Deep Learning?


It could be argued that Deep Learning techniques would be preferable for our training methods than Machine Learning because early audio speech samples may be more aligned with unstructured data where neural networking (using interconnecting nodes in a structure resembling the human brain) could potentially be a better process for more precise results, this may be something we incorporate at a later date.



Future Prospect


The methods employed in this Machine Learning training should enable a type of restoration process similar to 'morphology' (how words relate to each other within language and the principles in which the words are formed) so that specific information about the speech from the recording can be anticipated through ML inference with the outcome easily classified using text. This would enhance the preservation of language to find patterns that may not be possible using linguistic documentation techniques and to prevent the decline of at risk and endangered languages.


The model will be tested with recordings from the Ambientscape Archive.



Equipment used for Machine Learning


Computer: Recycled 24 Core Xeon workstation and up to 512GB of system Ram with an Nvidia 3090 GPU 24GB Ram and 328 Tensor Cores.


Software: Windows 10 running Pytorch through the Anaconda A.I. Operating System.



Environmental Responsibility


'Large data centers produce 1% of the World's greenhouse emissions'

-International Energy Agency



M.L. and deep learning is known to consume a high level of energy, we do calculations to determine the lowest configuration needed to reduce core count, memory and GPU usage depending on the training task at hand

and use only 1 computer which is all that is required, this limits power usage and energy dissipation. We also utilise renewable energy sources through solar panels and reconditioned recycled batteries to offset

energy requirements.




Related Links:



The Readings


Embracing Artificial Intelligence, for Preserving Dying Languages


Toward a Realistic Model of Speech Processing in the Brain, Pdf


Wav2vec 2.0 - Learning the Structure of Speech from Raw Audio


Wav2vec 2.0 Framework for Self-Supervised Learning Neurosys


Pytorch: Speech Recognition with Wav2Vec2: - Author Moto Hira


Revitalizing Endangered Languages: - A.I. - Powered Languages


Robust Wav2vec 2.0: Analyzing Domain Shift Pre-Training Paper


Wav2vec 2.0 on Speaker Verification and Language Identification


Machine Learning Approaches to Historic Music Preservation Pdf



Source Codes


CoEDL/Elpis - Software for Creating Speech Recognition Models


Wav2vec 2.0 Github Source Programming Code Model for Usage



ML Framework


Pytorch An Open Source Machine Learning Software Framework



A.I. Laboratory


Alan Turing Institute: - Data Science Institute at the British Library


Google AI - Artificial Intelligence Company, and Research Facility


Meta AI - An Artificial Intelligence Academic Research Laboratory



A.I. Translators


An Automatic Te Reo Māori Transcription Tool - for Audio & Video


OB Translate: Nigerian MT/AI Assistance Platform for Languages


Google Woolaroo: Preserving Languages with Machine Learning



Ethical Guidelines


Ambientscape: - Usage of Artificial Intelligence the Ten Principles


Collection of Four Ethical Guidelines on Artificial Intelligence, Pdf


Understanding of Artificial Intelligence Ethics and Safety: - gov.uk


Accelerating Revitalisation of Te Reo Maori Webinar: AI for Good



Ambientscape.com




Registered Company: 14287782