This is to design modular software that ultimately helps to transform vocal speech into written text and vice versa. This work should provide the wholistic view of the whole system. Speech and voice sounds combine two natural processes that are sound frequency coming out from our voice box (the larynx), and the use of tongue, lips, teeth velum, pharynx etc. Using the techniques involving voice formants (F0, F1, F2, F3) the program should consider all these dimension that help differentiate phonemes and produce different meaningful words.
VOICE – TEXT SOFTWARE DESIGN
This is to design and implement a modular software system that ultimately helps to transform vocal speech into written text and vice versa. This work should provide the wholistic view of the whole system. Speech and voice sounds combine two natural processes that are sound frequency coming out from our voice box (the larynx), and the use of tongue, lips, teeth velum, pharynx etc. Using the techniques involving voice formants (F0, F1, F2, F3) the program should consider all these dimension that help differentiate phonemes and produce different meaningful words.
The system is designed to be able to switch from a language to another language. The system is to be able to implement on all known operating systems, specifically on MS Windows, Android, iOS ... Main target use of the system is on portable devices such as cell phone, tablets, and other PADs; but also on PCs and servers. A two or three tier design, with server components and user interface may be a good one. Ultimate use involves transformation of speech from an online server and display result on the end device, including ability to create a text file and print out.
The design provides rooms for different modules dealing with specific functionalities.
Functionalities of the system include:
• A module that helps recording an individual phoneme/word and correspond it to a written form in a well structured database. This module is mainly used to populate the database in different languages. Multiple ways of pronouncing a phoneme or word should be recorded, so that different voice speeches are covered.
• A module that when a word or a sentence is pronounced in a specific language, the system is able to parse it into its phonemes, search the database for patterns of phoneme sequence (words) and append its written form in a text file in a chosen keyboard/character font.
• A module to select the language of communication and related to the proper database. Whenever a new word is encountered, the system propose and help append it to the database.
• A security module that include production of access token that is used for accessing the functionalities of the system.
• An attractive user interface from which the above functionalities are accessed and link to other online tools such as chat rooms, social networks, etc.. With corresponding keyboard and alphabet.
• Other aspect can be discussed while progressing in the design.
The above technical orientations are just suggestions to let understand the intended use of the system. But we will fully rely on the expertise of a good designer, including any modification or differences. The whole design should be delivered within less than 3 months.
• User requirement definition
1. The system displays a main interface window, with all options including language selection, Services to use (recording, translation, etc..)
2. Secured access to the system
- The user selects to use the Kuita system
- The system presents the user methods and options to comply with, regarding permission levels, payment method,
- User select methods and provides required information for access
- The system allows access to user with corresponding access level
3. The user selects a language in which he wants to work amongst other existing languages in the system
4. Register a letter or a sound in a database of voice sound
- The user selects to record a new phoneme or a new word
- A virtual keyboard is displayed with the user selected language keys
- A user interface displays waiting the user to pronounce a phoneme or a word
- When the user pronounces a word or a phoneme, the system shows the corresponding written text if already recorded, or displays a window for the user to type the corresponding new text in his language
- The system records the text along with the voiced form for further use
- The system shows a new record window and let the user continue registering words or phonemes.
- The system present options for the user to continue registering or close the recording
- If the user selects to end recording, the system comes back to main screen with initial options.
5. Speech recognition
- The user select an option to write a text from his voiced speech
- The system displays a text enabled window with microphone enabled environment.
- The user pronounces the speech the system listens and writes the spoken sentences on a text form
- The user can check and correct the written text the validate during or at the end of the transaction
- The user can send the written speech in a text file
- The user closes the speech window
6. Access token provision
- Computer based user register to use the system with specific access level
- The system provide different options for registration including online, pay as you go token, etc..
- The use provides required information
- The system provide a token (software or hard copy toke ) with an access code
- The user enters the access code and validates
- The system opens functionalities and let user proceed with
- The user can register for extended access to the system
- Cell phone or other PDA based users can proceed with cell phone credit
7. Translating a text to vocal speech
- A user receiving a text in a language script clicks on button to hear the related speech
- The system runs the speech from the online database
- The user can stop or rerun the speech
- The user exits
8. Interface with online tools (emails, Whasapp, online chats, etc..)
- User launch an online tool and selects to write in his selected language
- The system provide an option for typing or dictating a speech
- The user dictates or type and the system translates in a text written in his language, with his corresponding alphabet
- The user sends his text in his language
- The system appends to the message an option to the receiver to install the corresponding language writing script on his own computer or cell phone
- The receiving user install the alphabet and read the text
9. System installation
- The system provides installation files online
- The system brooks installation information online or hard copy
- The user receives installation information including user interface, selected fonts corresponding to languages, interface with the online language database
- User starts using the system
• Functional requirements
1. The user is able to find all the important options from the interface that links to the corresponding functionalities or can find a way from that area to go by to find what he wants.
2. A new language can be introduced in the system by allowing to input individual phonemes used in that language and corresponding script / alphabet in a new database
3. The end user system should create a virtual keyboard with the user selected language keys
4. The system should provide an online server based database of speech versus related text that the end user system can query and display
5. The end used system receives speech from the mic and send to the database for correspondence
6. The server when receiving voice speech as above parses it into phonemes and words, the reconstitutes the corresponding text
7. The system is able to auto-feed the database with new words detected in the input speech and let the user confirms.
8. The sever send back the built text to the end user system
9. The end user system creates text file or append the new text to an existing text file with all necessary settings to be read in the selected language
10. The system should be able to keep different database for each language
11. The system can detect a new word or phoneme from a speech and be able to add it to the existing database of the language.
12. The system should have a maximum sensitivity to be able to distinguish even the most closest phonemes, both on the server and the end user component
13. The system should be able to create a text file from an input speech and let the user send online via social networks (Facebook, Whatsapp, etc..) with an option (a link or a button) to the other users to run the spoken version of the text
14. The system should be able to read a text and translate it into voiced speech using the reverse way of the speech listening.
15. The system should be able to allow the user to email, text, Whatsapp, Facebook etc.. using his language’s script
• Non-functional requirements
1. The system should be able to be run from a website with all required interfaces
2. The system should provide implementation of online payment and adequate security to grant access to its services
3. The system should be able to brook its service on cell phone an internet networks.
4. The system should enable the user to install all the required component of the software (language, alphabet, microphone/speaker settings, etc..) from a single point, with minimum configuration
• Other requirements
1. The system designer should present a well detailed technique of voiced phoneme and words detection and measure taking into account all the complexity and subtleties of the subject. This may include working with a specialist in this matter
2. Development should include use of the most versatile and most efficient tools for all or most platforms
3. All the above requirements are to be refined and well structured to fit in adequate modules and ready to implemented
4. Let use set the milestones together as the discussion is going on
Milestones of this work:
1. Comprehensive description of the system requirements and user requirements of the software to design.
2. An overall view of the system, with all the components, comprehensive modules covering all the functionalities, interrelationship and tier structure, etc..
3. Provision of different implementation options including development tools and rules, along with strengths and weaknesses for each. Description of one chosen option and presentation of strengths and weaknesses.
4. Prototype of a module to record phonemes and a main module with possibility to invoke the other functions, including the phoneme recording module
5. Detailed technical design of each function to be covered. Including implementation method and tools, development techniques, time and effort to invest for development, description of efforts required to overcome technical issues and tools strength and shortcomings, any other aspect the needed to help an easy implementation. From this work a technician can only go ahead implementing codes with no further burden.