Technology

 

It's time to liberate Speech recognition from the current limitation of SW, because we can always do it better in custom silicon–A professor of Electrical and Computer Engineering Carnegie Mellon University

Dedicated designs in custom silicon can give up to 100X better energy efficiency (MOPS/mW) compared to DSP based designs and up to 10000X better energy efficiency compared to SW–A Berkeley University study

Introduction

Voice is the natural and primary mode of communication in humans. Our ability to communicate with everyday consumer devices through a natural interface like voice is an order of magnitude faster and easier than using a keyboard, mouse, or even touch and gestures. Devices such as cellphones or tablets, digital cameras, wearable devices, Smart TVs, remote controls, home appliances, and others,can enrich the user experience by offering hands-free voice-activated commands through embedded speech recognition. Numerous announcements made by device manufacturers offering voice activation on high-end devices, undeniably validates it as a key driver in the near future. According to research firm BCC Research, the global voice recognition end-market for consumers reached $29.5 billion in 2012, and is expected to grow to $65.1 billion in 2017, a CAGR of 17.2%.

Speech recognition is a computationally intensive technology. Solutions available on the market today are software based and available only on high-end devices that have adequate resources. Software solutions require a large memory footprint and significant host processing power (most often an additional DSP or assistance through the cloud). These requirements lead to higher power consumption, increased latency, and cost, thus making embedded speech recognition very expensive and prohibitive for resource constrained low to mid-range devices.

SimSim – 3iLogic-Designs’ Speech Recognition Engine

3iLogic-Designs’ SimSim™ IP core, eliminates these constraints by delivering high performance and ultra-low power, in an extremely small footprint without sacrificing features or accuracy. Localized processing removes dependency on cloud connectivity and provides very fast and deterministic response – always, everywhere. SimSim can easily be integrated into ASSP, MCUs, ASICs, and FPGAs as appropriate.

The core takes spoken audio input in the form of 16 bit data (16 Khz or 8 khz), through Silence filtering, Feature vector generation, Senone scoring, Viterbi decoding and generates a result database. This database is analyzed by the host SW to interpret the spoken utterance.

Some of the key features of SimSim IP are as follows:

  • High-performance, Ultra low-power solution for localized, Vocal UI
  • Synthesizable solution that easily integrates into microcontrollers, application processors, ASICs and FPGAs
  • Small silicon foot print (135K gates)
  • “Always On” capability that includes Voice Activity Detection (VAD) and Keyword spotting (with or without speaker verification)
  • Highly accurate Speaker Independent solution that requires no user training
  • Language independent architecture
  • Programmable keyword, dictionary, grammar
  • Simple to complex grammar design with Java Speech grammar support
  • Scalable vocabulary
  • 16/8 kHz audio support
  • Host processor agnostic and works with embedded or external system memories
  • Development tools and APIs provided