Is your model to go from spoken English -> Spanish and vice versa? Speech is highly specialized and from my limited work in it a few years ago, gets very complex very quickly. In my mind the best way to proceed would be to use separate speech to text engines then use something like google translate to do a language text translation.
Josh