Formats and standards

Speex can encode speech in both narrowband and wideband and provides different bit-rates. However, not all features need to be supported by a certain implementation or device. In order to be called ``Speex compatible'' (whatever that means), an implementation must implement at least a basic set of features.

At the minimum, all narrowband modes of operation MUST be supported at the decoder. This includes the decoding of a wideband bit-stream by the narrowband decoder[*]. If present, a wideband decoder MUST be able to decode a narrowband stream, and MAY either be able to decode all wideband modes or be able to decode the embedded narrowband part of all modes (which includes ignoring the high-band bits).

For encoders, at least one narrowband or wideband mode MUST be supported. The main reason why all encoding modes do not have to be supported is that some platforms may not be able to handle the complexity of encoding in some modes.

RTP Payload Format

The RTP payload draft is included in appendix C and the latest version is available at This draft has been sent (2003/02/26) to the Internet Engineering Task Force (IETF) and will be discussed at the March 18th meeting in San Francisco.


For now, you should use the MIME type audio/x-speex for Speex-in-Ogg. We will apply for type audio/speex in the near future.

Ogg file format

Speex bit-streams can be stored in Ogg files. In this case, the first packet of the Ogg file contains the Speex header described in table 2. All integer fields in the headers are stored as little-endian. The speex_string field must contain the ``Speex   '' (with 3 trailing spaces), which identifies the bit-stream. The next field, speex_version contains the version of Speex that encoded the file. For now, refer to speex_header.[ch] for more info. The beginning of stream (b_o_s) flag is set to 1 for the header. The header packet has packetno=0 and granulepos=0.

The second packet contains the Speex comment header. The format used is the Vorbis comment format described here: . This packet has packetno=1 and granulepos=0.

The third and subsequent packets each contain one or more (number found in header) Speex frames. These are identified with packetno starting from 2 and the granulepos is the number of the last sample encoded in that packet. The last of these packets has the end of stream (e_o_s) flag is set to 1.

Table 2: Ogg/Speex header packet
Field Type Size
speex_string char[] 8
speex_version char[] 20
speex_version_id int 4
header_size int 4
rate int 4
mode int 4
mode_bitstream_version int 4
nb_channels int 4
bitrate int 4
frame_size int 4
vbr int 4
frames_per_packet int 4
extra_headers int 4
reserved1 int 4
reserved2 int 4

Jean-Marc Valin 2007-05-23