0/*
1FLAC audio decoder. Choice of public domain or MIT-0. See license statements at the end of this file.
2dr_flac - v0.13.0 - TBD
4David Reid - mackron@gmail.com
6GitHub: https://github.com/mackron/dr_libs
7*/
9/*
10Introduction
11============
12dr_flac is a single file library. To use it, do something like the following in one .c file.
14 ```c
15 #define DR_FLAC_IMPLEMENTATION
16 #include "dr_flac.h"
17 ```
19You can then #include this file in other parts of the program as you would with any other header file. To decode audio data, do something like the following:
21 ```c
22 drflac* pFlac = drflac_open_file("MySong.flac", NULL);
23 if (pFlac == NULL) {
24 // Failed to open FLAC file
25 }
27 drflac_int32* pSamples = malloc(pFlac->totalPCMFrameCount * pFlac->channels * sizeof(drflac_int32));
28 drflac_uint64 numberOfInterleavedSamplesActuallyRead = drflac_read_pcm_frames_s32(pFlac, pFlac->totalPCMFrameCount, pSamples);
29 ```
31The drflac object represents the decoder. It is a transparent type so all the information you need, such as the number of channels and the bits per sample,
32should be directly accessible - just make sure you don't change their values. Samples are always output as interleaved signed 32-bit PCM. In the example above
33a native FLAC stream was opened, however dr_flac has seamless support for Ogg encapsulated FLAC streams as well.
35You do not need to decode the entire stream in one go - you just specify how many samples you'd like at any given time and the decoder will give you as many
36samples as it can, up to the amount requested. Later on when you need the next batch of samples, just call it again. Example:
38 ```c
39 while (drflac_read_pcm_frames_s32(pFlac, chunkSizeInPCMFrames, pChunkSamples) > 0) {
40 do_something();
41 }
42 ```
44You can seek to a specific PCM frame with `drflac_seek_to_pcm_frame()`.
46If you just want to quickly decode an entire FLAC file in one go you can do something like this:
48 ```c
49 unsigned int channels;
50 unsigned int sampleRate;
51 drflac_uint64 totalPCMFrameCount;
52 drflac_int32* pSampleData = drflac_open_file_and_read_pcm_frames_s32("MySong.flac", &channels, &sampleRate, &totalPCMFrameCount, NULL);
53 if (pSampleData == NULL) {
54 // Failed to open and decode FLAC file.
55 }
57 ...
59 drflac_free(pSampleData, NULL);
60 ```
62You can read samples as signed 16-bit integer and 32-bit floating-point PCM with the *_s16() and *_f32() family of APIs respectively, but note that these
63should be considered lossy.
66If you need access to metadata (album art, etc.), use `drflac_open_with_metadata()`, `drflac_open_file_with_metdata()` or `drflac_open_memory_with_metadata()`.
67The rationale for keeping these APIs separate is that they're slightly slower than the normal versions and also just a little bit harder to use. dr_flac
68reports metadata to the application through the use of a callback, and every metadata block is reported before `drflac_open_with_metdata()` returns.
70The main opening APIs (`drflac_open()`, etc.) will fail if the header is not present. The presents a problem in certain scenarios such as broadcast style
71streams or internet radio where the header may not be present because the user has started playback mid-stream. To handle this, use the relaxed APIs:
73 `drflac_open_relaxed()`
74 `drflac_open_with_metadata_relaxed()`
76It is not recommended to use these APIs for file based streams because a missing header would usually indicate a corrupt or perverse file. In addition, these
77APIs can take a long time to initialize because they may need to spend a lot of time finding the first frame.
81Build Options
82=============
83#define these options before including this file.
85#define DR_FLAC_NO_STDIO
86 Disable `drflac_open_file()` and family.
88#define DR_FLAC_NO_OGG
89 Disables support for Ogg/FLAC streams.
91#define DR_FLAC_BUFFER_SIZE <number>
92 Defines the size of the internal buffer to store data from onRead(). This buffer is used to reduce the number of calls back to the client for more data.
93 Larger values means more memory, but better performance. My tests show diminishing returns after about 4KB (which is the default). Consider reducing this if
94 you have a very efficient implementation of onRead(), or increase it if it's very inefficient. Must be a multiple of 8.
96#define DR_FLAC_NO_CRC
97 Disables CRC checks. This will offer a performance boost when CRC is unnecessary. This will disable binary search seeking. When seeking, the seek table will
98 be used if available. Otherwise the seek will be performed using brute force.
100#define DR_FLAC_NO_SIMD
101 Disables SIMD optimizations (SSE on x86/x64 architectures, NEON on ARM architectures). Use this if you are having compatibility issues with your compiler.
103#define DR_FLAC_NO_WCHAR
104 Disables all functions ending with `_w`. Use this if your compiler does not provide wchar.h. Not required if DR_FLAC_NO_STDIO is also defined.
108Notes
109=====
110- dr_flac does not support changing the sample rate nor channel count mid stream.
111- dr_flac is not thread-safe, but its APIs can be called from any thread so long as you do your own synchronization.
112- When using Ogg encapsulation, a corrupted metadata block will result in `drflac_open_with_metadata()` and `drflac_open()` returning inconsistent samples due
113 to differences in corrupted stream recorvery logic between the two APIs.
114*/
116#ifndef dr_flac_h
117#define dr_flac_h
119#ifdef __cplusplus
120extern "C" {
121#endif
123#define DRFLAC_STRINGIFY(x) #x
124#define DRFLAC_XSTRINGIFY(x) DRFLAC_STRINGIFY(x)
126#define DRFLAC_VERSION_MAJOR 0
127#define DRFLAC_VERSION_MINOR 13
128#define DRFLAC_VERSION_REVISION 0
129#define DRFLAC_VERSION_STRING DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MAJOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MINOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_REVISION)
131#include <stddef.h> /* For size_t. */
133/* Sized Types */
134typedef signed char drflac_int8;
135typedef unsigned char drflac_uint8;
136typedef signed short drflac_int16;
137typedef unsigned short drflac_uint16;
138typedef signed int drflac_int32;
139typedef unsigned int drflac_uint32;
140#if defined(_MSC_VER) && !defined(__clang__)
141 typedef signed __int64 drflac_int64;
142 typedef unsigned __int64 drflac_uint64;
143#else
144 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
145 #pragma GCC diagnostic push
146 #pragma GCC diagnostic ignored "-Wlong-long"
147 #if defined(__clang__)
148 #pragma GCC diagnostic ignored "-Wc++11-long-long"
149 #endif
150 #endif
151 typedef signed long long drflac_int64;
152 typedef unsigned long long drflac_uint64;
153 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
154 #pragma GCC diagnostic pop
155 #endif
156#endif
157#if defined(__LP64__) || defined(_WIN64) || (defined(__x86_64__) && !defined(__ILP32__)) || defined(_M_X64) || defined(__ia64) || defined(_M_IA64) || defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__)
158 typedef drflac_uint64 drflac_uintptr;
159#else
160 typedef drflac_uint32 drflac_uintptr;
161#endif
162typedef drflac_uint8 drflac_bool8;
163typedef drflac_uint32 drflac_bool32;
164#define DRFLAC_TRUE 1
165#define DRFLAC_FALSE 0
166/* End Sized Types */
168/* Decorations */
169#if !defined(DRFLAC_API)
170 #if defined(DRFLAC_DLL)
171 #if defined(_WIN32)
172 #define DRFLAC_DLL_IMPORT __declspec(dllimport)
173 #define DRFLAC_DLL_EXPORT __declspec(dllexport)
174 #define DRFLAC_DLL_PRIVATE static
175 #else
176 #if defined(__GNUC__) && __GNUC__ >= 4
177 #define DRFLAC_DLL_IMPORT __attribute__((visibility("default")))
178 #define DRFLAC_DLL_EXPORT __attribute__((visibility("default")))
179 #define DRFLAC_DLL_PRIVATE __attribute__((visibility("hidden")))
180 #else
181 #define DRFLAC_DLL_IMPORT
182 #define DRFLAC_DLL_EXPORT
183 #define DRFLAC_DLL_PRIVATE static
184 #endif
185 #endif
187 #if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
188 #define DRFLAC_API DRFLAC_DLL_EXPORT
189 #else
190 #define DRFLAC_API DRFLAC_DLL_IMPORT
191 #endif
192 #define DRFLAC_PRIVATE DRFLAC_DLL_PRIVATE
193 #else
194 #define DRFLAC_API extern
195 #define DRFLAC_PRIVATE static
196 #endif
197#endif
198/* End Decorations */
200#if defined(_MSC_VER) && _MSC_VER >= 1700 /* Visual Studio 2012 */
201 #define DRFLAC_DEPRECATED __declspec(deprecated)
202#elif (defined(__GNUC__) && __GNUC__ >= 4) /* GCC 4 */
203 #define DRFLAC_DEPRECATED __attribute__((deprecated))
204#elif defined(__has_feature) /* Clang */
205 #if __has_feature(attribute_deprecated)
206 #define DRFLAC_DEPRECATED __attribute__((deprecated))
207 #else
208 #define DRFLAC_DEPRECATED
209 #endif
210#else
211 #define DRFLAC_DEPRECATED
212#endif
214DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision);
215DRFLAC_API const char* drflac_version_string(void);
217/* Allocation Callbacks */
218typedef struct
219{
220 void* pUserData;
221 void* (* onMalloc)(size_t sz, void* pUserData);
222 void* (* onRealloc)(void* p, size_t sz, void* pUserData);
223 void (* onFree)(void* p, void* pUserData);
224} drflac_allocation_callbacks;
225/* End Allocation Callbacks */
227/*
228As data is read from the client it is placed into an internal buffer for fast access. This controls the size of that buffer. Larger values means more speed,
229but also more memory. In my testing there is diminishing returns after about 4KB, but you can fiddle with this to suit your own needs. Must be a multiple of 8.
230*/
231#ifndef DR_FLAC_BUFFER_SIZE
232#define DR_FLAC_BUFFER_SIZE 4096
233#endif
236/* Architecture Detection */
237#if defined(_WIN64) || defined(_LP64) || defined(__LP64__)
238#define DRFLAC_64BIT
239#endif
241#if defined(__x86_64__) || (defined(_M_X64) && !defined(_M_ARM64EC))
242 #define DRFLAC_X64
243#elif defined(__i386) || defined(_M_IX86)
244 #define DRFLAC_X86
245#elif defined(__arm__) || defined(_M_ARM) || defined(__arm64) || defined(__arm64__) || defined(__aarch64__) || defined(_M_ARM64) || defined(_M_ARM64EC)
246 #define DRFLAC_ARM
247#endif
248/* End Architecture Detection */
251#ifdef DRFLAC_64BIT
252typedef drflac_uint64 drflac_cache_t;
253#else
254typedef drflac_uint32 drflac_cache_t;
255#endif
257/* The various metadata block types. */
258#define DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO 0
259#define DRFLAC_METADATA_BLOCK_TYPE_PADDING 1
260#define DRFLAC_METADATA_BLOCK_TYPE_APPLICATION 2
261#define DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE 3
262#define DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT 4
263#define DRFLAC_METADATA_BLOCK_TYPE_CUESHEET 5
264#define DRFLAC_METADATA_BLOCK_TYPE_PICTURE 6
265#define DRFLAC_METADATA_BLOCK_TYPE_INVALID 127
267/* The various picture types specified in the PICTURE block. */
268#define DRFLAC_PICTURE_TYPE_OTHER 0
269#define DRFLAC_PICTURE_TYPE_FILE_ICON 1
270#define DRFLAC_PICTURE_TYPE_OTHER_FILE_ICON 2
271#define DRFLAC_PICTURE_TYPE_COVER_FRONT 3
272#define DRFLAC_PICTURE_TYPE_COVER_BACK 4
273#define DRFLAC_PICTURE_TYPE_LEAFLET_PAGE 5
274#define DRFLAC_PICTURE_TYPE_MEDIA 6
275#define DRFLAC_PICTURE_TYPE_LEAD_ARTIST 7
276#define DRFLAC_PICTURE_TYPE_ARTIST 8
277#define DRFLAC_PICTURE_TYPE_CONDUCTOR 9
278#define DRFLAC_PICTURE_TYPE_BAND 10
279#define DRFLAC_PICTURE_TYPE_COMPOSER 11
280#define DRFLAC_PICTURE_TYPE_LYRICIST 12
281#define DRFLAC_PICTURE_TYPE_RECORDING_LOCATION 13
282#define DRFLAC_PICTURE_TYPE_DURING_RECORDING 14
283#define DRFLAC_PICTURE_TYPE_DURING_PERFORMANCE 15
284#define DRFLAC_PICTURE_TYPE_SCREEN_CAPTURE 16
285#define DRFLAC_PICTURE_TYPE_BRIGHT_COLORED_FISH 17
286#define DRFLAC_PICTURE_TYPE_ILLUSTRATION 18
287#define DRFLAC_PICTURE_TYPE_BAND_LOGOTYPE 19
288#define DRFLAC_PICTURE_TYPE_PUBLISHER_LOGOTYPE 20
290typedef enum
291{
292 drflac_container_native,
293 drflac_container_ogg,
294 drflac_container_unknown
295} drflac_container;
297typedef enum
298{
299 DRFLAC_SEEK_SET,
300 DRFLAC_SEEK_CUR,
301 DRFLAC_SEEK_END
302} drflac_seek_origin;
304/* The order of members in this structure is important because we map this directly to the raw data within the SEEKTABLE metadata block. */
305typedef struct
306{
307 drflac_uint64 firstPCMFrame;
308 drflac_uint64 flacFrameOffset; /* The offset from the first byte of the header of the first frame. */
309 drflac_uint16 pcmFrameCount;
310} drflac_seekpoint;
312typedef struct
313{
314 drflac_uint16 minBlockSizeInPCMFrames;
315 drflac_uint16 maxBlockSizeInPCMFrames;
316 drflac_uint32 minFrameSizeInPCMFrames;
317 drflac_uint32 maxFrameSizeInPCMFrames;
318 drflac_uint32 sampleRate;
319 drflac_uint8 channels;
320 drflac_uint8 bitsPerSample;
321 drflac_uint64 totalPCMFrameCount;
322 drflac_uint8 md5[16];
323} drflac_streaminfo;
325typedef struct
326{
327 /*
328 The metadata type. Use this to know how to interpret the data below. Will be set to one of the
329 DRFLAC_METADATA_BLOCK_TYPE_* tokens.
330 */
331 drflac_uint32 type;
333 /*
334 A pointer to the raw data. This points to a temporary buffer so don't hold on to it. It's best to
335 not modify the contents of this buffer. Use the structures below for more meaningful and structured
336 information about the metadata. It's possible for this to be null.
337 */
338 const void* pRawData;
340 /* The size in bytes of the block and the buffer pointed to by pRawData if it's non-NULL. */
341 drflac_uint32 rawDataSize;
343 union
344 {
345 drflac_streaminfo streaminfo;
347 struct
348 {
349 int unused;
350 } padding;
352 struct
353 {
354 drflac_uint32 id;
355 const void* pData;
356 drflac_uint32 dataSize;
357 } application;
359 struct
360 {
361 drflac_uint32 seekpointCount;
362 const drflac_seekpoint* pSeekpoints;
363 } seektable;
365 struct
366 {
367 drflac_uint32 vendorLength;
368 const char* vendor;
369 drflac_uint32 commentCount;
370 const void* pComments;
371 } vorbis_comment;
373 struct
374 {
375 char catalog[128];
376 drflac_uint64 leadInSampleCount;
377 drflac_bool32 isCD;
378 drflac_uint8 trackCount;
379 const void* pTrackData;
380 } cuesheet;
382 struct
383 {
384 drflac_uint32 type;
385 drflac_uint32 mimeLength;
386 const char* mime;
387 drflac_uint32 descriptionLength;
388 const char* description;
389 drflac_uint32 width;
390 drflac_uint32 height;
391 drflac_uint32 colorDepth;
392 drflac_uint32 indexColorCount;
393 drflac_uint32 pictureDataSize;
394 const drflac_uint8* pPictureData;
395 } picture;
396 } data;
397} drflac_metadata;
400/*
401Callback for when data needs to be read from the client.
404Parameters
405----------
406pUserData (in)
407 The user data that was passed to drflac_open() and family.
409pBufferOut (out)
410 The output buffer.
412bytesToRead (in)
413 The number of bytes to read.
416Return Value
417------------
418The number of bytes actually read.
421Remarks
422-------
423A return value of less than bytesToRead indicates the end of the stream. Do _not_ return from this callback until either the entire bytesToRead is filled or
424you have reached the end of the stream.
425*/
426typedef size_t (* drflac_read_proc)(void* pUserData, void* pBufferOut, size_t bytesToRead);
428/*
429Callback for when data needs to be seeked.
432Parameters
433----------
434pUserData (in)
435 The user data that was passed to drflac_open() and family.
437offset (in)
438 The number of bytes to move, relative to the origin. Will never be negative.
440origin (in)
441 The origin of the seek - the current position, the start of the stream, or the end of the stream.
444Return Value
445------------
446Whether or not the seek was successful.
449Remarks
450-------
451Seeking relative to the start and the current position must always be supported. If seeking from the end of the stream is not supported, return DRFLAC_FALSE.
453When seeking to a PCM frame using drflac_seek_to_pcm_frame(), dr_flac may call this with an offset beyond the end of the FLAC stream. This needs to be detected
454and handled by returning DRFLAC_FALSE.
455*/
456typedef drflac_bool32 (* drflac_seek_proc)(void* pUserData, int offset, drflac_seek_origin origin);
458/*
459Callback for when the current position in the stream needs to be retrieved.
462Parameters
463----------
464pUserData (in)
465 The user data that was passed to drflac_open() and family.
467pCursor (out)
468 A pointer to a variable to receive the current position in the stream.
471Return Value
472------------
473Whether or not the operation was successful.
474*/
475typedef drflac_bool32 (* drflac_tell_proc)(void* pUserData, drflac_int64* pCursor);
477/*
478Callback for when a metadata block is read.
481Parameters
482----------
483pUserData (in)
484 The user data that was passed to drflac_open() and family.
486pMetadata (in)
487 A pointer to a structure containing the data of the metadata block.
490Remarks
491-------
492Use pMetadata->type to determine which metadata block is being handled and how to read the data. This
493will be set to one of the DRFLAC_METADATA_BLOCK_TYPE_* tokens.
494*/
495typedef void (* drflac_meta_proc)(void* pUserData, drflac_metadata* pMetadata);
498/* Structure for internal use. Only used for decoders opened with drflac_open_memory. */
499typedef struct
500{
501 const drflac_uint8* data;
502 size_t dataSize;
503 size_t currentReadPos;
504} drflac__memory_stream;
506/* Structure for internal use. Used for bit streaming. */
507typedef struct
508{
509 /* The function to call when more data needs to be read. */
510 drflac_read_proc onRead;
512 /* The function to call when the current read position needs to be moved. */
513 drflac_seek_proc onSeek;
515 /* The function to call when the current read position needs to be retrieved. */
516 drflac_tell_proc onTell;
518 /* The user data to pass around to onRead and onSeek. */
519 void* pUserData;
522 /*
523 The number of unaligned bytes in the L2 cache. This will always be 0 until the end of the stream is hit. At the end of the
524 stream there will be a number of bytes that don't cleanly fit in an L1 cache line, so we use this variable to know whether
525 or not the bistreamer needs to run on a slower path to read those last bytes. This will never be more than sizeof(drflac_cache_t).
526 */
527 size_t unalignedByteCount;
529 /* The content of the unaligned bytes. */
530 drflac_cache_t unalignedCache;
532 /* The index of the next valid cache line in the "L2" cache. */
533 drflac_uint32 nextL2Line;
535 /* The number of bits that have been consumed by the cache. This is used to determine how many valid bits are remaining. */
536 drflac_uint32 consumedBits;
538 /*
539 The cached data which was most recently read from the client. There are two levels of cache. Data flows as such:
540 Client -> L2 -> L1. The L2 -> L1 movement is aligned and runs on a fast path in just a few instructions.
541 */
542 drflac_cache_t cacheL2[DR_FLAC_BUFFER_SIZE/sizeof(drflac_cache_t)];
543 drflac_cache_t cache;
545 /*
546 CRC-16. This is updated whenever bits are read from the bit stream. Manually set this to 0 to reset the CRC. For FLAC, this
547 is reset to 0 at the beginning of each frame.
548 */
549 drflac_uint16 crc16;
550 drflac_cache_t crc16Cache; /* A cache for optimizing CRC calculations. This is filled when when the L1 cache is reloaded. */
551 drflac_uint32 crc16CacheIgnoredBytes; /* The number of bytes to ignore when updating the CRC-16 from the CRC-16 cache. */
552} drflac_bs;
554typedef struct
555{
556 /* The type of the subframe: SUBFRAME_CONSTANT, SUBFRAME_VERBATIM, SUBFRAME_FIXED or SUBFRAME_LPC. */
557 drflac_uint8 subframeType;
559 /* The number of wasted bits per sample as specified by the sub-frame header. */
560 drflac_uint8 wastedBitsPerSample;
562 /* The order to use for the prediction stage for SUBFRAME_FIXED and SUBFRAME_LPC. */
563 drflac_uint8 lpcOrder;
565 /* A pointer to the buffer containing the decoded samples in the subframe. This pointer is an offset from drflac::pExtraData. */
566 drflac_int32* pSamplesS32;
567} drflac_subframe;
569typedef struct
570{
571 /*
572 If the stream uses variable block sizes, this will be set to the index of the first PCM frame. If fixed block sizes are used, this will
573 always be set to 0. This is 64-bit because the decoded PCM frame number will be 36 bits.
574 */
575 drflac_uint64 pcmFrameNumber;
577 /*
578 If the stream uses fixed block sizes, this will be set to the frame number. If variable block sizes are used, this will always be 0. This
579 is 32-bit because in fixed block sizes, the maximum frame number will be 31 bits.
580 */
581 drflac_uint32 flacFrameNumber;
583 /* The sample rate of this frame. */
584 drflac_uint32 sampleRate;
586 /* The number of PCM frames in each sub-frame within this frame. */
587 drflac_uint16 blockSizeInPCMFrames;
589 /*
590 The channel assignment of this frame. This is not always set to the channel count. If interchannel decorrelation is being used this
591 will be set to DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE, DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE or DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE.
592 */
593 drflac_uint8 channelAssignment;
595 /* The number of bits per sample within this frame. */
596 drflac_uint8 bitsPerSample;
598 /* The frame's CRC. */
599 drflac_uint8 crc8;
600} drflac_frame_header;
602typedef struct
603{
604 /* The header. */
605 drflac_frame_header header;
607 /*
608 The number of PCM frames left to be read in this FLAC frame. This is initially set to the block size. As PCM frames are read,
609 this will be decremented. When it reaches 0, the decoder will see this frame as fully consumed and load the next frame.
610 */
611 drflac_uint32 pcmFramesRemaining;
613 /* The list of sub-frames within the frame. There is one sub-frame for each channel, and there's a maximum of 8 channels. */
614 drflac_subframe subframes[8];
615} drflac_frame;
617typedef struct
618{
619 /* The function to call when a metadata block is read. */
620 drflac_meta_proc onMeta;
622 /* The user data posted to the metadata callback function. */
623 void* pUserDataMD;
625 /* Memory allocation callbacks. */
626 drflac_allocation_callbacks allocationCallbacks;
629 /* The sample rate. Will be set to something like 44100. */
630 drflac_uint32 sampleRate;
632 /*
633 The number of channels. This will be set to 1 for monaural streams, 2 for stereo, etc. Maximum 8. This is set based on the
634 value specified in the STREAMINFO block.
635 */
636 drflac_uint8 channels;
638 /* The bits per sample. Will be set to something like 16, 24, etc. */
639 drflac_uint8 bitsPerSample;
641 /* The maximum block size, in samples. This number represents the number of samples in each channel (not combined). */
642 drflac_uint16 maxBlockSizeInPCMFrames;
644 /*
645 The total number of PCM Frames making up the stream. Can be 0 in which case it's still a valid stream, but just means
646 the total PCM frame count is unknown. Likely the case with streams like internet radio.
647 */
648 drflac_uint64 totalPCMFrameCount;
651 /* The container type. This is set based on whether or not the decoder was opened from a native or Ogg stream. */
652 drflac_container container;
654 /* The number of seekpoints in the seektable. */
655 drflac_uint32 seekpointCount;
658 /* Information about the frame the decoder is currently sitting on. */
659 drflac_frame currentFLACFrame;
662 /* The index of the PCM frame the decoder is currently sitting on. This is only used for seeking. */
663 drflac_uint64 currentPCMFrame;
665 /* The position of the first FLAC frame in the stream. This is only ever used for seeking. */
666 drflac_uint64 firstFLACFramePosInBytes;
669 /* A hack to avoid a malloc() when opening a decoder with drflac_open_memory(). */
670 drflac__memory_stream memoryStream;
673 /* A pointer to the decoded sample data. This is an offset of pExtraData. */
674 drflac_int32* pDecodedSamples;
676 /* A pointer to the seek table. This is an offset of pExtraData, or NULL if there is no seek table. */
677 drflac_seekpoint* pSeekpoints;
679 /* Internal use only. Only used with Ogg containers. Points to a drflac_oggbs object. This is an offset of pExtraData. */
680 void* _oggbs;
682 /* Internal use only. Used for profiling and testing different seeking modes. */
683 drflac_bool32 _noSeekTableSeek : 1;
684 drflac_bool32 _noBinarySearchSeek : 1;
685 drflac_bool32 _noBruteForceSeek : 1;
687 /* The bit streamer. The raw FLAC data is fed through this object. */
688 drflac_bs bs;
690 /* Variable length extra data. We attach this to the end of the object so we can avoid unnecessary mallocs. */
691 drflac_uint8 pExtraData[1];
692} drflac;
695/*
696Opens a FLAC decoder.
699Parameters
700----------
701onRead (in)
702 The function to call when data needs to be read from the client.
704onSeek (in)
705 The function to call when the read position of the client data needs to move.
707pUserData (in, optional)
708 A pointer to application defined data that will be passed to onRead and onSeek.
710pAllocationCallbacks (in, optional)
711 A pointer to application defined callbacks for managing memory allocations.
714Return Value
715------------
716Returns a pointer to an object representing the decoder.
719Remarks
720-------
721Close the decoder with `drflac_close()`.
723`pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
725This function will automatically detect whether or not you are attempting to open a native or Ogg encapsulated FLAC, both of which should work seamlessly
726without any manual intervention. Ogg encapsulation also works with multiplexed streams which basically means it can play FLAC encoded audio tracks in videos.
728This is the lowest level function for opening a FLAC stream. You can also use `drflac_open_file()` and `drflac_open_memory()` to open the stream from a file or
729from a block of memory respectively.
731The STREAMINFO block must be present for this to succeed. Use `drflac_open_relaxed()` to open a FLAC stream where the header may not be present.
733Use `drflac_open_with_metadata()` if you need access to metadata.
736Seek Also
737---------
738drflac_open_file()
739drflac_open_memory()
740drflac_open_with_metadata()
741drflac_close()
742*/
743DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
745/*
746Opens a FLAC stream with relaxed validation of the header block.
749Parameters
750----------
751onRead (in)
752 The function to call when data needs to be read from the client.
754onSeek (in)
755 The function to call when the read position of the client data needs to move.
757container (in)
758 Whether or not the FLAC stream is encapsulated using standard FLAC encapsulation or Ogg encapsulation.
760pUserData (in, optional)
761 A pointer to application defined data that will be passed to onRead and onSeek.
763pAllocationCallbacks (in, optional)
764 A pointer to application defined callbacks for managing memory allocations.
767Return Value
768------------
769A pointer to an object representing the decoder.
772Remarks
773-------
774The same as drflac_open(), except attempts to open the stream even when a header block is not present.
776Because the header is not necessarily available, the caller must explicitly define the container (Native or Ogg). Do not set this to `drflac_container_unknown`
777as that is for internal use only.
779Opening in relaxed mode will continue reading data from onRead until it finds a valid frame. If a frame is never found it will continue forever. To abort,
780force your `onRead` callback to return 0, which dr_flac will use as an indicator that the end of the stream was found.
782Use `drflac_open_with_metadata_relaxed()` if you need access to metadata.
783*/
784DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
786/*
787Opens a FLAC decoder and notifies the caller of the metadata chunks (album art, etc.).
790Parameters
791----------
792onRead (in)
793 The function to call when data needs to be read from the client.
795onSeek (in)
796 The function to call when the read position of the client data needs to move.
798onMeta (in)
799 The function to call for every metadata block.
801pUserData (in, optional)
802 A pointer to application defined data that will be passed to onRead, onSeek and onMeta.
804pAllocationCallbacks (in, optional)
805 A pointer to application defined callbacks for managing memory allocations.
808Return Value
809------------
810A pointer to an object representing the decoder.
813Remarks
814-------
815Close the decoder with `drflac_close()`.
817`pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
819This is slower than `drflac_open()`, so avoid this one if you don't need metadata. Internally, this will allocate and free memory on the heap for every
820metadata block except for STREAMINFO and PADDING blocks.
822The caller is notified of the metadata via the `onMeta` callback. All metadata blocks will be handled before the function returns. This callback takes a
823pointer to a `drflac_metadata` object which is a union containing the data of all relevant metadata blocks. Use the `type` member to discriminate against
824the different metadata types.
826The STREAMINFO block must be present for this to succeed. Use `drflac_open_with_metadata_relaxed()` to open a FLAC stream where the header may not be present.
828Note that this will behave inconsistently with `drflac_open()` if the stream is an Ogg encapsulated stream and a metadata block is corrupted. This is due to
829the way the Ogg stream recovers from corrupted pages. When `drflac_open_with_metadata()` is being used, the open routine will try to read the contents of the
830metadata block, whereas `drflac_open()` will simply seek past it (for the sake of efficiency). This inconsistency can result in different samples being
831returned depending on whether or not the stream is being opened with metadata.
834Seek Also
835---------
836drflac_open_file_with_metadata()
837drflac_open_memory_with_metadata()
838drflac_open()
839drflac_close()
840*/
841DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
843/*
844The same as drflac_open_with_metadata(), except attempts to open the stream even when a header block is not present.
846See Also
847--------
848drflac_open_with_metadata()
849drflac_open_relaxed()
850*/
851DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
853/*
854Closes the given FLAC decoder.
857Parameters
858----------
859pFlac (in)
860 The decoder to close.
863Remarks
864-------
865This will destroy the decoder object.
868See Also
869--------
870drflac_open()
871drflac_open_with_metadata()
872drflac_open_file()
873drflac_open_file_w()
874drflac_open_file_with_metadata()
875drflac_open_file_with_metadata_w()
876drflac_open_memory()
877drflac_open_memory_with_metadata()
878*/
879DRFLAC_API void drflac_close(drflac* pFlac);
882/*
883Reads sample data from the given FLAC decoder, output as interleaved signed 32-bit PCM.
886Parameters
887----------
888pFlac (in)
889 The decoder.
891framesToRead (in)
892 The number of PCM frames to read.
894pBufferOut (out, optional)
895 A pointer to the buffer that will receive the decoded samples.
898Return Value
899------------
900Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
903Remarks
904-------
905pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
906*/
907DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut);
910/*
911Reads sample data from the given FLAC decoder, output as interleaved signed 16-bit PCM.
914Parameters
915----------
916pFlac (in)
917 The decoder.
919framesToRead (in)
920 The number of PCM frames to read.
922pBufferOut (out, optional)
923 A pointer to the buffer that will receive the decoded samples.
926Return Value
927------------
928Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
931Remarks
932-------
933pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
935Note that this is lossy for streams where the bits per sample is larger than 16.
936*/
937DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut);
939/*
940Reads sample data from the given FLAC decoder, output as interleaved 32-bit floating point PCM.
943Parameters
944----------
945pFlac (in)
946 The decoder.
948framesToRead (in)
949 The number of PCM frames to read.
951pBufferOut (out, optional)
952 A pointer to the buffer that will receive the decoded samples.
955Return Value
956------------
957Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
960Remarks
961-------
962pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
964Note that this should be considered lossy due to the nature of floating point numbers not being able to exactly represent every possible number.
965*/
966DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut);
968/*
969Seeks to the PCM frame at the given index.
972Parameters
973----------
974pFlac (in)
975 The decoder.
977pcmFrameIndex (in)
978 The index of the PCM frame to seek to. See notes below.
981Return Value
982-------------
983`DRFLAC_TRUE` if successful; `DRFLAC_FALSE` otherwise.
984*/
985DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex);
989#ifndef DR_FLAC_NO_STDIO
990/*
991Opens a FLAC decoder from the file at the given path.
994Parameters
995----------
996pFileName (in)
997 The path of the file to open, either absolute or relative to the current directory.
999pAllocationCallbacks (in, optional)
1000 A pointer to application defined callbacks for managing memory allocations.
1003Return Value
1004------------
1005A pointer to an object representing the decoder.
1008Remarks
1009-------
1010Close the decoder with drflac_close().
1013Remarks
1014-------
1015This will hold a handle to the file until the decoder is closed with drflac_close(). Some platforms will restrict the number of files a process can have open
1016at any given time, so keep this mind if you have many decoders open at the same time.
1019See Also
1020--------
1021drflac_open_file_with_metadata()
1022drflac_open()
1023drflac_close()
1024*/
1025DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
1026DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
1028/*
1029Opens a FLAC decoder from the file at the given path and notifies the caller of the metadata chunks (album art, etc.)
1032Parameters
1033----------
1034pFileName (in)
1035 The path of the file to open, either absolute or relative to the current directory.
1037pAllocationCallbacks (in, optional)
1038 A pointer to application defined callbacks for managing memory allocations.
1040onMeta (in)
1041 The callback to fire for each metadata block.
1043pUserData (in)
1044 A pointer to the user data to pass to the metadata callback.
1046pAllocationCallbacks (in)
1047 A pointer to application defined callbacks for managing memory allocations.
1050Remarks
1051-------
1052Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
1055See Also
1056--------
1057drflac_open_with_metadata()
1058drflac_open()
1059drflac_close()
1060*/
1061DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1062DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1063#endif
1065/*
1066Opens a FLAC decoder from a pre-allocated block of memory
1069Parameters
1070----------
1071pData (in)
1072 A pointer to the raw encoded FLAC data.
1074dataSize (in)
1075 The size in bytes of `data`.
1077pAllocationCallbacks (in)
1078 A pointer to application defined callbacks for managing memory allocations.
1081Return Value
1082------------
1083A pointer to an object representing the decoder.
1086Remarks
1087-------
1088This does not create a copy of the data. It is up to the application to ensure the buffer remains valid for the lifetime of the decoder.
1091See Also
1092--------
1093drflac_open()
1094drflac_close()
1095*/
1096DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks);
1098/*
1099Opens a FLAC decoder from a pre-allocated block of memory and notifies the caller of the metadata chunks (album art, etc.)
1102Parameters
1103----------
1104pData (in)
1105 A pointer to the raw encoded FLAC data.
1107dataSize (in)
1108 The size in bytes of `data`.
1110onMeta (in)
1111 The callback to fire for each metadata block.
1113pUserData (in)
1114 A pointer to the user data to pass to the metadata callback.
1116pAllocationCallbacks (in)
1117 A pointer to application defined callbacks for managing memory allocations.
1120Remarks
1121-------
1122Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
1125See Also
1126-------
1127drflac_open_with_metadata()
1128drflac_open()
1129drflac_close()
1130*/
1131DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1135/* High Level APIs */
1137/*
1138Opens a FLAC stream from the given callbacks and fully decodes it in a single operation. The return value is a
1139pointer to the sample data as interleaved signed 32-bit PCM. The returned data must be freed with drflac_free().
1141You can pass in custom memory allocation callbacks via the pAllocationCallbacks parameter. This can be NULL in which
1142case it will use DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
1144Sometimes a FLAC file won't keep track of the total sample count. In this situation the function will continuously
1145read samples into a dynamically sized buffer on the heap until no samples are left.
1147Do not call this function on a broadcast type of stream (like internet radio streams and whatnot).
1148*/
1149DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1151/* Same as drflac_open_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1152DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1154/* Same as drflac_open_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1155DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1157#ifndef DR_FLAC_NO_STDIO
1158/* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a file. */
1159DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1161/* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1162DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1164/* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1165DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1166#endif
1168/* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a block of memory. */
1169DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1171/* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1172DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1174/* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1175DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1177/*
1178Frees memory that was allocated internally by dr_flac.
1180Set pAllocationCallbacks to the same object that was passed to drflac_open_*_and_read_pcm_frames_*(). If you originally passed in NULL, pass in NULL for this.
1181*/
1182DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks);
1185/* Structure representing an iterator for vorbis comments in a VORBIS_COMMENT metadata block. */
1186typedef struct
1187{
1188 drflac_uint32 countRemaining;
1189 const char* pRunningData;
1190} drflac_vorbis_comment_iterator;
1192/*
1193Initializes a vorbis comment iterator. This can be used for iterating over the vorbis comments in a VORBIS_COMMENT
1194metadata block.
1195*/
1196DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments);
1198/*
1199Goes to the next vorbis comment in the given iterator. If null is returned it means there are no more comments. The
1200returned string is NOT null terminated.
1201*/
1202DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut);
1205/* Structure representing an iterator for cuesheet tracks in a CUESHEET metadata block. */
1206typedef struct
1207{
1208 drflac_uint32 countRemaining;
1209 const char* pRunningData;
1210} drflac_cuesheet_track_iterator;
1212/* The order of members here is important because we map this directly to the raw data within the CUESHEET metadata block. */
1213typedef struct
1214{
1215 drflac_uint64 offset;
1216 drflac_uint8 index;
1217 drflac_uint8 reserved[3];
1218} drflac_cuesheet_track_index;
1220typedef struct
1221{
1222 drflac_uint64 offset;
1223 drflac_uint8 trackNumber;
1224 char ISRC[12];
1225 drflac_bool8 isAudio;
1226 drflac_bool8 preEmphasis;
1227 drflac_uint8 indexCount;
1228 const drflac_cuesheet_track_index* pIndexPoints;
1229} drflac_cuesheet_track;
1231/*
1232Initializes a cuesheet track iterator. This can be used for iterating over the cuesheet tracks in a CUESHEET metadata
1233block.
1234*/
1235DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData);
1237/* Goes to the next cuesheet track in the given iterator. If DRFLAC_FALSE is returned it means there are no more comments. */
1238DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack);
1241#ifdef __cplusplus
1242}
1243#endif
1244#endif /* dr_flac_h */
1247/************************************************************************************************************************************************************
1248 ************************************************************************************************************************************************************
1250 IMPLEMENTATION
1252 ************************************************************************************************************************************************************
1253 ************************************************************************************************************************************************************/
1254#if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
1255#ifndef dr_flac_c
1256#define dr_flac_c
1258/* Disable some annoying warnings. */
1259#if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
1260 #pragma GCC diagnostic push
1261 #if __GNUC__ >= 7
1262 #pragma GCC diagnostic ignored "-Wimplicit-fallthrough"
1263 #endif
1264#endif
1266#ifdef __linux__
1267 #ifndef _BSD_SOURCE
1268 #define _BSD_SOURCE
1269 #endif
1270 #ifndef _DEFAULT_SOURCE
1271 #define _DEFAULT_SOURCE
1272 #endif
1273 #ifndef __USE_BSD
1274 #define __USE_BSD
1275 #endif
1276 #include <endian.h>
1277#endif
1279#include <stdlib.h>
1280#include <string.h>
1282/* Inline */
1283#ifdef _MSC_VER
1284 #define DRFLAC_INLINE __forceinline
1285#elif defined(__GNUC__)
1286 /*
1287 I've had a bug report where GCC is emitting warnings about functions possibly not being inlineable. This warning happens when
1288 the __attribute__((always_inline)) attribute is defined without an "inline" statement. I think therefore there must be some
1289 case where "__inline__" is not always defined, thus the compiler emitting these warnings. When using -std=c89 or -ansi on the
1290 command line, we cannot use the "inline" keyword and instead need to use "__inline__". In an attempt to work around this issue
1291 I am using "__inline__" only when we're compiling in strict ANSI mode.
1292 */
1293 #if defined(__STRICT_ANSI__)
1294 #define DRFLAC_GNUC_INLINE_HINT __inline__
1295 #else
1296 #define DRFLAC_GNUC_INLINE_HINT inline
1297 #endif
1299 #if (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 2)) || defined(__clang__)
1300 #define DRFLAC_INLINE DRFLAC_GNUC_INLINE_HINT __attribute__((always_inline))
1301 #else
1302 #define DRFLAC_INLINE DRFLAC_GNUC_INLINE_HINT
1303 #endif
1304#elif defined(__WATCOMC__)
1305 #define DRFLAC_INLINE __inline
1306#else
1307 #define DRFLAC_INLINE
1308#endif
1309/* End Inline */
1311/*
1312Intrinsics Support
1314There's a bug in GCC 4.2.x which results in an incorrect compilation error when using _mm_slli_epi32() where it complains with
1316 "error: shift must be an immediate"
1318Unfortuantely dr_flac depends on this for a few things so we're just going to disable SSE on GCC 4.2 and below.
1319*/
1320#if !defined(DR_FLAC_NO_SIMD)
1321 #if defined(DRFLAC_X64) || defined(DRFLAC_X86)
1322 #if defined(_MSC_VER) && !defined(__clang__)
1323 /* MSVC. */
1324 #if _MSC_VER >= 1400 && !defined(DRFLAC_NO_SSE2) /* 2005 */
1325 #define DRFLAC_SUPPORT_SSE2
1326 #endif
1327 #if _MSC_VER >= 1600 && !defined(DRFLAC_NO_SSE41) /* 2010 */
1328 #define DRFLAC_SUPPORT_SSE41
1329 #endif
1330 #elif defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)))
1331 /* Assume GNUC-style. */
1332 #if defined(__SSE2__) && !defined(DRFLAC_NO_SSE2)
1333 #define DRFLAC_SUPPORT_SSE2
1334 #endif
1335 #if defined(__SSE4_1__) && !defined(DRFLAC_NO_SSE41)
1336 #define DRFLAC_SUPPORT_SSE41
1337 #endif
1338 #endif
1340 /* If at this point we still haven't determined compiler support for the intrinsics just fall back to __has_include. */
1341 #if !defined(__GNUC__) && !defined(__clang__) && defined(__has_include)
1342 #if !defined(DRFLAC_SUPPORT_SSE2) && !defined(DRFLAC_NO_SSE2) && __has_include(<emmintrin.h>)
1343 #define DRFLAC_SUPPORT_SSE2
1344 #endif
1345 #if !defined(DRFLAC_SUPPORT_SSE41) && !defined(DRFLAC_NO_SSE41) && __has_include(<smmintrin.h>)
1346 #define DRFLAC_SUPPORT_SSE41
1347 #endif
1348 #endif
1350 #if defined(DRFLAC_SUPPORT_SSE41)
1351 #include <smmintrin.h>
1352 #elif defined(DRFLAC_SUPPORT_SSE2)
1353 #include <emmintrin.h>
1354 #endif
1355 #endif
1357 #if defined(DRFLAC_ARM)
1358 #if !defined(DRFLAC_NO_NEON) && (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
1359 #define DRFLAC_SUPPORT_NEON
1360 #include <arm_neon.h>
1361 #endif
1362 #endif
1363#endif
1365/* Compile-time CPU feature support. */
1366#if !defined(DR_FLAC_NO_SIMD) && (defined(DRFLAC_X86) || defined(DRFLAC_X64))
1367 #if defined(_MSC_VER) && !defined(__clang__)
1368 #if _MSC_VER >= 1400
1369 #include <intrin.h>
1370 static void drflac__cpuid(int info[4], int fid)
1371 {
1372 __cpuid(info, fid);
1373 }
1374 #else
1375 #define DRFLAC_NO_CPUID
1376 #endif
1377 #else
1378 #if defined(__GNUC__) || defined(__clang__)
1379 static void drflac__cpuid(int info[4], int fid)
1380 {
1381 /*
1382 It looks like the -fPIC option uses the ebx register which GCC complains about. We can work around this by just using a different register, the
1383 specific register of which I'm letting the compiler decide on. The "k" prefix is used to specify a 32-bit register. The {...} syntax is for
1384 supporting different assembly dialects.
1386 What's basically happening is that we're saving and restoring the ebx register manually.
1387 */
1388 #if defined(DRFLAC_X86) && defined(__PIC__)
1389 __asm__ __volatile__ (
1390 "xchg{l} {%%}ebx, %k1;"
1391 "cpuid;"
1392 "xchg{l} {%%}ebx, %k1;"
1393 : "=a"(info[0]), "=&r"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
1394 );
1395 #else
1396 __asm__ __volatile__ (
1397 "cpuid" : "=a"(info[0]), "=b"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
1398 );
1399 #endif
1400 }
1401 #else
1402 #define DRFLAC_NO_CPUID
1403 #endif
1404 #endif
1405#else
1406 #define DRFLAC_NO_CPUID
1407#endif
1409static DRFLAC_INLINE drflac_bool32 drflac_has_sse2(void)
1410{
1411#if defined(DRFLAC_SUPPORT_SSE2)
1412 #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE2)
1413 #if defined(DRFLAC_X64)
1414 return DRFLAC_TRUE; /* 64-bit targets always support SSE2. */
1415 #elif (defined(_M_IX86_FP) && _M_IX86_FP == 2) || defined(__SSE2__)
1416 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate SSE2 code we can assume support. */
1417 #else
1418 #if defined(DRFLAC_NO_CPUID)
1419 return DRFLAC_FALSE;
1420 #else
1421 int info[4];
1422 drflac__cpuid(info, 1);
1423 return (info[3] & (1 << 26)) != 0;
1424 #endif
1425 #endif
1426 #else
1427 return DRFLAC_FALSE; /* SSE2 is only supported on x86 and x64 architectures. */
1428 #endif
1429#else
1430 return DRFLAC_FALSE; /* No compiler support. */
1431#endif
1432}
1434static DRFLAC_INLINE drflac_bool32 drflac_has_sse41(void)
1435{
1436#if defined(DRFLAC_SUPPORT_SSE41)
1437 #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE41)
1438 #if defined(__SSE4_1__) || defined(__AVX__)
1439 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate SSE41 code we can assume support. */
1440 #else
1441 #if defined(DRFLAC_NO_CPUID)
1442 return DRFLAC_FALSE;
1443 #else
1444 int info[4];
1445 drflac__cpuid(info, 1);
1446 return (info[2] & (1 << 19)) != 0;
1447 #endif
1448 #endif
1449 #else
1450 return DRFLAC_FALSE; /* SSE41 is only supported on x86 and x64 architectures. */
1451 #endif
1452#else
1453 return DRFLAC_FALSE; /* No compiler support. */
1454#endif
1455}
1458#if defined(_MSC_VER) && _MSC_VER >= 1500 && (defined(DRFLAC_X86) || defined(DRFLAC_X64)) && !defined(__clang__)
1459 #define DRFLAC_HAS_LZCNT_INTRINSIC
1460#elif (defined(__GNUC__) && ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 7)))
1461 #define DRFLAC_HAS_LZCNT_INTRINSIC
1462#elif defined(__clang__)
1463 #if defined(__has_builtin)
1464 #if __has_builtin(__builtin_clzll) || __has_builtin(__builtin_clzl)
1465 #define DRFLAC_HAS_LZCNT_INTRINSIC
1466 #endif
1467 #endif
1468#endif
1470#if defined(_MSC_VER) && _MSC_VER >= 1400 && !defined(__clang__)
1471 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1472 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1473 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1474#elif defined(__clang__)
1475 #if defined(__has_builtin)
1476 #if __has_builtin(__builtin_bswap16)
1477 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1478 #endif
1479 #if __has_builtin(__builtin_bswap32)
1480 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1481 #endif
1482 #if __has_builtin(__builtin_bswap64)
1483 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1484 #endif
1485 #endif
1486#elif defined(__GNUC__)
1487 #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))
1488 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1489 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1490 #endif
1491 #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8))
1492 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1493 #endif
1494#elif defined(__WATCOMC__) && defined(__386__)
1495 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1496 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1497 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1498 extern __inline drflac_uint16 _watcom_bswap16(drflac_uint16);
1499 extern __inline drflac_uint32 _watcom_bswap32(drflac_uint32);
1500 extern __inline drflac_uint64 _watcom_bswap64(drflac_uint64);
1501#pragma aux _watcom_bswap16 = \
1502 "xchg al, ah" \
1503 parm [ax] \
1504 value [ax] \
1505 modify nomemory;
1506#pragma aux _watcom_bswap32 = \
1507 "bswap eax" \
1508 parm [eax] \
1509 value [eax] \
1510 modify nomemory;
1511#pragma aux _watcom_bswap64 = \
1512 "bswap eax" \
1513 "bswap edx" \
1514 "xchg eax,edx" \
1515 parm [eax edx] \
1516 value [eax edx] \
1517 modify nomemory;
1518#endif
1521/* Standard library stuff. */
1522#ifndef DRFLAC_ASSERT
1523#include <assert.h>
1524#define DRFLAC_ASSERT(expression) assert(expression)
1525#endif
1526#ifndef DRFLAC_MALLOC
1527#define DRFLAC_MALLOC(sz) malloc((sz))
1528#endif
1529#ifndef DRFLAC_REALLOC
1530#define DRFLAC_REALLOC(p, sz) realloc((p), (sz))
1531#endif
1532#ifndef DRFLAC_FREE
1533#define DRFLAC_FREE(p) free((p))
1534#endif
1535#ifndef DRFLAC_COPY_MEMORY
1536#define DRFLAC_COPY_MEMORY(dst, src, sz) memcpy((dst), (src), (sz))
1537#endif
1538#ifndef DRFLAC_ZERO_MEMORY
1539#define DRFLAC_ZERO_MEMORY(p, sz) memset((p), 0, (sz))
1540#endif
1541#ifndef DRFLAC_ZERO_OBJECT
1542#define DRFLAC_ZERO_OBJECT(p) DRFLAC_ZERO_MEMORY((p), sizeof(*(p)))
1543#endif
1545#define DRFLAC_MAX_SIMD_VECTOR_SIZE 64 /* 64 for AVX-512 in the future. */
1547/* Result Codes */
1548typedef drflac_int32 drflac_result;
1549#define DRFLAC_SUCCESS 0
1550#define DRFLAC_ERROR -1 /* A generic error. */
1551#define DRFLAC_INVALID_ARGS -2
1552#define DRFLAC_INVALID_OPERATION -3
1553#define DRFLAC_OUT_OF_MEMORY -4
1554#define DRFLAC_OUT_OF_RANGE -5
1555#define DRFLAC_ACCESS_DENIED -6
1556#define DRFLAC_DOES_NOT_EXIST -7
1557#define DRFLAC_ALREADY_EXISTS -8
1558#define DRFLAC_TOO_MANY_OPEN_FILES -9
1559#define DRFLAC_INVALID_FILE -10
1560#define DRFLAC_TOO_BIG -11
1561#define DRFLAC_PATH_TOO_LONG -12
1562#define DRFLAC_NAME_TOO_LONG -13
1563#define DRFLAC_NOT_DIRECTORY -14
1564#define DRFLAC_IS_DIRECTORY -15
1565#define DRFLAC_DIRECTORY_NOT_EMPTY -16
1566#define DRFLAC_END_OF_FILE -17
1567#define DRFLAC_NO_SPACE -18
1568#define DRFLAC_BUSY -19
1569#define DRFLAC_IO_ERROR -20
1570#define DRFLAC_INTERRUPT -21
1571#define DRFLAC_UNAVAILABLE -22
1572#define DRFLAC_ALREADY_IN_USE -23
1573#define DRFLAC_BAD_ADDRESS -24
1574#define DRFLAC_BAD_SEEK -25
1575#define DRFLAC_BAD_PIPE -26
1576#define DRFLAC_DEADLOCK -27
1577#define DRFLAC_TOO_MANY_LINKS -28
1578#define DRFLAC_NOT_IMPLEMENTED -29
1579#define DRFLAC_NO_MESSAGE -30
1580#define DRFLAC_BAD_MESSAGE -31
1581#define DRFLAC_NO_DATA_AVAILABLE -32
1582#define DRFLAC_INVALID_DATA -33
1583#define DRFLAC_TIMEOUT -34
1584#define DRFLAC_NO_NETWORK -35
1585#define DRFLAC_NOT_UNIQUE -36
1586#define DRFLAC_NOT_SOCKET -37
1587#define DRFLAC_NO_ADDRESS -38
1588#define DRFLAC_BAD_PROTOCOL -39
1589#define DRFLAC_PROTOCOL_UNAVAILABLE -40
1590#define DRFLAC_PROTOCOL_NOT_SUPPORTED -41
1591#define DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED -42
1592#define DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED -43
1593#define DRFLAC_SOCKET_NOT_SUPPORTED -44
1594#define DRFLAC_CONNECTION_RESET -45
1595#define DRFLAC_ALREADY_CONNECTED -46
1596#define DRFLAC_NOT_CONNECTED -47
1597#define DRFLAC_CONNECTION_REFUSED -48
1598#define DRFLAC_NO_HOST -49
1599#define DRFLAC_IN_PROGRESS -50
1600#define DRFLAC_CANCELLED -51
1601#define DRFLAC_MEMORY_ALREADY_MAPPED -52
1602#define DRFLAC_AT_END -53
1604#define DRFLAC_CRC_MISMATCH -100
1605/* End Result Codes */
1608#define DRFLAC_SUBFRAME_CONSTANT 0
1609#define DRFLAC_SUBFRAME_VERBATIM 1
1610#define DRFLAC_SUBFRAME_FIXED 8
1611#define DRFLAC_SUBFRAME_LPC 32
1612#define DRFLAC_SUBFRAME_RESERVED 255
1614#define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE 0
1615#define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2 1
1617#define DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT 0
1618#define DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE 8
1619#define DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE 9
1620#define DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE 10
1622#define DRFLAC_SEEKPOINT_SIZE_IN_BYTES 18
1623#define DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES 36
1624#define DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES 12
1626#define drflac_align(x, a) ((((x) + (a) - 1) / (a)) * (a))
1629DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision)
1630{
1631 if (pMajor) {
1632 *pMajor = DRFLAC_VERSION_MAJOR;
1633 }
1635 if (pMinor) {
1636 *pMinor = DRFLAC_VERSION_MINOR;
1637 }
1639 if (pRevision) {
1640 *pRevision = DRFLAC_VERSION_REVISION;
1641 }
1642}
1644DRFLAC_API const char* drflac_version_string(void)
1645{
1646 return DRFLAC_VERSION_STRING;
1647}
1650/* CPU caps. */
1651#if defined(__has_feature)
1652 #if __has_feature(thread_sanitizer)
1653 #define DRFLAC_NO_THREAD_SANITIZE __attribute__((no_sanitize("thread")))
1654 #else
1655 #define DRFLAC_NO_THREAD_SANITIZE
1656 #endif
1657#else
1658 #define DRFLAC_NO_THREAD_SANITIZE
1659#endif
1661#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
1662static drflac_bool32 drflac__gIsLZCNTSupported = DRFLAC_FALSE;
1663#endif
1665#ifndef DRFLAC_NO_CPUID
1666static drflac_bool32 drflac__gIsSSE2Supported = DRFLAC_FALSE;
1667static drflac_bool32 drflac__gIsSSE41Supported = DRFLAC_FALSE;
1669/*
1670I've had a bug report that Clang's ThreadSanitizer presents a warning in this function. Having reviewed this, this does
1671actually make sense. However, since CPU caps should never differ for a running process, I don't think the trade off of
1672complicating internal API's by passing around CPU caps versus just disabling the warnings is worthwhile. I'm therefore
1673just going to disable these warnings. This is disabled via the DRFLAC_NO_THREAD_SANITIZE attribute.
1674*/
1675DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
1676{
1677 static drflac_bool32 isCPUCapsInitialized = DRFLAC_FALSE;
1679 if (!isCPUCapsInitialized) {
1680 /* LZCNT */
1681#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
1682 int info[4] = {0};
1683 drflac__cpuid(info, 0x80000001);
1684 drflac__gIsLZCNTSupported = (info[2] & (1 << 5)) != 0;
1685#endif
1687 /* SSE2 */
1688 drflac__gIsSSE2Supported = drflac_has_sse2();
1690 /* SSE4.1 */
1691 drflac__gIsSSE41Supported = drflac_has_sse41();
1693 /* Initialized. */
1694 isCPUCapsInitialized = DRFLAC_TRUE;
1695 }
1696}
1697#else
1698static drflac_bool32 drflac__gIsNEONSupported = DRFLAC_FALSE;
1700static DRFLAC_INLINE drflac_bool32 drflac__has_neon(void)
1701{
1702#if defined(DRFLAC_SUPPORT_NEON)
1703 #if defined(DRFLAC_ARM) && !defined(DRFLAC_NO_NEON)
1704 #if (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
1705 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate NEON code we can assume support. */
1706 #else
1707 /* TODO: Runtime check. */
1708 return DRFLAC_FALSE;
1709 #endif
1710 #else
1711 return DRFLAC_FALSE; /* NEON is only supported on ARM architectures. */
1712 #endif
1713#else
1714 return DRFLAC_FALSE; /* No compiler support. */
1715#endif
1716}
1718DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
1719{
1720 drflac__gIsNEONSupported = drflac__has_neon();
1722#if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
1723 drflac__gIsLZCNTSupported = DRFLAC_TRUE;
1724#endif
1725}
1726#endif
1729/* Endian Management */
1730static DRFLAC_INLINE drflac_bool32 drflac__is_little_endian(void)
1731{
1732#if defined(DRFLAC_X86) || defined(DRFLAC_X64)
1733 return DRFLAC_TRUE;
1734#elif defined(__BYTE_ORDER) && defined(__LITTLE_ENDIAN) && __BYTE_ORDER == __LITTLE_ENDIAN
1735 return DRFLAC_TRUE;
1736#else
1737 int n = 1;
1738 return (*(char*)&n) == 1;
1739#endif
1740}
1742static DRFLAC_INLINE drflac_uint16 drflac__swap_endian_uint16(drflac_uint16 n)
1743{
1744#ifdef DRFLAC_HAS_BYTESWAP16_INTRINSIC
1745 #if defined(_MSC_VER) && !defined(__clang__)
1746 return _byteswap_ushort(n);
1747 #elif defined(__GNUC__) || defined(__clang__)
1748 return __builtin_bswap16(n);
1749 #elif defined(__WATCOMC__) && defined(__386__)
1750 return _watcom_bswap16(n);
1751 #else
1752 #error "This compiler does not support the byte swap intrinsic."
1753 #endif
1754#else
1755 return ((n & 0xFF00) >> 8) |
1756 ((n & 0x00FF) << 8);
1757#endif
1758}
1760static DRFLAC_INLINE drflac_uint32 drflac__swap_endian_uint32(drflac_uint32 n)
1761{
1762#ifdef DRFLAC_HAS_BYTESWAP32_INTRINSIC
1763 #if defined(_MSC_VER) && !defined(__clang__)
1764 return _byteswap_ulong(n);
1765 #elif defined(__GNUC__) || defined(__clang__)
1766 #if defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 6) && !defined(__ARM_ARCH_6M__) && !defined(DRFLAC_64BIT) /* <-- 64-bit inline assembly has not been tested, so disabling for now. */
1767 /* Inline assembly optimized implementation for ARM. In my testing, GCC does not generate optimized code with __builtin_bswap32(). */
1768 drflac_uint32 r;
1769 __asm__ __volatile__ (
1770 #if defined(DRFLAC_64BIT)
1771 "rev %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(n) /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
1772 #else
1773 "rev %[out], %[in]" : [out]"=r"(r) : [in]"r"(n)
1774 #endif
1775 );
1776 return r;
1777 #else
1778 return __builtin_bswap32(n);
1779 #endif
1780 #elif defined(__WATCOMC__) && defined(__386__)
1781 return _watcom_bswap32(n);
1782 #else
1783 #error "This compiler does not support the byte swap intrinsic."
1784 #endif
1785#else
1786 return ((n & 0xFF000000) >> 24) |
1787 ((n & 0x00FF0000) >> 8) |
1788 ((n & 0x0000FF00) << 8) |
1789 ((n & 0x000000FF) << 24);
1790#endif
1791}
1793static DRFLAC_INLINE drflac_uint64 drflac__swap_endian_uint64(drflac_uint64 n)
1794{
1795#ifdef DRFLAC_HAS_BYTESWAP64_INTRINSIC
1796 #if defined(_MSC_VER) && !defined(__clang__)
1797 return _byteswap_uint64(n);
1798 #elif defined(__GNUC__) || defined(__clang__)
1799 return __builtin_bswap64(n);
1800 #elif defined(__WATCOMC__) && defined(__386__)
1801 return _watcom_bswap64(n);
1802 #else
1803 #error "This compiler does not support the byte swap intrinsic."
1804 #endif
1805#else
1806 /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
1807 return ((n & ((drflac_uint64)0xFF000000 << 32)) >> 56) |
1808 ((n & ((drflac_uint64)0x00FF0000 << 32)) >> 40) |
1809 ((n & ((drflac_uint64)0x0000FF00 << 32)) >> 24) |
1810 ((n & ((drflac_uint64)0x000000FF << 32)) >> 8) |
1811 ((n & ((drflac_uint64)0xFF000000 )) << 8) |
1812 ((n & ((drflac_uint64)0x00FF0000 )) << 24) |
1813 ((n & ((drflac_uint64)0x0000FF00 )) << 40) |
1814 ((n & ((drflac_uint64)0x000000FF )) << 56);
1815#endif
1816}
1819static DRFLAC_INLINE drflac_uint16 drflac__be2host_16(drflac_uint16 n)
1820{
1821 if (drflac__is_little_endian()) {
1822 return drflac__swap_endian_uint16(n);
1823 }
1825 return n;
1826}
1828static DRFLAC_INLINE drflac_uint32 drflac__be2host_32(drflac_uint32 n)
1829{
1830 if (drflac__is_little_endian()) {
1831 return drflac__swap_endian_uint32(n);
1832 }
1834 return n;
1835}
1837static DRFLAC_INLINE drflac_uint32 drflac__be2host_32_ptr_unaligned(const void* pData)
1838{
1839 const drflac_uint8* pNum = (drflac_uint8*)pData;
1840 return *(pNum) << 24 | *(pNum+1) << 16 | *(pNum+2) << 8 | *(pNum+3);
1841}
1843static DRFLAC_INLINE drflac_uint64 drflac__be2host_64(drflac_uint64 n)
1844{
1845 if (drflac__is_little_endian()) {
1846 return drflac__swap_endian_uint64(n);
1847 }
1849 return n;
1850}
1853static DRFLAC_INLINE drflac_uint32 drflac__le2host_32(drflac_uint32 n)
1854{
1855 if (!drflac__is_little_endian()) {
1856 return drflac__swap_endian_uint32(n);
1857 }
1859 return n;
1860}
1862static DRFLAC_INLINE drflac_uint32 drflac__le2host_32_ptr_unaligned(const void* pData)
1863{
1864 const drflac_uint8* pNum = (drflac_uint8*)pData;
1865 return *pNum | *(pNum+1) << 8 | *(pNum+2) << 16 | *(pNum+3) << 24;
1866}
1869static DRFLAC_INLINE drflac_uint32 drflac__unsynchsafe_32(drflac_uint32 n)
1870{
1871 drflac_uint32 result = 0;
1872 result |= (n & 0x7F000000) >> 3;
1873 result |= (n & 0x007F0000) >> 2;
1874 result |= (n & 0x00007F00) >> 1;
1875 result |= (n & 0x0000007F) >> 0;
1877 return result;
1878}
1882/* The CRC code below is based on this document: http://zlib.net/crc_v3.txt */
1883static drflac_uint8 drflac__crc8_table[] = {
1884 0x00, 0x07, 0x0E, 0x09, 0x1C, 0x1B, 0x12, 0x15, 0x38, 0x3F, 0x36, 0x31, 0x24, 0x23, 0x2A, 0x2D,
1885 0x70, 0x77, 0x7E, 0x79, 0x6C, 0x6B, 0x62, 0x65, 0x48, 0x4F, 0x46, 0x41, 0x54, 0x53, 0x5A, 0x5D,
1886 0xE0, 0xE7, 0xEE, 0xE9, 0xFC, 0xFB, 0xF2, 0xF5, 0xD8, 0xDF, 0xD6, 0xD1, 0xC4, 0xC3, 0xCA, 0xCD,
1887 0x90, 0x97, 0x9E, 0x99, 0x8C, 0x8B, 0x82, 0x85, 0xA8, 0xAF, 0xA6, 0xA1, 0xB4, 0xB3, 0xBA, 0xBD,
1888 0xC7, 0xC0, 0xC9, 0xCE, 0xDB, 0xDC, 0xD5, 0xD2, 0xFF, 0xF8, 0xF1, 0xF6, 0xE3, 0xE4, 0xED, 0xEA,
1889 0xB7, 0xB0, 0xB9, 0xBE, 0xAB, 0xAC, 0xA5, 0xA2, 0x8F, 0x88, 0x81, 0x86, 0x93, 0x94, 0x9D, 0x9A,
1890 0x27, 0x20, 0x29, 0x2E, 0x3B, 0x3C, 0x35, 0x32, 0x1F, 0x18, 0x11, 0x16, 0x03, 0x04, 0x0D, 0x0A,
1891 0x57, 0x50, 0x59, 0x5E, 0x4B, 0x4C, 0x45, 0x42, 0x6F, 0x68, 0x61, 0x66, 0x73, 0x74, 0x7D, 0x7A,
1892 0x89, 0x8E, 0x87, 0x80, 0x95, 0x92, 0x9B, 0x9C, 0xB1, 0xB6, 0xBF, 0xB8, 0xAD, 0xAA, 0xA3, 0xA4,
1893 0xF9, 0xFE, 0xF7, 0xF0, 0xE5, 0xE2, 0xEB, 0xEC, 0xC1, 0xC6, 0xCF, 0xC8, 0xDD, 0xDA, 0xD3, 0xD4,
1894 0x69, 0x6E, 0x67, 0x60, 0x75, 0x72, 0x7B, 0x7C, 0x51, 0x56, 0x5F, 0x58, 0x4D, 0x4A, 0x43, 0x44,
1895 0x19, 0x1E, 0x17, 0x10, 0x05, 0x02, 0x0B, 0x0C, 0x21, 0x26, 0x2F, 0x28, 0x3D, 0x3A, 0x33, 0x34,
1896 0x4E, 0x49, 0x40, 0x47, 0x52, 0x55, 0x5C, 0x5B, 0x76, 0x71, 0x78, 0x7F, 0x6A, 0x6D, 0x64, 0x63,
1897 0x3E, 0x39, 0x30, 0x37, 0x22, 0x25, 0x2C, 0x2B, 0x06, 0x01, 0x08, 0x0F, 0x1A, 0x1D, 0x14, 0x13,
1898 0xAE, 0xA9, 0xA0, 0xA7, 0xB2, 0xB5, 0xBC, 0xBB, 0x96, 0x91, 0x98, 0x9F, 0x8A, 0x8D, 0x84, 0x83,
1899 0xDE, 0xD9, 0xD0, 0xD7, 0xC2, 0xC5, 0xCC, 0xCB, 0xE6, 0xE1, 0xE8, 0xEF, 0xFA, 0xFD, 0xF4, 0xF3
1900};
1902static drflac_uint16 drflac__crc16_table[] = {
1903 0x0000, 0x8005, 0x800F, 0x000A, 0x801B, 0x001E, 0x0014, 0x8011,
1904 0x8033, 0x0036, 0x003C, 0x8039, 0x0028, 0x802D, 0x8027, 0x0022,
1905 0x8063, 0x0066, 0x006C, 0x8069, 0x0078, 0x807D, 0x8077, 0x0072,
1906 0x0050, 0x8055, 0x805F, 0x005A, 0x804B, 0x004E, 0x0044, 0x8041,
1907 0x80C3, 0x00C6, 0x00CC, 0x80C9, 0x00D8, 0x80DD, 0x80D7, 0x00D2,
1908 0x00F0, 0x80F5, 0x80FF, 0x00FA, 0x80EB, 0x00EE, 0x00E4, 0x80E1,
1909 0x00A0, 0x80A5, 0x80AF, 0x00AA, 0x80BB, 0x00BE, 0x00B4, 0x80B1,
1910 0x8093, 0x0096, 0x009C, 0x8099, 0x0088, 0x808D, 0x8087, 0x0082,
1911 0x8183, 0x0186, 0x018C, 0x8189, 0x0198, 0x819D, 0x8197, 0x0192,
1912 0x01B0, 0x81B5, 0x81BF, 0x01BA, 0x81AB, 0x01AE, 0x01A4, 0x81A1,
1913 0x01E0, 0x81E5, 0x81EF, 0x01EA, 0x81FB, 0x01FE, 0x01F4, 0x81F1,
1914 0x81D3, 0x01D6, 0x01DC, 0x81D9, 0x01C8, 0x81CD, 0x81C7, 0x01C2,
1915 0x0140, 0x8145, 0x814F, 0x014A, 0x815B, 0x015E, 0x0154, 0x8151,
1916 0x8173, 0x0176, 0x017C, 0x8179, 0x0168, 0x816D, 0x8167, 0x0162,
1917 0x8123, 0x0126, 0x012C, 0x8129, 0x0138, 0x813D, 0x8137, 0x0132,
1918 0x0110, 0x8115, 0x811F, 0x011A, 0x810B, 0x010E, 0x0104, 0x8101,
1919 0x8303, 0x0306, 0x030C, 0x8309, 0x0318, 0x831D, 0x8317, 0x0312,
1920 0x0330, 0x8335, 0x833F, 0x033A, 0x832B, 0x032E, 0x0324, 0x8321,
1921 0x0360, 0x8365, 0x836F, 0x036A, 0x837B, 0x037E, 0x0374, 0x8371,
1922 0x8353, 0x0356, 0x035C, 0x8359, 0x0348, 0x834D, 0x8347, 0x0342,
1923 0x03C0, 0x83C5, 0x83CF, 0x03CA, 0x83DB, 0x03DE, 0x03D4, 0x83D1,
1924 0x83F3, 0x03F6, 0x03FC, 0x83F9, 0x03E8, 0x83ED, 0x83E7, 0x03E2,
1925 0x83A3, 0x03A6, 0x03AC, 0x83A9, 0x03B8, 0x83BD, 0x83B7, 0x03B2,
1926 0x0390, 0x8395, 0x839F, 0x039A, 0x838B, 0x038E, 0x0384, 0x8381,
1927 0x0280, 0x8285, 0x828F, 0x028A, 0x829B, 0x029E, 0x0294, 0x8291,
1928 0x82B3, 0x02B6, 0x02BC, 0x82B9, 0x02A8, 0x82AD, 0x82A7, 0x02A2,
1929 0x82E3, 0x02E6, 0x02EC, 0x82E9, 0x02F8, 0x82FD, 0x82F7, 0x02F2,
1930 0x02D0, 0x82D5, 0x82DF, 0x02DA, 0x82CB, 0x02CE, 0x02C4, 0x82C1,
1931 0x8243, 0x0246, 0x024C, 0x8249, 0x0258, 0x825D, 0x8257, 0x0252,
1932 0x0270, 0x8275, 0x827F, 0x027A, 0x826B, 0x026E, 0x0264, 0x8261,
1933 0x0220, 0x8225, 0x822F, 0x022A, 0x823B, 0x023E, 0x0234, 0x8231,
1934 0x8213, 0x0216, 0x021C, 0x8219, 0x0208, 0x820D, 0x8207, 0x0202
1935};
1937static DRFLAC_INLINE drflac_uint8 drflac_crc8_byte(drflac_uint8 crc, drflac_uint8 data)
1938{
1939 return drflac__crc8_table[crc ^ data];
1940}
1942static DRFLAC_INLINE drflac_uint8 drflac_crc8(drflac_uint8 crc, drflac_uint32 data, drflac_uint32 count)
1943{
1944#ifdef DR_FLAC_NO_CRC
1945 (void)crc;
1946 (void)data;
1947 (void)count;
1948 return 0;
1949#else
1950#if 0
1951 /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc8(crc, 0, 8);") */
1952 drflac_uint8 p = 0x07;
1953 for (int i = count-1; i >= 0; --i) {
1954 drflac_uint8 bit = (data & (1 << i)) >> i;
1955 if (crc & 0x80) {
1956 crc = ((crc << 1) | bit) ^ p;
1957 } else {
1958 crc = ((crc << 1) | bit);
1959 }
1960 }
1961 return crc;
1962#else
1963 drflac_uint32 wholeBytes;
1964 drflac_uint32 leftoverBits;
1965 drflac_uint64 leftoverDataMask;
1967 static drflac_uint64 leftoverDataMaskTable[8] = {
1968 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
1969 };
1971 DRFLAC_ASSERT(count <= 32);
1973 wholeBytes = count >> 3;
1974 leftoverBits = count - (wholeBytes*8);
1975 leftoverDataMask = leftoverDataMaskTable[leftoverBits];
1977 switch (wholeBytes) {
1978 case 4: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
1979 case 3: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
1980 case 2: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
1981 case 1: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
1982 case 0: if (leftoverBits > 0) crc = (drflac_uint8)((crc << leftoverBits) ^ drflac__crc8_table[(crc >> (8 - leftoverBits)) ^ (data & leftoverDataMask)]);
1983 }
1984 return crc;
1985#endif
1986#endif
1987}
1989static DRFLAC_INLINE drflac_uint16 drflac_crc16_byte(drflac_uint16 crc, drflac_uint8 data)
1990{
1991 return (crc << 8) ^ drflac__crc16_table[(drflac_uint8)(crc >> 8) ^ data];
1992}
1994static DRFLAC_INLINE drflac_uint16 drflac_crc16_cache(drflac_uint16 crc, drflac_cache_t data)
1995{
1996#ifdef DRFLAC_64BIT
1997 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
1998 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
1999 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
2000 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
2001#endif
2002 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
2003 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
2004 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 8) & 0xFF));
2005 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 0) & 0xFF));
2007 return crc;
2008}
2010static DRFLAC_INLINE drflac_uint16 drflac_crc16_bytes(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 byteCount)
2011{
2012 switch (byteCount)
2013 {
2014#ifdef DRFLAC_64BIT
2015 case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
2016 case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
2017 case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
2018 case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
2019#endif
2020 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
2021 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
2022 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 8) & 0xFF));
2023 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 0) & 0xFF));
2024 }
2026 return crc;
2027}
2029#if 0
2030static DRFLAC_INLINE drflac_uint16 drflac_crc16__32bit(drflac_uint16 crc, drflac_uint32 data, drflac_uint32 count)
2031{
2032#ifdef DR_FLAC_NO_CRC
2033 (void)crc;
2034 (void)data;
2035 (void)count;
2036 return 0;
2037#else
2038#if 0
2039 /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc16(crc, 0, 16);") */
2040 drflac_uint16 p = 0x8005;
2041 for (int i = count-1; i >= 0; --i) {
2042 drflac_uint16 bit = (data & (1ULL << i)) >> i;
2043 if (r & 0x8000) {
2044 r = ((r << 1) | bit) ^ p;
2045 } else {
2046 r = ((r << 1) | bit);
2047 }
2048 }
2050 return crc;
2051#else
2052 drflac_uint32 wholeBytes;
2053 drflac_uint32 leftoverBits;
2054 drflac_uint64 leftoverDataMask;
2056 static drflac_uint64 leftoverDataMaskTable[8] = {
2057 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2058 };
2060 DRFLAC_ASSERT(count <= 64);
2062 wholeBytes = count >> 3;
2063 leftoverBits = count & 7;
2064 leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2066 switch (wholeBytes) {
2067 default:
2068 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
2069 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
2070 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
2071 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
2072 case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
2073 }
2074 return crc;
2075#endif
2076#endif
2077}
2079static DRFLAC_INLINE drflac_uint16 drflac_crc16__64bit(drflac_uint16 crc, drflac_uint64 data, drflac_uint32 count)
2080{
2081#ifdef DR_FLAC_NO_CRC
2082 (void)crc;
2083 (void)data;
2084 (void)count;
2085 return 0;
2086#else
2087 drflac_uint32 wholeBytes;
2088 drflac_uint32 leftoverBits;
2089 drflac_uint64 leftoverDataMask;
2091 static drflac_uint64 leftoverDataMaskTable[8] = {
2092 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2093 };
2095 DRFLAC_ASSERT(count <= 64);
2097 wholeBytes = count >> 3;
2098 leftoverBits = count & 7;
2099 leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2101 switch (wholeBytes) {
2102 default:
2103 case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 << 32) << leftoverBits)) >> (56 + leftoverBits))); /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
2104 case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 << 32) << leftoverBits)) >> (48 + leftoverBits)));
2105 case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 << 32) << leftoverBits)) >> (40 + leftoverBits)));
2106 case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF << 32) << leftoverBits)) >> (32 + leftoverBits)));
2107 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 ) << leftoverBits)) >> (24 + leftoverBits)));
2108 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 ) << leftoverBits)) >> (16 + leftoverBits)));
2109 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 ) << leftoverBits)) >> ( 8 + leftoverBits)));
2110 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF ) << leftoverBits)) >> ( 0 + leftoverBits)));
2111 case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
2112 }
2113 return crc;
2114#endif
2115}
2118static DRFLAC_INLINE drflac_uint16 drflac_crc16(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 count)
2119{
2120#ifdef DRFLAC_64BIT
2121 return drflac_crc16__64bit(crc, data, count);
2122#else
2123 return drflac_crc16__32bit(crc, data, count);
2124#endif
2125}
2126#endif
2129#ifdef DRFLAC_64BIT
2130#define drflac__be2host__cache_line drflac__be2host_64
2131#else
2132#define drflac__be2host__cache_line drflac__be2host_32
2133#endif
2135/*
2136BIT READING ATTEMPT #2
2138This uses a 32- or 64-bit bit-shifted cache - as bits are read, the cache is shifted such that the first valid bit is sitting
2139on the most significant bit. It uses the notion of an L1 and L2 cache (borrowed from CPU architecture), where the L1 cache
2140is a 32- or 64-bit unsigned integer (depending on whether or not a 32- or 64-bit build is being compiled) and the L2 is an
2141array of "cache lines", with each cache line being the same size as the L1. The L2 is a buffer of about 4KB and is where data
2142from onRead() is read into.
2143*/
2144#define DRFLAC_CACHE_L1_SIZE_BYTES(bs) (sizeof((bs)->cache))
2145#define DRFLAC_CACHE_L1_SIZE_BITS(bs) (sizeof((bs)->cache)*8)
2146#define DRFLAC_CACHE_L1_BITS_REMAINING(bs) (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (bs)->consumedBits)
2147#define DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount) (~((~(drflac_cache_t)0) >> (_bitCount)))
2148#define DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, _bitCount) (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (_bitCount))
2149#define DRFLAC_CACHE_L1_SELECT(bs, _bitCount) (((bs)->cache) & DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount))
2150#define DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, _bitCount) (DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)))
2151#define DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, _bitCount)(DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> (DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)) & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1)))
2152#define DRFLAC_CACHE_L2_SIZE_BYTES(bs) (sizeof((bs)->cacheL2))
2153#define DRFLAC_CACHE_L2_LINE_COUNT(bs) (DRFLAC_CACHE_L2_SIZE_BYTES(bs) / sizeof((bs)->cacheL2[0]))
2154#define DRFLAC_CACHE_L2_LINES_REMAINING(bs) (DRFLAC_CACHE_L2_LINE_COUNT(bs) - (bs)->nextL2Line)
2157#ifndef DR_FLAC_NO_CRC
2158static DRFLAC_INLINE void drflac__reset_crc16(drflac_bs* bs)
2159{
2160 bs->crc16 = 0;
2161 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2162}
2164static DRFLAC_INLINE void drflac__update_crc16(drflac_bs* bs)
2165{
2166 if (bs->crc16CacheIgnoredBytes == 0) {
2167 bs->crc16 = drflac_crc16_cache(bs->crc16, bs->crc16Cache);
2168 } else {
2169 bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache, DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bs->crc16CacheIgnoredBytes);
2170 bs->crc16CacheIgnoredBytes = 0;
2171 }
2172}
2174static DRFLAC_INLINE drflac_uint16 drflac__flush_crc16(drflac_bs* bs)
2175{
2176 /* We should never be flushing in a situation where we are not aligned on a byte boundary. */
2177 DRFLAC_ASSERT((DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7) == 0);
2179 /*
2180 The bits that were read from the L1 cache need to be accumulated. The number of bytes needing to be accumulated is determined
2181 by the number of bits that have been consumed.
2182 */
2183 if (DRFLAC_CACHE_L1_BITS_REMAINING(bs) == 0) {
2184 drflac__update_crc16(bs);
2185 } else {
2186 /* We only accumulate the consumed bits. */
2187 bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache >> DRFLAC_CACHE_L1_BITS_REMAINING(bs), (bs->consumedBits >> 3) - bs->crc16CacheIgnoredBytes);
2189 /*
2190 The bits that we just accumulated should never be accumulated again. We need to keep track of how many bytes were accumulated
2191 so we can handle that later.
2192 */
2193 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2194 }
2196 return bs->crc16;
2197}
2198#endif
2200static DRFLAC_INLINE drflac_bool32 drflac__reload_l1_cache_from_l2(drflac_bs* bs)
2201{
2202 size_t bytesRead;
2203 size_t alignedL1LineCount;
2205 /* Fast path. Try loading straight from L2. */
2206 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
2207 bs->cache = bs->cacheL2[bs->nextL2Line++];
2208 return DRFLAC_TRUE;
2209 }
2211 /*
2212 If we get here it means we've run out of data in the L2 cache. We'll need to fetch more from the client, if there's
2213 any left.
2214 */
2215 if (bs->unalignedByteCount > 0) {
2216 return DRFLAC_FALSE; /* If we have any unaligned bytes it means there's no more aligned bytes left in the client. */
2217 }
2219 bytesRead = bs->onRead(bs->pUserData, bs->cacheL2, DRFLAC_CACHE_L2_SIZE_BYTES(bs));
2221 bs->nextL2Line = 0;
2222 if (bytesRead == DRFLAC_CACHE_L2_SIZE_BYTES(bs)) {
2223 bs->cache = bs->cacheL2[bs->nextL2Line++];
2224 return DRFLAC_TRUE;
2225 }
2228 /*
2229 If we get here it means we were unable to retrieve enough data to fill the entire L2 cache. It probably
2230 means we've just reached the end of the file. We need to move the valid data down to the end of the buffer
2231 and adjust the index of the next line accordingly. Also keep in mind that the L2 cache must be aligned to
2232 the size of the L1 so we'll need to seek backwards by any misaligned bytes.
2233 */
2234 alignedL1LineCount = bytesRead / DRFLAC_CACHE_L1_SIZE_BYTES(bs);
2236 /* We need to keep track of any unaligned bytes for later use. */
2237 bs->unalignedByteCount = bytesRead - (alignedL1LineCount * DRFLAC_CACHE_L1_SIZE_BYTES(bs));
2238 if (bs->unalignedByteCount > 0) {
2239 bs->unalignedCache = bs->cacheL2[alignedL1LineCount];
2240 }
2242 if (alignedL1LineCount > 0) {
2243 size_t offset = DRFLAC_CACHE_L2_LINE_COUNT(bs) - alignedL1LineCount;
2244 size_t i;
2245 for (i = alignedL1LineCount; i > 0; --i) {
2246 bs->cacheL2[i-1 + offset] = bs->cacheL2[i-1];
2247 }
2249 bs->nextL2Line = (drflac_uint32)offset;
2250 bs->cache = bs->cacheL2[bs->nextL2Line++];
2251 return DRFLAC_TRUE;
2252 } else {
2253 /* If we get into this branch it means we weren't able to load any L1-aligned data. */
2254 bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs);
2255 return DRFLAC_FALSE;
2256 }
2257}
2259static drflac_bool32 drflac__reload_cache(drflac_bs* bs)
2260{
2261 size_t bytesRead;
2263#ifndef DR_FLAC_NO_CRC
2264 drflac__update_crc16(bs);
2265#endif
2267 /* Fast path. Try just moving the next value in the L2 cache to the L1 cache. */
2268 if (drflac__reload_l1_cache_from_l2(bs)) {
2269 bs->cache = drflac__be2host__cache_line(bs->cache);
2270 bs->consumedBits = 0;
2271#ifndef DR_FLAC_NO_CRC
2272 bs->crc16Cache = bs->cache;
2273#endif
2274 return DRFLAC_TRUE;
2275 }
2277 /* Slow path. */
2279 /*
2280 If we get here it means we have failed to load the L1 cache from the L2. Likely we've just reached the end of the stream and the last
2281 few bytes did not meet the alignment requirements for the L2 cache. In this case we need to fall back to a slower path and read the
2282 data from the unaligned cache.
2283 */
2284 bytesRead = bs->unalignedByteCount;
2285 if (bytesRead == 0) {
2286 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs); /* <-- The stream has been exhausted, so marked the bits as consumed. */
2287 return DRFLAC_FALSE;
2288 }
2290 DRFLAC_ASSERT(bytesRead < DRFLAC_CACHE_L1_SIZE_BYTES(bs));
2291 bs->consumedBits = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bytesRead) * 8;
2293 bs->cache = drflac__be2host__cache_line(bs->unalignedCache);
2294 bs->cache &= DRFLAC_CACHE_L1_SELECTION_MASK(DRFLAC_CACHE_L1_BITS_REMAINING(bs)); /* <-- Make sure the consumed bits are always set to zero. Other parts of the library depend on this property. */
2295 bs->unalignedByteCount = 0; /* <-- At this point the unaligned bytes have been moved into the cache and we thus have no more unaligned bytes. */
2297#ifndef DR_FLAC_NO_CRC
2298 bs->crc16Cache = bs->cache >> bs->consumedBits;
2299 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2300#endif
2301 return DRFLAC_TRUE;
2302}
2304static void drflac__reset_cache(drflac_bs* bs)
2305{
2306 bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs); /* <-- This clears the L2 cache. */
2307 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs); /* <-- This clears the L1 cache. */
2308 bs->cache = 0;
2309 bs->unalignedByteCount = 0; /* <-- This clears the trailing unaligned bytes. */
2310 bs->unalignedCache = 0;
2312#ifndef DR_FLAC_NO_CRC
2313 bs->crc16Cache = 0;
2314 bs->crc16CacheIgnoredBytes = 0;
2315#endif
2316}
2319static DRFLAC_INLINE drflac_bool32 drflac__read_uint32(drflac_bs* bs, unsigned int bitCount, drflac_uint32* pResultOut)
2320{
2321 DRFLAC_ASSERT(bs != NULL);
2322 DRFLAC_ASSERT(pResultOut != NULL);
2323 DRFLAC_ASSERT(bitCount > 0);
2324 DRFLAC_ASSERT(bitCount <= 32);
2326 if (bs->consumedBits == DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2327 if (!drflac__reload_cache(bs)) {
2328 return DRFLAC_FALSE;
2329 }
2330 }
2332 if (bitCount <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2333 /*
2334 If we want to load all 32-bits from a 32-bit cache we need to do it slightly differently because we can't do
2335 a 32-bit shift on a 32-bit integer. This will never be the case on 64-bit caches, so we can have a slightly
2336 more optimal solution for this.
2337 */
2338#ifdef DRFLAC_64BIT
2339 *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
2340 bs->consumedBits += bitCount;
2341 bs->cache <<= bitCount;
2342#else
2343 if (bitCount < DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2344 *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
2345 bs->consumedBits += bitCount;
2346 bs->cache <<= bitCount;
2347 } else {
2348 /* Cannot shift by 32-bits, so need to do it differently. */
2349 *pResultOut = (drflac_uint32)bs->cache;
2350 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs);
2351 bs->cache = 0;
2352 }
2353#endif
2355 return DRFLAC_TRUE;
2356 } else {
2357 /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
2358 drflac_uint32 bitCountHi = DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2359 drflac_uint32 bitCountLo = bitCount - bitCountHi;
2360 drflac_uint32 resultHi;
2362 DRFLAC_ASSERT(bitCountHi > 0);
2363 DRFLAC_ASSERT(bitCountHi < 32);
2364 resultHi = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountHi);
2366 if (!drflac__reload_cache(bs)) {
2367 return DRFLAC_FALSE;
2368 }
2369 if (bitCountLo > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2370 /* This happens when we get to end of stream */
2371 return DRFLAC_FALSE;
2372 }
2374 *pResultOut = (resultHi << bitCountLo) | (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountLo);
2375 bs->consumedBits += bitCountLo;
2376 bs->cache <<= bitCountLo;
2377 return DRFLAC_TRUE;
2378 }
2379}
2381static drflac_bool32 drflac__read_int32(drflac_bs* bs, unsigned int bitCount, drflac_int32* pResult)
2382{
2383 drflac_uint32 result;
2385 DRFLAC_ASSERT(bs != NULL);
2386 DRFLAC_ASSERT(pResult != NULL);
2387 DRFLAC_ASSERT(bitCount > 0);
2388 DRFLAC_ASSERT(bitCount <= 32);
2390 if (!drflac__read_uint32(bs, bitCount, &result)) {
2391 return DRFLAC_FALSE;
2392 }
2394 /* Do not attempt to shift by 32 as it's undefined. */
2395 if (bitCount < 32) {
2396 drflac_uint32 signbit;
2397 signbit = ((result >> (bitCount-1)) & 0x01);
2398 result |= (~signbit + 1) << bitCount;
2399 }
2401 *pResult = (drflac_int32)result;
2402 return DRFLAC_TRUE;
2403}
2405#ifdef DRFLAC_64BIT
2406static drflac_bool32 drflac__read_uint64(drflac_bs* bs, unsigned int bitCount, drflac_uint64* pResultOut)
2407{
2408 drflac_uint32 resultHi;
2409 drflac_uint32 resultLo;
2411 DRFLAC_ASSERT(bitCount <= 64);
2412 DRFLAC_ASSERT(bitCount > 32);
2414 if (!drflac__read_uint32(bs, bitCount - 32, &resultHi)) {
2415 return DRFLAC_FALSE;
2416 }
2418 if (!drflac__read_uint32(bs, 32, &resultLo)) {
2419 return DRFLAC_FALSE;
2420 }
2422 *pResultOut = (((drflac_uint64)resultHi) << 32) | ((drflac_uint64)resultLo);
2423 return DRFLAC_TRUE;
2424}
2425#endif
2427/* Function below is unused, but leaving it here in case I need to quickly add it again. */
2428#if 0
2429static drflac_bool32 drflac__read_int64(drflac_bs* bs, unsigned int bitCount, drflac_int64* pResultOut)
2430{
2431 drflac_uint64 result;
2432 drflac_uint64 signbit;
2434 DRFLAC_ASSERT(bitCount <= 64);
2436 if (!drflac__read_uint64(bs, bitCount, &result)) {
2437 return DRFLAC_FALSE;
2438 }
2440 signbit = ((result >> (bitCount-1)) & 0x01);
2441 result |= (~signbit + 1) << bitCount;
2443 *pResultOut = (drflac_int64)result;
2444 return DRFLAC_TRUE;
2445}
2446#endif
2448static drflac_bool32 drflac__read_uint16(drflac_bs* bs, unsigned int bitCount, drflac_uint16* pResult)
2449{
2450 drflac_uint32 result;
2452 DRFLAC_ASSERT(bs != NULL);
2453 DRFLAC_ASSERT(pResult != NULL);
2454 DRFLAC_ASSERT(bitCount > 0);
2455 DRFLAC_ASSERT(bitCount <= 16);
2457 if (!drflac__read_uint32(bs, bitCount, &result)) {
2458 return DRFLAC_FALSE;
2459 }
2461 *pResult = (drflac_uint16)result;
2462 return DRFLAC_TRUE;
2463}
2465#if 0
2466static drflac_bool32 drflac__read_int16(drflac_bs* bs, unsigned int bitCount, drflac_int16* pResult)
2467{
2468 drflac_int32 result;
2470 DRFLAC_ASSERT(bs != NULL);
2471 DRFLAC_ASSERT(pResult != NULL);
2472 DRFLAC_ASSERT(bitCount > 0);
2473 DRFLAC_ASSERT(bitCount <= 16);
2475 if (!drflac__read_int32(bs, bitCount, &result)) {
2476 return DRFLAC_FALSE;
2477 }
2479 *pResult = (drflac_int16)result;
2480 return DRFLAC_TRUE;
2481}
2482#endif
2484static drflac_bool32 drflac__read_uint8(drflac_bs* bs, unsigned int bitCount, drflac_uint8* pResult)
2485{
2486 drflac_uint32 result;
2488 DRFLAC_ASSERT(bs != NULL);
2489 DRFLAC_ASSERT(pResult != NULL);
2490 DRFLAC_ASSERT(bitCount > 0);
2491 DRFLAC_ASSERT(bitCount <= 8);
2493 if (!drflac__read_uint32(bs, bitCount, &result)) {
2494 return DRFLAC_FALSE;
2495 }
2497 *pResult = (drflac_uint8)result;
2498 return DRFLAC_TRUE;
2499}
2501static drflac_bool32 drflac__read_int8(drflac_bs* bs, unsigned int bitCount, drflac_int8* pResult)
2502{
2503 drflac_int32 result;
2505 DRFLAC_ASSERT(bs != NULL);
2506 DRFLAC_ASSERT(pResult != NULL);
2507 DRFLAC_ASSERT(bitCount > 0);
2508 DRFLAC_ASSERT(bitCount <= 8);
2510 if (!drflac__read_int32(bs, bitCount, &result)) {
2511 return DRFLAC_FALSE;
2512 }
2514 *pResult = (drflac_int8)result;
2515 return DRFLAC_TRUE;
2516}
2519static drflac_bool32 drflac__seek_bits(drflac_bs* bs, size_t bitsToSeek)
2520{
2521 if (bitsToSeek <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2522 bs->consumedBits += (drflac_uint32)bitsToSeek;
2523 bs->cache <<= bitsToSeek;
2524 return DRFLAC_TRUE;
2525 } else {
2526 /* It straddles the cached data. This function isn't called too frequently so I'm favouring simplicity here. */
2527 bitsToSeek -= DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2528 bs->consumedBits += DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2529 bs->cache = 0;
2531 /* Simple case. Seek in groups of the same number as bits that fit within a cache line. */
2532#ifdef DRFLAC_64BIT
2533 while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2534 drflac_uint64 bin;
2535 if (!drflac__read_uint64(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
2536 return DRFLAC_FALSE;
2537 }
2538 bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
2539 }
2540#else
2541 while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2542 drflac_uint32 bin;
2543 if (!drflac__read_uint32(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
2544 return DRFLAC_FALSE;
2545 }
2546 bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
2547 }
2548#endif
2550 /* Whole leftover bytes. */
2551 while (bitsToSeek >= 8) {
2552 drflac_uint8 bin;
2553 if (!drflac__read_uint8(bs, 8, &bin)) {
2554 return DRFLAC_FALSE;
2555 }
2556 bitsToSeek -= 8;
2557 }
2559 /* Leftover bits. */
2560 if (bitsToSeek > 0) {
2561 drflac_uint8 bin;
2562 if (!drflac__read_uint8(bs, (drflac_uint32)bitsToSeek, &bin)) {
2563 return DRFLAC_FALSE;
2564 }
2565 bitsToSeek = 0; /* <-- Necessary for the assert below. */
2566 }
2568 DRFLAC_ASSERT(bitsToSeek == 0);
2569 return DRFLAC_TRUE;
2570 }
2571}
2574/* This function moves the bit streamer to the first bit after the sync code (bit 15 of the of the frame header). It will also update the CRC-16. */
2575static drflac_bool32 drflac__find_and_seek_to_next_sync_code(drflac_bs* bs)
2576{
2577 DRFLAC_ASSERT(bs != NULL);
2579 /*
2580 The sync code is always aligned to 8 bits. This is convenient for us because it means we can do byte-aligned movements. The first
2581 thing to do is align to the next byte.
2582 */
2583 if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
2584 return DRFLAC_FALSE;
2585 }
2587 for (;;) {
2588 drflac_uint8 hi;
2590#ifndef DR_FLAC_NO_CRC
2591 drflac__reset_crc16(bs);
2592#endif
2594 if (!drflac__read_uint8(bs, 8, &hi)) {
2595 return DRFLAC_FALSE;
2596 }
2598 if (hi == 0xFF) {
2599 drflac_uint8 lo;
2600 if (!drflac__read_uint8(bs, 6, &lo)) {
2601 return DRFLAC_FALSE;
2602 }
2604 if (lo == 0x3E) {
2605 return DRFLAC_TRUE;
2606 } else {
2607 if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
2608 return DRFLAC_FALSE;
2609 }
2610 }
2611 }
2612 }
2614 /* Should never get here. */
2615 /*return DRFLAC_FALSE;*/
2616}
2619#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
2620#define DRFLAC_IMPLEMENT_CLZ_LZCNT
2621#endif
2622#if defined(_MSC_VER) && _MSC_VER >= 1400 && (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(__clang__)
2623#define DRFLAC_IMPLEMENT_CLZ_MSVC
2624#endif
2625#if defined(__WATCOMC__) && defined(__386__)
2626#define DRFLAC_IMPLEMENT_CLZ_WATCOM
2627#endif
2628#ifdef __MRC__
2629#include <intrinsics.h>
2630#define DRFLAC_IMPLEMENT_CLZ_MRC
2631#endif
2633static DRFLAC_INLINE drflac_uint32 drflac__clz_software(drflac_cache_t x)
2634{
2635 drflac_uint32 n;
2636 static drflac_uint32 clz_table_4[] = {
2637 0,
2638 4,
2639 3, 3,
2640 2, 2, 2, 2,
2641 1, 1, 1, 1, 1, 1, 1, 1
2642 };
2644 if (x == 0) {
2645 return sizeof(x)*8;
2646 }
2648 n = clz_table_4[x >> (sizeof(x)*8 - 4)];
2649 if (n == 0) {
2650#ifdef DRFLAC_64BIT
2651 if ((x & ((drflac_uint64)0xFFFFFFFF << 32)) == 0) { n = 32; x <<= 32; }
2652 if ((x & ((drflac_uint64)0xFFFF0000 << 32)) == 0) { n += 16; x <<= 16; }
2653 if ((x & ((drflac_uint64)0xFF000000 << 32)) == 0) { n += 8; x <<= 8; }
2654 if ((x & ((drflac_uint64)0xF0000000 << 32)) == 0) { n += 4; x <<= 4; }
2655#else
2656 if ((x & 0xFFFF0000) == 0) { n = 16; x <<= 16; }
2657 if ((x & 0xFF000000) == 0) { n += 8; x <<= 8; }
2658 if ((x & 0xF0000000) == 0) { n += 4; x <<= 4; }
2659#endif
2660 n += clz_table_4[x >> (sizeof(x)*8 - 4)];
2661 }
2663 return n - 1;
2664}
2666#ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
2667static DRFLAC_INLINE drflac_bool32 drflac__is_lzcnt_supported(void)
2668{
2669 /* Fast compile time check for ARM. */
2670#if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
2671 return DRFLAC_TRUE;
2672#elif defined(__MRC__)
2673 return DRFLAC_TRUE;
2674#else
2675 /* If the compiler itself does not support the intrinsic then we'll need to return false. */
2676 #ifdef DRFLAC_HAS_LZCNT_INTRINSIC
2677 return drflac__gIsLZCNTSupported;
2678 #else
2679 return DRFLAC_FALSE;
2680 #endif
2681#endif
2682}
2684static DRFLAC_INLINE drflac_uint32 drflac__clz_lzcnt(drflac_cache_t x)
2685{
2686 /*
2687 It's critical for competitive decoding performance that this function be highly optimal. With MSVC we can use the __lzcnt64() and __lzcnt() intrinsics
2688 to achieve good performance, however on GCC and Clang it's a little bit more annoying. The __builtin_clzl() and __builtin_clzll() intrinsics leave
2689 it undefined as to the return value when `x` is 0. We need this to be well defined as returning 32 or 64, depending on whether or not it's a 32- or
2690 64-bit build. To work around this we would need to add a conditional to check for the x = 0 case, but this creates unnecessary inefficiency. To work
2691 around this problem I have written some inline assembly to emit the LZCNT (x86) or CLZ (ARM) instruction directly which removes the need to include
2692 the conditional. This has worked well in the past, but for some reason Clang's MSVC compatible driver, clang-cl, does not seem to be handling this
2693 in the same way as the normal Clang driver. It seems that `clang-cl` is just outputting the wrong results sometimes, maybe due to some register
2694 getting clobbered?
2696 I'm not sure if this is a bug with dr_flac's inlined assembly (most likely), a bug in `clang-cl` or just a misunderstanding on my part with inline
2697 assembly rules for `clang-cl`. If somebody can identify an error in dr_flac's inlined assembly I'm happy to get that fixed.
2699 Fortunately there is an easy workaround for this. Clang implements MSVC-specific intrinsics for compatibility. It also defines _MSC_VER for extra
2700 compatibility. We can therefore just check for _MSC_VER and use the MSVC intrinsic which, fortunately for us, Clang supports. It would still be nice
2701 to know how to fix the inlined assembly for correctness sake, however.
2702 */
2704#if defined(_MSC_VER) /*&& !defined(__clang__)*/ /* <-- Intentionally wanting Clang to use the MSVC __lzcnt64/__lzcnt intrinsics due to above ^. */
2705 #ifdef DRFLAC_64BIT
2706 return (drflac_uint32)__lzcnt64(x);
2707 #else
2708 return (drflac_uint32)__lzcnt(x);
2709 #endif
2710#else
2711 #if defined(__GNUC__) || defined(__clang__)
2712 #if defined(DRFLAC_X64)
2713 {
2714 drflac_uint64 r;
2715 __asm__ __volatile__ (
2716 "lzcnt{ %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
2717 );
2719 return (drflac_uint32)r;
2720 }
2721 #elif defined(DRFLAC_X86)
2722 {
2723 drflac_uint32 r;
2724 __asm__ __volatile__ (
2725 "lzcnt{l %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
2726 );
2728 return r;
2729 }
2730 #elif defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5) && !defined(__ARM_ARCH_6M__) && !defined(DRFLAC_64BIT) /* <-- I haven't tested 64-bit inline assembly, so only enabling this for the 32-bit build for now. */
2731 {
2732 unsigned int r;
2733 __asm__ __volatile__ (
2734 #if defined(DRFLAC_64BIT)
2735 "clz %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(x) /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
2736 #else
2737 "clz %[out], %[in]" : [out]"=r"(r) : [in]"r"(x)
2738 #endif
2739 );
2741 return r;
2742 }
2743 #else
2744 if (x == 0) {
2745 return sizeof(x)*8;
2746 }
2747 #ifdef DRFLAC_64BIT
2748 return (drflac_uint32)__builtin_clzll((drflac_uint64)x);
2749 #else
2750 return (drflac_uint32)__builtin_clzl((drflac_uint32)x);
2751 #endif
2752 #endif
2753 #else
2754 /* Unsupported compiler. */
2755 #error "This compiler does not support the lzcnt intrinsic."
2756 #endif
2757#endif
2758}
2759#endif
2761#ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
2762#include <intrin.h> /* For BitScanReverse(). */
2764static DRFLAC_INLINE drflac_uint32 drflac__clz_msvc(drflac_cache_t x)
2765{
2766 drflac_uint32 n;
2768 if (x == 0) {
2769 return sizeof(x)*8;
2770 }
2772#ifdef DRFLAC_64BIT
2773 _BitScanReverse64((unsigned long*)&n, x);
2774#else
2775 _BitScanReverse((unsigned long*)&n, x);
2776#endif
2777 return sizeof(x)*8 - n - 1;
2778}
2779#endif
2781#ifdef DRFLAC_IMPLEMENT_CLZ_WATCOM
2782static __inline drflac_uint32 drflac__clz_watcom (drflac_uint32);
2783#ifdef DRFLAC_IMPLEMENT_CLZ_WATCOM_LZCNT
2784/* Use the LZCNT instruction (only available on some processors since the 2010s). */
2785#pragma aux drflac__clz_watcom_lzcnt = \
2786 "db 0F3h, 0Fh, 0BDh, 0C0h" /* lzcnt eax, eax */ \
2787 parm [eax] \
2788 value [eax] \
2789 modify nomemory;
2790#else
2791/* Use the 386+-compatible implementation. */
2792#pragma aux drflac__clz_watcom = \
2793 "bsr eax, eax" \
2794 "xor eax, 31" \
2795 parm [eax] nomemory \
2796 value [eax] \
2797 modify exact [eax] nomemory;
2798#endif
2799#endif
2801static DRFLAC_INLINE drflac_uint32 drflac__clz(drflac_cache_t x)
2802{
2803#ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
2804 if (drflac__is_lzcnt_supported()) {
2805 return drflac__clz_lzcnt(x);
2806 } else
2807#endif
2808 {
2809#ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
2810 return drflac__clz_msvc(x);
2811#elif defined(DRFLAC_IMPLEMENT_CLZ_WATCOM_LZCNT)
2812 return drflac__clz_watcom_lzcnt(x);
2813#elif defined(DRFLAC_IMPLEMENT_CLZ_WATCOM)
2814 return (x == 0) ? sizeof(x)*8 : drflac__clz_watcom(x);
2815#elif defined(__MRC__)
2816 return __cntlzw(x);
2817#else
2818 return drflac__clz_software(x);
2819#endif
2820 }
2821}
2824static DRFLAC_INLINE drflac_bool32 drflac__seek_past_next_set_bit(drflac_bs* bs, unsigned int* pOffsetOut)
2825{
2826 drflac_uint32 zeroCounter = 0;
2827 drflac_uint32 setBitOffsetPlus1;
2829 while (bs->cache == 0) {
2830 zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2831 if (!drflac__reload_cache(bs)) {
2832 return DRFLAC_FALSE;
2833 }
2834 }
2836 if (bs->cache == 1) {
2837 /* Not catching this would lead to undefined behaviour: a shift of a 32-bit number by 32 or more is undefined */
2838 *pOffsetOut = zeroCounter + (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs) - 1;
2839 if (!drflac__reload_cache(bs)) {
2840 return DRFLAC_FALSE;
2841 }
2843 return DRFLAC_TRUE;
2844 }
2846 setBitOffsetPlus1 = drflac__clz(bs->cache);
2847 setBitOffsetPlus1 += 1;
2849 if (setBitOffsetPlus1 > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2850 /* This happens when we get to end of stream */
2851 return DRFLAC_FALSE;
2852 }
2854 bs->consumedBits += setBitOffsetPlus1;
2855 bs->cache <<= setBitOffsetPlus1;
2857 *pOffsetOut = zeroCounter + setBitOffsetPlus1 - 1;
2858 return DRFLAC_TRUE;
2859}
2863static drflac_bool32 drflac__seek_to_byte(drflac_bs* bs, drflac_uint64 offsetFromStart)
2864{
2865 DRFLAC_ASSERT(bs != NULL);
2866 DRFLAC_ASSERT(offsetFromStart > 0);
2868 /*
2869 Seeking from the start is not quite as trivial as it sounds because the onSeek callback takes a signed 32-bit integer (which
2870 is intentional because it simplifies the implementation of the onSeek callbacks), however offsetFromStart is unsigned 64-bit.
2871 To resolve we just need to do an initial seek from the start, and then a series of offset seeks to make up the remainder.
2872 */
2873 if (offsetFromStart > 0x7FFFFFFF) {
2874 drflac_uint64 bytesRemaining = offsetFromStart;
2875 if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, DRFLAC_SEEK_SET)) {
2876 return DRFLAC_FALSE;
2877 }
2878 bytesRemaining -= 0x7FFFFFFF;
2880 while (bytesRemaining > 0x7FFFFFFF) {
2881 if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, DRFLAC_SEEK_CUR)) {
2882 return DRFLAC_FALSE;
2883 }
2884 bytesRemaining -= 0x7FFFFFFF;
2885 }
2887 if (bytesRemaining > 0) {
2888 if (!bs->onSeek(bs->pUserData, (int)bytesRemaining, DRFLAC_SEEK_CUR)) {
2889 return DRFLAC_FALSE;
2890 }
2891 }
2892 } else {
2893 if (!bs->onSeek(bs->pUserData, (int)offsetFromStart, DRFLAC_SEEK_SET)) {
2894 return DRFLAC_FALSE;
2895 }
2896 }
2898 /* The cache should be reset to force a reload of fresh data from the client. */
2899 drflac__reset_cache(bs);
2900 return DRFLAC_TRUE;
2901}
2904static drflac_result drflac__read_utf8_coded_number(drflac_bs* bs, drflac_uint64* pNumberOut, drflac_uint8* pCRCOut)
2905{
2906 drflac_uint8 crc;
2907 drflac_uint64 result;
2908 drflac_uint8 utf8[7] = {0};
2909 int byteCount;
2910 int i;
2912 DRFLAC_ASSERT(bs != NULL);
2913 DRFLAC_ASSERT(pNumberOut != NULL);
2914 DRFLAC_ASSERT(pCRCOut != NULL);
2916 crc = *pCRCOut;
2918 if (!drflac__read_uint8(bs, 8, utf8)) {
2919 *pNumberOut = 0;
2920 return DRFLAC_AT_END;
2921 }
2922 crc = drflac_crc8(crc, utf8[0], 8);
2924 if ((utf8[0] & 0x80) == 0) {
2925 *pNumberOut = utf8[0];
2926 *pCRCOut = crc;
2927 return DRFLAC_SUCCESS;
2928 }
2930 /*byteCount = 1;*/
2931 if ((utf8[0] & 0xE0) == 0xC0) {
2932 byteCount = 2;
2933 } else if ((utf8[0] & 0xF0) == 0xE0) {
2934 byteCount = 3;
2935 } else if ((utf8[0] & 0xF8) == 0xF0) {
2936 byteCount = 4;
2937 } else if ((utf8[0] & 0xFC) == 0xF8) {
2938 byteCount = 5;
2939 } else if ((utf8[0] & 0xFE) == 0xFC) {
2940 byteCount = 6;
2941 } else if ((utf8[0] & 0xFF) == 0xFE) {
2942 byteCount = 7;
2943 } else {
2944 *pNumberOut = 0;
2945 return DRFLAC_CRC_MISMATCH; /* Bad UTF-8 encoding. */
2946 }
2948 /* Read extra bytes. */
2949 DRFLAC_ASSERT(byteCount > 1);
2951 result = (drflac_uint64)(utf8[0] & (0xFF >> (byteCount + 1)));
2952 for (i = 1; i < byteCount; ++i) {
2953 if (!drflac__read_uint8(bs, 8, utf8 + i)) {
2954 *pNumberOut = 0;
2955 return DRFLAC_AT_END;
2956 }
2957 crc = drflac_crc8(crc, utf8[i], 8);
2959 result = (result << 6) | (utf8[i] & 0x3F);
2960 }
2962 *pNumberOut = result;
2963 *pCRCOut = crc;
2964 return DRFLAC_SUCCESS;
2965}
2968static DRFLAC_INLINE drflac_uint32 drflac__ilog2_u32(drflac_uint32 x)
2969{
2970#if 1 /* Needs optimizing. */
2971 drflac_uint32 result = 0;
2972 while (x > 0) {
2973 result += 1;
2974 x >>= 1;
2975 }
2977 return result;
2978#endif
2979}
2981static DRFLAC_INLINE drflac_bool32 drflac__use_64_bit_prediction(drflac_uint32 bitsPerSample, drflac_uint32 order, drflac_uint32 precision)
2982{
2983 /* https://web.archive.org/web/20220205005724/https://github.com/ietf-wg-cellar/flac-specification/blob/37a49aa48ba4ba12e8757badfc59c0df35435fec/rfc_backmatter.md */
2984 return bitsPerSample + precision + drflac__ilog2_u32(order) > 32;
2985}
2988/*
2989The next two functions are responsible for calculating the prediction.
2991When the bits per sample is >16 we need to use 64-bit integer arithmetic because otherwise we'll run out of precision. It's
2992safe to assume this will be slower on 32-bit platforms so we use a more optimal solution when the bits per sample is <=16.
2993*/
2994#if defined(__clang__)
2995__attribute__((no_sanitize("signed-integer-overflow")))
2996#endif
2997static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_32(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
2998{
2999 drflac_int32 prediction = 0;
3001 DRFLAC_ASSERT(order <= 32);
3003 /* 32-bit version. */
3005 /* VC++ optimizes this to a single jmp. I've not yet verified this for other compilers. */
3006 switch (order)
3007 {
3008 case 32: prediction += coefficients[31] * pDecodedSamples[-32];
3009 case 31: prediction += coefficients[30] * pDecodedSamples[-31];
3010 case 30: prediction += coefficients[29] * pDecodedSamples[-30];
3011 case 29: prediction += coefficients[28] * pDecodedSamples[-29];
3012 case 28: prediction += coefficients[27] * pDecodedSamples[-28];
3013 case 27: prediction += coefficients[26] * pDecodedSamples[-27];
3014 case 26: prediction += coefficients[25] * pDecodedSamples[-26];
3015 case 25: prediction += coefficients[24] * pDecodedSamples[-25];
3016 case 24: prediction += coefficients[23] * pDecodedSamples[-24];
3017 case 23: prediction += coefficients[22] * pDecodedSamples[-23];
3018 case 22: prediction += coefficients[21] * pDecodedSamples[-22];
3019 case 21: prediction += coefficients[20] * pDecodedSamples[-21];
3020 case 20: prediction += coefficients[19] * pDecodedSamples[-20];
3021 case 19: prediction += coefficients[18] * pDecodedSamples[-19];
3022 case 18: prediction += coefficients[17] * pDecodedSamples[-18];
3023 case 17: prediction += coefficients[16] * pDecodedSamples[-17];
3024 case 16: prediction += coefficients[15] * pDecodedSamples[-16];
3025 case 15: prediction += coefficients[14] * pDecodedSamples[-15];
3026 case 14: prediction += coefficients[13] * pDecodedSamples[-14];
3027 case 13: prediction += coefficients[12] * pDecodedSamples[-13];
3028 case 12: prediction += coefficients[11] * pDecodedSamples[-12];
3029 case 11: prediction += coefficients[10] * pDecodedSamples[-11];
3030 case 10: prediction += coefficients[ 9] * pDecodedSamples[-10];
3031 case 9: prediction += coefficients[ 8] * pDecodedSamples[- 9];
3032 case 8: prediction += coefficients[ 7] * pDecodedSamples[- 8];
3033 case 7: prediction += coefficients[ 6] * pDecodedSamples[- 7];
3034 case 6: prediction += coefficients[ 5] * pDecodedSamples[- 6];
3035 case 5: prediction += coefficients[ 4] * pDecodedSamples[- 5];
3036 case 4: prediction += coefficients[ 3] * pDecodedSamples[- 4];
3037 case 3: prediction += coefficients[ 2] * pDecodedSamples[- 3];
3038 case 2: prediction += coefficients[ 1] * pDecodedSamples[- 2];
3039 case 1: prediction += coefficients[ 0] * pDecodedSamples[- 1];
3040 }
3042 return (drflac_int32)(prediction >> shift);
3043}
3045static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_64(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
3046{
3047 drflac_int64 prediction;
3049 DRFLAC_ASSERT(order <= 32);
3051 /* 64-bit version. */
3053 /* This method is faster on the 32-bit build when compiling with VC++. See note below. */
3054#ifndef DRFLAC_64BIT
3055 if (order == 8)
3056 {
3057 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3058 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3059 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3060 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3061 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3062 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3063 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3064 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3065 }
3066 else if (order == 7)
3067 {
3068 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3069 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3070 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3071 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3072 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3073 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3074 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3075 }
3076 else if (order == 3)
3077 {
3078 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3079 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3080 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3081 }
3082 else if (order == 6)
3083 {
3084 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3085 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3086 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3087 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3088 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3089 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3090 }
3091 else if (order == 5)
3092 {
3093 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3094 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3095 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3096 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3097 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3098 }
3099 else if (order == 4)
3100 {
3101 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3102 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3103 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3104 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3105 }
3106 else if (order == 12)
3107 {
3108 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3109 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3110 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3111 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3112 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3113 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3114 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3115 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3116 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3117 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
3118 prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3119 prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
3120 }
3121 else if (order == 2)
3122 {
3123 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3124 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3125 }
3126 else if (order == 1)
3127 {
3128 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3129 }
3130 else if (order == 10)
3131 {
3132 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3133 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3134 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3135 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3136 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3137 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3138 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3139 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3140 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3141 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
3142 }
3143 else if (order == 9)
3144 {
3145 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3146 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3147 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3148 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3149 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3150 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3151 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3152 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3153 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3154 }
3155 else if (order == 11)
3156 {
3157 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3158 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3159 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3160 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3161 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3162 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3163 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3164 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3165 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3166 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
3167 prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3168 }
3169 else
3170 {
3171 int j;
3173 prediction = 0;
3174 for (j = 0; j < (int)order; ++j) {
3175 prediction += coefficients[j] * (drflac_int64)pDecodedSamples[-j-1];
3176 }
3177 }
3178#endif
3180 /*
3181 VC++ optimizes this to a single jmp instruction, but only the 64-bit build. The 32-bit build generates less efficient code for some
3182 reason. The ugly version above is faster so we'll just switch between the two depending on the target platform.
3183 */
3184#ifdef DRFLAC_64BIT
3185 prediction = 0;
3186 switch (order)
3187 {
3188 case 32: prediction += coefficients[31] * (drflac_int64)pDecodedSamples[-32];
3189 case 31: prediction += coefficients[30] * (drflac_int64)pDecodedSamples[-31];
3190 case 30: prediction += coefficients[29] * (drflac_int64)pDecodedSamples[-30];
3191 case 29: prediction += coefficients[28] * (drflac_int64)pDecodedSamples[-29];
3192 case 28: prediction += coefficients[27] * (drflac_int64)pDecodedSamples[-28];
3193 case 27: prediction += coefficients[26] * (drflac_int64)pDecodedSamples[-27];
3194 case 26: prediction += coefficients[25] * (drflac_int64)pDecodedSamples[-26];
3195 case 25: prediction += coefficients[24] * (drflac_int64)pDecodedSamples[-25];
3196 case 24: prediction += coefficients[23] * (drflac_int64)pDecodedSamples[-24];
3197 case 23: prediction += coefficients[22] * (drflac_int64)pDecodedSamples[-23];
3198 case 22: prediction += coefficients[21] * (drflac_int64)pDecodedSamples[-22];
3199 case 21: prediction += coefficients[20] * (drflac_int64)pDecodedSamples[-21];
3200 case 20: prediction += coefficients[19] * (drflac_int64)pDecodedSamples[-20];
3201 case 19: prediction += coefficients[18] * (drflac_int64)pDecodedSamples[-19];
3202 case 18: prediction += coefficients[17] * (drflac_int64)pDecodedSamples[-18];
3203 case 17: prediction += coefficients[16] * (drflac_int64)pDecodedSamples[-17];
3204 case 16: prediction += coefficients[15] * (drflac_int64)pDecodedSamples[-16];
3205 case 15: prediction += coefficients[14] * (drflac_int64)pDecodedSamples[-15];
3206 case 14: prediction += coefficients[13] * (drflac_int64)pDecodedSamples[-14];
3207 case 13: prediction += coefficients[12] * (drflac_int64)pDecodedSamples[-13];
3208 case 12: prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
3209 case 11: prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3210 case 10: prediction += coefficients[ 9] * (drflac_int64)pDecodedSamples[-10];
3211 case 9: prediction += coefficients[ 8] * (drflac_int64)pDecodedSamples[- 9];
3212 case 8: prediction += coefficients[ 7] * (drflac_int64)pDecodedSamples[- 8];
3213 case 7: prediction += coefficients[ 6] * (drflac_int64)pDecodedSamples[- 7];
3214 case 6: prediction += coefficients[ 5] * (drflac_int64)pDecodedSamples[- 6];
3215 case 5: prediction += coefficients[ 4] * (drflac_int64)pDecodedSamples[- 5];
3216 case 4: prediction += coefficients[ 3] * (drflac_int64)pDecodedSamples[- 4];
3217 case 3: prediction += coefficients[ 2] * (drflac_int64)pDecodedSamples[- 3];
3218 case 2: prediction += coefficients[ 1] * (drflac_int64)pDecodedSamples[- 2];
3219 case 1: prediction += coefficients[ 0] * (drflac_int64)pDecodedSamples[- 1];
3220 }
3221#endif
3223 return (drflac_int32)(prediction >> shift);
3224}
3227#if 0
3228/*
3229Reference implementation for reading and decoding samples with residual. This is intentionally left unoptimized for the
3230sake of readability and should only be used as a reference.
3231*/
3232static drflac_bool32 drflac__decode_samples_with_residual__rice__reference(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3233{
3234 drflac_uint32 i;
3236 DRFLAC_ASSERT(bs != NULL);
3237 DRFLAC_ASSERT(pSamplesOut != NULL);
3239 for (i = 0; i < count; ++i) {
3240 drflac_uint32 zeroCounter = 0;
3241 for (;;) {
3242 drflac_uint8 bit;
3243 if (!drflac__read_uint8(bs, 1, &bit)) {
3244 return DRFLAC_FALSE;
3245 }
3247 if (bit == 0) {
3248 zeroCounter += 1;
3249 } else {
3250 break;
3251 }
3252 }
3254 drflac_uint32 decodedRice;
3255 if (riceParam > 0) {
3256 if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
3257 return DRFLAC_FALSE;
3258 }
3259 } else {
3260 decodedRice = 0;
3261 }
3263 decodedRice |= (zeroCounter << riceParam);
3264 if ((decodedRice & 0x01)) {
3265 decodedRice = ~(decodedRice >> 1);
3266 } else {
3267 decodedRice = (decodedRice >> 1);
3268 }
3271 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
3272 pSamplesOut[i] = decodedRice + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
3273 } else {
3274 pSamplesOut[i] = decodedRice + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
3275 }
3276 }
3278 return DRFLAC_TRUE;
3279}
3280#endif
3282#if 0
3283static drflac_bool32 drflac__read_rice_parts__reference(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3284{
3285 drflac_uint32 zeroCounter = 0;
3286 drflac_uint32 decodedRice;
3288 for (;;) {
3289 drflac_uint8 bit;
3290 if (!drflac__read_uint8(bs, 1, &bit)) {
3291 return DRFLAC_FALSE;
3292 }
3294 if (bit == 0) {
3295 zeroCounter += 1;
3296 } else {
3297 break;
3298 }
3299 }
3301 if (riceParam > 0) {
3302 if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
3303 return DRFLAC_FALSE;
3304 }
3305 } else {
3306 decodedRice = 0;
3307 }
3309 *pZeroCounterOut = zeroCounter;
3310 *pRiceParamPartOut = decodedRice;
3311 return DRFLAC_TRUE;
3312}
3313#endif
3315#if 0
3316static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3317{
3318 drflac_cache_t riceParamMask;
3319 drflac_uint32 zeroCounter;
3320 drflac_uint32 setBitOffsetPlus1;
3321 drflac_uint32 riceParamPart;
3322 drflac_uint32 riceLength;
3324 DRFLAC_ASSERT(riceParam > 0); /* <-- riceParam should never be 0. drflac__read_rice_parts__param_equals_zero() should be used instead for this case. */
3326 riceParamMask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParam);
3328 zeroCounter = 0;
3329 while (bs->cache == 0) {
3330 zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
3331 if (!drflac__reload_cache(bs)) {
3332 return DRFLAC_FALSE;
3333 }
3334 }
3336 setBitOffsetPlus1 = drflac__clz(bs->cache);
3337 zeroCounter += setBitOffsetPlus1;
3338 setBitOffsetPlus1 += 1;
3340 riceLength = setBitOffsetPlus1 + riceParam;
3341 if (riceLength < DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3342 riceParamPart = (drflac_uint32)((bs->cache & (riceParamMask >> setBitOffsetPlus1)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceLength));
3344 bs->consumedBits += riceLength;
3345 bs->cache <<= riceLength;
3346 } else {
3347 drflac_uint32 bitCountLo;
3348 drflac_cache_t resultHi;
3350 bs->consumedBits += riceLength;
3351 bs->cache <<= setBitOffsetPlus1 & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1); /* <-- Equivalent to "if (setBitOffsetPlus1 < DRFLAC_CACHE_L1_SIZE_BITS(bs)) { bs->cache <<= setBitOffsetPlus1; }" */
3353 /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
3354 bitCountLo = bs->consumedBits - DRFLAC_CACHE_L1_SIZE_BITS(bs);
3355 resultHi = DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, riceParam); /* <-- Use DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE() if ever this function allows riceParam=0. */
3357 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3358#ifndef DR_FLAC_NO_CRC
3359 drflac__update_crc16(bs);
3360#endif
3361 bs->cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3362 bs->consumedBits = 0;
3363#ifndef DR_FLAC_NO_CRC
3364 bs->crc16Cache = bs->cache;
3365#endif
3366 } else {
3367 /* Slow path. We need to fetch more data from the client. */
3368 if (!drflac__reload_cache(bs)) {
3369 return DRFLAC_FALSE;
3370 }
3371 if (bitCountLo > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3372 /* This happens when we get to end of stream */
3373 return DRFLAC_FALSE;
3374 }
3375 }
3377 riceParamPart = (drflac_uint32)(resultHi | DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, bitCountLo));
3379 bs->consumedBits += bitCountLo;
3380 bs->cache <<= bitCountLo;
3381 }
3383 pZeroCounterOut[0] = zeroCounter;
3384 pRiceParamPartOut[0] = riceParamPart;
3386 return DRFLAC_TRUE;
3387}
3388#endif
3390static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts_x1(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3391{
3392 drflac_uint32 riceParamPlus1 = riceParam + 1;
3393 /*drflac_cache_t riceParamPlus1Mask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParamPlus1);*/
3394 drflac_uint32 riceParamPlus1Shift = DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPlus1);
3395 drflac_uint32 riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
3397 /*
3398 The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
3399 no idea how this will work in practice...
3400 */
3401 drflac_cache_t bs_cache = bs->cache;
3402 drflac_uint32 bs_consumedBits = bs->consumedBits;
3404 /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
3405 drflac_uint32 lzcount = drflac__clz(bs_cache);
3406 if (lzcount < sizeof(bs_cache)*8) {
3407 pZeroCounterOut[0] = lzcount;
3409 /*
3410 It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
3411 this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
3412 outside of this function at a higher level.
3413 */
3414 extract_rice_param_part:
3415 bs_cache <<= lzcount;
3416 bs_consumedBits += lzcount;
3418 if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
3419 /* Getting here means the rice parameter part is wholly contained within the current cache line. */
3420 pRiceParamPartOut[0] = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
3421 bs_cache <<= riceParamPlus1;
3422 bs_consumedBits += riceParamPlus1;
3423 } else {
3424 drflac_uint32 riceParamPartHi;
3425 drflac_uint32 riceParamPartLo;
3426 drflac_uint32 riceParamPartLoBitCount;
3428 /*
3429 Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
3430 line, reload the cache, and then combine it with the head of the next cache line.
3431 */
3433 /* Grab the high part of the rice parameter part. */
3434 riceParamPartHi = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
3436 /* Before reloading the cache we need to grab the size in bits of the low part. */
3437 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
3438 DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
3440 /* Now reload the cache. */
3441 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3442 #ifndef DR_FLAC_NO_CRC
3443 drflac__update_crc16(bs);
3444 #endif
3445 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3446 bs_consumedBits = riceParamPartLoBitCount;
3447 #ifndef DR_FLAC_NO_CRC
3448 bs->crc16Cache = bs_cache;
3449 #endif
3450 } else {
3451 /* Slow path. We need to fetch more data from the client. */
3452 if (!drflac__reload_cache(bs)) {
3453 return DRFLAC_FALSE;
3454 }
3455 if (riceParamPartLoBitCount > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3456 /* This happens when we get to end of stream */
3457 return DRFLAC_FALSE;
3458 }
3460 bs_cache = bs->cache;
3461 bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
3462 }
3464 /* We should now have enough information to construct the rice parameter part. */
3465 riceParamPartLo = (drflac_uint32)(bs_cache >> (DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPartLoBitCount)));
3466 pRiceParamPartOut[0] = riceParamPartHi | riceParamPartLo;
3468 bs_cache <<= riceParamPartLoBitCount;
3469 }
3470 } else {
3471 /*
3472 Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
3473 to drflac__clz() and we need to reload the cache.
3474 */
3475 drflac_uint32 zeroCounter = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BITS(bs) - bs_consumedBits);
3476 for (;;) {
3477 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3478 #ifndef DR_FLAC_NO_CRC
3479 drflac__update_crc16(bs);
3480 #endif
3481 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3482 bs_consumedBits = 0;
3483 #ifndef DR_FLAC_NO_CRC
3484 bs->crc16Cache = bs_cache;
3485 #endif
3486 } else {
3487 /* Slow path. We need to fetch more data from the client. */
3488 if (!drflac__reload_cache(bs)) {
3489 return DRFLAC_FALSE;
3490 }
3492 bs_cache = bs->cache;
3493 bs_consumedBits = bs->consumedBits;
3494 }
3496 lzcount = drflac__clz(bs_cache);
3497 zeroCounter += lzcount;
3499 if (lzcount < sizeof(bs_cache)*8) {
3500 break;
3501 }
3502 }
3504 pZeroCounterOut[0] = zeroCounter;
3505 goto extract_rice_param_part;
3506 }
3508 /* Make sure the cache is restored at the end of it all. */
3509 bs->cache = bs_cache;
3510 bs->consumedBits = bs_consumedBits;
3512 return DRFLAC_TRUE;
3513}
3515static DRFLAC_INLINE drflac_bool32 drflac__seek_rice_parts(drflac_bs* bs, drflac_uint8 riceParam)
3516{
3517 drflac_uint32 riceParamPlus1 = riceParam + 1;
3518 drflac_uint32 riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
3520 /*
3521 The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
3522 no idea how this will work in practice...
3523 */
3524 drflac_cache_t bs_cache = bs->cache;
3525 drflac_uint32 bs_consumedBits = bs->consumedBits;
3527 /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
3528 drflac_uint32 lzcount = drflac__clz(bs_cache);
3529 if (lzcount < sizeof(bs_cache)*8) {
3530 /*
3531 It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
3532 this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
3533 outside of this function at a higher level.
3534 */
3535 extract_rice_param_part:
3536 bs_cache <<= lzcount;
3537 bs_consumedBits += lzcount;
3539 if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
3540 /* Getting here means the rice parameter part is wholly contained within the current cache line. */
3541 bs_cache <<= riceParamPlus1;
3542 bs_consumedBits += riceParamPlus1;
3543 } else {
3544 /*
3545 Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
3546 line, reload the cache, and then combine it with the head of the next cache line.
3547 */
3549 /* Before reloading the cache we need to grab the size in bits of the low part. */
3550 drflac_uint32 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
3551 DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
3553 /* Now reload the cache. */
3554 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3555 #ifndef DR_FLAC_NO_CRC
3556 drflac__update_crc16(bs);
3557 #endif
3558 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3559 bs_consumedBits = riceParamPartLoBitCount;
3560 #ifndef DR_FLAC_NO_CRC
3561 bs->crc16Cache = bs_cache;
3562 #endif
3563 } else {
3564 /* Slow path. We need to fetch more data from the client. */
3565 if (!drflac__reload_cache(bs)) {
3566 return DRFLAC_FALSE;
3567 }
3569 if (riceParamPartLoBitCount > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3570 /* This happens when we get to end of stream */
3571 return DRFLAC_FALSE;
3572 }
3574 bs_cache = bs->cache;
3575 bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
3576 }
3578 bs_cache <<= riceParamPartLoBitCount;
3579 }
3580 } else {
3581 /*
3582 Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
3583 to drflac__clz() and we need to reload the cache.
3584 */
3585 for (;;) {
3586 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3587 #ifndef DR_FLAC_NO_CRC
3588 drflac__update_crc16(bs);
3589 #endif
3590 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3591 bs_consumedBits = 0;
3592 #ifndef DR_FLAC_NO_CRC
3593 bs->crc16Cache = bs_cache;
3594 #endif
3595 } else {
3596 /* Slow path. We need to fetch more data from the client. */
3597 if (!drflac__reload_cache(bs)) {
3598 return DRFLAC_FALSE;
3599 }
3601 bs_cache = bs->cache;
3602 bs_consumedBits = bs->consumedBits;
3603 }
3605 lzcount = drflac__clz(bs_cache);
3606 if (lzcount < sizeof(bs_cache)*8) {
3607 break;
3608 }
3609 }
3611 goto extract_rice_param_part;
3612 }
3614 /* Make sure the cache is restored at the end of it all. */
3615 bs->cache = bs_cache;
3616 bs->consumedBits = bs_consumedBits;
3618 return DRFLAC_TRUE;
3619}
3622static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar_zeroorder(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3623{
3624 drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3625 drflac_uint32 zeroCountPart0;
3626 drflac_uint32 riceParamPart0;
3627 drflac_uint32 riceParamMask;
3628 drflac_uint32 i;
3630 DRFLAC_ASSERT(bs != NULL);
3631 DRFLAC_ASSERT(pSamplesOut != NULL);
3633 (void)bitsPerSample;
3634 (void)order;
3635 (void)shift;
3636 (void)coefficients;
3638 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
3640 i = 0;
3641 while (i < count) {
3642 /* Rice extraction. */
3643 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
3644 return DRFLAC_FALSE;
3645 }
3647 /* Rice reconstruction. */
3648 riceParamPart0 &= riceParamMask;
3649 riceParamPart0 |= (zeroCountPart0 << riceParam);
3650 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3652 pSamplesOut[i] = riceParamPart0;
3654 i += 1;
3655 }
3657 return DRFLAC_TRUE;
3658}
3660static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3661{
3662 drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3663 drflac_uint32 zeroCountPart0 = 0;
3664 drflac_uint32 zeroCountPart1 = 0;
3665 drflac_uint32 zeroCountPart2 = 0;
3666 drflac_uint32 zeroCountPart3 = 0;
3667 drflac_uint32 riceParamPart0 = 0;
3668 drflac_uint32 riceParamPart1 = 0;
3669 drflac_uint32 riceParamPart2 = 0;
3670 drflac_uint32 riceParamPart3 = 0;
3671 drflac_uint32 riceParamMask;
3672 const drflac_int32* pSamplesOutEnd;
3673 drflac_uint32 i;
3675 DRFLAC_ASSERT(bs != NULL);
3676 DRFLAC_ASSERT(pSamplesOut != NULL);
3678 if (lpcOrder == 0) {
3679 return drflac__decode_samples_with_residual__rice__scalar_zeroorder(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
3680 }
3682 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
3683 pSamplesOutEnd = pSamplesOut + (count & ~3);
3685 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
3686 while (pSamplesOut < pSamplesOutEnd) {
3687 /*
3688 Rice extraction. It's faster to do this one at a time against local variables than it is to use the x4 version
3689 against an array. Not sure why, but perhaps it's making more efficient use of registers?
3690 */
3691 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
3692 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
3693 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
3694 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
3695 return DRFLAC_FALSE;
3696 }
3698 riceParamPart0 &= riceParamMask;
3699 riceParamPart1 &= riceParamMask;
3700 riceParamPart2 &= riceParamMask;
3701 riceParamPart3 &= riceParamMask;
3703 riceParamPart0 |= (zeroCountPart0 << riceParam);
3704 riceParamPart1 |= (zeroCountPart1 << riceParam);
3705 riceParamPart2 |= (zeroCountPart2 << riceParam);
3706 riceParamPart3 |= (zeroCountPart3 << riceParam);
3708 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3709 riceParamPart1 = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
3710 riceParamPart2 = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
3711 riceParamPart3 = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
3713 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
3714 pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 1);
3715 pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 2);
3716 pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 3);
3718 pSamplesOut += 4;
3719 }
3720 } else {
3721 while (pSamplesOut < pSamplesOutEnd) {
3722 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
3723 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
3724 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
3725 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
3726 return DRFLAC_FALSE;
3727 }
3729 riceParamPart0 &= riceParamMask;
3730 riceParamPart1 &= riceParamMask;
3731 riceParamPart2 &= riceParamMask;
3732 riceParamPart3 &= riceParamMask;
3734 riceParamPart0 |= (zeroCountPart0 << riceParam);
3735 riceParamPart1 |= (zeroCountPart1 << riceParam);
3736 riceParamPart2 |= (zeroCountPart2 << riceParam);
3737 riceParamPart3 |= (zeroCountPart3 << riceParam);
3739 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3740 riceParamPart1 = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
3741 riceParamPart2 = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
3742 riceParamPart3 = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
3744 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
3745 pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 1);
3746 pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 2);
3747 pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 3);
3749 pSamplesOut += 4;
3750 }
3751 }
3753 i = (count & ~3);
3754 while (i < count) {
3755 /* Rice extraction. */
3756 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
3757 return DRFLAC_FALSE;
3758 }
3760 /* Rice reconstruction. */
3761 riceParamPart0 &= riceParamMask;
3762 riceParamPart0 |= (zeroCountPart0 << riceParam);
3763 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3764 /*riceParamPart0 = (riceParamPart0 >> 1) ^ (~(riceParamPart0 & 0x01) + 1);*/
3766 /* Sample reconstruction. */
3767 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
3768 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
3769 } else {
3770 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
3771 }
3773 i += 1;
3774 pSamplesOut += 1;
3775 }
3777 return DRFLAC_TRUE;
3778}
3780#if defined(DRFLAC_SUPPORT_SSE2)
3781static DRFLAC_INLINE __m128i drflac__mm_packs_interleaved_epi32(__m128i a, __m128i b)
3782{
3783 __m128i r;
3785 /* Pack. */
3786 r = _mm_packs_epi32(a, b);
3788 /* a3a2 a1a0 b3b2 b1b0 -> a3a2 b3b2 a1a0 b1b0 */
3789 r = _mm_shuffle_epi32(r, _MM_SHUFFLE(3, 1, 2, 0));
3791 /* a3a2 b3b2 a1a0 b1b0 -> a3b3 a2b2 a1b1 a0b0 */
3792 r = _mm_shufflehi_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
3793 r = _mm_shufflelo_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
3795 return r;
3796}
3797#endif
3799#if defined(DRFLAC_SUPPORT_SSE41)
3800static DRFLAC_INLINE __m128i drflac__mm_not_si128(__m128i a)
3801{
3802 return _mm_xor_si128(a, _mm_cmpeq_epi32(_mm_setzero_si128(), _mm_setzero_si128()));
3803}
3805static DRFLAC_INLINE __m128i drflac__mm_hadd_epi32(__m128i x)
3806{
3807 __m128i x64 = _mm_add_epi32(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
3808 __m128i x32 = _mm_shufflelo_epi16(x64, _MM_SHUFFLE(1, 0, 3, 2));
3809 return _mm_add_epi32(x64, x32);
3810}
3812static DRFLAC_INLINE __m128i drflac__mm_hadd_epi64(__m128i x)
3813{
3814 return _mm_add_epi64(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
3815}
3817static DRFLAC_INLINE __m128i drflac__mm_srai_epi64(__m128i x, int count)
3818{
3819 /*
3820 To simplify this we are assuming count < 32. This restriction allows us to work on a low side and a high side. The low side
3821 is shifted with zero bits, whereas the right side is shifted with sign bits.
3822 */
3823 __m128i lo = _mm_srli_epi64(x, count);
3824 __m128i hi = _mm_srai_epi32(x, count);
3826 hi = _mm_and_si128(hi, _mm_set_epi32(0xFFFFFFFF, 0, 0xFFFFFFFF, 0)); /* The high part needs to have the low part cleared. */
3828 return _mm_or_si128(lo, hi);
3829}
3831static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3832{
3833 int i;
3834 drflac_uint32 riceParamMask;
3835 drflac_int32* pDecodedSamples = pSamplesOut;
3836 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
3837 drflac_uint32 zeroCountParts0 = 0;
3838 drflac_uint32 zeroCountParts1 = 0;
3839 drflac_uint32 zeroCountParts2 = 0;
3840 drflac_uint32 zeroCountParts3 = 0;
3841 drflac_uint32 riceParamParts0 = 0;
3842 drflac_uint32 riceParamParts1 = 0;
3843 drflac_uint32 riceParamParts2 = 0;
3844 drflac_uint32 riceParamParts3 = 0;
3845 __m128i coefficients128_0;
3846 __m128i coefficients128_4;
3847 __m128i coefficients128_8;
3848 __m128i samples128_0;
3849 __m128i samples128_4;
3850 __m128i samples128_8;
3851 __m128i riceParamMask128;
3853 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3855 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
3856 riceParamMask128 = _mm_set1_epi32(riceParamMask);
3858 /* Pre-load. */
3859 coefficients128_0 = _mm_setzero_si128();
3860 coefficients128_4 = _mm_setzero_si128();
3861 coefficients128_8 = _mm_setzero_si128();
3863 samples128_0 = _mm_setzero_si128();
3864 samples128_4 = _mm_setzero_si128();
3865 samples128_8 = _mm_setzero_si128();
3867 /*
3868 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
3869 what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results
3870 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
3871 so I think there's opportunity for this to be simplified.
3872 */
3873#if 1
3874 {
3875 int runningOrder = order;
3877 /* 0 - 3. */
3878 if (runningOrder >= 4) {
3879 coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
3880 samples128_0 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 4));
3881 runningOrder -= 4;
3882 } else {
3883 switch (runningOrder) {
3884 case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
3885 case 2: coefficients128_0 = _mm_set_epi32(0, 0, coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0, 0); break;
3886 case 1: coefficients128_0 = _mm_set_epi32(0, 0, 0, coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0, 0, 0); break;
3887 }
3888 runningOrder = 0;
3889 }
3891 /* 4 - 7 */
3892 if (runningOrder >= 4) {
3893 coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
3894 samples128_4 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 8));
3895 runningOrder -= 4;
3896 } else {
3897 switch (runningOrder) {
3898 case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
3899 case 2: coefficients128_4 = _mm_set_epi32(0, 0, coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0, 0); break;
3900 case 1: coefficients128_4 = _mm_set_epi32(0, 0, 0, coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0, 0, 0); break;
3901 }
3902 runningOrder = 0;
3903 }
3905 /* 8 - 11 */
3906 if (runningOrder == 4) {
3907 coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
3908 samples128_8 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 12));
3909 runningOrder -= 4;
3910 } else {
3911 switch (runningOrder) {
3912 case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
3913 case 2: coefficients128_8 = _mm_set_epi32(0, 0, coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0, 0); break;
3914 case 1: coefficients128_8 = _mm_set_epi32(0, 0, 0, coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0, 0, 0); break;
3915 }
3916 runningOrder = 0;
3917 }
3919 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
3920 coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
3921 coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
3922 coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
3923 }
3924#else
3925 /* This causes strict-aliasing warnings with GCC. */
3926 switch (order)
3927 {
3928 case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
3929 case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
3930 case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
3931 case 9: ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
3932 case 8: ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
3933 case 7: ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
3934 case 6: ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
3935 case 5: ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
3936 case 4: ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
3937 case 3: ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
3938 case 2: ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
3939 case 1: ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
3940 }
3941#endif
3943 /* For this version we are doing one sample at a time. */
3944 while (pDecodedSamples < pDecodedSamplesEnd) {
3945 __m128i prediction128;
3946 __m128i zeroCountPart128;
3947 __m128i riceParamPart128;
3949 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
3950 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
3951 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
3952 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
3953 return DRFLAC_FALSE;
3954 }
3956 zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
3957 riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
3959 riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
3960 riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
3961 riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01))), _mm_set1_epi32(0x01))); /* <-- SSE2 compatible */
3962 /*riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_mullo_epi32(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01)), _mm_set1_epi32(0xFFFFFFFF)));*/ /* <-- Only supported from SSE4.1 and is slower in my testing... */
3964 if (order <= 4) {
3965 for (i = 0; i < 4; i += 1) {
3966 prediction128 = _mm_mullo_epi32(coefficients128_0, samples128_0);
3968 /* Horizontal add and shift. */
3969 prediction128 = drflac__mm_hadd_epi32(prediction128);
3970 prediction128 = _mm_srai_epi32(prediction128, shift);
3971 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
3973 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
3974 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
3975 }
3976 } else if (order <= 8) {
3977 for (i = 0; i < 4; i += 1) {
3978 prediction128 = _mm_mullo_epi32(coefficients128_4, samples128_4);
3979 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
3981 /* Horizontal add and shift. */
3982 prediction128 = drflac__mm_hadd_epi32(prediction128);
3983 prediction128 = _mm_srai_epi32(prediction128, shift);
3984 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
3986 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
3987 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
3988 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
3989 }
3990 } else {
3991 for (i = 0; i < 4; i += 1) {
3992 prediction128 = _mm_mullo_epi32(coefficients128_8, samples128_8);
3993 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_4, samples128_4));
3994 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
3996 /* Horizontal add and shift. */
3997 prediction128 = drflac__mm_hadd_epi32(prediction128);
3998 prediction128 = _mm_srai_epi32(prediction128, shift);
3999 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4001 samples128_8 = _mm_alignr_epi8(samples128_4, samples128_8, 4);
4002 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
4003 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4004 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4005 }
4006 }
4008 /* We store samples in groups of 4. */
4009 _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
4010 pDecodedSamples += 4;
4011 }
4013 /* Make sure we process the last few samples. */
4014 i = (count & ~3);
4015 while (i < (int)count) {
4016 /* Rice extraction. */
4017 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
4018 return DRFLAC_FALSE;
4019 }
4021 /* Rice reconstruction. */
4022 riceParamParts0 &= riceParamMask;
4023 riceParamParts0 |= (zeroCountParts0 << riceParam);
4024 riceParamParts0 = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
4026 /* Sample reconstruction. */
4027 pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
4029 i += 1;
4030 pDecodedSamples += 1;
4031 }
4033 return DRFLAC_TRUE;
4034}
4036static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4037{
4038 int i;
4039 drflac_uint32 riceParamMask;
4040 drflac_int32* pDecodedSamples = pSamplesOut;
4041 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4042 drflac_uint32 zeroCountParts0 = 0;
4043 drflac_uint32 zeroCountParts1 = 0;
4044 drflac_uint32 zeroCountParts2 = 0;
4045 drflac_uint32 zeroCountParts3 = 0;
4046 drflac_uint32 riceParamParts0 = 0;
4047 drflac_uint32 riceParamParts1 = 0;
4048 drflac_uint32 riceParamParts2 = 0;
4049 drflac_uint32 riceParamParts3 = 0;
4050 __m128i coefficients128_0;
4051 __m128i coefficients128_4;
4052 __m128i coefficients128_8;
4053 __m128i samples128_0;
4054 __m128i samples128_4;
4055 __m128i samples128_8;
4056 __m128i prediction128;
4057 __m128i riceParamMask128;
4059 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4061 DRFLAC_ASSERT(order <= 12);
4063 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
4064 riceParamMask128 = _mm_set1_epi32(riceParamMask);
4066 prediction128 = _mm_setzero_si128();
4068 /* Pre-load. */
4069 coefficients128_0 = _mm_setzero_si128();
4070 coefficients128_4 = _mm_setzero_si128();
4071 coefficients128_8 = _mm_setzero_si128();
4073 samples128_0 = _mm_setzero_si128();
4074 samples128_4 = _mm_setzero_si128();
4075 samples128_8 = _mm_setzero_si128();
4077#if 1
4078 {
4079 int runningOrder = order;
4081 /* 0 - 3. */
4082 if (runningOrder >= 4) {
4083 coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
4084 samples128_0 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 4));
4085 runningOrder -= 4;
4086 } else {
4087 switch (runningOrder) {
4088 case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
4089 case 2: coefficients128_0 = _mm_set_epi32(0, 0, coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0, 0); break;
4090 case 1: coefficients128_0 = _mm_set_epi32(0, 0, 0, coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0, 0, 0); break;
4091 }
4092 runningOrder = 0;
4093 }
4095 /* 4 - 7 */
4096 if (runningOrder >= 4) {
4097 coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
4098 samples128_4 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 8));
4099 runningOrder -= 4;
4100 } else {
4101 switch (runningOrder) {
4102 case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
4103 case 2: coefficients128_4 = _mm_set_epi32(0, 0, coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0, 0); break;
4104 case 1: coefficients128_4 = _mm_set_epi32(0, 0, 0, coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0, 0, 0); break;
4105 }
4106 runningOrder = 0;
4107 }
4109 /* 8 - 11 */
4110 if (runningOrder == 4) {
4111 coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
4112 samples128_8 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 12));
4113 runningOrder -= 4;
4114 } else {
4115 switch (runningOrder) {
4116 case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
4117 case 2: coefficients128_8 = _mm_set_epi32(0, 0, coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0, 0); break;
4118 case 1: coefficients128_8 = _mm_set_epi32(0, 0, 0, coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0, 0, 0); break;
4119 }
4120 runningOrder = 0;
4121 }
4123 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4124 coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
4125 coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
4126 coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
4127 }
4128#else
4129 switch (order)
4130 {
4131 case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
4132 case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
4133 case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
4134 case 9: ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
4135 case 8: ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
4136 case 7: ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
4137 case 6: ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
4138 case 5: ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
4139 case 4: ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
4140 case 3: ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
4141 case 2: ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
4142 case 1: ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
4143 }
4144#endif
4146 /* For this version we are doing one sample at a time. */
4147 while (pDecodedSamples < pDecodedSamplesEnd) {
4148 __m128i zeroCountPart128;
4149 __m128i riceParamPart128;
4151 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
4152 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
4153 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
4154 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
4155 return DRFLAC_FALSE;
4156 }
4158 zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
4159 riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
4161 riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
4162 riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
4163 riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(1))), _mm_set1_epi32(1)));
4165 for (i = 0; i < 4; i += 1) {
4166 prediction128 = _mm_xor_si128(prediction128, prediction128); /* Reset to 0. */
4168 switch (order)
4169 {
4170 case 12:
4171 case 11: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(1, 1, 0, 0))));
4172 case 10:
4173 case 9: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(3, 3, 2, 2))));
4174 case 8:
4175 case 7: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(1, 1, 0, 0))));
4176 case 6:
4177 case 5: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(3, 3, 2, 2))));
4178 case 4:
4179 case 3: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(1, 1, 0, 0))));
4180 case 2:
4181 case 1: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(3, 3, 2, 2))));
4182 }
4184 /* Horizontal add and shift. */
4185 prediction128 = drflac__mm_hadd_epi64(prediction128);
4186 prediction128 = drflac__mm_srai_epi64(prediction128, shift);
4187 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4189 /* Our value should be sitting in prediction128[0]. We need to combine this with our SSE samples. */
4190 samples128_8 = _mm_alignr_epi8(samples128_4, samples128_8, 4);
4191 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
4192 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4194 /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
4195 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4196 }
4198 /* We store samples in groups of 4. */
4199 _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
4200 pDecodedSamples += 4;
4201 }
4203 /* Make sure we process the last few samples. */
4204 i = (count & ~3);
4205 while (i < (int)count) {
4206 /* Rice extraction. */
4207 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
4208 return DRFLAC_FALSE;
4209 }
4211 /* Rice reconstruction. */
4212 riceParamParts0 &= riceParamMask;
4213 riceParamParts0 |= (zeroCountParts0 << riceParam);
4214 riceParamParts0 = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
4216 /* Sample reconstruction. */
4217 pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
4219 i += 1;
4220 pDecodedSamples += 1;
4221 }
4223 return DRFLAC_TRUE;
4224}
4226static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4227{
4228 DRFLAC_ASSERT(bs != NULL);
4229 DRFLAC_ASSERT(pSamplesOut != NULL);
4231 /* In my testing the order is rarely > 12, so in this case I'm going to simplify the SSE implementation by only handling order <= 12. */
4232 if (lpcOrder > 0 && lpcOrder <= 12) {
4233 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
4234 return drflac__decode_samples_with_residual__rice__sse41_64(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
4235 } else {
4236 return drflac__decode_samples_with_residual__rice__sse41_32(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
4237 }
4238 } else {
4239 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4240 }
4241}
4242#endif
4244#if defined(DRFLAC_SUPPORT_NEON)
4245static DRFLAC_INLINE void drflac__vst2q_s32(drflac_int32* p, int32x4x2_t x)
4246{
4247 vst1q_s32(p+0, x.val[0]);
4248 vst1q_s32(p+4, x.val[1]);
4249}
4251static DRFLAC_INLINE void drflac__vst2q_u32(drflac_uint32* p, uint32x4x2_t x)
4252{
4253 vst1q_u32(p+0, x.val[0]);
4254 vst1q_u32(p+4, x.val[1]);
4255}
4257static DRFLAC_INLINE void drflac__vst2q_f32(float* p, float32x4x2_t x)
4258{
4259 vst1q_f32(p+0, x.val[0]);
4260 vst1q_f32(p+4, x.val[1]);
4261}
4263static DRFLAC_INLINE void drflac__vst2q_s16(drflac_int16* p, int16x4x2_t x)
4264{
4265 vst1q_s16(p, vcombine_s16(x.val[0], x.val[1]));
4266}
4268static DRFLAC_INLINE void drflac__vst2q_u16(drflac_uint16* p, uint16x4x2_t x)
4269{
4270 vst1q_u16(p, vcombine_u16(x.val[0], x.val[1]));
4271}
4273static DRFLAC_INLINE int32x4_t drflac__vdupq_n_s32x4(drflac_int32 x3, drflac_int32 x2, drflac_int32 x1, drflac_int32 x0)
4274{
4275 drflac_int32 x[4];
4276 x[3] = x3;
4277 x[2] = x2;
4278 x[1] = x1;
4279 x[0] = x0;
4280 return vld1q_s32(x);
4281}
4283static DRFLAC_INLINE int32x4_t drflac__valignrq_s32_1(int32x4_t a, int32x4_t b)
4284{
4285 /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
4287 /* Reference */
4288 /*return drflac__vdupq_n_s32x4(
4289 vgetq_lane_s32(a, 0),
4290 vgetq_lane_s32(b, 3),
4291 vgetq_lane_s32(b, 2),
4292 vgetq_lane_s32(b, 1)
4293 );*/
4295 return vextq_s32(b, a, 1);
4296}
4298static DRFLAC_INLINE uint32x4_t drflac__valignrq_u32_1(uint32x4_t a, uint32x4_t b)
4299{
4300 /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
4302 /* Reference */
4303 /*return drflac__vdupq_n_s32x4(
4304 vgetq_lane_s32(a, 0),
4305 vgetq_lane_s32(b, 3),
4306 vgetq_lane_s32(b, 2),
4307 vgetq_lane_s32(b, 1)
4308 );*/
4310 return vextq_u32(b, a, 1);
4311}
4313static DRFLAC_INLINE int32x2_t drflac__vhaddq_s32(int32x4_t x)
4314{
4315 /* The sum must end up in position 0. */
4317 /* Reference */
4318 /*return vdupq_n_s32(
4319 vgetq_lane_s32(x, 3) +
4320 vgetq_lane_s32(x, 2) +
4321 vgetq_lane_s32(x, 1) +
4322 vgetq_lane_s32(x, 0)
4323 );*/
4325 int32x2_t r = vadd_s32(vget_high_s32(x), vget_low_s32(x));
4326 return vpadd_s32(r, r);
4327}
4329static DRFLAC_INLINE int64x1_t drflac__vhaddq_s64(int64x2_t x)
4330{
4331 return vadd_s64(vget_high_s64(x), vget_low_s64(x));
4332}
4334static DRFLAC_INLINE int32x4_t drflac__vrevq_s32(int32x4_t x)
4335{
4336 /* Reference */
4337 /*return drflac__vdupq_n_s32x4(
4338 vgetq_lane_s32(x, 0),
4339 vgetq_lane_s32(x, 1),
4340 vgetq_lane_s32(x, 2),
4341 vgetq_lane_s32(x, 3)
4342 );*/
4344 return vrev64q_s32(vcombine_s32(vget_high_s32(x), vget_low_s32(x)));
4345}
4347static DRFLAC_INLINE int32x4_t drflac__vnotq_s32(int32x4_t x)
4348{
4349 return veorq_s32(x, vdupq_n_s32(0xFFFFFFFF));
4350}
4352static DRFLAC_INLINE uint32x4_t drflac__vnotq_u32(uint32x4_t x)
4353{
4354 return veorq_u32(x, vdupq_n_u32(0xFFFFFFFF));
4355}
4357static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4358{
4359 int i;
4360 drflac_uint32 riceParamMask;
4361 drflac_int32* pDecodedSamples = pSamplesOut;
4362 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4363 drflac_uint32 zeroCountParts[4];
4364 drflac_uint32 riceParamParts[4];
4365 int32x4_t coefficients128_0;
4366 int32x4_t coefficients128_4;
4367 int32x4_t coefficients128_8;
4368 int32x4_t samples128_0;
4369 int32x4_t samples128_4;
4370 int32x4_t samples128_8;
4371 uint32x4_t riceParamMask128;
4372 int32x4_t riceParam128;
4373 int32x2_t shift64;
4374 uint32x4_t one128;
4376 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4378 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
4379 riceParamMask128 = vdupq_n_u32(riceParamMask);
4381 riceParam128 = vdupq_n_s32(riceParam);
4382 shift64 = vdup_n_s32(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
4383 one128 = vdupq_n_u32(1);
4385 /*
4386 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
4387 what's available in the input buffers. It would be conenient to use a fall-through switch to do this, but this results
4388 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
4389 so I think there's opportunity for this to be simplified.
4390 */
4391 {
4392 int runningOrder = order;
4393 drflac_int32 tempC[4] = {0, 0, 0, 0};
4394 drflac_int32 tempS[4] = {0, 0, 0, 0};
4396 /* 0 - 3. */
4397 if (runningOrder >= 4) {
4398 coefficients128_0 = vld1q_s32(coefficients + 0);
4399 samples128_0 = vld1q_s32(pSamplesOut - 4);
4400 runningOrder -= 4;
4401 } else {
4402 switch (runningOrder) {
4403 case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
4404 case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
4405 case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
4406 }
4408 coefficients128_0 = vld1q_s32(tempC);
4409 samples128_0 = vld1q_s32(tempS);
4410 runningOrder = 0;
4411 }
4413 /* 4 - 7 */
4414 if (runningOrder >= 4) {
4415 coefficients128_4 = vld1q_s32(coefficients + 4);
4416 samples128_4 = vld1q_s32(pSamplesOut - 8);
4417 runningOrder -= 4;
4418 } else {
4419 switch (runningOrder) {
4420 case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
4421 case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
4422 case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
4423 }
4425 coefficients128_4 = vld1q_s32(tempC);
4426 samples128_4 = vld1q_s32(tempS);
4427 runningOrder = 0;
4428 }
4430 /* 8 - 11 */
4431 if (runningOrder == 4) {
4432 coefficients128_8 = vld1q_s32(coefficients + 8);
4433 samples128_8 = vld1q_s32(pSamplesOut - 12);
4434 runningOrder -= 4;
4435 } else {
4436 switch (runningOrder) {
4437 case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
4438 case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
4439 case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
4440 }
4442 coefficients128_8 = vld1q_s32(tempC);
4443 samples128_8 = vld1q_s32(tempS);
4444 runningOrder = 0;
4445 }
4447 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4448 coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
4449 coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
4450 coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
4451 }
4453 /* For this version we are doing one sample at a time. */
4454 while (pDecodedSamples < pDecodedSamplesEnd) {
4455 int32x4_t prediction128;
4456 int32x2_t prediction64;
4457 uint32x4_t zeroCountPart128;
4458 uint32x4_t riceParamPart128;
4460 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
4461 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
4462 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
4463 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
4464 return DRFLAC_FALSE;
4465 }
4467 zeroCountPart128 = vld1q_u32(zeroCountParts);
4468 riceParamPart128 = vld1q_u32(riceParamParts);
4470 riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
4471 riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
4472 riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
4474 if (order <= 4) {
4475 for (i = 0; i < 4; i += 1) {
4476 prediction128 = vmulq_s32(coefficients128_0, samples128_0);
4478 /* Horizontal add and shift. */
4479 prediction64 = drflac__vhaddq_s32(prediction128);
4480 prediction64 = vshl_s32(prediction64, shift64);
4481 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4483 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4484 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4485 }
4486 } else if (order <= 8) {
4487 for (i = 0; i < 4; i += 1) {
4488 prediction128 = vmulq_s32(coefficients128_4, samples128_4);
4489 prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
4491 /* Horizontal add and shift. */
4492 prediction64 = drflac__vhaddq_s32(prediction128);
4493 prediction64 = vshl_s32(prediction64, shift64);
4494 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4496 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4497 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4498 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4499 }
4500 } else {
4501 for (i = 0; i < 4; i += 1) {
4502 prediction128 = vmulq_s32(coefficients128_8, samples128_8);
4503 prediction128 = vmlaq_s32(prediction128, coefficients128_4, samples128_4);
4504 prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
4506 /* Horizontal add and shift. */
4507 prediction64 = drflac__vhaddq_s32(prediction128);
4508 prediction64 = vshl_s32(prediction64, shift64);
4509 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4511 samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
4512 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4513 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4514 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4515 }
4516 }
4518 /* We store samples in groups of 4. */
4519 vst1q_s32(pDecodedSamples, samples128_0);
4520 pDecodedSamples += 4;
4521 }
4523 /* Make sure we process the last few samples. */
4524 i = (count & ~3);
4525 while (i < (int)count) {
4526 /* Rice extraction. */
4527 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
4528 return DRFLAC_FALSE;
4529 }
4531 /* Rice reconstruction. */
4532 riceParamParts[0] &= riceParamMask;
4533 riceParamParts[0] |= (zeroCountParts[0] << riceParam);
4534 riceParamParts[0] = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
4536 /* Sample reconstruction. */
4537 pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
4539 i += 1;
4540 pDecodedSamples += 1;
4541 }
4543 return DRFLAC_TRUE;
4544}
4546static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4547{
4548 int i;
4549 drflac_uint32 riceParamMask;
4550 drflac_int32* pDecodedSamples = pSamplesOut;
4551 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4552 drflac_uint32 zeroCountParts[4];
4553 drflac_uint32 riceParamParts[4];
4554 int32x4_t coefficients128_0;
4555 int32x4_t coefficients128_4;
4556 int32x4_t coefficients128_8;
4557 int32x4_t samples128_0;
4558 int32x4_t samples128_4;
4559 int32x4_t samples128_8;
4560 uint32x4_t riceParamMask128;
4561 int32x4_t riceParam128;
4562 int64x1_t shift64;
4563 uint32x4_t one128;
4564 int64x2_t prediction128 = { 0 };
4565 uint32x4_t zeroCountPart128;
4566 uint32x4_t riceParamPart128;
4568 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4570 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
4571 riceParamMask128 = vdupq_n_u32(riceParamMask);
4573 riceParam128 = vdupq_n_s32(riceParam);
4574 shift64 = vdup_n_s64(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
4575 one128 = vdupq_n_u32(1);
4577 /*
4578 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
4579 what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results
4580 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
4581 so I think there's opportunity for this to be simplified.
4582 */
4583 {
4584 int runningOrder = order;
4585 drflac_int32 tempC[4] = {0, 0, 0, 0};
4586 drflac_int32 tempS[4] = {0, 0, 0, 0};
4588 /* 0 - 3. */
4589 if (runningOrder >= 4) {
4590 coefficients128_0 = vld1q_s32(coefficients + 0);
4591 samples128_0 = vld1q_s32(pSamplesOut - 4);
4592 runningOrder -= 4;
4593 } else {
4594 switch (runningOrder) {
4595 case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
4596 case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
4597 case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
4598 }
4600 coefficients128_0 = vld1q_s32(tempC);
4601 samples128_0 = vld1q_s32(tempS);
4602 runningOrder = 0;
4603 }
4605 /* 4 - 7 */
4606 if (runningOrder >= 4) {
4607 coefficients128_4 = vld1q_s32(coefficients + 4);
4608 samples128_4 = vld1q_s32(pSamplesOut - 8);
4609 runningOrder -= 4;
4610 } else {
4611 switch (runningOrder) {
4612 case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
4613 case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
4614 case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
4615 }
4617 coefficients128_4 = vld1q_s32(tempC);
4618 samples128_4 = vld1q_s32(tempS);
4619 runningOrder = 0;
4620 }
4622 /* 8 - 11 */
4623 if (runningOrder == 4) {
4624 coefficients128_8 = vld1q_s32(coefficients + 8);
4625 samples128_8 = vld1q_s32(pSamplesOut - 12);
4626 runningOrder -= 4;
4627 } else {
4628 switch (runningOrder) {
4629 case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
4630 case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
4631 case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
4632 }
4634 coefficients128_8 = vld1q_s32(tempC);
4635 samples128_8 = vld1q_s32(tempS);
4636 runningOrder = 0;
4637 }
4639 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4640 coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
4641 coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
4642 coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
4643 }
4645 /* For this version we are doing one sample at a time. */
4646 while (pDecodedSamples < pDecodedSamplesEnd) {
4647 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
4648 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
4649 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
4650 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
4651 return DRFLAC_FALSE;
4652 }
4654 zeroCountPart128 = vld1q_u32(zeroCountParts);
4655 riceParamPart128 = vld1q_u32(riceParamParts);
4657 riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
4658 riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
4659 riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
4661 for (i = 0; i < 4; i += 1) {
4662 int64x1_t prediction64;
4664 prediction128 = veorq_s64(prediction128, prediction128); /* Reset to 0. */
4665 switch (order)
4666 {
4667 case 12:
4668 case 11: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_8), vget_low_s32(samples128_8)));
4669 case 10:
4670 case 9: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_8), vget_high_s32(samples128_8)));
4671 case 8:
4672 case 7: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_4), vget_low_s32(samples128_4)));
4673 case 6:
4674 case 5: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_4), vget_high_s32(samples128_4)));
4675 case 4:
4676 case 3: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_0), vget_low_s32(samples128_0)));
4677 case 2:
4678 case 1: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_0), vget_high_s32(samples128_0)));
4679 }
4681 /* Horizontal add and shift. */
4682 prediction64 = drflac__vhaddq_s64(prediction128);
4683 prediction64 = vshl_s64(prediction64, shift64);
4684 prediction64 = vadd_s64(prediction64, vdup_n_s64(vgetq_lane_u32(riceParamPart128, 0)));
4686 /* Our value should be sitting in prediction64[0]. We need to combine this with our SSE samples. */
4687 samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
4688 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4689 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(vreinterpret_s32_s64(prediction64), vdup_n_s32(0)), samples128_0);
4691 /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
4692 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4693 }
4695 /* We store samples in groups of 4. */
4696 vst1q_s32(pDecodedSamples, samples128_0);
4697 pDecodedSamples += 4;
4698 }
4700 /* Make sure we process the last few samples. */
4701 i = (count & ~3);
4702 while (i < (int)count) {
4703 /* Rice extraction. */
4704 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
4705 return DRFLAC_FALSE;
4706 }
4708 /* Rice reconstruction. */
4709 riceParamParts[0] &= riceParamMask;
4710 riceParamParts[0] |= (zeroCountParts[0] << riceParam);
4711 riceParamParts[0] = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
4713 /* Sample reconstruction. */
4714 pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
4716 i += 1;
4717 pDecodedSamples += 1;
4718 }
4720 return DRFLAC_TRUE;
4721}
4723static drflac_bool32 drflac__decode_samples_with_residual__rice__neon(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4724{
4725 DRFLAC_ASSERT(bs != NULL);
4726 DRFLAC_ASSERT(pSamplesOut != NULL);
4728 /* In my testing the order is rarely > 12, so in this case I'm going to simplify the NEON implementation by only handling order <= 12. */
4729 if (lpcOrder > 0 && lpcOrder <= 12) {
4730 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
4731 return drflac__decode_samples_with_residual__rice__neon_64(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
4732 } else {
4733 return drflac__decode_samples_with_residual__rice__neon_32(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
4734 }
4735 } else {
4736 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4737 }
4738}
4739#endif
4741static drflac_bool32 drflac__decode_samples_with_residual__rice(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4742{
4743#if defined(DRFLAC_SUPPORT_SSE41)
4744 if (drflac__gIsSSE41Supported) {
4745 return drflac__decode_samples_with_residual__rice__sse41(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4746 } else
4747#elif defined(DRFLAC_SUPPORT_NEON)
4748 if (drflac__gIsNEONSupported) {
4749 return drflac__decode_samples_with_residual__rice__neon(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4750 } else
4751#endif
4752 {
4753 /* Scalar fallback. */
4754 #if 0
4755 return drflac__decode_samples_with_residual__rice__reference(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4756 #else
4757 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4758 #endif
4759 }
4760}
4762/* Reads and seeks past a string of residual values as Rice codes. The decoder should be sitting on the first bit of the Rice codes. */
4763static drflac_bool32 drflac__read_and_seek_residual__rice(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam)
4764{
4765 drflac_uint32 i;
4767 DRFLAC_ASSERT(bs != NULL);
4769 for (i = 0; i < count; ++i) {
4770 if (!drflac__seek_rice_parts(bs, riceParam)) {
4771 return DRFLAC_FALSE;
4772 }
4773 }
4775 return DRFLAC_TRUE;
4776}
4778#if defined(__clang__)
4779__attribute__((no_sanitize("signed-integer-overflow")))
4780#endif
4781static drflac_bool32 drflac__decode_samples_with_residual__unencoded(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 unencodedBitsPerSample, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4782{
4783 drflac_uint32 i;
4785 DRFLAC_ASSERT(bs != NULL);
4786 DRFLAC_ASSERT(unencodedBitsPerSample <= 31); /* <-- unencodedBitsPerSample is a 5 bit number, so cannot exceed 31. */
4787 DRFLAC_ASSERT(pSamplesOut != NULL);
4789 for (i = 0; i < count; ++i) {
4790 if (unencodedBitsPerSample > 0) {
4791 if (!drflac__read_int32(bs, unencodedBitsPerSample, pSamplesOut + i)) {
4792 return DRFLAC_FALSE;
4793 }
4794 } else {
4795 pSamplesOut[i] = 0;
4796 }
4798 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
4799 pSamplesOut[i] += drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
4800 } else {
4801 pSamplesOut[i] += drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
4802 }
4803 }
4805 return DRFLAC_TRUE;
4806}
4809/*
4810Reads and decodes the residual for the sub-frame the decoder is currently sitting on. This function should be called
4811when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be ignored. The
4812<blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
4813*/
4814static drflac_bool32 drflac__decode_samples_with_residual(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 blockSize, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
4815{
4816 drflac_uint8 residualMethod;
4817 drflac_uint8 partitionOrder;
4818 drflac_uint32 samplesInPartition;
4819 drflac_uint32 partitionsRemaining;
4821 DRFLAC_ASSERT(bs != NULL);
4822 DRFLAC_ASSERT(blockSize != 0);
4823 DRFLAC_ASSERT(pDecodedSamples != NULL); /* <-- Should we allow NULL, in which case we just seek past the residual rather than do a full decode? */
4825 if (!drflac__read_uint8(bs, 2, &residualMethod)) {
4826 return DRFLAC_FALSE;
4827 }
4829 if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4830 return DRFLAC_FALSE; /* Unknown or unsupported residual coding method. */
4831 }
4833 /* Ignore the first <order> values. */
4834 pDecodedSamples += lpcOrder;
4836 if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
4837 return DRFLAC_FALSE;
4838 }
4840 /*
4841 From the FLAC spec:
4842 The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
4843 */
4844 if (partitionOrder > 8) {
4845 return DRFLAC_FALSE;
4846 }
4848 /* Validation check. */
4849 if ((blockSize / (1 << partitionOrder)) < lpcOrder) {
4850 return DRFLAC_FALSE;
4851 }
4853 samplesInPartition = (blockSize / (1 << partitionOrder)) - lpcOrder;
4854 partitionsRemaining = (1 << partitionOrder);
4855 for (;;) {
4856 drflac_uint8 riceParam = 0;
4857 if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
4858 if (!drflac__read_uint8(bs, 4, &riceParam)) {
4859 return DRFLAC_FALSE;
4860 }
4861 if (riceParam == 15) {
4862 riceParam = 0xFF;
4863 }
4864 } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4865 if (!drflac__read_uint8(bs, 5, &riceParam)) {
4866 return DRFLAC_FALSE;
4867 }
4868 if (riceParam == 31) {
4869 riceParam = 0xFF;
4870 }
4871 }
4873 if (riceParam != 0xFF) {
4874 if (!drflac__decode_samples_with_residual__rice(bs, bitsPerSample, samplesInPartition, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
4875 return DRFLAC_FALSE;
4876 }
4877 } else {
4878 drflac_uint8 unencodedBitsPerSample = 0;
4879 if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
4880 return DRFLAC_FALSE;
4881 }
4883 if (!drflac__decode_samples_with_residual__unencoded(bs, bitsPerSample, samplesInPartition, unencodedBitsPerSample, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
4884 return DRFLAC_FALSE;
4885 }
4886 }
4888 pDecodedSamples += samplesInPartition;
4890 if (partitionsRemaining == 1) {
4891 break;
4892 }
4894 partitionsRemaining -= 1;
4896 if (partitionOrder != 0) {
4897 samplesInPartition = blockSize / (1 << partitionOrder);
4898 }
4899 }
4901 return DRFLAC_TRUE;
4902}
4904/*
4905Reads and seeks past the residual for the sub-frame the decoder is currently sitting on. This function should be called
4906when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be set to 0. The
4907<blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
4908*/
4909static drflac_bool32 drflac__read_and_seek_residual(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 order)
4910{
4911 drflac_uint8 residualMethod;
4912 drflac_uint8 partitionOrder;
4913 drflac_uint32 samplesInPartition;
4914 drflac_uint32 partitionsRemaining;
4916 DRFLAC_ASSERT(bs != NULL);
4917 DRFLAC_ASSERT(blockSize != 0);
4919 if (!drflac__read_uint8(bs, 2, &residualMethod)) {
4920 return DRFLAC_FALSE;
4921 }
4923 if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4924 return DRFLAC_FALSE; /* Unknown or unsupported residual coding method. */
4925 }
4927 if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
4928 return DRFLAC_FALSE;
4929 }
4931 /*
4932 From the FLAC spec:
4933 The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
4934 */
4935 if (partitionOrder > 8) {
4936 return DRFLAC_FALSE;
4937 }
4939 /* Validation check. */
4940 if ((blockSize / (1 << partitionOrder)) <= order) {
4941 return DRFLAC_FALSE;
4942 }
4944 samplesInPartition = (blockSize / (1 << partitionOrder)) - order;
4945 partitionsRemaining = (1 << partitionOrder);
4946 for (;;)
4947 {
4948 drflac_uint8 riceParam = 0;
4949 if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
4950 if (!drflac__read_uint8(bs, 4, &riceParam)) {
4951 return DRFLAC_FALSE;
4952 }
4953 if (riceParam == 15) {
4954 riceParam = 0xFF;
4955 }
4956 } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4957 if (!drflac__read_uint8(bs, 5, &riceParam)) {
4958 return DRFLAC_FALSE;
4959 }
4960 if (riceParam == 31) {
4961 riceParam = 0xFF;
4962 }
4963 }
4965 if (riceParam != 0xFF) {
4966 if (!drflac__read_and_seek_residual__rice(bs, samplesInPartition, riceParam)) {
4967 return DRFLAC_FALSE;
4968 }
4969 } else {
4970 drflac_uint8 unencodedBitsPerSample = 0;
4971 if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
4972 return DRFLAC_FALSE;
4973 }
4975 if (!drflac__seek_bits(bs, unencodedBitsPerSample * samplesInPartition)) {
4976 return DRFLAC_FALSE;
4977 }
4978 }
4981 if (partitionsRemaining == 1) {
4982 break;
4983 }
4985 partitionsRemaining -= 1;
4986 samplesInPartition = blockSize / (1 << partitionOrder);
4987 }
4989 return DRFLAC_TRUE;
4990}
4993static drflac_bool32 drflac__decode_samples__constant(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
4994{
4995 drflac_uint32 i;
4997 /* Only a single sample needs to be decoded here. */
4998 drflac_int32 sample;
4999 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5000 return DRFLAC_FALSE;
5001 }
5003 /*
5004 We don't really need to expand this, but it does simplify the process of reading samples. If this becomes a performance issue (unlikely)
5005 we'll want to look at a more efficient way.
5006 */
5007 for (i = 0; i < blockSize; ++i) {
5008 pDecodedSamples[i] = sample;
5009 }
5011 return DRFLAC_TRUE;
5012}
5014static drflac_bool32 drflac__decode_samples__verbatim(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
5015{
5016 drflac_uint32 i;
5018 for (i = 0; i < blockSize; ++i) {
5019 drflac_int32 sample;
5020 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5021 return DRFLAC_FALSE;
5022 }
5024 pDecodedSamples[i] = sample;
5025 }
5027 return DRFLAC_TRUE;
5028}
5030static drflac_bool32 drflac__decode_samples__fixed(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
5031{
5032 drflac_uint32 i;
5034 static drflac_int32 lpcCoefficientsTable[5][4] = {
5035 {0, 0, 0, 0},
5036 {1, 0, 0, 0},
5037 {2, -1, 0, 0},
5038 {3, -3, 1, 0},
5039 {4, -6, 4, -1}
5040 };
5042 /* Warm up samples and coefficients. */
5043 for (i = 0; i < lpcOrder; ++i) {
5044 drflac_int32 sample;
5045 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5046 return DRFLAC_FALSE;
5047 }
5049 pDecodedSamples[i] = sample;
5050 }
5052 if (!drflac__decode_samples_with_residual(bs, subframeBitsPerSample, blockSize, lpcOrder, 0, 4, lpcCoefficientsTable[lpcOrder], pDecodedSamples)) {
5053 return DRFLAC_FALSE;
5054 }
5056 return DRFLAC_TRUE;
5057}
5059static drflac_bool32 drflac__decode_samples__lpc(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 bitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
5060{
5061 drflac_uint8 i;
5062 drflac_uint8 lpcPrecision;
5063 drflac_int8 lpcShift;
5064 drflac_int32 coefficients[32];
5066 /* Warm up samples. */
5067 for (i = 0; i < lpcOrder; ++i) {
5068 drflac_int32 sample;
5069 if (!drflac__read_int32(bs, bitsPerSample, &sample)) {
5070 return DRFLAC_FALSE;
5071 }
5073 pDecodedSamples[i] = sample;
5074 }
5076 if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
5077 return DRFLAC_FALSE;
5078 }
5079 if (lpcPrecision == 15) {
5080 return DRFLAC_FALSE; /* Invalid. */
5081 }
5082 lpcPrecision += 1;
5084 if (!drflac__read_int8(bs, 5, &lpcShift)) {
5085 return DRFLAC_FALSE;
5086 }
5088 /*
5089 From the FLAC specification:
5091 Quantized linear predictor coefficient shift needed in bits (NOTE: this number is signed two's-complement)
5093 Emphasis on the "signed two's-complement". In practice there does not seem to be any encoders nor decoders supporting negative shifts. For now dr_flac is
5094 not going to support negative shifts as I don't have any reference files. However, when a reference file comes through I will consider adding support.
5095 */
5096 if (lpcShift < 0) {
5097 return DRFLAC_FALSE;
5098 }
5100 DRFLAC_ZERO_MEMORY(coefficients, sizeof(coefficients));
5101 for (i = 0; i < lpcOrder; ++i) {
5102 if (!drflac__read_int32(bs, lpcPrecision, coefficients + i)) {
5103 return DRFLAC_FALSE;
5104 }
5105 }
5107 if (!drflac__decode_samples_with_residual(bs, bitsPerSample, blockSize, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
5108 return DRFLAC_FALSE;
5109 }
5111 return DRFLAC_TRUE;
5112}
5115static drflac_bool32 drflac__read_next_flac_frame_header(drflac_bs* bs, drflac_uint8 streaminfoBitsPerSample, drflac_frame_header* header)
5116{
5117 const drflac_uint32 sampleRateTable[12] = {0, 88200, 176400, 192000, 8000, 16000, 22050, 24000, 32000, 44100, 48000, 96000};
5118 const drflac_uint8 bitsPerSampleTable[8] = {0, 8, 12, (drflac_uint8)-1, 16, 20, 24, (drflac_uint8)-1}; /* -1 = reserved. */
5120 DRFLAC_ASSERT(bs != NULL);
5121 DRFLAC_ASSERT(header != NULL);
5123 /* Keep looping until we find a valid sync code. */
5124 for (;;) {
5125 drflac_uint8 crc8 = 0xCE; /* 0xCE = drflac_crc8(0, 0x3FFE, 14); */
5126 drflac_uint8 reserved = 0;
5127 drflac_uint8 blockingStrategy = 0;
5128 drflac_uint8 blockSize = 0;
5129 drflac_uint8 sampleRate = 0;
5130 drflac_uint8 channelAssignment = 0;
5131 drflac_uint8 bitsPerSample = 0;
5132 drflac_bool32 isVariableBlockSize;
5134 if (!drflac__find_and_seek_to_next_sync_code(bs)) {
5135 return DRFLAC_FALSE;
5136 }
5138 if (!drflac__read_uint8(bs, 1, &reserved)) {
5139 return DRFLAC_FALSE;
5140 }
5141 if (reserved == 1) {
5142 continue;
5143 }
5144 crc8 = drflac_crc8(crc8, reserved, 1);
5146 if (!drflac__read_uint8(bs, 1, &blockingStrategy)) {
5147 return DRFLAC_FALSE;
5148 }
5149 crc8 = drflac_crc8(crc8, blockingStrategy, 1);
5151 if (!drflac__read_uint8(bs, 4, &blockSize)) {
5152 return DRFLAC_FALSE;
5153 }
5154 if (blockSize == 0) {
5155 continue;
5156 }
5157 crc8 = drflac_crc8(crc8, blockSize, 4);
5159 if (!drflac__read_uint8(bs, 4, &sampleRate)) {
5160 return DRFLAC_FALSE;
5161 }
5162 crc8 = drflac_crc8(crc8, sampleRate, 4);
5164 if (!drflac__read_uint8(bs, 4, &channelAssignment)) {
5165 return DRFLAC_FALSE;
5166 }
5167 if (channelAssignment > 10) {
5168 continue;
5169 }
5170 crc8 = drflac_crc8(crc8, channelAssignment, 4);
5172 if (!drflac__read_uint8(bs, 3, &bitsPerSample)) {
5173 return DRFLAC_FALSE;
5174 }
5175 if (bitsPerSample == 3 || bitsPerSample == 7) {
5176 continue;
5177 }
5178 crc8 = drflac_crc8(crc8, bitsPerSample, 3);
5181 if (!drflac__read_uint8(bs, 1, &reserved)) {
5182 return DRFLAC_FALSE;
5183 }
5184 if (reserved == 1) {
5185 continue;
5186 }
5187 crc8 = drflac_crc8(crc8, reserved, 1);
5190 isVariableBlockSize = blockingStrategy == 1;
5191 if (isVariableBlockSize) {
5192 drflac_uint64 pcmFrameNumber;
5193 drflac_result result = drflac__read_utf8_coded_number(bs, &pcmFrameNumber, &crc8);
5194 if (result != DRFLAC_SUCCESS) {
5195 if (result == DRFLAC_AT_END) {
5196 return DRFLAC_FALSE;
5197 } else {
5198 continue;
5199 }
5200 }
5201 header->flacFrameNumber = 0;
5202 header->pcmFrameNumber = pcmFrameNumber;
5203 } else {
5204 drflac_uint64 flacFrameNumber = 0;
5205 drflac_result result = drflac__read_utf8_coded_number(bs, &flacFrameNumber, &crc8);
5206 if (result != DRFLAC_SUCCESS) {
5207 if (result == DRFLAC_AT_END) {
5208 return DRFLAC_FALSE;
5209 } else {
5210 continue;
5211 }
5212 }
5213 header->flacFrameNumber = (drflac_uint32)flacFrameNumber; /* <-- Safe cast. */
5214 header->pcmFrameNumber = 0;
5215 }
5218 DRFLAC_ASSERT(blockSize > 0);
5219 if (blockSize == 1) {
5220 header->blockSizeInPCMFrames = 192;
5221 } else if (blockSize <= 5) {
5222 DRFLAC_ASSERT(blockSize >= 2);
5223 header->blockSizeInPCMFrames = 576 * (1 << (blockSize - 2));
5224 } else if (blockSize == 6) {
5225 if (!drflac__read_uint16(bs, 8, &header->blockSizeInPCMFrames)) {
5226 return DRFLAC_FALSE;
5227 }
5228 crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 8);
5229 header->blockSizeInPCMFrames += 1;
5230 } else if (blockSize == 7) {
5231 if (!drflac__read_uint16(bs, 16, &header->blockSizeInPCMFrames)) {
5232 return DRFLAC_FALSE;
5233 }
5234 crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 16);
5235 if (header->blockSizeInPCMFrames == 0xFFFF) {
5236 return DRFLAC_FALSE; /* Frame is too big. This is the size of the frame minus 1. The STREAMINFO block defines the max block size which is 16-bits. Adding one will make it 17 bits and therefore too big. */
5237 }
5238 header->blockSizeInPCMFrames += 1;
5239 } else {
5240 DRFLAC_ASSERT(blockSize >= 8);
5241 header->blockSizeInPCMFrames = 256 * (1 << (blockSize - 8));
5242 }
5245 if (sampleRate <= 11) {
5246 header->sampleRate = sampleRateTable[sampleRate];
5247 } else if (sampleRate == 12) {
5248 if (!drflac__read_uint32(bs, 8, &header->sampleRate)) {
5249 return DRFLAC_FALSE;
5250 }
5251 crc8 = drflac_crc8(crc8, header->sampleRate, 8);
5252 header->sampleRate *= 1000;
5253 } else if (sampleRate == 13) {
5254 if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
5255 return DRFLAC_FALSE;
5256 }
5257 crc8 = drflac_crc8(crc8, header->sampleRate, 16);
5258 } else if (sampleRate == 14) {
5259 if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
5260 return DRFLAC_FALSE;
5261 }
5262 crc8 = drflac_crc8(crc8, header->sampleRate, 16);
5263 header->sampleRate *= 10;
5264 } else {
5265 continue; /* Invalid. Assume an invalid block. */
5266 }
5269 header->channelAssignment = channelAssignment;
5271 header->bitsPerSample = bitsPerSampleTable[bitsPerSample];
5272 if (header->bitsPerSample == 0) {
5273 header->bitsPerSample = streaminfoBitsPerSample;
5274 }
5276 if (header->bitsPerSample != streaminfoBitsPerSample) {
5277 /* If this subframe has a different bitsPerSample then streaminfo or the first frame, reject it */
5278 return DRFLAC_FALSE;
5279 }
5281 if (!drflac__read_uint8(bs, 8, &header->crc8)) {
5282 return DRFLAC_FALSE;
5283 }
5285#ifndef DR_FLAC_NO_CRC
5286 if (header->crc8 != crc8) {
5287 continue; /* CRC mismatch. Loop back to the top and find the next sync code. */
5288 }
5289#endif
5290 return DRFLAC_TRUE;
5291 }
5292}
5294static drflac_bool32 drflac__read_subframe_header(drflac_bs* bs, drflac_subframe* pSubframe)
5295{
5296 drflac_uint8 header;
5297 int type;
5299 if (!drflac__read_uint8(bs, 8, &header)) {
5300 return DRFLAC_FALSE;
5301 }
5303 /* First bit should always be 0. */
5304 if ((header & 0x80) != 0) {
5305 return DRFLAC_FALSE;
5306 }
5308 /*
5309 Default to 0 for the LPC order. It's important that we always set this to 0 for non LPC
5310 and FIXED subframes because we'll be using it in a generic validation check later.
5311 */
5312 pSubframe->lpcOrder = 0;
5314 type = (header & 0x7E) >> 1;
5315 if (type == 0) {
5316 pSubframe->subframeType = DRFLAC_SUBFRAME_CONSTANT;
5317 } else if (type == 1) {
5318 pSubframe->subframeType = DRFLAC_SUBFRAME_VERBATIM;
5319 } else {
5320 if ((type & 0x20) != 0) {
5321 pSubframe->subframeType = DRFLAC_SUBFRAME_LPC;
5322 pSubframe->lpcOrder = (drflac_uint8)(type & 0x1F) + 1;
5323 } else if ((type & 0x08) != 0) {
5324 pSubframe->subframeType = DRFLAC_SUBFRAME_FIXED;
5325 pSubframe->lpcOrder = (drflac_uint8)(type & 0x07);
5326 if (pSubframe->lpcOrder > 4) {
5327 pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
5328 pSubframe->lpcOrder = 0;
5329 }
5330 } else {
5331 pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
5332 }
5333 }
5335 if (pSubframe->subframeType == DRFLAC_SUBFRAME_RESERVED) {
5336 return DRFLAC_FALSE;
5337 }
5339 /* Wasted bits per sample. */
5340 pSubframe->wastedBitsPerSample = 0;
5341 if ((header & 0x01) == 1) {
5342 unsigned int wastedBitsPerSample;
5343 if (!drflac__seek_past_next_set_bit(bs, &wastedBitsPerSample)) {
5344 return DRFLAC_FALSE;
5345 }
5346 pSubframe->wastedBitsPerSample = (drflac_uint8)wastedBitsPerSample + 1;
5347 }
5349 return DRFLAC_TRUE;
5350}
5352static drflac_bool32 drflac__decode_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex, drflac_int32* pDecodedSamplesOut)
5353{
5354 drflac_subframe* pSubframe;
5355 drflac_uint32 subframeBitsPerSample;
5357 DRFLAC_ASSERT(bs != NULL);
5358 DRFLAC_ASSERT(frame != NULL);
5360 pSubframe = frame->subframes + subframeIndex;
5361 if (!drflac__read_subframe_header(bs, pSubframe)) {
5362 return DRFLAC_FALSE;
5363 }
5365 /* Side channels require an extra bit per sample. Took a while to figure that one out... */
5366 subframeBitsPerSample = frame->header.bitsPerSample;
5367 if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
5368 subframeBitsPerSample += 1;
5369 } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
5370 subframeBitsPerSample += 1;
5371 }
5373 if (subframeBitsPerSample > 32) {
5374 /* libFLAC and ffmpeg reject 33-bit subframes as well */
5375 return DRFLAC_FALSE;
5376 }
5378 /* Need to handle wasted bits per sample. */
5379 if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
5380 return DRFLAC_FALSE;
5381 }
5382 subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
5384 pSubframe->pSamplesS32 = pDecodedSamplesOut;
5386 /*
5387 pDecodedSamplesOut will be pointing to a buffer that was allocated with enough memory to store
5388 maxBlockSizeInPCMFrames samples (as specified in the FLAC header). We need to guard against an
5389 overflow here. At a higher level we are checking maxBlockSizeInPCMFrames from the header, but
5390 here we need to do an additional check to ensure this frame's block size fully encompasses any
5391 warmup samples which is determined by the LPC order. For non LPC and FIXED subframes, the LPC
5392 order will be have been set to 0 in drflac__read_subframe_header().
5393 */
5394 if (frame->header.blockSizeInPCMFrames < pSubframe->lpcOrder) {
5395 return DRFLAC_FALSE;
5396 }
5398 switch (pSubframe->subframeType)
5399 {
5400 case DRFLAC_SUBFRAME_CONSTANT:
5401 {
5402 drflac__decode_samples__constant(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
5403 } break;
5405 case DRFLAC_SUBFRAME_VERBATIM:
5406 {
5407 drflac__decode_samples__verbatim(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
5408 } break;
5410 case DRFLAC_SUBFRAME_FIXED:
5411 {
5412 drflac__decode_samples__fixed(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
5413 } break;
5415 case DRFLAC_SUBFRAME_LPC:
5416 {
5417 drflac__decode_samples__lpc(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
5418 } break;
5420 default: return DRFLAC_FALSE;
5421 }
5423 return DRFLAC_TRUE;
5424}
5426static drflac_bool32 drflac__seek_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex)
5427{
5428 drflac_subframe* pSubframe;
5429 drflac_uint32 subframeBitsPerSample;
5431 DRFLAC_ASSERT(bs != NULL);
5432 DRFLAC_ASSERT(frame != NULL);
5434 pSubframe = frame->subframes + subframeIndex;
5435 if (!drflac__read_subframe_header(bs, pSubframe)) {
5436 return DRFLAC_FALSE;
5437 }
5439 /* Side channels require an extra bit per sample. Took a while to figure that one out... */
5440 subframeBitsPerSample = frame->header.bitsPerSample;
5441 if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
5442 subframeBitsPerSample += 1;
5443 } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
5444 subframeBitsPerSample += 1;
5445 }
5447 /* Need to handle wasted bits per sample. */
5448 if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
5449 return DRFLAC_FALSE;
5450 }
5451 subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
5453 pSubframe->pSamplesS32 = NULL;
5455 switch (pSubframe->subframeType)
5456 {
5457 case DRFLAC_SUBFRAME_CONSTANT:
5458 {
5459 if (!drflac__seek_bits(bs, subframeBitsPerSample)) {
5460 return DRFLAC_FALSE;
5461 }
5462 } break;
5464 case DRFLAC_SUBFRAME_VERBATIM:
5465 {
5466 unsigned int bitsToSeek = frame->header.blockSizeInPCMFrames * subframeBitsPerSample;
5467 if (!drflac__seek_bits(bs, bitsToSeek)) {
5468 return DRFLAC_FALSE;
5469 }
5470 } break;
5472 case DRFLAC_SUBFRAME_FIXED:
5473 {
5474 unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
5475 if (!drflac__seek_bits(bs, bitsToSeek)) {
5476 return DRFLAC_FALSE;
5477 }
5479 if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
5480 return DRFLAC_FALSE;
5481 }
5482 } break;
5484 case DRFLAC_SUBFRAME_LPC:
5485 {
5486 drflac_uint8 lpcPrecision;
5488 unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
5489 if (!drflac__seek_bits(bs, bitsToSeek)) {
5490 return DRFLAC_FALSE;
5491 }
5493 if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
5494 return DRFLAC_FALSE;
5495 }
5496 if (lpcPrecision == 15) {
5497 return DRFLAC_FALSE; /* Invalid. */
5498 }
5499 lpcPrecision += 1;
5502 bitsToSeek = (pSubframe->lpcOrder * lpcPrecision) + 5; /* +5 for shift. */
5503 if (!drflac__seek_bits(bs, bitsToSeek)) {
5504 return DRFLAC_FALSE;
5505 }
5507 if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
5508 return DRFLAC_FALSE;
5509 }
5510 } break;
5512 default: return DRFLAC_FALSE;
5513 }
5515 return DRFLAC_TRUE;
5516}
5519static DRFLAC_INLINE drflac_uint8 drflac__get_channel_count_from_channel_assignment(drflac_int8 channelAssignment)
5520{
5521 drflac_uint8 lookup[] = {1, 2, 3, 4, 5, 6, 7, 8, 2, 2, 2};
5523 DRFLAC_ASSERT(channelAssignment <= 10);
5524 return lookup[channelAssignment];
5525}
5527static drflac_result drflac__decode_flac_frame(drflac* pFlac)
5528{
5529 int channelCount;
5530 int i;
5531 drflac_uint8 paddingSizeInBits;
5532 drflac_uint16 desiredCRC16;
5533#ifndef DR_FLAC_NO_CRC
5534 drflac_uint16 actualCRC16;
5535#endif
5537 /* This function should be called while the stream is sitting on the first byte after the frame header. */
5538 DRFLAC_ZERO_MEMORY(pFlac->currentFLACFrame.subframes, sizeof(pFlac->currentFLACFrame.subframes));
5540 /* The frame block size must never be larger than the maximum block size defined by the FLAC stream. */
5541 if (pFlac->currentFLACFrame.header.blockSizeInPCMFrames > pFlac->maxBlockSizeInPCMFrames) {
5542 return DRFLAC_ERROR;
5543 }
5545 /* The number of channels in the frame must match the channel count from the STREAMINFO block. */
5546 channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
5547 if (channelCount != (int)pFlac->channels) {
5548 return DRFLAC_ERROR;
5549 }
5551 for (i = 0; i < channelCount; ++i) {
5552 if (!drflac__decode_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i, pFlac->pDecodedSamples + (pFlac->currentFLACFrame.header.blockSizeInPCMFrames * i))) {
5553 return DRFLAC_ERROR;
5554 }
5555 }
5557 paddingSizeInBits = (drflac_uint8)(DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7);
5558 if (paddingSizeInBits > 0) {
5559 drflac_uint8 padding = 0;
5560 if (!drflac__read_uint8(&pFlac->bs, paddingSizeInBits, &padding)) {
5561 return DRFLAC_AT_END;
5562 }
5563 }
5565#ifndef DR_FLAC_NO_CRC
5566 actualCRC16 = drflac__flush_crc16(&pFlac->bs);
5567#endif
5568 if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
5569 return DRFLAC_AT_END;
5570 }
5572#ifndef DR_FLAC_NO_CRC
5573 if (actualCRC16 != desiredCRC16) {
5574 return DRFLAC_CRC_MISMATCH; /* CRC mismatch. */
5575 }
5576#endif
5578 pFlac->currentFLACFrame.pcmFramesRemaining = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
5580 return DRFLAC_SUCCESS;
5581}
5583static drflac_result drflac__seek_flac_frame(drflac* pFlac)
5584{
5585 int channelCount;
5586 int i;
5587 drflac_uint16 desiredCRC16;
5588#ifndef DR_FLAC_NO_CRC
5589 drflac_uint16 actualCRC16;
5590#endif
5592 channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
5593 for (i = 0; i < channelCount; ++i) {
5594 if (!drflac__seek_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i)) {
5595 return DRFLAC_ERROR;
5596 }
5597 }
5599 /* Padding. */
5600 if (!drflac__seek_bits(&pFlac->bs, DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7)) {
5601 return DRFLAC_ERROR;
5602 }
5604 /* CRC. */
5605#ifndef DR_FLAC_NO_CRC
5606 actualCRC16 = drflac__flush_crc16(&pFlac->bs);
5607#endif
5608 if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
5609 return DRFLAC_AT_END;
5610 }
5612#ifndef DR_FLAC_NO_CRC
5613 if (actualCRC16 != desiredCRC16) {
5614 return DRFLAC_CRC_MISMATCH; /* CRC mismatch. */
5615 }
5616#endif
5618 return DRFLAC_SUCCESS;
5619}
5621static drflac_bool32 drflac__read_and_decode_next_flac_frame(drflac* pFlac)
5622{
5623 DRFLAC_ASSERT(pFlac != NULL);
5625 for (;;) {
5626 drflac_result result;
5628 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5629 return DRFLAC_FALSE;
5630 }
5632 result = drflac__decode_flac_frame(pFlac);
5633 if (result != DRFLAC_SUCCESS) {
5634 if (result == DRFLAC_CRC_MISMATCH) {
5635 continue; /* CRC mismatch. Skip to the next frame. */
5636 } else {
5637 return DRFLAC_FALSE;
5638 }
5639 }
5641 return DRFLAC_TRUE;
5642 }
5643}
5645static void drflac__get_pcm_frame_range_of_current_flac_frame(drflac* pFlac, drflac_uint64* pFirstPCMFrame, drflac_uint64* pLastPCMFrame)
5646{
5647 drflac_uint64 firstPCMFrame;
5648 drflac_uint64 lastPCMFrame;
5650 DRFLAC_ASSERT(pFlac != NULL);
5652 firstPCMFrame = pFlac->currentFLACFrame.header.pcmFrameNumber;
5653 if (firstPCMFrame == 0) {
5654 firstPCMFrame = ((drflac_uint64)pFlac->currentFLACFrame.header.flacFrameNumber) * pFlac->maxBlockSizeInPCMFrames;
5655 }
5657 lastPCMFrame = firstPCMFrame + pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
5658 if (lastPCMFrame > 0) {
5659 lastPCMFrame -= 1; /* Needs to be zero based. */
5660 }
5662 if (pFirstPCMFrame) {
5663 *pFirstPCMFrame = firstPCMFrame;
5664 }
5665 if (pLastPCMFrame) {
5666 *pLastPCMFrame = lastPCMFrame;
5667 }
5668}
5670static drflac_bool32 drflac__seek_to_first_frame(drflac* pFlac)
5671{
5672 drflac_bool32 result;
5674 DRFLAC_ASSERT(pFlac != NULL);
5676 result = drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes);
5678 DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
5679 pFlac->currentPCMFrame = 0;
5681 return result;
5682}
5684static DRFLAC_INLINE drflac_result drflac__seek_to_next_flac_frame(drflac* pFlac)
5685{
5686 /* This function should only ever be called while the decoder is sitting on the first byte past the FRAME_HEADER section. */
5687 DRFLAC_ASSERT(pFlac != NULL);
5688 return drflac__seek_flac_frame(pFlac);
5689}
5692static drflac_uint64 drflac__seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 pcmFramesToSeek)
5693{
5694 drflac_uint64 pcmFramesRead = 0;
5695 while (pcmFramesToSeek > 0) {
5696 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
5697 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
5698 break; /* Couldn't read the next frame, so just break from the loop and return. */
5699 }
5700 } else {
5701 if (pFlac->currentFLACFrame.pcmFramesRemaining > pcmFramesToSeek) {
5702 pcmFramesRead += pcmFramesToSeek;
5703 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)pcmFramesToSeek; /* <-- Safe cast. Will always be < currentFrame.pcmFramesRemaining < 65536. */
5704 pcmFramesToSeek = 0;
5705 } else {
5706 pcmFramesRead += pFlac->currentFLACFrame.pcmFramesRemaining;
5707 pcmFramesToSeek -= pFlac->currentFLACFrame.pcmFramesRemaining;
5708 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
5709 }
5710 }
5711 }
5713 pFlac->currentPCMFrame += pcmFramesRead;
5714 return pcmFramesRead;
5715}
5718static drflac_bool32 drflac__seek_to_pcm_frame__brute_force(drflac* pFlac, drflac_uint64 pcmFrameIndex)
5719{
5720 drflac_bool32 isMidFrame = DRFLAC_FALSE;
5721 drflac_uint64 runningPCMFrameCount;
5723 DRFLAC_ASSERT(pFlac != NULL);
5725 /* If we are seeking forward we start from the current position. Otherwise we need to start all the way from the start of the file. */
5726 if (pcmFrameIndex >= pFlac->currentPCMFrame) {
5727 /* Seeking forward. Need to seek from the current position. */
5728 runningPCMFrameCount = pFlac->currentPCMFrame;
5730 /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
5731 if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
5732 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5733 return DRFLAC_FALSE;
5734 }
5735 } else {
5736 isMidFrame = DRFLAC_TRUE;
5737 }
5738 } else {
5739 /* Seeking backwards. Need to seek from the start of the file. */
5740 runningPCMFrameCount = 0;
5742 /* Move back to the start. */
5743 if (!drflac__seek_to_first_frame(pFlac)) {
5744 return DRFLAC_FALSE;
5745 }
5747 /* Decode the first frame in preparation for sample-exact seeking below. */
5748 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5749 return DRFLAC_FALSE;
5750 }
5751 }
5753 /*
5754 We need to as quickly as possible find the frame that contains the target sample. To do this, we iterate over each frame and inspect its
5755 header. If based on the header we can determine that the frame contains the sample, we do a full decode of that frame.
5756 */
5757 for (;;) {
5758 drflac_uint64 pcmFrameCountInThisFLACFrame;
5759 drflac_uint64 firstPCMFrameInFLACFrame = 0;
5760 drflac_uint64 lastPCMFrameInFLACFrame = 0;
5762 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
5764 pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
5765 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
5766 /*
5767 The sample should be in this frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
5768 it never existed and keep iterating.
5769 */
5770 drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
5772 if (!isMidFrame) {
5773 drflac_result result = drflac__decode_flac_frame(pFlac);
5774 if (result == DRFLAC_SUCCESS) {
5775 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
5776 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
5777 } else {
5778 if (result == DRFLAC_CRC_MISMATCH) {
5779 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
5780 } else {
5781 return DRFLAC_FALSE;
5782 }
5783 }
5784 } else {
5785 /* We started seeking mid-frame which means we need to skip the frame decoding part. */
5786 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
5787 }
5788 } else {
5789 /*
5790 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
5791 frame never existed and leave the running sample count untouched.
5792 */
5793 if (!isMidFrame) {
5794 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
5795 if (result == DRFLAC_SUCCESS) {
5796 runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
5797 } else {
5798 if (result == DRFLAC_CRC_MISMATCH) {
5799 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
5800 } else {
5801 return DRFLAC_FALSE;
5802 }
5803 }
5804 } else {
5805 /*
5806 We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
5807 drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
5808 */
5809 runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
5810 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
5811 isMidFrame = DRFLAC_FALSE;
5812 }
5814 /* If we are seeking to the end of the file and we've just hit it, we're done. */
5815 if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
5816 return DRFLAC_TRUE;
5817 }
5818 }
5820 next_iteration:
5821 /* Grab the next frame in preparation for the next iteration. */
5822 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5823 return DRFLAC_FALSE;
5824 }
5825 }
5826}
5829#if !defined(DR_FLAC_NO_CRC)
5830/*
5831We use an average compression ratio to determine our approximate start location. FLAC files are generally about 50%-70% the size of their
5832uncompressed counterparts so we'll use this as a basis. I'm going to split the middle and use a factor of 0.6 to determine the starting
5833location.
5834*/
5835#define DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO 0.6f
5837static drflac_bool32 drflac__seek_to_approximate_flac_frame_to_byte(drflac* pFlac, drflac_uint64 targetByte, drflac_uint64 rangeLo, drflac_uint64 rangeHi, drflac_uint64* pLastSuccessfulSeekOffset)
5838{
5839 DRFLAC_ASSERT(pFlac != NULL);
5840 DRFLAC_ASSERT(pLastSuccessfulSeekOffset != NULL);
5841 DRFLAC_ASSERT(targetByte >= rangeLo);
5842 DRFLAC_ASSERT(targetByte <= rangeHi);
5844 *pLastSuccessfulSeekOffset = pFlac->firstFLACFramePosInBytes;
5846 for (;;) {
5847 /* After rangeLo == rangeHi == targetByte fails, we need to break out. */
5848 drflac_uint64 lastTargetByte = targetByte;
5850 /* When seeking to a byte, failure probably means we've attempted to seek beyond the end of the stream. To counter this we just halve it each attempt. */
5851 if (!drflac__seek_to_byte(&pFlac->bs, targetByte)) {
5852 /* If we couldn't even seek to the first byte in the stream we have a problem. Just abandon the whole thing. */
5853 if (targetByte == 0) {
5854 drflac__seek_to_first_frame(pFlac); /* Try to recover. */
5855 return DRFLAC_FALSE;
5856 }
5858 /* Halve the byte location and continue. */
5859 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5860 rangeHi = targetByte;
5861 } else {
5862 /* Getting here should mean that we have seeked to an appropriate byte. */
5864 /* Clear the details of the FLAC frame so we don't misreport data. */
5865 DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
5867 /*
5868 Now seek to the next FLAC frame. We need to decode the entire frame (not just the header) because it's possible for the header to incorrectly pass the
5869 CRC check and return bad data. We need to decode the entire frame to be more certain. Although this seems unlikely, this has happened to me in testing
5870 so it needs to stay this way for now.
5871 */
5872#if 1
5873 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
5874 /* Halve the byte location and continue. */
5875 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5876 rangeHi = targetByte;
5877 } else {
5878 break;
5879 }
5880#else
5881 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5882 /* Halve the byte location and continue. */
5883 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5884 rangeHi = targetByte;
5885 } else {
5886 break;
5887 }
5888#endif
5889 }
5891 /* We already tried this byte and there are no more to try, break out. */
5892 if(targetByte == lastTargetByte) {
5893 return DRFLAC_FALSE;
5894 }
5895 }
5897 /* The current PCM frame needs to be updated based on the frame we just seeked to. */
5898 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
5900 DRFLAC_ASSERT(targetByte <= rangeHi);
5902 *pLastSuccessfulSeekOffset = targetByte;
5903 return DRFLAC_TRUE;
5904}
5906static drflac_bool32 drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 offset)
5907{
5908 /* This section of code would be used if we were only decoding the FLAC frame header when calling drflac__seek_to_approximate_flac_frame_to_byte(). */
5909#if 0
5910 if (drflac__decode_flac_frame(pFlac) != DRFLAC_SUCCESS) {
5911 /* We failed to decode this frame which may be due to it being corrupt. We'll just use the next valid FLAC frame. */
5912 if (drflac__read_and_decode_next_flac_frame(pFlac) == DRFLAC_FALSE) {
5913 return DRFLAC_FALSE;
5914 }
5915 }
5916#endif
5918 return drflac__seek_forward_by_pcm_frames(pFlac, offset) == offset;
5919}
5922static drflac_bool32 drflac__seek_to_pcm_frame__binary_search_internal(drflac* pFlac, drflac_uint64 pcmFrameIndex, drflac_uint64 byteRangeLo, drflac_uint64 byteRangeHi)
5923{
5924 /* This assumes pFlac->currentPCMFrame is sitting on byteRangeLo upon entry. */
5926 drflac_uint64 targetByte;
5927 drflac_uint64 pcmRangeLo = pFlac->totalPCMFrameCount;
5928 drflac_uint64 pcmRangeHi = 0;
5929 drflac_uint64 lastSuccessfulSeekOffset = (drflac_uint64)-1;
5930 drflac_uint64 closestSeekOffsetBeforeTargetPCMFrame = byteRangeLo;
5931 drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
5933 targetByte = byteRangeLo + (drflac_uint64)(((drflac_int64)((pcmFrameIndex - pFlac->currentPCMFrame) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO);
5934 if (targetByte > byteRangeHi) {
5935 targetByte = byteRangeHi;
5936 }
5938 for (;;) {
5939 if (drflac__seek_to_approximate_flac_frame_to_byte(pFlac, targetByte, byteRangeLo, byteRangeHi, &lastSuccessfulSeekOffset)) {
5940 /* We found a FLAC frame. We need to check if it contains the sample we're looking for. */
5941 drflac_uint64 newPCMRangeLo;
5942 drflac_uint64 newPCMRangeHi;
5943 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &newPCMRangeLo, &newPCMRangeHi);
5945 /* If we selected the same frame, it means we should be pretty close. Just decode the rest. */
5946 if (pcmRangeLo == newPCMRangeLo) {
5947 if (!drflac__seek_to_approximate_flac_frame_to_byte(pFlac, closestSeekOffsetBeforeTargetPCMFrame, closestSeekOffsetBeforeTargetPCMFrame, byteRangeHi, &lastSuccessfulSeekOffset)) {
5948 break; /* Failed to seek to closest frame. */
5949 }
5951 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
5952 return DRFLAC_TRUE;
5953 } else {
5954 break; /* Failed to seek forward. */
5955 }
5956 }
5958 pcmRangeLo = newPCMRangeLo;
5959 pcmRangeHi = newPCMRangeHi;
5961 if (pcmRangeLo <= pcmFrameIndex && pcmRangeHi >= pcmFrameIndex) {
5962 /* The target PCM frame is in this FLAC frame. */
5963 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame) ) {
5964 return DRFLAC_TRUE;
5965 } else {
5966 break; /* Failed to seek to FLAC frame. */
5967 }
5968 } else {
5969 const float approxCompressionRatio = (drflac_int64)(lastSuccessfulSeekOffset - pFlac->firstFLACFramePosInBytes) / ((drflac_int64)(pcmRangeLo * pFlac->channels * pFlac->bitsPerSample)/8.0f);
5971 if (pcmRangeLo > pcmFrameIndex) {
5972 /* We seeked too far forward. We need to move our target byte backward and try again. */
5973 byteRangeHi = lastSuccessfulSeekOffset;
5974 if (byteRangeLo > byteRangeHi) {
5975 byteRangeLo = byteRangeHi;
5976 }
5978 targetByte = byteRangeLo + ((byteRangeHi - byteRangeLo) / 2);
5979 if (targetByte < byteRangeLo) {
5980 targetByte = byteRangeLo;
5981 }
5982 } else /*if (pcmRangeHi < pcmFrameIndex)*/ {
5983 /* We didn't seek far enough. We need to move our target byte forward and try again. */
5985 /* If we're close enough we can just seek forward. */
5986 if ((pcmFrameIndex - pcmRangeLo) < seekForwardThreshold) {
5987 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
5988 return DRFLAC_TRUE;
5989 } else {
5990 break; /* Failed to seek to FLAC frame. */
5991 }
5992 } else {
5993 byteRangeLo = lastSuccessfulSeekOffset;
5994 if (byteRangeHi < byteRangeLo) {
5995 byteRangeHi = byteRangeLo;
5996 }
5998 targetByte = lastSuccessfulSeekOffset + (drflac_uint64)(((drflac_int64)((pcmFrameIndex-pcmRangeLo) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * approxCompressionRatio);
5999 if (targetByte > byteRangeHi) {
6000 targetByte = byteRangeHi;
6001 }
6003 if (closestSeekOffsetBeforeTargetPCMFrame < lastSuccessfulSeekOffset) {
6004 closestSeekOffsetBeforeTargetPCMFrame = lastSuccessfulSeekOffset;
6005 }
6006 }
6007 }
6008 }
6009 } else {
6010 /* Getting here is really bad. We just recover as best we can, but moving to the first frame in the stream, and then abort. */
6011 break;
6012 }
6013 }
6015 drflac__seek_to_first_frame(pFlac); /* <-- Try to recover. */
6016 return DRFLAC_FALSE;
6017}
6019static drflac_bool32 drflac__seek_to_pcm_frame__binary_search(drflac* pFlac, drflac_uint64 pcmFrameIndex)
6020{
6021 drflac_uint64 byteRangeLo;
6022 drflac_uint64 byteRangeHi;
6023 drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
6025 /* Our algorithm currently assumes the FLAC stream is currently sitting at the start. */
6026 if (drflac__seek_to_first_frame(pFlac) == DRFLAC_FALSE) {
6027 return DRFLAC_FALSE;
6028 }
6030 /* If we're close enough to the start, just move to the start and seek forward. */
6031 if (pcmFrameIndex < seekForwardThreshold) {
6032 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFrameIndex) == pcmFrameIndex;
6033 }
6035 /*
6036 Our starting byte range is the byte position of the first FLAC frame and the approximate end of the file as if it were completely uncompressed. This ensures
6037 the entire file is included, even though most of the time it'll exceed the end of the actual stream. This is OK as the frame searching logic will handle it.
6038 */
6039 byteRangeLo = pFlac->firstFLACFramePosInBytes;
6040 byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
6042 return drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi);
6043}
6044#endif /* !DR_FLAC_NO_CRC */
6046static drflac_bool32 drflac__seek_to_pcm_frame__seek_table(drflac* pFlac, drflac_uint64 pcmFrameIndex)
6047{
6048 drflac_uint32 iClosestSeekpoint = 0;
6049 drflac_bool32 isMidFrame = DRFLAC_FALSE;
6050 drflac_uint64 runningPCMFrameCount;
6051 drflac_uint32 iSeekpoint;
6054 DRFLAC_ASSERT(pFlac != NULL);
6056 if (pFlac->pSeekpoints == NULL || pFlac->seekpointCount == 0) {
6057 return DRFLAC_FALSE;
6058 }
6060 /* Do not use the seektable if pcmFramIndex is not coverd by it. */
6061 if (pFlac->pSeekpoints[0].firstPCMFrame > pcmFrameIndex) {
6062 return DRFLAC_FALSE;
6063 }
6065 for (iSeekpoint = 0; iSeekpoint < pFlac->seekpointCount; ++iSeekpoint) {
6066 if (pFlac->pSeekpoints[iSeekpoint].firstPCMFrame >= pcmFrameIndex) {
6067 break;
6068 }
6070 iClosestSeekpoint = iSeekpoint;
6071 }
6073 /* There's been cases where the seek table contains only zeros. We need to do some basic validation on the closest seekpoint. */
6074 if (pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount == 0 || pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount > pFlac->maxBlockSizeInPCMFrames) {
6075 return DRFLAC_FALSE;
6076 }
6077 if (pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame > pFlac->totalPCMFrameCount && pFlac->totalPCMFrameCount > 0) {
6078 return DRFLAC_FALSE;
6079 }
6081#if !defined(DR_FLAC_NO_CRC)
6082 /* At this point we should know the closest seek point. We can use a binary search for this. We need to know the total sample count for this. */
6083 if (pFlac->totalPCMFrameCount > 0) {
6084 drflac_uint64 byteRangeLo;
6085 drflac_uint64 byteRangeHi;
6087 byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
6088 byteRangeLo = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset;
6090 /*
6091 If our closest seek point is not the last one, we only need to search between it and the next one. The section below calculates an appropriate starting
6092 value for byteRangeHi which will clamp it appropriately.
6094 Note that the next seekpoint must have an offset greater than the closest seekpoint because otherwise our binary search algorithm will break down. There
6095 have been cases where a seektable consists of seek points where every byte offset is set to 0 which causes problems. If this happens we need to abort.
6096 */
6097 if (iClosestSeekpoint < pFlac->seekpointCount-1) {
6098 drflac_uint32 iNextSeekpoint = iClosestSeekpoint + 1;
6100 /* Basic validation on the seekpoints to ensure they're usable. */
6101 if (pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset >= pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset || pFlac->pSeekpoints[iNextSeekpoint].pcmFrameCount == 0) {
6102 return DRFLAC_FALSE; /* The next seekpoint doesn't look right. The seek table cannot be trusted from here. Abort. */
6103 }
6105 if (pFlac->pSeekpoints[iNextSeekpoint].firstPCMFrame != (((drflac_uint64)0xFFFFFFFF << 32) | 0xFFFFFFFF)) { /* Make sure it's not a placeholder seekpoint. */
6106 byteRangeHi = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset - 1; /* byteRangeHi must be zero based. */
6107 }
6108 }
6110 if (drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
6111 if (drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6112 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
6114 if (drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi)) {
6115 return DRFLAC_TRUE;
6116 }
6117 }
6118 }
6119 }
6120#endif /* !DR_FLAC_NO_CRC */
6122 /* Getting here means we need to use a slower algorithm because the binary search method failed or cannot be used. */
6124 /*
6125 If we are seeking forward and the closest seekpoint is _before_ the current sample, we just seek forward from where we are. Otherwise we start seeking
6126 from the seekpoint's first sample.
6127 */
6128 if (pcmFrameIndex >= pFlac->currentPCMFrame && pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame <= pFlac->currentPCMFrame) {
6129 /* Optimized case. Just seek forward from where we are. */
6130 runningPCMFrameCount = pFlac->currentPCMFrame;
6132 /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
6133 if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
6134 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6135 return DRFLAC_FALSE;
6136 }
6137 } else {
6138 isMidFrame = DRFLAC_TRUE;
6139 }
6140 } else {
6141 /* Slower case. Seek to the start of the seekpoint and then seek forward from there. */
6142 runningPCMFrameCount = pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame;
6144 if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
6145 return DRFLAC_FALSE;
6146 }
6148 /* Grab the frame the seekpoint is sitting on in preparation for the sample-exact seeking below. */
6149 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6150 return DRFLAC_FALSE;
6151 }
6152 }
6154 for (;;) {
6155 drflac_uint64 pcmFrameCountInThisFLACFrame;
6156 drflac_uint64 firstPCMFrameInFLACFrame = 0;
6157 drflac_uint64 lastPCMFrameInFLACFrame = 0;
6159 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
6161 pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
6162 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
6163 /*
6164 The sample should be in this frame. We need to fully decode it, but if it's an invalid frame (a CRC mismatch) we need to pretend
6165 it never existed and keep iterating.
6166 */
6167 drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
6169 if (!isMidFrame) {
6170 drflac_result result = drflac__decode_flac_frame(pFlac);
6171 if (result == DRFLAC_SUCCESS) {
6172 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
6173 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
6174 } else {
6175 if (result == DRFLAC_CRC_MISMATCH) {
6176 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
6177 } else {
6178 return DRFLAC_FALSE;
6179 }
6180 }
6181 } else {
6182 /* We started seeking mid-frame which means we need to skip the frame decoding part. */
6183 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
6184 }
6185 } else {
6186 /*
6187 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
6188 frame never existed and leave the running sample count untouched.
6189 */
6190 if (!isMidFrame) {
6191 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
6192 if (result == DRFLAC_SUCCESS) {
6193 runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
6194 } else {
6195 if (result == DRFLAC_CRC_MISMATCH) {
6196 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
6197 } else {
6198 return DRFLAC_FALSE;
6199 }
6200 }
6201 } else {
6202 /*
6203 We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
6204 drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
6205 */
6206 runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
6207 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
6208 isMidFrame = DRFLAC_FALSE;
6209 }
6211 /* If we are seeking to the end of the file and we've just hit it, we're done. */
6212 if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
6213 return DRFLAC_TRUE;
6214 }
6215 }
6217 next_iteration:
6218 /* Grab the next frame in preparation for the next iteration. */
6219 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6220 return DRFLAC_FALSE;
6221 }
6222 }
6223}
6226#ifndef DR_FLAC_NO_OGG
6227typedef struct
6228{
6229 drflac_uint8 capturePattern[4]; /* Should be "OggS" */
6230 drflac_uint8 structureVersion; /* Always 0. */
6231 drflac_uint8 headerType;
6232 drflac_uint64 granulePosition;
6233 drflac_uint32 serialNumber;
6234 drflac_uint32 sequenceNumber;
6235 drflac_uint32 checksum;
6236 drflac_uint8 segmentCount;
6237 drflac_uint8 segmentTable[255];
6238} drflac_ogg_page_header;
6239#endif
6241typedef struct
6242{
6243 drflac_read_proc onRead;
6244 drflac_seek_proc onSeek;
6245 drflac_tell_proc onTell;
6246 drflac_meta_proc onMeta;
6247 drflac_container container;
6248 void* pUserData;
6249 void* pUserDataMD;
6250 drflac_uint32 sampleRate;
6251 drflac_uint8 channels;
6252 drflac_uint8 bitsPerSample;
6253 drflac_uint64 totalPCMFrameCount;
6254 drflac_uint16 maxBlockSizeInPCMFrames;
6255 drflac_uint64 runningFilePos;
6256 drflac_bool32 hasStreamInfoBlock;
6257 drflac_bool32 hasMetadataBlocks;
6258 drflac_bs bs; /* <-- A bit streamer is required for loading data during initialization. */
6259 drflac_frame_header firstFrameHeader; /* <-- The header of the first frame that was read during relaxed initalization. Only set if there is no STREAMINFO block. */
6261#ifndef DR_FLAC_NO_OGG
6262 drflac_uint32 oggSerial;
6263 drflac_uint64 oggFirstBytePos;
6264 drflac_ogg_page_header oggBosHeader;
6265#endif
6266} drflac_init_info;
6268static DRFLAC_INLINE void drflac__decode_block_header(drflac_uint32 blockHeader, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
6269{
6270 blockHeader = drflac__be2host_32(blockHeader);
6271 *isLastBlock = (drflac_uint8)((blockHeader & 0x80000000UL) >> 31);
6272 *blockType = (drflac_uint8)((blockHeader & 0x7F000000UL) >> 24);
6273 *blockSize = (blockHeader & 0x00FFFFFFUL);
6274}
6276static DRFLAC_INLINE drflac_bool32 drflac__read_and_decode_block_header(drflac_read_proc onRead, void* pUserData, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
6277{
6278 drflac_uint32 blockHeader;
6280 *blockSize = 0;
6281 if (onRead(pUserData, &blockHeader, 4) != 4) {
6282 return DRFLAC_FALSE;
6283 }
6285 drflac__decode_block_header(blockHeader, isLastBlock, blockType, blockSize);
6286 return DRFLAC_TRUE;
6287}
6289static drflac_bool32 drflac__read_streaminfo(drflac_read_proc onRead, void* pUserData, drflac_streaminfo* pStreamInfo)
6290{
6291 drflac_uint32 blockSizes;
6292 drflac_uint64 frameSizes = 0;
6293 drflac_uint64 importantProps;
6294 drflac_uint8 md5[16];
6296 /* min/max block size. */
6297 if (onRead(pUserData, &blockSizes, 4) != 4) {
6298 return DRFLAC_FALSE;
6299 }
6301 /* min/max frame size. */
6302 if (onRead(pUserData, &frameSizes, 6) != 6) {
6303 return DRFLAC_FALSE;
6304 }
6306 /* Sample rate, channels, bits per sample and total sample count. */
6307 if (onRead(pUserData, &importantProps, 8) != 8) {
6308 return DRFLAC_FALSE;
6309 }
6311 /* MD5 */
6312 if (onRead(pUserData, md5, sizeof(md5)) != sizeof(md5)) {
6313 return DRFLAC_FALSE;
6314 }
6316 blockSizes = drflac__be2host_32(blockSizes);
6317 frameSizes = drflac__be2host_64(frameSizes);
6318 importantProps = drflac__be2host_64(importantProps);
6320 pStreamInfo->minBlockSizeInPCMFrames = (drflac_uint16)((blockSizes & 0xFFFF0000) >> 16);
6321 pStreamInfo->maxBlockSizeInPCMFrames = (drflac_uint16) (blockSizes & 0x0000FFFF);
6322 pStreamInfo->minFrameSizeInPCMFrames = (drflac_uint32)((frameSizes & (((drflac_uint64)0x00FFFFFF << 16) << 24)) >> 40);
6323 pStreamInfo->maxFrameSizeInPCMFrames = (drflac_uint32)((frameSizes & (((drflac_uint64)0x00FFFFFF << 16) << 0)) >> 16);
6324 pStreamInfo->sampleRate = (drflac_uint32)((importantProps & (((drflac_uint64)0x000FFFFF << 16) << 28)) >> 44);
6325 pStreamInfo->channels = (drflac_uint8 )((importantProps & (((drflac_uint64)0x0000000E << 16) << 24)) >> 41) + 1;
6326 pStreamInfo->bitsPerSample = (drflac_uint8 )((importantProps & (((drflac_uint64)0x0000001F << 16) << 20)) >> 36) + 1;
6327 pStreamInfo->totalPCMFrameCount = ((importantProps & ((((drflac_uint64)0x0000000F << 16) << 16) | 0xFFFFFFFF)));
6328 DRFLAC_COPY_MEMORY(pStreamInfo->md5, md5, sizeof(md5));
6330 return DRFLAC_TRUE;
6331}
6334static void* drflac__malloc_default(size_t sz, void* pUserData)
6335{
6336 (void)pUserData;
6337 return DRFLAC_MALLOC(sz);
6338}
6340static void* drflac__realloc_default(void* p, size_t sz, void* pUserData)
6341{
6342 (void)pUserData;
6343 return DRFLAC_REALLOC(p, sz);
6344}
6346static void drflac__free_default(void* p, void* pUserData)
6347{
6348 (void)pUserData;
6349 DRFLAC_FREE(p);
6350}
6353static void* drflac__malloc_from_callbacks(size_t sz, const drflac_allocation_callbacks* pAllocationCallbacks)
6354{
6355 if (pAllocationCallbacks == NULL) {
6356 return NULL;
6357 }
6359 if (pAllocationCallbacks->onMalloc != NULL) {
6360 return pAllocationCallbacks->onMalloc(sz, pAllocationCallbacks->pUserData);
6361 }
6363 /* Try using realloc(). */
6364 if (pAllocationCallbacks->onRealloc != NULL) {
6365 return pAllocationCallbacks->onRealloc(NULL, sz, pAllocationCallbacks->pUserData);
6366 }
6368 return NULL;
6369}
6371static void* drflac__realloc_from_callbacks(void* p, size_t szNew, size_t szOld, const drflac_allocation_callbacks* pAllocationCallbacks)
6372{
6373 if (pAllocationCallbacks == NULL) {
6374 return NULL;
6375 }
6377 if (pAllocationCallbacks->onRealloc != NULL) {
6378 return pAllocationCallbacks->onRealloc(p, szNew, pAllocationCallbacks->pUserData);
6379 }
6381 /* Try emulating realloc() in terms of malloc()/free(). */
6382 if (pAllocationCallbacks->onMalloc != NULL && pAllocationCallbacks->onFree != NULL) {
6383 void* p2;
6385 p2 = pAllocationCallbacks->onMalloc(szNew, pAllocationCallbacks->pUserData);
6386 if (p2 == NULL) {
6387 return NULL;
6388 }
6390 if (p != NULL) {
6391 DRFLAC_COPY_MEMORY(p2, p, szOld);
6392 pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
6393 }
6395 return p2;
6396 }
6398 return NULL;
6399}
6401static void drflac__free_from_callbacks(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
6402{
6403 if (p == NULL || pAllocationCallbacks == NULL) {
6404 return;
6405 }
6407 if (pAllocationCallbacks->onFree != NULL) {
6408 pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
6409 }
6410}
6413static drflac_bool32 drflac__read_and_decode_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_uint64* pFirstFramePos, drflac_uint64* pSeektablePos, drflac_uint32* pSeekpointCount, drflac_allocation_callbacks* pAllocationCallbacks)
6414{
6415 /*
6416 We want to keep track of the byte position in the stream of the seektable. At the time of calling this function we know that
6417 we'll be sitting on byte 42.
6418 */
6419 drflac_uint64 runningFilePos = 42;
6420 drflac_uint64 seektablePos = 0;
6421 drflac_uint32 seektableSize = 0;
6423 (void)onTell;
6425 for (;;) {
6426 drflac_metadata metadata;
6427 drflac_uint8 isLastBlock = 0;
6428 drflac_uint8 blockType = 0;
6429 drflac_uint32 blockSize;
6430 if (drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize) == DRFLAC_FALSE) {
6431 return DRFLAC_FALSE;
6432 }
6433 runningFilePos += 4;
6435 metadata.type = blockType;
6436 metadata.pRawData = NULL;
6437 metadata.rawDataSize = 0;
6439 switch (blockType)
6440 {
6441 case DRFLAC_METADATA_BLOCK_TYPE_APPLICATION:
6442 {
6443 if (blockSize < 4) {
6444 return DRFLAC_FALSE;
6445 }
6447 if (onMeta) {
6448 void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6449 if (pRawData == NULL) {
6450 return DRFLAC_FALSE;
6451 }
6453 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6454 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6455 return DRFLAC_FALSE;
6456 }
6458 metadata.pRawData = pRawData;
6459 metadata.rawDataSize = blockSize;
6460 metadata.data.application.id = drflac__be2host_32(*(drflac_uint32*)pRawData);
6461 metadata.data.application.pData = (const void*)((drflac_uint8*)pRawData + sizeof(drflac_uint32));
6462 metadata.data.application.dataSize = blockSize - sizeof(drflac_uint32);
6463 onMeta(pUserDataMD, &metadata);
6465 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6466 }
6467 } break;
6469 case DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE:
6470 {
6471 seektablePos = runningFilePos;
6472 seektableSize = blockSize;
6474 if (onMeta) {
6475 drflac_uint32 seekpointCount;
6476 drflac_uint32 iSeekpoint;
6477 void* pRawData;
6479 seekpointCount = blockSize/DRFLAC_SEEKPOINT_SIZE_IN_BYTES;
6481 pRawData = drflac__malloc_from_callbacks(seekpointCount * sizeof(drflac_seekpoint), pAllocationCallbacks);
6482 if (pRawData == NULL) {
6483 return DRFLAC_FALSE;
6484 }
6486 /* We need to read seekpoint by seekpoint and do some processing. */
6487 for (iSeekpoint = 0; iSeekpoint < seekpointCount; ++iSeekpoint) {
6488 drflac_seekpoint* pSeekpoint = (drflac_seekpoint*)pRawData + iSeekpoint;
6490 if (onRead(pUserData, pSeekpoint, DRFLAC_SEEKPOINT_SIZE_IN_BYTES) != DRFLAC_SEEKPOINT_SIZE_IN_BYTES) {
6491 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6492 return DRFLAC_FALSE;
6493 }
6495 /* Endian swap. */
6496 pSeekpoint->firstPCMFrame = drflac__be2host_64(pSeekpoint->firstPCMFrame);
6497 pSeekpoint->flacFrameOffset = drflac__be2host_64(pSeekpoint->flacFrameOffset);
6498 pSeekpoint->pcmFrameCount = drflac__be2host_16(pSeekpoint->pcmFrameCount);
6499 }
6501 metadata.pRawData = pRawData;
6502 metadata.rawDataSize = blockSize;
6503 metadata.data.seektable.seekpointCount = seekpointCount;
6504 metadata.data.seektable.pSeekpoints = (const drflac_seekpoint*)pRawData;
6506 onMeta(pUserDataMD, &metadata);
6508 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6509 }
6510 } break;
6512 case DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT:
6513 {
6514 if (blockSize < 8) {
6515 return DRFLAC_FALSE;
6516 }
6518 if (onMeta) {
6519 void* pRawData;
6520 const char* pRunningData;
6521 const char* pRunningDataEnd;
6522 drflac_uint32 i;
6524 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6525 if (pRawData == NULL) {
6526 return DRFLAC_FALSE;
6527 }
6529 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6530 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6531 return DRFLAC_FALSE;
6532 }
6534 metadata.pRawData = pRawData;
6535 metadata.rawDataSize = blockSize;
6537 pRunningData = (const char*)pRawData;
6538 pRunningDataEnd = (const char*)pRawData + blockSize;
6540 metadata.data.vorbis_comment.vendorLength = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6542 /* Need space for the rest of the block */
6543 if ((pRunningDataEnd - pRunningData) - 4 < (drflac_int64)metadata.data.vorbis_comment.vendorLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6544 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6545 return DRFLAC_FALSE;
6546 }
6547 metadata.data.vorbis_comment.vendor = pRunningData; pRunningData += metadata.data.vorbis_comment.vendorLength;
6548 metadata.data.vorbis_comment.commentCount = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6550 /* Need space for 'commentCount' comments after the block, which at minimum is a drflac_uint32 per comment */
6551 if ((pRunningDataEnd - pRunningData) / sizeof(drflac_uint32) < metadata.data.vorbis_comment.commentCount) { /* <-- Note the order of operations to avoid overflow to a valid value */
6552 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6553 return DRFLAC_FALSE;
6554 }
6555 metadata.data.vorbis_comment.pComments = pRunningData;
6557 /* Check that the comments section is valid before passing it to the callback */
6558 for (i = 0; i < metadata.data.vorbis_comment.commentCount; ++i) {
6559 drflac_uint32 commentLength;
6561 if (pRunningDataEnd - pRunningData < 4) {
6562 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6563 return DRFLAC_FALSE;
6564 }
6566 commentLength = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6567 if (pRunningDataEnd - pRunningData < (drflac_int64)commentLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6568 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6569 return DRFLAC_FALSE;
6570 }
6571 pRunningData += commentLength;
6572 }
6574 onMeta(pUserDataMD, &metadata);
6576 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6577 }
6578 } break;
6580 case DRFLAC_METADATA_BLOCK_TYPE_CUESHEET:
6581 {
6582 if (blockSize < 396) {
6583 return DRFLAC_FALSE;
6584 }
6586 if (onMeta) {
6587 void* pRawData;
6588 const char* pRunningData;
6589 const char* pRunningDataEnd;
6590 size_t bufferSize;
6591 drflac_uint8 iTrack;
6592 drflac_uint8 iIndex;
6593 void* pTrackData;
6595 /*
6596 This needs to be loaded in two passes. The first pass is used to calculate the size of the memory allocation
6597 we need for storing the necessary data. The second pass will fill that buffer with usable data.
6598 */
6599 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6600 if (pRawData == NULL) {
6601 return DRFLAC_FALSE;
6602 }
6604 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6605 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6606 return DRFLAC_FALSE;
6607 }
6609 metadata.pRawData = pRawData;
6610 metadata.rawDataSize = blockSize;
6612 pRunningData = (const char*)pRawData;
6613 pRunningDataEnd = (const char*)pRawData + blockSize;
6615 DRFLAC_COPY_MEMORY(metadata.data.cuesheet.catalog, pRunningData, 128); pRunningData += 128;
6616 metadata.data.cuesheet.leadInSampleCount = drflac__be2host_64(*(const drflac_uint64*)pRunningData); pRunningData += 8;
6617 metadata.data.cuesheet.isCD = (pRunningData[0] & 0x80) != 0; pRunningData += 259;
6618 metadata.data.cuesheet.trackCount = pRunningData[0]; pRunningData += 1;
6619 metadata.data.cuesheet.pTrackData = NULL; /* Will be filled later. */
6621 /* Pass 1: Calculate the size of the buffer for the track data. */
6622 {
6623 const char* pRunningDataSaved = pRunningData; /* Will be restored at the end in preparation for the second pass. */
6625 bufferSize = metadata.data.cuesheet.trackCount * DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES;
6627 for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) {
6628 drflac_uint8 indexCount;
6629 drflac_uint32 indexPointSize;
6631 if (pRunningDataEnd - pRunningData < DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES) {
6632 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6633 return DRFLAC_FALSE;
6634 }
6636 /* Skip to the index point count */
6637 pRunningData += 35;
6639 indexCount = pRunningData[0];
6640 pRunningData += 1;
6642 bufferSize += indexCount * sizeof(drflac_cuesheet_track_index);
6644 /* Quick validation check. */
6645 indexPointSize = indexCount * DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES;
6646 if (pRunningDataEnd - pRunningData < (drflac_int64)indexPointSize) {
6647 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6648 return DRFLAC_FALSE;
6649 }
6651 pRunningData += indexPointSize;
6652 }
6654 pRunningData = pRunningDataSaved;
6655 }
6657 /* Pass 2: Allocate a buffer and fill the data. Validation was done in the step above so can be skipped. */
6658 {
6659 char* pRunningTrackData;
6661 pTrackData = drflac__malloc_from_callbacks(bufferSize, pAllocationCallbacks);
6662 if (pTrackData == NULL) {
6663 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6664 return DRFLAC_FALSE;
6665 }
6667 pRunningTrackData = (char*)pTrackData;
6669 for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) {
6670 drflac_uint8 indexCount;
6672 DRFLAC_COPY_MEMORY(pRunningTrackData, pRunningData, DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES);
6673 pRunningData += DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES-1; /* Skip forward, but not beyond the last byte in the CUESHEET_TRACK block which is the index count. */
6674 pRunningTrackData += DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES-1;
6676 /* Grab the index count for the next part. */
6677 indexCount = pRunningData[0];
6678 pRunningData += 1;
6679 pRunningTrackData += 1;
6681 /* Extract each track index. */
6682 for (iIndex = 0; iIndex < indexCount; ++iIndex) {
6683 drflac_cuesheet_track_index* pTrackIndex = (drflac_cuesheet_track_index*)pRunningTrackData;
6685 DRFLAC_COPY_MEMORY(pRunningTrackData, pRunningData, DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES);
6686 pRunningData += DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES;
6687 pRunningTrackData += sizeof(drflac_cuesheet_track_index);
6689 pTrackIndex->offset = drflac__be2host_64(pTrackIndex->offset);
6690 }
6691 }
6693 metadata.data.cuesheet.pTrackData = pTrackData;
6694 }
6696 /* The original data is no longer needed. */
6697 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6698 pRawData = NULL;
6700 onMeta(pUserDataMD, &metadata);
6702 drflac__free_from_callbacks(pTrackData, pAllocationCallbacks);
6703 pTrackData = NULL;
6704 }
6705 } break;
6707 case DRFLAC_METADATA_BLOCK_TYPE_PICTURE:
6708 {
6709 if (blockSize < 32) {
6710 return DRFLAC_FALSE;
6711 }
6713 if (onMeta) {
6714 void* pRawData;
6715 const char* pRunningData;
6716 const char* pRunningDataEnd;
6718 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6719 if (pRawData == NULL) {
6720 return DRFLAC_FALSE;
6721 }
6723 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6724 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6725 return DRFLAC_FALSE;
6726 }
6728 metadata.pRawData = pRawData;
6729 metadata.rawDataSize = blockSize;
6731 pRunningData = (const char*)pRawData;
6732 pRunningDataEnd = (const char*)pRawData + blockSize;
6734 metadata.data.picture.type = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6735 metadata.data.picture.mimeLength = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6737 /* Need space for the rest of the block */
6738 if ((pRunningDataEnd - pRunningData) - 24 < (drflac_int64)metadata.data.picture.mimeLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6739 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6740 return DRFLAC_FALSE;
6741 }
6742 metadata.data.picture.mime = pRunningData; pRunningData += metadata.data.picture.mimeLength;
6743 metadata.data.picture.descriptionLength = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6745 /* Need space for the rest of the block */
6746 if ((pRunningDataEnd - pRunningData) - 20 < (drflac_int64)metadata.data.picture.descriptionLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6747 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6748 return DRFLAC_FALSE;
6749 }
6750 metadata.data.picture.description = pRunningData; pRunningData += metadata.data.picture.descriptionLength;
6751 metadata.data.picture.width = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6752 metadata.data.picture.height = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6753 metadata.data.picture.colorDepth = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6754 metadata.data.picture.indexColorCount = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6755 metadata.data.picture.pictureDataSize = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6756 metadata.data.picture.pPictureData = (const drflac_uint8*)pRunningData;
6758 /* Need space for the picture after the block */
6759 if (pRunningDataEnd - pRunningData < (drflac_int64)metadata.data.picture.pictureDataSize) { /* <-- Note the order of operations to avoid overflow to a valid value */
6760 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6761 return DRFLAC_FALSE;
6762 }
6764 onMeta(pUserDataMD, &metadata);
6766 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6767 }
6768 } break;
6770 case DRFLAC_METADATA_BLOCK_TYPE_PADDING:
6771 {
6772 if (onMeta) {
6773 metadata.data.padding.unused = 0;
6775 /* Padding doesn't have anything meaningful in it, so just skip over it, but make sure the caller is aware of it by firing the callback. */
6776 if (!onSeek(pUserData, blockSize, DRFLAC_SEEK_CUR)) {
6777 isLastBlock = DRFLAC_TRUE; /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
6778 } else {
6779 onMeta(pUserDataMD, &metadata);
6780 }
6781 }
6782 } break;
6784 case DRFLAC_METADATA_BLOCK_TYPE_INVALID:
6785 {
6786 /* Invalid chunk. Just skip over this one. */
6787 if (onMeta) {
6788 if (!onSeek(pUserData, blockSize, DRFLAC_SEEK_CUR)) {
6789 isLastBlock = DRFLAC_TRUE; /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
6790 }
6791 }
6792 } break;
6794 default:
6795 {
6796 /*
6797 It's an unknown chunk, but not necessarily invalid. There's a chance more metadata blocks might be defined later on, so we
6798 can at the very least report the chunk to the application and let it look at the raw data.
6799 */
6800 if (onMeta) {
6801 void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6802 if (pRawData == NULL) {
6803 return DRFLAC_FALSE;
6804 }
6806 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6807 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6808 return DRFLAC_FALSE;
6809 }
6811 metadata.pRawData = pRawData;
6812 metadata.rawDataSize = blockSize;
6813 onMeta(pUserDataMD, &metadata);
6815 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6816 }
6817 } break;
6818 }
6820 /* If we're not handling metadata, just skip over the block. If we are, it will have been handled earlier in the switch statement above. */
6821 if (onMeta == NULL && blockSize > 0) {
6822 if (!onSeek(pUserData, blockSize, DRFLAC_SEEK_CUR)) {
6823 isLastBlock = DRFLAC_TRUE;
6824 }
6825 }
6827 runningFilePos += blockSize;
6828 if (isLastBlock) {
6829 break;
6830 }
6831 }
6833 *pSeektablePos = seektablePos;
6834 *pSeekpointCount = seektableSize / DRFLAC_SEEKPOINT_SIZE_IN_BYTES;
6835 *pFirstFramePos = runningFilePos;
6837 return DRFLAC_TRUE;
6838}
6840static drflac_bool32 drflac__init_private__native(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
6841{
6842 /* Pre Condition: The bit stream should be sitting just past the 4-byte id header. */
6844 drflac_uint8 isLastBlock;
6845 drflac_uint8 blockType;
6846 drflac_uint32 blockSize;
6848 (void)onSeek;
6850 pInit->container = drflac_container_native;
6852 /* The first metadata block should be the STREAMINFO block. */
6853 if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
6854 return DRFLAC_FALSE;
6855 }
6857 if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
6858 if (!relaxed) {
6859 /* We're opening in strict mode and the first block is not the STREAMINFO block. Error. */
6860 return DRFLAC_FALSE;
6861 } else {
6862 /*
6863 Relaxed mode. To open from here we need to just find the first frame and set the sample rate, etc. to whatever is defined
6864 for that frame.
6865 */
6866 pInit->hasStreamInfoBlock = DRFLAC_FALSE;
6867 pInit->hasMetadataBlocks = DRFLAC_FALSE;
6869 if (!drflac__read_next_flac_frame_header(&pInit->bs, 0, &pInit->firstFrameHeader)) {
6870 return DRFLAC_FALSE; /* Couldn't find a frame. */
6871 }
6873 if (pInit->firstFrameHeader.bitsPerSample == 0) {
6874 return DRFLAC_FALSE; /* Failed to initialize because the first frame depends on the STREAMINFO block, which does not exist. */
6875 }
6877 pInit->sampleRate = pInit->firstFrameHeader.sampleRate;
6878 pInit->channels = drflac__get_channel_count_from_channel_assignment(pInit->firstFrameHeader.channelAssignment);
6879 pInit->bitsPerSample = pInit->firstFrameHeader.bitsPerSample;
6880 pInit->maxBlockSizeInPCMFrames = 65535; /* <-- See notes here: https://xiph.org/flac/format.html#metadata_block_streaminfo */
6881 return DRFLAC_TRUE;
6882 }
6883 } else {
6884 drflac_streaminfo streaminfo;
6885 if (!drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
6886 return DRFLAC_FALSE;
6887 }
6889 pInit->hasStreamInfoBlock = DRFLAC_TRUE;
6890 pInit->sampleRate = streaminfo.sampleRate;
6891 pInit->channels = streaminfo.channels;
6892 pInit->bitsPerSample = streaminfo.bitsPerSample;
6893 pInit->totalPCMFrameCount = streaminfo.totalPCMFrameCount;
6894 pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames; /* Don't care about the min block size - only the max (used for determining the size of the memory allocation). */
6895 pInit->hasMetadataBlocks = !isLastBlock;
6897 if (onMeta) {
6898 drflac_metadata metadata;
6899 metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
6900 metadata.pRawData = NULL;
6901 metadata.rawDataSize = 0;
6902 metadata.data.streaminfo = streaminfo;
6903 onMeta(pUserDataMD, &metadata);
6904 }
6906 return DRFLAC_TRUE;
6907 }
6908}
6910#ifndef DR_FLAC_NO_OGG
6911#define DRFLAC_OGG_MAX_PAGE_SIZE 65307
6912#define DRFLAC_OGG_CAPTURE_PATTERN_CRC32 1605413199 /* CRC-32 of "OggS". */
6914typedef enum
6915{
6916 drflac_ogg_recover_on_crc_mismatch,
6917 drflac_ogg_fail_on_crc_mismatch
6918} drflac_ogg_crc_mismatch_recovery;
6920#ifndef DR_FLAC_NO_CRC
6921static drflac_uint32 drflac__crc32_table[] = {
6922 0x00000000L, 0x04C11DB7L, 0x09823B6EL, 0x0D4326D9L,
6923 0x130476DCL, 0x17C56B6BL, 0x1A864DB2L, 0x1E475005L,
6924 0x2608EDB8L, 0x22C9F00FL, 0x2F8AD6D6L, 0x2B4BCB61L,
6925 0x350C9B64L, 0x31CD86D3L, 0x3C8EA00AL, 0x384FBDBDL,
6926 0x4C11DB70L, 0x48D0C6C7L, 0x4593E01EL, 0x4152FDA9L,
6927 0x5F15ADACL, 0x5BD4B01BL, 0x569796C2L, 0x52568B75L,
6928 0x6A1936C8L, 0x6ED82B7FL, 0x639B0DA6L, 0x675A1011L,
6929 0x791D4014L, 0x7DDC5DA3L, 0x709F7B7AL, 0x745E66CDL,
6930 0x9823B6E0L, 0x9CE2AB57L, 0x91A18D8EL, 0x95609039L,
6931 0x8B27C03CL, 0x8FE6DD8BL, 0x82A5FB52L, 0x8664E6E5L,
6932 0xBE2B5B58L, 0xBAEA46EFL, 0xB7A96036L, 0xB3687D81L,
6933 0xAD2F2D84L, 0xA9EE3033L, 0xA4AD16EAL, 0xA06C0B5DL,
6934 0xD4326D90L, 0xD0F37027L, 0xDDB056FEL, 0xD9714B49L,
6935 0xC7361B4CL, 0xC3F706FBL, 0xCEB42022L, 0xCA753D95L,
6936 0xF23A8028L, 0xF6FB9D9FL, 0xFBB8BB46L, 0xFF79A6F1L,
6937 0xE13EF6F4L, 0xE5FFEB43L, 0xE8BCCD9AL, 0xEC7DD02DL,
6938 0x34867077L, 0x30476DC0L, 0x3D044B19L, 0x39C556AEL,
6939 0x278206ABL, 0x23431B1CL, 0x2E003DC5L, 0x2AC12072L,
6940 0x128E9DCFL, 0x164F8078L, 0x1B0CA6A1L, 0x1FCDBB16L,
6941 0x018AEB13L, 0x054BF6A4L, 0x0808D07DL, 0x0CC9CDCAL,
6942 0x7897AB07L, 0x7C56B6B0L, 0x71159069L, 0x75D48DDEL,
6943 0x6B93DDDBL, 0x6F52C06CL, 0x6211E6B5L, 0x66D0FB02L,
6944 0x5E9F46BFL, 0x5A5E5B08L, 0x571D7DD1L, 0x53DC6066L,
6945 0x4D9B3063L, 0x495A2DD4L, 0x44190B0DL, 0x40D816BAL,
6946 0xACA5C697L, 0xA864DB20L, 0xA527FDF9L, 0xA1E6E04EL,
6947 0xBFA1B04BL, 0xBB60ADFCL, 0xB6238B25L, 0xB2E29692L,
6948 0x8AAD2B2FL, 0x8E6C3698L, 0x832F1041L, 0x87EE0DF6L,
6949 0x99A95DF3L, 0x9D684044L, 0x902B669DL, 0x94EA7B2AL,
6950 0xE0B41DE7L, 0xE4750050L, 0xE9362689L, 0xEDF73B3EL,
6951 0xF3B06B3BL, 0xF771768CL, 0xFA325055L, 0xFEF34DE2L,
6952 0xC6BCF05FL, 0xC27DEDE8L, 0xCF3ECB31L, 0xCBFFD686L,
6953 0xD5B88683L, 0xD1799B34L, 0xDC3ABDEDL, 0xD8FBA05AL,
6954 0x690CE0EEL, 0x6DCDFD59L, 0x608EDB80L, 0x644FC637L,
6955 0x7A089632L, 0x7EC98B85L, 0x738AAD5CL, 0x774BB0EBL,
6956 0x4F040D56L, 0x4BC510E1L, 0x46863638L, 0x42472B8FL,
6957 0x5C007B8AL, 0x58C1663DL, 0x558240E4L, 0x51435D53L,
6958 0x251D3B9EL, 0x21DC2629L, 0x2C9F00F0L, 0x285E1D47L,
6959 0x36194D42L, 0x32D850F5L, 0x3F9B762CL, 0x3B5A6B9BL,
6960 0x0315D626L, 0x07D4CB91L, 0x0A97ED48L, 0x0E56F0FFL,
6961 0x1011A0FAL, 0x14D0BD4DL, 0x19939B94L, 0x1D528623L,
6962 0xF12F560EL, 0xF5EE4BB9L, 0xF8AD6D60L, 0xFC6C70D7L,
6963 0xE22B20D2L, 0xE6EA3D65L, 0xEBA91BBCL, 0xEF68060BL,
6964 0xD727BBB6L, 0xD3E6A601L, 0xDEA580D8L, 0xDA649D6FL,
6965 0xC423CD6AL, 0xC0E2D0DDL, 0xCDA1F604L, 0xC960EBB3L,
6966 0xBD3E8D7EL, 0xB9FF90C9L, 0xB4BCB610L, 0xB07DABA7L,
6967 0xAE3AFBA2L, 0xAAFBE615L, 0xA7B8C0CCL, 0xA379DD7BL,
6968 0x9B3660C6L, 0x9FF77D71L, 0x92B45BA8L, 0x9675461FL,
6969 0x8832161AL, 0x8CF30BADL, 0x81B02D74L, 0x857130C3L,
6970 0x5D8A9099L, 0x594B8D2EL, 0x5408ABF7L, 0x50C9B640L,
6971 0x4E8EE645L, 0x4A4FFBF2L, 0x470CDD2BL, 0x43CDC09CL,
6972 0x7B827D21L, 0x7F436096L, 0x7200464FL, 0x76C15BF8L,
6973 0x68860BFDL, 0x6C47164AL, 0x61043093L, 0x65C52D24L,
6974 0x119B4BE9L, 0x155A565EL, 0x18197087L, 0x1CD86D30L,
6975 0x029F3D35L, 0x065E2082L, 0x0B1D065BL, 0x0FDC1BECL,
6976 0x3793A651L, 0x3352BBE6L, 0x3E119D3FL, 0x3AD08088L,
6977 0x2497D08DL, 0x2056CD3AL, 0x2D15EBE3L, 0x29D4F654L,
6978 0xC5A92679L, 0xC1683BCEL, 0xCC2B1D17L, 0xC8EA00A0L,
6979 0xD6AD50A5L, 0xD26C4D12L, 0xDF2F6BCBL, 0xDBEE767CL,
6980 0xE3A1CBC1L, 0xE760D676L, 0xEA23F0AFL, 0xEEE2ED18L,
6981 0xF0A5BD1DL, 0xF464A0AAL, 0xF9278673L, 0xFDE69BC4L,
6982 0x89B8FD09L, 0x8D79E0BEL, 0x803AC667L, 0x84FBDBD0L,
6983 0x9ABC8BD5L, 0x9E7D9662L, 0x933EB0BBL, 0x97FFAD0CL,
6984 0xAFB010B1L, 0xAB710D06L, 0xA6322BDFL, 0xA2F33668L,
6985 0xBCB4666DL, 0xB8757BDAL, 0xB5365D03L, 0xB1F740B4L
6986};
6987#endif
6989static DRFLAC_INLINE drflac_uint32 drflac_crc32_byte(drflac_uint32 crc32, drflac_uint8 data)
6990{
6991#ifndef DR_FLAC_NO_CRC
6992 return (crc32 << 8) ^ drflac__crc32_table[(drflac_uint8)((crc32 >> 24) & 0xFF) ^ data];
6993#else
6994 (void)data;
6995 return crc32;
6996#endif
6997}
6999#if 0
7000static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint32(drflac_uint32 crc32, drflac_uint32 data)
7001{
7002 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 24) & 0xFF));
7003 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 16) & 0xFF));
7004 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 8) & 0xFF));
7005 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 0) & 0xFF));
7006 return crc32;
7007}
7009static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint64(drflac_uint32 crc32, drflac_uint64 data)
7010{
7011 crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 32) & 0xFFFFFFFF));
7012 crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 0) & 0xFFFFFFFF));
7013 return crc32;
7014}
7015#endif
7017static DRFLAC_INLINE drflac_uint32 drflac_crc32_buffer(drflac_uint32 crc32, drflac_uint8* pData, drflac_uint32 dataSize)
7018{
7019 /* This can be optimized. */
7020 drflac_uint32 i;
7021 for (i = 0; i < dataSize; ++i) {
7022 crc32 = drflac_crc32_byte(crc32, pData[i]);
7023 }
7024 return crc32;
7025}
7028static DRFLAC_INLINE drflac_bool32 drflac_ogg__is_capture_pattern(drflac_uint8 pattern[4])
7029{
7030 return pattern[0] == 'O' && pattern[1] == 'g' && pattern[2] == 'g' && pattern[3] == 'S';
7031}
7033static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_header_size(drflac_ogg_page_header* pHeader)
7034{
7035 return 27 + pHeader->segmentCount;
7036}
7038static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_body_size(drflac_ogg_page_header* pHeader)
7039{
7040 drflac_uint32 pageBodySize = 0;
7041 int i;
7043 for (i = 0; i < pHeader->segmentCount; ++i) {
7044 pageBodySize += pHeader->segmentTable[i];
7045 }
7047 return pageBodySize;
7048}
7050static drflac_result drflac_ogg__read_page_header_after_capture_pattern(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
7051{
7052 drflac_uint8 data[23];
7053 drflac_uint32 i;
7055 DRFLAC_ASSERT(*pCRC32 == DRFLAC_OGG_CAPTURE_PATTERN_CRC32);
7057 if (onRead(pUserData, data, 23) != 23) {
7058 return DRFLAC_AT_END;
7059 }
7060 *pBytesRead += 23;
7062 /*
7063 It's not actually used, but set the capture pattern to 'OggS' for completeness. Not doing this will cause static analysers to complain about
7064 us trying to access uninitialized data. We could alternatively just comment out this member of the drflac_ogg_page_header structure, but I
7065 like to have it map to the structure of the underlying data.
7066 */
7067 pHeader->capturePattern[0] = 'O';
7068 pHeader->capturePattern[1] = 'g';
7069 pHeader->capturePattern[2] = 'g';
7070 pHeader->capturePattern[3] = 'S';
7072 pHeader->structureVersion = data[0];
7073 pHeader->headerType = data[1];
7074 DRFLAC_COPY_MEMORY(&pHeader->granulePosition, &data[ 2], 8);
7075 DRFLAC_COPY_MEMORY(&pHeader->serialNumber, &data[10], 4);
7076 DRFLAC_COPY_MEMORY(&pHeader->sequenceNumber, &data[14], 4);
7077 DRFLAC_COPY_MEMORY(&pHeader->checksum, &data[18], 4);
7078 pHeader->segmentCount = data[22];
7080 /* Calculate the CRC. Note that for the calculation the checksum part of the page needs to be set to 0. */
7081 data[18] = 0;
7082 data[19] = 0;
7083 data[20] = 0;
7084 data[21] = 0;
7086 for (i = 0; i < 23; ++i) {
7087 *pCRC32 = drflac_crc32_byte(*pCRC32, data[i]);
7088 }
7091 if (onRead(pUserData, pHeader->segmentTable, pHeader->segmentCount) != pHeader->segmentCount) {
7092 return DRFLAC_AT_END;
7093 }
7094 *pBytesRead += pHeader->segmentCount;
7096 for (i = 0; i < pHeader->segmentCount; ++i) {
7097 *pCRC32 = drflac_crc32_byte(*pCRC32, pHeader->segmentTable[i]);
7098 }
7100 return DRFLAC_SUCCESS;
7101}
7103static drflac_result drflac_ogg__read_page_header(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
7104{
7105 drflac_uint8 id[4];
7107 *pBytesRead = 0;
7109 if (onRead(pUserData, id, 4) != 4) {
7110 return DRFLAC_AT_END;
7111 }
7112 *pBytesRead += 4;
7114 /* We need to read byte-by-byte until we find the OggS capture pattern. */
7115 for (;;) {
7116 if (drflac_ogg__is_capture_pattern(id)) {
7117 drflac_result result;
7119 *pCRC32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
7121 result = drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, pHeader, pBytesRead, pCRC32);
7122 if (result == DRFLAC_SUCCESS) {
7123 return DRFLAC_SUCCESS;
7124 } else {
7125 if (result == DRFLAC_CRC_MISMATCH) {
7126 continue;
7127 } else {
7128 return result;
7129 }
7130 }
7131 } else {
7132 /* The first 4 bytes did not equal the capture pattern. Read the next byte and try again. */
7133 id[0] = id[1];
7134 id[1] = id[2];
7135 id[2] = id[3];
7136 if (onRead(pUserData, &id[3], 1) != 1) {
7137 return DRFLAC_AT_END;
7138 }
7139 *pBytesRead += 1;
7140 }
7141 }
7142}
7145/*
7146The main part of the Ogg encapsulation is the conversion from the physical Ogg bitstream to the native FLAC bitstream. It works
7147in three general stages: Ogg Physical Bitstream -> Ogg/FLAC Logical Bitstream -> FLAC Native Bitstream. dr_flac is designed
7148in such a way that the core sections assume everything is delivered in native format. Therefore, for each encapsulation type
7149dr_flac is supporting there needs to be a layer sitting on top of the onRead and onSeek callbacks that ensures the bits read from
7150the physical Ogg bitstream are converted and delivered in native FLAC format.
7151*/
7152typedef struct
7153{
7154 drflac_read_proc onRead; /* The original onRead callback from drflac_open() and family. */
7155 drflac_seek_proc onSeek; /* The original onSeek callback from drflac_open() and family. */
7156 drflac_tell_proc onTell; /* The original onTell callback from drflac_open() and family. */
7157 void* pUserData; /* The user data passed on onRead and onSeek. This is the user data that was passed on drflac_open() and family. */
7158 drflac_uint64 currentBytePos; /* The position of the byte we are sitting on in the physical byte stream. Used for efficient seeking. */
7159 drflac_uint64 firstBytePos; /* The position of the first byte in the physical bitstream. Points to the start of the "OggS" identifier of the FLAC bos page. */
7160 drflac_uint32 serialNumber; /* The serial number of the FLAC audio pages. This is determined by the initial header page that was read during initialization. */
7161 drflac_ogg_page_header bosPageHeader; /* Used for seeking. */
7162 drflac_ogg_page_header currentPageHeader;
7163 drflac_uint32 bytesRemainingInPage;
7164 drflac_uint32 pageDataSize;
7165 drflac_uint8 pageData[DRFLAC_OGG_MAX_PAGE_SIZE];
7166} drflac_oggbs; /* oggbs = Ogg Bitstream */
7168static size_t drflac_oggbs__read_physical(drflac_oggbs* oggbs, void* bufferOut, size_t bytesToRead)
7169{
7170 size_t bytesActuallyRead = oggbs->onRead(oggbs->pUserData, bufferOut, bytesToRead);
7171 oggbs->currentBytePos += bytesActuallyRead;
7173 return bytesActuallyRead;
7174}
7176static drflac_bool32 drflac_oggbs__seek_physical(drflac_oggbs* oggbs, drflac_uint64 offset, drflac_seek_origin origin)
7177{
7178 if (origin == DRFLAC_SEEK_SET) {
7179 if (offset <= 0x7FFFFFFF) {
7180 if (!oggbs->onSeek(oggbs->pUserData, (int)offset, DRFLAC_SEEK_SET)) {
7181 return DRFLAC_FALSE;
7182 }
7183 oggbs->currentBytePos = offset;
7185 return DRFLAC_TRUE;
7186 } else {
7187 if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, DRFLAC_SEEK_SET)) {
7188 return DRFLAC_FALSE;
7189 }
7190 oggbs->currentBytePos = offset;
7192 return drflac_oggbs__seek_physical(oggbs, offset - 0x7FFFFFFF, DRFLAC_SEEK_CUR);
7193 }
7194 } else {
7195 while (offset > 0x7FFFFFFF) {
7196 if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, DRFLAC_SEEK_CUR)) {
7197 return DRFLAC_FALSE;
7198 }
7199 oggbs->currentBytePos += 0x7FFFFFFF;
7200 offset -= 0x7FFFFFFF;
7201 }
7203 if (!oggbs->onSeek(oggbs->pUserData, (int)offset, DRFLAC_SEEK_CUR)) { /* <-- Safe cast thanks to the loop above. */
7204 return DRFLAC_FALSE;
7205 }
7206 oggbs->currentBytePos += offset;
7208 return DRFLAC_TRUE;
7209 }
7210}
7212static drflac_bool32 drflac_oggbs__goto_next_page(drflac_oggbs* oggbs, drflac_ogg_crc_mismatch_recovery recoveryMethod)
7213{
7214 drflac_ogg_page_header header;
7215 for (;;) {
7216 drflac_uint32 crc32 = 0;
7217 drflac_uint32 bytesRead;
7218 drflac_uint32 pageBodySize;
7219#ifndef DR_FLAC_NO_CRC
7220 drflac_uint32 actualCRC32;
7221#endif
7223 if (drflac_ogg__read_page_header(oggbs->onRead, oggbs->pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7224 return DRFLAC_FALSE;
7225 }
7226 oggbs->currentBytePos += bytesRead;
7228 pageBodySize = drflac_ogg__get_page_body_size(&header);
7229 if (pageBodySize > DRFLAC_OGG_MAX_PAGE_SIZE) {
7230 continue; /* Invalid page size. Assume it's corrupted and just move to the next page. */
7231 }
7233 if (header.serialNumber != oggbs->serialNumber) {
7234 /* It's not a FLAC page. Skip it. */
7235 if (pageBodySize > 0 && !drflac_oggbs__seek_physical(oggbs, pageBodySize, DRFLAC_SEEK_CUR)) {
7236 return DRFLAC_FALSE;
7237 }
7238 continue;
7239 }
7242 /* We need to read the entire page and then do a CRC check on it. If there's a CRC mismatch we need to skip this page. */
7243 if (drflac_oggbs__read_physical(oggbs, oggbs->pageData, pageBodySize) != pageBodySize) {
7244 return DRFLAC_FALSE;
7245 }
7246 oggbs->pageDataSize = pageBodySize;
7248#ifndef DR_FLAC_NO_CRC
7249 actualCRC32 = drflac_crc32_buffer(crc32, oggbs->pageData, oggbs->pageDataSize);
7250 if (actualCRC32 != header.checksum) {
7251 if (recoveryMethod == drflac_ogg_recover_on_crc_mismatch) {
7252 continue; /* CRC mismatch. Skip this page. */
7253 } else {
7254 /*
7255 Even though we are failing on a CRC mismatch, we still want our stream to be in a good state. Therefore we
7256 go to the next valid page to ensure we're in a good state, but return false to let the caller know that the
7257 seek did not fully complete.
7258 */
7259 drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch);
7260 return DRFLAC_FALSE;
7261 }
7262 }
7263#else
7264 (void)recoveryMethod; /* <-- Silence a warning. */
7265#endif
7267 oggbs->currentPageHeader = header;
7268 oggbs->bytesRemainingInPage = pageBodySize;
7269 return DRFLAC_TRUE;
7270 }
7271}
7273/* Function below is unused at the moment, but I might be re-adding it later. */
7274#if 0
7275static drflac_uint8 drflac_oggbs__get_current_segment_index(drflac_oggbs* oggbs, drflac_uint8* pBytesRemainingInSeg)
7276{
7277 drflac_uint32 bytesConsumedInPage = drflac_ogg__get_page_body_size(&oggbs->currentPageHeader) - oggbs->bytesRemainingInPage;
7278 drflac_uint8 iSeg = 0;
7279 drflac_uint32 iByte = 0;
7280 while (iByte < bytesConsumedInPage) {
7281 drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
7282 if (iByte + segmentSize > bytesConsumedInPage) {
7283 break;
7284 } else {
7285 iSeg += 1;
7286 iByte += segmentSize;
7287 }
7288 }
7290 *pBytesRemainingInSeg = oggbs->currentPageHeader.segmentTable[iSeg] - (drflac_uint8)(bytesConsumedInPage - iByte);
7291 return iSeg;
7292}
7294static drflac_bool32 drflac_oggbs__seek_to_next_packet(drflac_oggbs* oggbs)
7295{
7296 /* The current packet ends when we get to the segment with a lacing value of < 255 which is not at the end of a page. */
7297 for (;;) {
7298 drflac_bool32 atEndOfPage = DRFLAC_FALSE;
7300 drflac_uint8 bytesRemainingInSeg;
7301 drflac_uint8 iFirstSeg = drflac_oggbs__get_current_segment_index(oggbs, &bytesRemainingInSeg);
7303 drflac_uint32 bytesToEndOfPacketOrPage = bytesRemainingInSeg;
7304 for (drflac_uint8 iSeg = iFirstSeg; iSeg < oggbs->currentPageHeader.segmentCount; ++iSeg) {
7305 drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
7306 if (segmentSize < 255) {
7307 if (iSeg == oggbs->currentPageHeader.segmentCount-1) {
7308 atEndOfPage = DRFLAC_TRUE;
7309 }
7311 break;
7312 }
7314 bytesToEndOfPacketOrPage += segmentSize;
7315 }
7317 /*
7318 At this point we will have found either the packet or the end of the page. If were at the end of the page we'll
7319 want to load the next page and keep searching for the end of the packet.
7320 */
7321 drflac_oggbs__seek_physical(oggbs, bytesToEndOfPacketOrPage, DRFLAC_SEEK_CUR);
7322 oggbs->bytesRemainingInPage -= bytesToEndOfPacketOrPage;
7324 if (atEndOfPage) {
7325 /*
7326 We're potentially at the next packet, but we need to check the next page first to be sure because the packet may
7327 straddle pages.
7328 */
7329 if (!drflac_oggbs__goto_next_page(oggbs)) {
7330 return DRFLAC_FALSE;
7331 }
7333 /* If it's a fresh packet it most likely means we're at the next packet. */
7334 if ((oggbs->currentPageHeader.headerType & 0x01) == 0) {
7335 return DRFLAC_TRUE;
7336 }
7337 } else {
7338 /* We're at the next packet. */
7339 return DRFLAC_TRUE;
7340 }
7341 }
7342}
7344static drflac_bool32 drflac_oggbs__seek_to_next_frame(drflac_oggbs* oggbs)
7345{
7346 /* The bitstream should be sitting on the first byte just after the header of the frame. */
7348 /* What we're actually doing here is seeking to the start of the next packet. */
7349 return drflac_oggbs__seek_to_next_packet(oggbs);
7350}
7351#endif
7353static size_t drflac__on_read_ogg(void* pUserData, void* bufferOut, size_t bytesToRead)
7354{
7355 drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
7356 drflac_uint8* pRunningBufferOut = (drflac_uint8*)bufferOut;
7357 size_t bytesRead = 0;
7359 DRFLAC_ASSERT(oggbs != NULL);
7360 DRFLAC_ASSERT(pRunningBufferOut != NULL);
7362 /* Reading is done page-by-page. If we've run out of bytes in the page we need to move to the next one. */
7363 while (bytesRead < bytesToRead) {
7364 size_t bytesRemainingToRead = bytesToRead - bytesRead;
7366 if (oggbs->bytesRemainingInPage >= bytesRemainingToRead) {
7367 DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), bytesRemainingToRead);
7368 bytesRead += bytesRemainingToRead;
7369 oggbs->bytesRemainingInPage -= (drflac_uint32)bytesRemainingToRead;
7370 break;
7371 }
7373 /* If we get here it means some of the requested data is contained in the next pages. */
7374 if (oggbs->bytesRemainingInPage > 0) {
7375 DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), oggbs->bytesRemainingInPage);
7376 bytesRead += oggbs->bytesRemainingInPage;
7377 pRunningBufferOut += oggbs->bytesRemainingInPage;
7378 oggbs->bytesRemainingInPage = 0;
7379 }
7381 DRFLAC_ASSERT(bytesRemainingToRead > 0);
7382 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7383 break; /* Failed to go to the next page. Might have simply hit the end of the stream. */
7384 }
7385 }
7387 return bytesRead;
7388}
7390static drflac_bool32 drflac__on_seek_ogg(void* pUserData, int offset, drflac_seek_origin origin)
7391{
7392 drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
7393 int bytesSeeked = 0;
7395 DRFLAC_ASSERT(oggbs != NULL);
7396 DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
7398 /* Seeking is always forward which makes things a lot simpler. */
7399 if (origin == DRFLAC_SEEK_SET) {
7400 if (!drflac_oggbs__seek_physical(oggbs, (int)oggbs->firstBytePos, DRFLAC_SEEK_SET)) {
7401 return DRFLAC_FALSE;
7402 }
7404 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
7405 return DRFLAC_FALSE;
7406 }
7408 return drflac__on_seek_ogg(pUserData, offset, DRFLAC_SEEK_CUR);
7409 } else if (origin == DRFLAC_SEEK_CUR) {
7410 while (bytesSeeked < offset) {
7411 int bytesRemainingToSeek = offset - bytesSeeked;
7412 DRFLAC_ASSERT(bytesRemainingToSeek >= 0);
7414 if (oggbs->bytesRemainingInPage >= (size_t)bytesRemainingToSeek) {
7415 bytesSeeked += bytesRemainingToSeek;
7416 (void)bytesSeeked; /* <-- Silence a dead store warning emitted by Clang Static Analyzer. */
7417 oggbs->bytesRemainingInPage -= bytesRemainingToSeek;
7418 break;
7419 }
7421 /* If we get here it means some of the requested data is contained in the next pages. */
7422 if (oggbs->bytesRemainingInPage > 0) {
7423 bytesSeeked += (int)oggbs->bytesRemainingInPage;
7424 oggbs->bytesRemainingInPage = 0;
7425 }
7427 DRFLAC_ASSERT(bytesRemainingToSeek > 0);
7428 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
7429 /* Failed to go to the next page. We either hit the end of the stream or had a CRC mismatch. */
7430 return DRFLAC_FALSE;
7431 }
7432 }
7433 } else if (origin == DRFLAC_SEEK_END) {
7434 /* Seeking to the end is not supported. */
7435 return DRFLAC_FALSE;
7436 }
7438 return DRFLAC_TRUE;
7439}
7441static drflac_bool32 drflac__on_tell_ogg(void* pUserData, drflac_int64* pCursor)
7442{
7443 /*
7444 Not implemented for Ogg containers because we don't currently track the byte position of the logical bitstream. To support this, we'll need
7445 to track the position in drflac__on_read_ogg and drflac__on_seek_ogg.
7446 */
7447 (void)pUserData;
7448 (void)pCursor;
7449 return DRFLAC_FALSE;
7450}
7453static drflac_bool32 drflac_ogg__seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
7454{
7455 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
7456 drflac_uint64 originalBytePos;
7457 drflac_uint64 runningGranulePosition;
7458 drflac_uint64 runningFrameBytePos;
7459 drflac_uint64 runningPCMFrameCount;
7461 DRFLAC_ASSERT(oggbs != NULL);
7463 originalBytePos = oggbs->currentBytePos; /* For recovery. Points to the OggS identifier. */
7465 /* First seek to the first frame. */
7466 if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes)) {
7467 return DRFLAC_FALSE;
7468 }
7469 oggbs->bytesRemainingInPage = 0;
7471 runningGranulePosition = 0;
7472 for (;;) {
7473 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7474 drflac_oggbs__seek_physical(oggbs, originalBytePos, DRFLAC_SEEK_SET);
7475 return DRFLAC_FALSE; /* Never did find that sample... */
7476 }
7478 runningFrameBytePos = oggbs->currentBytePos - drflac_ogg__get_page_header_size(&oggbs->currentPageHeader) - oggbs->pageDataSize;
7479 if (oggbs->currentPageHeader.granulePosition >= pcmFrameIndex) {
7480 break; /* The sample is somewhere in the previous page. */
7481 }
7483 /*
7484 At this point we know the sample is not in the previous page. It could possibly be in this page. For simplicity we
7485 disregard any pages that do not begin a fresh packet.
7486 */
7487 if ((oggbs->currentPageHeader.headerType & 0x01) == 0) { /* <-- Is it a fresh page? */
7488 if (oggbs->currentPageHeader.segmentTable[0] >= 2) {
7489 drflac_uint8 firstBytesInPage[2];
7490 firstBytesInPage[0] = oggbs->pageData[0];
7491 firstBytesInPage[1] = oggbs->pageData[1];
7493 if ((firstBytesInPage[0] == 0xFF) && (firstBytesInPage[1] & 0xFC) == 0xF8) { /* <-- Does the page begin with a frame's sync code? */
7494 runningGranulePosition = oggbs->currentPageHeader.granulePosition;
7495 }
7497 continue;
7498 }
7499 }
7500 }
7502 /*
7503 We found the page that that is closest to the sample, so now we need to find it. The first thing to do is seek to the
7504 start of that page. In the loop above we checked that it was a fresh page which means this page is also the start of
7505 a new frame. This property means that after we've seeked to the page we can immediately start looping over frames until
7506 we find the one containing the target sample.
7507 */
7508 if (!drflac_oggbs__seek_physical(oggbs, runningFrameBytePos, DRFLAC_SEEK_SET)) {
7509 return DRFLAC_FALSE;
7510 }
7511 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7512 return DRFLAC_FALSE;
7513 }
7515 /*
7516 At this point we'll be sitting on the first byte of the frame header of the first frame in the page. We just keep
7517 looping over these frames until we find the one containing the sample we're after.
7518 */
7519 runningPCMFrameCount = runningGranulePosition;
7520 for (;;) {
7521 /*
7522 There are two ways to find the sample and seek past irrelevant frames:
7523 1) Use the native FLAC decoder.
7524 2) Use Ogg's framing system.
7526 Both of these options have their own pros and cons. Using the native FLAC decoder is slower because it needs to
7527 do a full decode of the frame. Using Ogg's framing system is faster, but more complicated and involves some code
7528 duplication for the decoding of frame headers.
7530 Another thing to consider is that using the Ogg framing system will perform direct seeking of the physical Ogg
7531 bitstream. This is important to consider because it means we cannot read data from the drflac_bs object using the
7532 standard drflac__*() APIs because that will read in extra data for its own internal caching which in turn breaks
7533 the positioning of the read pointer of the physical Ogg bitstream. Therefore, anything that would normally be read
7534 using the native FLAC decoding APIs, such as drflac__read_next_flac_frame_header(), need to be re-implemented so as to
7535 avoid the use of the drflac_bs object.
7537 Considering these issues, I have decided to use the slower native FLAC decoding method for the following reasons:
7538 1) Seeking is already partially accelerated using Ogg's paging system in the code block above.
7539 2) Seeking in an Ogg encapsulated FLAC stream is probably quite uncommon.
7540 3) Simplicity.
7541 */
7542 drflac_uint64 firstPCMFrameInFLACFrame = 0;
7543 drflac_uint64 lastPCMFrameInFLACFrame = 0;
7544 drflac_uint64 pcmFrameCountInThisFrame;
7546 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
7547 return DRFLAC_FALSE;
7548 }
7550 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
7552 pcmFrameCountInThisFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
7554 /* If we are seeking to the end of the file and we've just hit it, we're done. */
7555 if (pcmFrameIndex == pFlac->totalPCMFrameCount && (runningPCMFrameCount + pcmFrameCountInThisFrame) == pFlac->totalPCMFrameCount) {
7556 drflac_result result = drflac__decode_flac_frame(pFlac);
7557 if (result == DRFLAC_SUCCESS) {
7558 pFlac->currentPCMFrame = pcmFrameIndex;
7559 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
7560 return DRFLAC_TRUE;
7561 } else {
7562 return DRFLAC_FALSE;
7563 }
7564 }
7566 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFrame)) {
7567 /*
7568 The sample should be in this FLAC frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
7569 it never existed and keep iterating.
7570 */
7571 drflac_result result = drflac__decode_flac_frame(pFlac);
7572 if (result == DRFLAC_SUCCESS) {
7573 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
7574 drflac_uint64 pcmFramesToDecode = (size_t)(pcmFrameIndex - runningPCMFrameCount); /* <-- Safe cast because the maximum number of samples in a frame is 65535. */
7575 if (pcmFramesToDecode == 0) {
7576 return DRFLAC_TRUE;
7577 }
7579 pFlac->currentPCMFrame = runningPCMFrameCount;
7581 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
7582 } else {
7583 if (result == DRFLAC_CRC_MISMATCH) {
7584 continue; /* CRC mismatch. Pretend this frame never existed. */
7585 } else {
7586 return DRFLAC_FALSE;
7587 }
7588 }
7589 } else {
7590 /*
7591 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
7592 frame never existed and leave the running sample count untouched.
7593 */
7594 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
7595 if (result == DRFLAC_SUCCESS) {
7596 runningPCMFrameCount += pcmFrameCountInThisFrame;
7597 } else {
7598 if (result == DRFLAC_CRC_MISMATCH) {
7599 continue; /* CRC mismatch. Pretend this frame never existed. */
7600 } else {
7601 return DRFLAC_FALSE;
7602 }
7603 }
7604 }
7605 }
7606}
7610static drflac_bool32 drflac__init_private__ogg(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
7611{
7612 drflac_ogg_page_header header;
7613 drflac_uint32 crc32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
7614 drflac_uint32 bytesRead = 0;
7616 /* Pre Condition: The bit stream should be sitting just past the 4-byte OggS capture pattern. */
7617 (void)relaxed;
7619 pInit->container = drflac_container_ogg;
7620 pInit->oggFirstBytePos = 0;
7622 /*
7623 We'll get here if the first 4 bytes of the stream were the OggS capture pattern, however it doesn't necessarily mean the
7624 stream includes FLAC encoded audio. To check for this we need to scan the beginning-of-stream page markers and check if
7625 any match the FLAC specification. Important to keep in mind that the stream may be multiplexed.
7626 */
7627 if (drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7628 return DRFLAC_FALSE;
7629 }
7630 pInit->runningFilePos += bytesRead;
7632 for (;;) {
7633 int pageBodySize;
7635 /* Break if we're past the beginning of stream page. */
7636 if ((header.headerType & 0x02) == 0) {
7637 return DRFLAC_FALSE;
7638 }
7640 /* Check if it's a FLAC header. */
7641 pageBodySize = drflac_ogg__get_page_body_size(&header);
7642 if (pageBodySize == 51) { /* 51 = the lacing value of the FLAC header packet. */
7643 /* It could be a FLAC page... */
7644 drflac_uint32 bytesRemainingInPage = pageBodySize;
7645 drflac_uint8 packetType;
7647 if (onRead(pUserData, &packetType, 1) != 1) {
7648 return DRFLAC_FALSE;
7649 }
7651 bytesRemainingInPage -= 1;
7652 if (packetType == 0x7F) {
7653 /* Increasingly more likely to be a FLAC page... */
7654 drflac_uint8 sig[4];
7655 if (onRead(pUserData, sig, 4) != 4) {
7656 return DRFLAC_FALSE;
7657 }
7659 bytesRemainingInPage -= 4;
7660 if (sig[0] == 'F' && sig[1] == 'L' && sig[2] == 'A' && sig[3] == 'C') {
7661 /* Almost certainly a FLAC page... */
7662 drflac_uint8 mappingVersion[2];
7663 if (onRead(pUserData, mappingVersion, 2) != 2) {
7664 return DRFLAC_FALSE;
7665 }
7667 if (mappingVersion[0] != 1) {
7668 return DRFLAC_FALSE; /* Only supporting version 1.x of the Ogg mapping. */
7669 }
7671 /*
7672 The next 2 bytes are the non-audio packets, not including this one. We don't care about this because we're going to
7673 be handling it in a generic way based on the serial number and packet types.
7674 */
7675 if (!onSeek(pUserData, 2, DRFLAC_SEEK_CUR)) {
7676 return DRFLAC_FALSE;
7677 }
7679 /* Expecting the native FLAC signature "fLaC". */
7680 if (onRead(pUserData, sig, 4) != 4) {
7681 return DRFLAC_FALSE;
7682 }
7684 if (sig[0] == 'f' && sig[1] == 'L' && sig[2] == 'a' && sig[3] == 'C') {
7685 /* The remaining data in the page should be the STREAMINFO block. */
7686 drflac_streaminfo streaminfo;
7687 drflac_uint8 isLastBlock;
7688 drflac_uint8 blockType;
7689 drflac_uint32 blockSize;
7690 if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
7691 return DRFLAC_FALSE;
7692 }
7694 if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
7695 return DRFLAC_FALSE; /* Invalid block type. First block must be the STREAMINFO block. */
7696 }
7698 if (drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
7699 /* Success! */
7700 pInit->hasStreamInfoBlock = DRFLAC_TRUE;
7701 pInit->sampleRate = streaminfo.sampleRate;
7702 pInit->channels = streaminfo.channels;
7703 pInit->bitsPerSample = streaminfo.bitsPerSample;
7704 pInit->totalPCMFrameCount = streaminfo.totalPCMFrameCount;
7705 pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames;
7706 pInit->hasMetadataBlocks = !isLastBlock;
7708 if (onMeta) {
7709 drflac_metadata metadata;
7710 metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
7711 metadata.pRawData = NULL;
7712 metadata.rawDataSize = 0;
7713 metadata.data.streaminfo = streaminfo;
7714 onMeta(pUserDataMD, &metadata);
7715 }
7717 pInit->runningFilePos += pageBodySize;
7718 pInit->oggFirstBytePos = pInit->runningFilePos - 79; /* Subtracting 79 will place us right on top of the "OggS" identifier of the FLAC bos page. */
7719 pInit->oggSerial = header.serialNumber;
7720 pInit->oggBosHeader = header;
7721 break;
7722 } else {
7723 /* Failed to read STREAMINFO block. Aww, so close... */
7724 return DRFLAC_FALSE;
7725 }
7726 } else {
7727 /* Invalid file. */
7728 return DRFLAC_FALSE;
7729 }
7730 } else {
7731 /* Not a FLAC header. Skip it. */
7732 if (!onSeek(pUserData, bytesRemainingInPage, DRFLAC_SEEK_CUR)) {
7733 return DRFLAC_FALSE;
7734 }
7735 }
7736 } else {
7737 /* Not a FLAC header. Seek past the entire page and move on to the next. */
7738 if (!onSeek(pUserData, bytesRemainingInPage, DRFLAC_SEEK_CUR)) {
7739 return DRFLAC_FALSE;
7740 }
7741 }
7742 } else {
7743 if (!onSeek(pUserData, pageBodySize, DRFLAC_SEEK_CUR)) {
7744 return DRFLAC_FALSE;
7745 }
7746 }
7748 pInit->runningFilePos += pageBodySize;
7751 /* Read the header of the next page. */
7752 if (drflac_ogg__read_page_header(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7753 return DRFLAC_FALSE;
7754 }
7755 pInit->runningFilePos += bytesRead;
7756 }
7758 /*
7759 If we get here it means we found a FLAC audio stream. We should be sitting on the first byte of the header of the next page. The next
7760 packets in the FLAC logical stream contain the metadata. The only thing left to do in the initialization phase for Ogg is to create the
7761 Ogg bistream object.
7762 */
7763 pInit->hasMetadataBlocks = DRFLAC_TRUE; /* <-- Always have at least VORBIS_COMMENT metadata block. */
7764 return DRFLAC_TRUE;
7765}
7766#endif
7768static drflac_bool32 drflac__init_private(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD)
7769{
7770 drflac_bool32 relaxed;
7771 drflac_uint8 id[4];
7773 if (pInit == NULL || onRead == NULL || onSeek == NULL) { /* <-- onTell is optional. */
7774 return DRFLAC_FALSE;
7775 }
7777 DRFLAC_ZERO_MEMORY(pInit, sizeof(*pInit));
7778 pInit->onRead = onRead;
7779 pInit->onSeek = onSeek;
7780 pInit->onTell = onTell;
7781 pInit->onMeta = onMeta;
7782 pInit->container = container;
7783 pInit->pUserData = pUserData;
7784 pInit->pUserDataMD = pUserDataMD;
7786 pInit->bs.onRead = onRead;
7787 pInit->bs.onSeek = onSeek;
7788 pInit->bs.onTell = onTell;
7789 pInit->bs.pUserData = pUserData;
7790 drflac__reset_cache(&pInit->bs);
7793 /* If the container is explicitly defined then we can try opening in relaxed mode. */
7794 relaxed = container != drflac_container_unknown;
7796 /* Skip over any ID3 tags. */
7797 for (;;) {
7798 if (onRead(pUserData, id, 4) != 4) {
7799 return DRFLAC_FALSE; /* Ran out of data. */
7800 }
7801 pInit->runningFilePos += 4;
7803 if (id[0] == 'I' && id[1] == 'D' && id[2] == '3') {
7804 drflac_uint8 header[6];
7805 drflac_uint8 flags;
7806 drflac_uint32 headerSize;
7808 if (onRead(pUserData, header, 6) != 6) {
7809 return DRFLAC_FALSE; /* Ran out of data. */
7810 }
7811 pInit->runningFilePos += 6;
7813 flags = header[1];
7815 DRFLAC_COPY_MEMORY(&headerSize, header+2, 4);
7816 headerSize = drflac__unsynchsafe_32(drflac__be2host_32(headerSize));
7817 if (flags & 0x10) {
7818 headerSize += 10;
7819 }
7821 if (!onSeek(pUserData, headerSize, DRFLAC_SEEK_CUR)) {
7822 return DRFLAC_FALSE; /* Failed to seek past the tag. */
7823 }
7824 pInit->runningFilePos += headerSize;
7825 } else {
7826 break;
7827 }
7828 }
7830 if (id[0] == 'f' && id[1] == 'L' && id[2] == 'a' && id[3] == 'C') {
7831 return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7832 }
7833#ifndef DR_FLAC_NO_OGG
7834 if (id[0] == 'O' && id[1] == 'g' && id[2] == 'g' && id[3] == 'S') {
7835 return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7836 }
7837#endif
7839 /* If we get here it means we likely don't have a header. Try opening in relaxed mode, if applicable. */
7840 if (relaxed) {
7841 if (container == drflac_container_native) {
7842 return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7843 }
7844#ifndef DR_FLAC_NO_OGG
7845 if (container == drflac_container_ogg) {
7846 return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7847 }
7848#endif
7849 }
7851 /* Unsupported container. */
7852 return DRFLAC_FALSE;
7853}
7855static void drflac__init_from_info(drflac* pFlac, const drflac_init_info* pInit)
7856{
7857 DRFLAC_ASSERT(pFlac != NULL);
7858 DRFLAC_ASSERT(pInit != NULL);
7860 DRFLAC_ZERO_MEMORY(pFlac, sizeof(*pFlac));
7861 pFlac->bs = pInit->bs;
7862 pFlac->onMeta = pInit->onMeta;
7863 pFlac->pUserDataMD = pInit->pUserDataMD;
7864 pFlac->maxBlockSizeInPCMFrames = pInit->maxBlockSizeInPCMFrames;
7865 pFlac->sampleRate = pInit->sampleRate;
7866 pFlac->channels = (drflac_uint8)pInit->channels;
7867 pFlac->bitsPerSample = (drflac_uint8)pInit->bitsPerSample;
7868 pFlac->totalPCMFrameCount = pInit->totalPCMFrameCount;
7869 pFlac->container = pInit->container;
7870}
7873static drflac* drflac_open_with_metadata_private(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD, const drflac_allocation_callbacks* pAllocationCallbacks)
7874{
7875 drflac_init_info init;
7876 drflac_uint32 allocationSize;
7877 drflac_uint32 wholeSIMDVectorCountPerChannel;
7878 drflac_uint32 decodedSamplesAllocationSize;
7879#ifndef DR_FLAC_NO_OGG
7880 drflac_oggbs* pOggbs = NULL;
7881#endif
7882 drflac_uint64 firstFramePos;
7883 drflac_uint64 seektablePos;
7884 drflac_uint32 seekpointCount;
7885 drflac_allocation_callbacks allocationCallbacks;
7886 drflac* pFlac;
7888 /* CPU support first. */
7889 drflac__init_cpu_caps();
7891 if (!drflac__init_private(&init, onRead, onSeek, onTell, onMeta, container, pUserData, pUserDataMD)) {
7892 return NULL;
7893 }
7895 if (pAllocationCallbacks != NULL) {
7896 allocationCallbacks = *pAllocationCallbacks;
7897 if (allocationCallbacks.onFree == NULL || (allocationCallbacks.onMalloc == NULL && allocationCallbacks.onRealloc == NULL)) {
7898 return NULL; /* Invalid allocation callbacks. */
7899 }
7900 } else {
7901 allocationCallbacks.pUserData = NULL;
7902 allocationCallbacks.onMalloc = drflac__malloc_default;
7903 allocationCallbacks.onRealloc = drflac__realloc_default;
7904 allocationCallbacks.onFree = drflac__free_default;
7905 }
7908 /*
7909 The size of the allocation for the drflac object needs to be large enough to fit the following:
7910 1) The main members of the drflac structure
7911 2) A block of memory large enough to store the decoded samples of the largest frame in the stream
7912 3) If the container is Ogg, a drflac_oggbs object
7914 The complicated part of the allocation is making sure there's enough room the decoded samples, taking into consideration
7915 the different SIMD instruction sets.
7916 */
7917 allocationSize = sizeof(drflac);
7919 /*
7920 The allocation size for decoded frames depends on the number of 32-bit integers that fit inside the largest SIMD vector
7921 we are supporting.
7922 */
7923 if ((init.maxBlockSizeInPCMFrames % (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) == 0) {
7924 wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32)));
7925 } else {
7926 wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) + 1;
7927 }
7929 decodedSamplesAllocationSize = wholeSIMDVectorCountPerChannel * DRFLAC_MAX_SIMD_VECTOR_SIZE * init.channels;
7931 allocationSize += decodedSamplesAllocationSize;
7932 allocationSize += DRFLAC_MAX_SIMD_VECTOR_SIZE; /* Allocate extra bytes to ensure we have enough for alignment. */
7934#ifndef DR_FLAC_NO_OGG
7935 /* There's additional data required for Ogg streams. */
7936 if (init.container == drflac_container_ogg) {
7937 allocationSize += sizeof(drflac_oggbs);
7939 pOggbs = (drflac_oggbs*)drflac__malloc_from_callbacks(sizeof(*pOggbs), &allocationCallbacks);
7940 if (pOggbs == NULL) {
7941 return NULL; /*DRFLAC_OUT_OF_MEMORY;*/
7942 }
7944 DRFLAC_ZERO_MEMORY(pOggbs, sizeof(*pOggbs));
7945 pOggbs->onRead = onRead;
7946 pOggbs->onSeek = onSeek;
7947 pOggbs->onTell = onTell;
7948 pOggbs->pUserData = pUserData;
7949 pOggbs->currentBytePos = init.oggFirstBytePos;
7950 pOggbs->firstBytePos = init.oggFirstBytePos;
7951 pOggbs->serialNumber = init.oggSerial;
7952 pOggbs->bosPageHeader = init.oggBosHeader;
7953 pOggbs->bytesRemainingInPage = 0;
7954 }
7955#endif
7957 /*
7958 This part is a bit awkward. We need to load the seektable so that it can be referenced in-memory, but I want the drflac object to
7959 consist of only a single heap allocation. To this, the size of the seek table needs to be known, which we determine when reading
7960 and decoding the metadata.
7961 */
7962 firstFramePos = 42; /* <-- We know we are at byte 42 at this point. */
7963 seektablePos = 0;
7964 seekpointCount = 0;
7965 if (init.hasMetadataBlocks) {
7966 drflac_read_proc onReadOverride = onRead;
7967 drflac_seek_proc onSeekOverride = onSeek;
7968 drflac_tell_proc onTellOverride = onTell;
7969 void* pUserDataOverride = pUserData;
7971#ifndef DR_FLAC_NO_OGG
7972 if (init.container == drflac_container_ogg) {
7973 onReadOverride = drflac__on_read_ogg;
7974 onSeekOverride = drflac__on_seek_ogg;
7975 onTellOverride = drflac__on_tell_ogg;
7976 pUserDataOverride = (void*)pOggbs;
7977 }
7978#endif
7980 if (!drflac__read_and_decode_metadata(onReadOverride, onSeekOverride, onTellOverride, onMeta, pUserDataOverride, pUserDataMD, &firstFramePos, &seektablePos, &seekpointCount, &allocationCallbacks)) {
7981 #ifndef DR_FLAC_NO_OGG
7982 drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
7983 #endif
7984 return NULL;
7985 }
7987 allocationSize += seekpointCount * sizeof(drflac_seekpoint);
7988 }
7991 pFlac = (drflac*)drflac__malloc_from_callbacks(allocationSize, &allocationCallbacks);
7992 if (pFlac == NULL) {
7993 #ifndef DR_FLAC_NO_OGG
7994 drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
7995 #endif
7996 return NULL;
7997 }
7999 drflac__init_from_info(pFlac, &init);
8000 pFlac->allocationCallbacks = allocationCallbacks;
8001 pFlac->pDecodedSamples = (drflac_int32*)drflac_align((size_t)pFlac->pExtraData, DRFLAC_MAX_SIMD_VECTOR_SIZE);
8003#ifndef DR_FLAC_NO_OGG
8004 if (init.container == drflac_container_ogg) {
8005 drflac_oggbs* pInternalOggbs = (drflac_oggbs*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize + (seekpointCount * sizeof(drflac_seekpoint)));
8006 DRFLAC_COPY_MEMORY(pInternalOggbs, pOggbs, sizeof(*pOggbs));
8008 /* At this point the pOggbs object has been handed over to pInternalOggbs and can be freed. */
8009 drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
8010 pOggbs = NULL;
8012 /* The Ogg bistream needs to be layered on top of the original bitstream. */
8013 pFlac->bs.onRead = drflac__on_read_ogg;
8014 pFlac->bs.onSeek = drflac__on_seek_ogg;
8015 pFlac->bs.onTell = drflac__on_tell_ogg;
8016 pFlac->bs.pUserData = (void*)pInternalOggbs;
8017 pFlac->_oggbs = (void*)pInternalOggbs;
8018 }
8019#endif
8021 pFlac->firstFLACFramePosInBytes = firstFramePos;
8023 /* NOTE: Seektables are not currently compatible with Ogg encapsulation (Ogg has its own accelerated seeking system). I may change this later, so I'm leaving this here for now. */
8024#ifndef DR_FLAC_NO_OGG
8025 if (init.container == drflac_container_ogg)
8026 {
8027 pFlac->pSeekpoints = NULL;
8028 pFlac->seekpointCount = 0;
8029 }
8030 else
8031#endif
8032 {
8033 /* If we have a seektable we need to load it now, making sure we move back to where we were previously. */
8034 if (seektablePos != 0) {
8035 pFlac->seekpointCount = seekpointCount;
8036 pFlac->pSeekpoints = (drflac_seekpoint*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize);
8038 DRFLAC_ASSERT(pFlac->bs.onSeek != NULL);
8039 DRFLAC_ASSERT(pFlac->bs.onRead != NULL);
8041 /* Seek to the seektable, then just read directly into our seektable buffer. */
8042 if (pFlac->bs.onSeek(pFlac->bs.pUserData, (int)seektablePos, DRFLAC_SEEK_SET)) {
8043 drflac_uint32 iSeekpoint;
8045 for (iSeekpoint = 0; iSeekpoint < seekpointCount; iSeekpoint += 1) {
8046 if (pFlac->bs.onRead(pFlac->bs.pUserData, pFlac->pSeekpoints + iSeekpoint, DRFLAC_SEEKPOINT_SIZE_IN_BYTES) == DRFLAC_SEEKPOINT_SIZE_IN_BYTES) {
8047 /* Endian swap. */
8048 pFlac->pSeekpoints[iSeekpoint].firstPCMFrame = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].firstPCMFrame);
8049 pFlac->pSeekpoints[iSeekpoint].flacFrameOffset = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].flacFrameOffset);
8050 pFlac->pSeekpoints[iSeekpoint].pcmFrameCount = drflac__be2host_16(pFlac->pSeekpoints[iSeekpoint].pcmFrameCount);
8051 } else {
8052 /* Failed to read the seektable. Pretend we don't have one. */
8053 pFlac->pSeekpoints = NULL;
8054 pFlac->seekpointCount = 0;
8055 break;
8056 }
8057 }
8059 /* We need to seek back to where we were. If this fails it's a critical error. */
8060 if (!pFlac->bs.onSeek(pFlac->bs.pUserData, (int)pFlac->firstFLACFramePosInBytes, DRFLAC_SEEK_SET)) {
8061 drflac__free_from_callbacks(pFlac, &allocationCallbacks);
8062 return NULL;
8063 }
8064 } else {
8065 /* Failed to seek to the seektable. Ominous sign, but for now we can just pretend we don't have one. */
8066 pFlac->pSeekpoints = NULL;
8067 pFlac->seekpointCount = 0;
8068 }
8069 }
8070 }
8073 /*
8074 If we get here, but don't have a STREAMINFO block, it means we've opened the stream in relaxed mode and need to decode
8075 the first frame.
8076 */
8077 if (!init.hasStreamInfoBlock) {
8078 pFlac->currentFLACFrame.header = init.firstFrameHeader;
8079 for (;;) {
8080 drflac_result result = drflac__decode_flac_frame(pFlac);
8081 if (result == DRFLAC_SUCCESS) {
8082 break;
8083 } else {
8084 if (result == DRFLAC_CRC_MISMATCH) {
8085 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
8086 drflac__free_from_callbacks(pFlac, &allocationCallbacks);
8087 return NULL;
8088 }
8089 continue;
8090 } else {
8091 drflac__free_from_callbacks(pFlac, &allocationCallbacks);
8092 return NULL;
8093 }
8094 }
8095 }
8096 }
8098 return pFlac;
8099}
8103#ifndef DR_FLAC_NO_STDIO
8104#include <stdio.h>
8105#ifndef DR_FLAC_NO_WCHAR
8106#include <wchar.h> /* For wcslen(), wcsrtombs() */
8107#endif
8109/* Errno */
8110/* drflac_result_from_errno() is only used for fopen() and wfopen() so putting it inside DR_WAV_NO_STDIO for now. If something else needs this later we can move it out. */
8111#include <errno.h>
8112static drflac_result drflac_result_from_errno(int e)
8113{
8114 switch (e)
8115 {
8116 case 0: return DRFLAC_SUCCESS;
8117 #ifdef EPERM
8118 case EPERM: return DRFLAC_INVALID_OPERATION;
8119 #endif
8120 #ifdef ENOENT
8121 case ENOENT: return DRFLAC_DOES_NOT_EXIST;
8122 #endif
8123 #ifdef ESRCH
8124 case ESRCH: return DRFLAC_DOES_NOT_EXIST;
8125 #endif
8126 #ifdef EINTR
8127 case EINTR: return DRFLAC_INTERRUPT;
8128 #endif
8129 #ifdef EIO
8130 case EIO: return DRFLAC_IO_ERROR;
8131 #endif
8132 #ifdef ENXIO
8133 case ENXIO: return DRFLAC_DOES_NOT_EXIST;
8134 #endif
8135 #ifdef E2BIG
8136 case E2BIG: return DRFLAC_INVALID_ARGS;
8137 #endif
8138 #ifdef ENOEXEC
8139 case ENOEXEC: return DRFLAC_INVALID_FILE;
8140 #endif
8141 #ifdef EBADF
8142 case EBADF: return DRFLAC_INVALID_FILE;
8143 #endif
8144 #ifdef ECHILD
8145 case ECHILD: return DRFLAC_ERROR;
8146 #endif
8147 #ifdef EAGAIN
8148 case EAGAIN: return DRFLAC_UNAVAILABLE;
8149 #endif
8150 #ifdef ENOMEM
8151 case ENOMEM: return DRFLAC_OUT_OF_MEMORY;
8152 #endif
8153 #ifdef EACCES
8154 case EACCES: return DRFLAC_ACCESS_DENIED;
8155 #endif
8156 #ifdef EFAULT
8157 case EFAULT: return DRFLAC_BAD_ADDRESS;
8158 #endif
8159 #ifdef ENOTBLK
8160 case ENOTBLK: return DRFLAC_ERROR;
8161 #endif
8162 #ifdef EBUSY
8163 case EBUSY: return DRFLAC_BUSY;
8164 #endif
8165 #ifdef EEXIST
8166 case EEXIST: return DRFLAC_ALREADY_EXISTS;
8167 #endif
8168 #ifdef EXDEV
8169 case EXDEV: return DRFLAC_ERROR;
8170 #endif
8171 #ifdef ENODEV
8172 case ENODEV: return DRFLAC_DOES_NOT_EXIST;
8173 #endif
8174 #ifdef ENOTDIR
8175 case ENOTDIR: return DRFLAC_NOT_DIRECTORY;
8176 #endif
8177 #ifdef EISDIR
8178 case EISDIR: return DRFLAC_IS_DIRECTORY;
8179 #endif
8180 #ifdef EINVAL
8181 case EINVAL: return DRFLAC_INVALID_ARGS;
8182 #endif
8183 #ifdef ENFILE
8184 case ENFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
8185 #endif
8186 #ifdef EMFILE
8187 case EMFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
8188 #endif
8189 #ifdef ENOTTY
8190 case ENOTTY: return DRFLAC_INVALID_OPERATION;
8191 #endif
8192 #ifdef ETXTBSY
8193 case ETXTBSY: return DRFLAC_BUSY;
8194 #endif
8195 #ifdef EFBIG
8196 case EFBIG: return DRFLAC_TOO_BIG;
8197 #endif
8198 #ifdef ENOSPC
8199 case ENOSPC: return DRFLAC_NO_SPACE;
8200 #endif
8201 #ifdef ESPIPE
8202 case ESPIPE: return DRFLAC_BAD_SEEK;
8203 #endif
8204 #ifdef EROFS
8205 case EROFS: return DRFLAC_ACCESS_DENIED;
8206 #endif
8207 #ifdef EMLINK
8208 case EMLINK: return DRFLAC_TOO_MANY_LINKS;
8209 #endif
8210 #ifdef EPIPE
8211 case EPIPE: return DRFLAC_BAD_PIPE;
8212 #endif
8213 #ifdef EDOM
8214 case EDOM: return DRFLAC_OUT_OF_RANGE;
8215 #endif
8216 #ifdef ERANGE
8217 case ERANGE: return DRFLAC_OUT_OF_RANGE;
8218 #endif
8219 #ifdef EDEADLK
8220 case EDEADLK: return DRFLAC_DEADLOCK;
8221 #endif
8222 #ifdef ENAMETOOLONG
8223 case ENAMETOOLONG: return DRFLAC_PATH_TOO_LONG;
8224 #endif
8225 #ifdef ENOLCK
8226 case ENOLCK: return DRFLAC_ERROR;
8227 #endif
8228 #ifdef ENOSYS
8229 case ENOSYS: return DRFLAC_NOT_IMPLEMENTED;
8230 #endif
8231 #if defined(ENOTEMPTY) && ENOTEMPTY != EEXIST /* In AIX, ENOTEMPTY and EEXIST use the same value. */
8232 case ENOTEMPTY: return DRFLAC_DIRECTORY_NOT_EMPTY;
8233 #endif
8234 #ifdef ELOOP
8235 case ELOOP: return DRFLAC_TOO_MANY_LINKS;
8236 #endif
8237 #ifdef ENOMSG
8238 case ENOMSG: return DRFLAC_NO_MESSAGE;
8239 #endif
8240 #ifdef EIDRM
8241 case EIDRM: return DRFLAC_ERROR;
8242 #endif
8243 #ifdef ECHRNG
8244 case ECHRNG: return DRFLAC_ERROR;
8245 #endif
8246 #ifdef EL2NSYNC
8247 case EL2NSYNC: return DRFLAC_ERROR;
8248 #endif
8249 #ifdef EL3HLT
8250 case EL3HLT: return DRFLAC_ERROR;
8251 #endif
8252 #ifdef EL3RST
8253 case EL3RST: return DRFLAC_ERROR;
8254 #endif
8255 #ifdef ELNRNG
8256 case ELNRNG: return DRFLAC_OUT_OF_RANGE;
8257 #endif
8258 #ifdef EUNATCH
8259 case EUNATCH: return DRFLAC_ERROR;
8260 #endif
8261 #ifdef ENOCSI
8262 case ENOCSI: return DRFLAC_ERROR;
8263 #endif
8264 #ifdef EL2HLT
8265 case EL2HLT: return DRFLAC_ERROR;
8266 #endif
8267 #ifdef EBADE
8268 case EBADE: return DRFLAC_ERROR;
8269 #endif
8270 #ifdef EBADR
8271 case EBADR: return DRFLAC_ERROR;
8272 #endif
8273 #ifdef EXFULL
8274 case EXFULL: return DRFLAC_ERROR;
8275 #endif
8276 #ifdef ENOANO
8277 case ENOANO: return DRFLAC_ERROR;
8278 #endif
8279 #ifdef EBADRQC
8280 case EBADRQC: return DRFLAC_ERROR;
8281 #endif
8282 #ifdef EBADSLT
8283 case EBADSLT: return DRFLAC_ERROR;
8284 #endif
8285 #ifdef EBFONT
8286 case EBFONT: return DRFLAC_INVALID_FILE;
8287 #endif
8288 #ifdef ENOSTR
8289 case ENOSTR: return DRFLAC_ERROR;
8290 #endif
8291 #ifdef ENODATA
8292 case ENODATA: return DRFLAC_NO_DATA_AVAILABLE;
8293 #endif
8294 #ifdef ETIME
8295 case ETIME: return DRFLAC_TIMEOUT;
8296 #endif
8297 #ifdef ENOSR
8298 case ENOSR: return DRFLAC_NO_DATA_AVAILABLE;
8299 #endif
8300 #ifdef ENONET
8301 case ENONET: return DRFLAC_NO_NETWORK;
8302 #endif
8303 #ifdef ENOPKG
8304 case ENOPKG: return DRFLAC_ERROR;
8305 #endif
8306 #ifdef EREMOTE
8307 case EREMOTE: return DRFLAC_ERROR;
8308 #endif
8309 #ifdef ENOLINK
8310 case ENOLINK: return DRFLAC_ERROR;
8311 #endif
8312 #ifdef EADV
8313 case EADV: return DRFLAC_ERROR;
8314 #endif
8315 #ifdef ESRMNT
8316 case ESRMNT: return DRFLAC_ERROR;
8317 #endif
8318 #ifdef ECOMM
8319 case ECOMM: return DRFLAC_ERROR;
8320 #endif
8321 #ifdef EPROTO
8322 case EPROTO: return DRFLAC_ERROR;
8323 #endif
8324 #ifdef EMULTIHOP
8325 case EMULTIHOP: return DRFLAC_ERROR;
8326 #endif
8327 #ifdef EDOTDOT
8328 case EDOTDOT: return DRFLAC_ERROR;
8329 #endif
8330 #ifdef EBADMSG
8331 case EBADMSG: return DRFLAC_BAD_MESSAGE;
8332 #endif
8333 #ifdef EOVERFLOW
8334 case EOVERFLOW: return DRFLAC_TOO_BIG;
8335 #endif
8336 #ifdef ENOTUNIQ
8337 case ENOTUNIQ: return DRFLAC_NOT_UNIQUE;
8338 #endif
8339 #ifdef EBADFD
8340 case EBADFD: return DRFLAC_ERROR;
8341 #endif
8342 #ifdef EREMCHG
8343 case EREMCHG: return DRFLAC_ERROR;
8344 #endif
8345 #ifdef ELIBACC
8346 case ELIBACC: return DRFLAC_ACCESS_DENIED;
8347 #endif
8348 #ifdef ELIBBAD
8349 case ELIBBAD: return DRFLAC_INVALID_FILE;
8350 #endif
8351 #ifdef ELIBSCN
8352 case ELIBSCN: return DRFLAC_INVALID_FILE;
8353 #endif
8354 #ifdef ELIBMAX
8355 case ELIBMAX: return DRFLAC_ERROR;
8356 #endif
8357 #ifdef ELIBEXEC
8358 case ELIBEXEC: return DRFLAC_ERROR;
8359 #endif
8360 #ifdef EILSEQ
8361 case EILSEQ: return DRFLAC_INVALID_DATA;
8362 #endif
8363 #ifdef ERESTART
8364 case ERESTART: return DRFLAC_ERROR;
8365 #endif
8366 #ifdef ESTRPIPE
8367 case ESTRPIPE: return DRFLAC_ERROR;
8368 #endif
8369 #ifdef EUSERS
8370 case EUSERS: return DRFLAC_ERROR;
8371 #endif
8372 #ifdef ENOTSOCK
8373 case ENOTSOCK: return DRFLAC_NOT_SOCKET;
8374 #endif
8375 #ifdef EDESTADDRREQ
8376 case EDESTADDRREQ: return DRFLAC_NO_ADDRESS;
8377 #endif
8378 #ifdef EMSGSIZE
8379 case EMSGSIZE: return DRFLAC_TOO_BIG;
8380 #endif
8381 #ifdef EPROTOTYPE
8382 case EPROTOTYPE: return DRFLAC_BAD_PROTOCOL;
8383 #endif
8384 #ifdef ENOPROTOOPT
8385 case ENOPROTOOPT: return DRFLAC_PROTOCOL_UNAVAILABLE;
8386 #endif
8387 #ifdef EPROTONOSUPPORT
8388 case EPROTONOSUPPORT: return DRFLAC_PROTOCOL_NOT_SUPPORTED;
8389 #endif
8390 #ifdef ESOCKTNOSUPPORT
8391 case ESOCKTNOSUPPORT: return DRFLAC_SOCKET_NOT_SUPPORTED;
8392 #endif
8393 #ifdef EOPNOTSUPP
8394 case EOPNOTSUPP: return DRFLAC_INVALID_OPERATION;
8395 #endif
8396 #ifdef EPFNOSUPPORT
8397 case EPFNOSUPPORT: return DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED;
8398 #endif
8399 #ifdef EAFNOSUPPORT
8400 case EAFNOSUPPORT: return DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED;
8401 #endif
8402 #ifdef EADDRINUSE
8403 case EADDRINUSE: return DRFLAC_ALREADY_IN_USE;
8404 #endif
8405 #ifdef EADDRNOTAVAIL
8406 case EADDRNOTAVAIL: return DRFLAC_ERROR;
8407 #endif
8408 #ifdef ENETDOWN
8409 case ENETDOWN: return DRFLAC_NO_NETWORK;
8410 #endif
8411 #ifdef ENETUNREACH
8412 case ENETUNREACH: return DRFLAC_NO_NETWORK;
8413 #endif
8414 #ifdef ENETRESET
8415 case ENETRESET: return DRFLAC_NO_NETWORK;
8416 #endif
8417 #ifdef ECONNABORTED
8418 case ECONNABORTED: return DRFLAC_NO_NETWORK;
8419 #endif
8420 #ifdef ECONNRESET
8421 case ECONNRESET: return DRFLAC_CONNECTION_RESET;
8422 #endif
8423 #ifdef ENOBUFS
8424 case ENOBUFS: return DRFLAC_NO_SPACE;
8425 #endif
8426 #ifdef EISCONN
8427 case EISCONN: return DRFLAC_ALREADY_CONNECTED;
8428 #endif
8429 #ifdef ENOTCONN
8430 case ENOTCONN: return DRFLAC_NOT_CONNECTED;
8431 #endif
8432 #ifdef ESHUTDOWN
8433 case ESHUTDOWN: return DRFLAC_ERROR;
8434 #endif
8435 #ifdef ETOOMANYREFS
8436 case ETOOMANYREFS: return DRFLAC_ERROR;
8437 #endif
8438 #ifdef ETIMEDOUT
8439 case ETIMEDOUT: return DRFLAC_TIMEOUT;
8440 #endif
8441 #ifdef ECONNREFUSED
8442 case ECONNREFUSED: return DRFLAC_CONNECTION_REFUSED;
8443 #endif
8444 #ifdef EHOSTDOWN
8445 case EHOSTDOWN: return DRFLAC_NO_HOST;
8446 #endif
8447 #ifdef EHOSTUNREACH
8448 case EHOSTUNREACH: return DRFLAC_NO_HOST;
8449 #endif
8450 #ifdef EALREADY
8451 case EALREADY: return DRFLAC_IN_PROGRESS;
8452 #endif
8453 #ifdef EINPROGRESS
8454 case EINPROGRESS: return DRFLAC_IN_PROGRESS;
8455 #endif
8456 #ifdef ESTALE
8457 case ESTALE: return DRFLAC_INVALID_FILE;
8458 #endif
8459 #ifdef EUCLEAN
8460 case EUCLEAN: return DRFLAC_ERROR;
8461 #endif
8462 #ifdef ENOTNAM
8463 case ENOTNAM: return DRFLAC_ERROR;
8464 #endif
8465 #ifdef ENAVAIL
8466 case ENAVAIL: return DRFLAC_ERROR;
8467 #endif
8468 #ifdef EISNAM
8469 case EISNAM: return DRFLAC_ERROR;
8470 #endif
8471 #ifdef EREMOTEIO
8472 case EREMOTEIO: return DRFLAC_IO_ERROR;
8473 #endif
8474 #ifdef EDQUOT
8475 case EDQUOT: return DRFLAC_NO_SPACE;
8476 #endif
8477 #ifdef ENOMEDIUM
8478 case ENOMEDIUM: return DRFLAC_DOES_NOT_EXIST;
8479 #endif
8480 #ifdef EMEDIUMTYPE
8481 case EMEDIUMTYPE: return DRFLAC_ERROR;
8482 #endif
8483 #ifdef ECANCELED
8484 case ECANCELED: return DRFLAC_CANCELLED;
8485 #endif
8486 #ifdef ENOKEY
8487 case ENOKEY: return DRFLAC_ERROR;
8488 #endif
8489 #ifdef EKEYEXPIRED
8490 case EKEYEXPIRED: return DRFLAC_ERROR;
8491 #endif
8492 #ifdef EKEYREVOKED
8493 case EKEYREVOKED: return DRFLAC_ERROR;
8494 #endif
8495 #ifdef EKEYREJECTED
8496 case EKEYREJECTED: return DRFLAC_ERROR;
8497 #endif
8498 #ifdef EOWNERDEAD
8499 case EOWNERDEAD: return DRFLAC_ERROR;
8500 #endif
8501 #ifdef ENOTRECOVERABLE
8502 case ENOTRECOVERABLE: return DRFLAC_ERROR;
8503 #endif
8504 #ifdef ERFKILL
8505 case ERFKILL: return DRFLAC_ERROR;
8506 #endif
8507 #ifdef EHWPOISON
8508 case EHWPOISON: return DRFLAC_ERROR;
8509 #endif
8510 default: return DRFLAC_ERROR;
8511 }
8512}
8513/* End Errno */
8515/* fopen */
8516static drflac_result drflac_fopen(FILE** ppFile, const char* pFilePath, const char* pOpenMode)
8517{
8518#if defined(_MSC_VER) && _MSC_VER >= 1400
8519 errno_t err;
8520#endif
8522 if (ppFile != NULL) {
8523 *ppFile = NULL; /* Safety. */
8524 }
8526 if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
8527 return DRFLAC_INVALID_ARGS;
8528 }
8530#if defined(_MSC_VER) && _MSC_VER >= 1400
8531 err = fopen_s(ppFile, pFilePath, pOpenMode);
8532 if (err != 0) {
8533 return drflac_result_from_errno(err);
8534 }
8535#else
8536#if defined(_WIN32) || defined(__APPLE__)
8537 *ppFile = fopen(pFilePath, pOpenMode);
8538#else
8539 #if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS == 64 && defined(_LARGEFILE64_SOURCE)
8540 *ppFile = fopen64(pFilePath, pOpenMode);
8541 #else
8542 *ppFile = fopen(pFilePath, pOpenMode);
8543 #endif
8544#endif
8545 if (*ppFile == NULL) {
8546 drflac_result result = drflac_result_from_errno(errno);
8547 if (result == DRFLAC_SUCCESS) {
8548 result = DRFLAC_ERROR; /* Just a safety check to make sure we never ever return success when pFile == NULL. */
8549 }
8551 return result;
8552 }
8553#endif
8555 return DRFLAC_SUCCESS;
8556}
8558/*
8559_wfopen() isn't always available in all compilation environments.
8561 * Windows only.
8562 * MSVC seems to support it universally as far back as VC6 from what I can tell (haven't checked further back).
8563 * MinGW-64 (both 32- and 64-bit) seems to support it.
8564 * MinGW wraps it in !defined(__STRICT_ANSI__).
8565 * OpenWatcom wraps it in !defined(_NO_EXT_KEYS).
8567This can be reviewed as compatibility issues arise. The preference is to use _wfopen_s() and _wfopen() as opposed to the wcsrtombs()
8568fallback, so if you notice your compiler not detecting this properly I'm happy to look at adding support.
8569*/
8570#if defined(_WIN32)
8571 #if defined(_MSC_VER) || defined(__MINGW64__) || (!defined(__STRICT_ANSI__) && !defined(_NO_EXT_KEYS))
8572 #define DRFLAC_HAS_WFOPEN
8573 #endif
8574#endif
8576#ifndef DR_FLAC_NO_WCHAR
8577static drflac_result drflac_wfopen(FILE** ppFile, const wchar_t* pFilePath, const wchar_t* pOpenMode, const drflac_allocation_callbacks* pAllocationCallbacks)
8578{
8579 if (ppFile != NULL) {
8580 *ppFile = NULL; /* Safety. */
8581 }
8583 if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
8584 return DRFLAC_INVALID_ARGS;
8585 }
8587#if defined(DRFLAC_HAS_WFOPEN)
8588 {
8589 /* Use _wfopen() on Windows. */
8590 #if defined(_MSC_VER) && _MSC_VER >= 1400
8591 errno_t err = _wfopen_s(ppFile, pFilePath, pOpenMode);
8592 if (err != 0) {
8593 return drflac_result_from_errno(err);
8594 }
8595 #else
8596 *ppFile = _wfopen(pFilePath, pOpenMode);
8597 if (*ppFile == NULL) {
8598 return drflac_result_from_errno(errno);
8599 }
8600 #endif
8601 (void)pAllocationCallbacks;
8602 }
8603#else
8604 /*
8605 Use fopen() on anything other than Windows. Requires a conversion. This is annoying because
8606 fopen() is locale specific. The only real way I can think of to do this is with wcsrtombs(). Note
8607 that wcstombs() is apparently not thread-safe because it uses a static global mbstate_t object for
8608 maintaining state. I've checked this with -std=c89 and it works, but if somebody get's a compiler
8609 error I'll look into improving compatibility.
8610 */
8612 /*
8613 Some compilers don't support wchar_t or wcsrtombs() which we're using below. In this case we just
8614 need to abort with an error. If you encounter a compiler lacking such support, add it to this list
8615 and submit a bug report and it'll be added to the library upstream.
8616 */
8617 #if defined(__DJGPP__)
8618 {
8619 /* Nothing to do here. This will fall through to the error check below. */
8620 }
8621 #else
8622 {
8623 mbstate_t mbs;
8624 size_t lenMB;
8625 const wchar_t* pFilePathTemp = pFilePath;
8626 char* pFilePathMB = NULL;
8627 char pOpenModeMB[32] = {0};
8629 /* Get the length first. */
8630 DRFLAC_ZERO_OBJECT(&mbs);
8631 lenMB = wcsrtombs(NULL, &pFilePathTemp, 0, &mbs);
8632 if (lenMB == (size_t)-1) {
8633 return drflac_result_from_errno(errno);
8634 }
8636 pFilePathMB = (char*)drflac__malloc_from_callbacks(lenMB + 1, pAllocationCallbacks);
8637 if (pFilePathMB == NULL) {
8638 return DRFLAC_OUT_OF_MEMORY;
8639 }
8641 pFilePathTemp = pFilePath;
8642 DRFLAC_ZERO_OBJECT(&mbs);
8643 wcsrtombs(pFilePathMB, &pFilePathTemp, lenMB + 1, &mbs);
8645 /* The open mode should always consist of ASCII characters so we should be able to do a trivial conversion. */
8646 {
8647 size_t i = 0;
8648 for (;;) {
8649 if (pOpenMode[i] == 0) {
8650 pOpenModeMB[i] = '\0';
8651 break;
8652 }
8654 pOpenModeMB[i] = (char)pOpenMode[i];
8655 i += 1;
8656 }
8657 }
8659 *ppFile = fopen(pFilePathMB, pOpenModeMB);
8661 drflac__free_from_callbacks(pFilePathMB, pAllocationCallbacks);
8662 }
8663 #endif
8665 if (*ppFile == NULL) {
8666 return DRFLAC_ERROR;
8667 }
8668#endif
8670 return DRFLAC_SUCCESS;
8671}
8672#endif
8673/* End fopen */
8675static size_t drflac__on_read_stdio(void* pUserData, void* bufferOut, size_t bytesToRead)
8676{
8677 return fread(bufferOut, 1, bytesToRead, (FILE*)pUserData);
8678}
8680static drflac_bool32 drflac__on_seek_stdio(void* pUserData, int offset, drflac_seek_origin origin)
8681{
8682 int whence = SEEK_SET;
8683 if (origin == DRFLAC_SEEK_CUR) {
8684 whence = SEEK_CUR;
8685 } else if (origin == DRFLAC_SEEK_END) {
8686 whence = SEEK_END;
8687 }
8689 return fseek((FILE*)pUserData, offset, whence) == 0;
8690}
8692static drflac_bool32 drflac__on_tell_stdio(void* pUserData, drflac_int64* pCursor)
8693{
8694 FILE* pFileStdio = (FILE*)pUserData;
8695 drflac_int64 result;
8697 /* These were all validated at a higher level. */
8698 DRFLAC_ASSERT(pFileStdio != NULL);
8699 DRFLAC_ASSERT(pCursor != NULL);
8701#if defined(_WIN32)
8702 #if defined(_MSC_VER) && _MSC_VER > 1200
8703 result = _ftelli64(pFileStdio);
8704 #else
8705 result = ftell(pFileStdio);
8706 #endif
8707#else
8708 result = ftell(pFileStdio);
8709#endif
8711 *pCursor = result;
8713 return DRFLAC_TRUE;
8714}
8718DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
8719{
8720 drflac* pFlac;
8721 FILE* pFile;
8723 if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
8724 return NULL;
8725 }
8727 pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, drflac__on_tell_stdio, (void*)pFile, pAllocationCallbacks);
8728 if (pFlac == NULL) {
8729 fclose(pFile);
8730 return NULL;
8731 }
8733 return pFlac;
8734}
8736#ifndef DR_FLAC_NO_WCHAR
8737DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
8738{
8739 drflac* pFlac;
8740 FILE* pFile;
8742 if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
8743 return NULL;
8744 }
8746 pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, drflac__on_tell_stdio, (void*)pFile, pAllocationCallbacks);
8747 if (pFlac == NULL) {
8748 fclose(pFile);
8749 return NULL;
8750 }
8752 return pFlac;
8753}
8754#endif
8756DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8757{
8758 drflac* pFlac;
8759 FILE* pFile;
8761 if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
8762 return NULL;
8763 }
8765 pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, drflac__on_tell_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
8766 if (pFlac == NULL) {
8767 fclose(pFile);
8768 return pFlac;
8769 }
8771 return pFlac;
8772}
8774#ifndef DR_FLAC_NO_WCHAR
8775DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8776{
8777 drflac* pFlac;
8778 FILE* pFile;
8780 if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
8781 return NULL;
8782 }
8784 pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, drflac__on_tell_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
8785 if (pFlac == NULL) {
8786 fclose(pFile);
8787 return pFlac;
8788 }
8790 return pFlac;
8791}
8792#endif
8793#endif /* DR_FLAC_NO_STDIO */
8795static size_t drflac__on_read_memory(void* pUserData, void* bufferOut, size_t bytesToRead)
8796{
8797 drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
8798 size_t bytesRemaining;
8800 DRFLAC_ASSERT(memoryStream != NULL);
8801 DRFLAC_ASSERT(memoryStream->dataSize >= memoryStream->currentReadPos);
8803 bytesRemaining = memoryStream->dataSize - memoryStream->currentReadPos;
8804 if (bytesToRead > bytesRemaining) {
8805 bytesToRead = bytesRemaining;
8806 }
8808 if (bytesToRead > 0) {
8809 DRFLAC_COPY_MEMORY(bufferOut, memoryStream->data + memoryStream->currentReadPos, bytesToRead);
8810 memoryStream->currentReadPos += bytesToRead;
8811 }
8813 return bytesToRead;
8814}
8816static drflac_bool32 drflac__on_seek_memory(void* pUserData, int offset, drflac_seek_origin origin)
8817{
8818 drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
8819 drflac_int64 newCursor;
8821 DRFLAC_ASSERT(memoryStream != NULL);
8823 newCursor = memoryStream->currentReadPos;
8825 if (origin == DRFLAC_SEEK_SET) {
8826 newCursor = 0;
8827 } else if (origin == DRFLAC_SEEK_CUR) {
8828 newCursor = (drflac_int64)memoryStream->currentReadPos;
8829 } else if (origin == DRFLAC_SEEK_END) {
8830 newCursor = (drflac_int64)memoryStream->dataSize;
8831 } else {
8832 DRFLAC_ASSERT(!"Invalid seek origin");
8833 return DRFLAC_FALSE;
8834 }
8836 newCursor += offset;
8838 if (newCursor < 0) {
8839 return DRFLAC_FALSE; /* Trying to seek prior to the start of the buffer. */
8840 }
8841 if ((size_t)newCursor > memoryStream->dataSize) {
8842 return DRFLAC_FALSE; /* Trying to seek beyond the end of the buffer. */
8843 }
8845 memoryStream->currentReadPos = (size_t)newCursor;
8847 return DRFLAC_TRUE;
8848}
8850static drflac_bool32 drflac__on_tell_memory(void* pUserData, drflac_int64* pCursor)
8851{
8852 drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
8854 DRFLAC_ASSERT(memoryStream != NULL);
8855 DRFLAC_ASSERT(pCursor != NULL);
8857 *pCursor = (drflac_int64)memoryStream->currentReadPos;
8858 return DRFLAC_TRUE;
8859}
8861DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks)
8862{
8863 drflac__memory_stream memoryStream;
8864 drflac* pFlac;
8866 memoryStream.data = (const drflac_uint8*)pData;
8867 memoryStream.dataSize = dataSize;
8868 memoryStream.currentReadPos = 0;
8869 pFlac = drflac_open(drflac__on_read_memory, drflac__on_seek_memory, drflac__on_tell_memory, &memoryStream, pAllocationCallbacks);
8870 if (pFlac == NULL) {
8871 return NULL;
8872 }
8874 pFlac->memoryStream = memoryStream;
8876 /* This is an awful hack... */
8877#ifndef DR_FLAC_NO_OGG
8878 if (pFlac->container == drflac_container_ogg)
8879 {
8880 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8881 oggbs->pUserData = &pFlac->memoryStream;
8882 }
8883 else
8884#endif
8885 {
8886 pFlac->bs.pUserData = &pFlac->memoryStream;
8887 }
8889 return pFlac;
8890}
8892DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8893{
8894 drflac__memory_stream memoryStream;
8895 drflac* pFlac;
8897 memoryStream.data = (const drflac_uint8*)pData;
8898 memoryStream.dataSize = dataSize;
8899 memoryStream.currentReadPos = 0;
8900 pFlac = drflac_open_with_metadata_private(drflac__on_read_memory, drflac__on_seek_memory, drflac__on_tell_memory, onMeta, drflac_container_unknown, &memoryStream, pUserData, pAllocationCallbacks);
8901 if (pFlac == NULL) {
8902 return NULL;
8903 }
8905 pFlac->memoryStream = memoryStream;
8907 /* This is an awful hack... */
8908#ifndef DR_FLAC_NO_OGG
8909 if (pFlac->container == drflac_container_ogg)
8910 {
8911 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8912 oggbs->pUserData = &pFlac->memoryStream;
8913 }
8914 else
8915#endif
8916 {
8917 pFlac->bs.pUserData = &pFlac->memoryStream;
8918 }
8920 return pFlac;
8921}
8925DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8926{
8927 return drflac_open_with_metadata_private(onRead, onSeek, onTell, NULL, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
8928}
8929DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8930{
8931 return drflac_open_with_metadata_private(onRead, onSeek, onTell, NULL, container, pUserData, pUserData, pAllocationCallbacks);
8932}
8934DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8935{
8936 return drflac_open_with_metadata_private(onRead, onSeek, onTell, onMeta, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
8937}
8938DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8939{
8940 return drflac_open_with_metadata_private(onRead, onSeek, onTell, onMeta, container, pUserData, pUserData, pAllocationCallbacks);
8941}
8943DRFLAC_API void drflac_close(drflac* pFlac)
8944{
8945 if (pFlac == NULL) {
8946 return;
8947 }
8949#ifndef DR_FLAC_NO_STDIO
8950 /*
8951 If we opened the file with drflac_open_file() we will want to close the file handle. We can know whether or not drflac_open_file()
8952 was used by looking at the callbacks.
8953 */
8954 if (pFlac->bs.onRead == drflac__on_read_stdio) {
8955 fclose((FILE*)pFlac->bs.pUserData);
8956 }
8958#ifndef DR_FLAC_NO_OGG
8959 /* Need to clean up Ogg streams a bit differently due to the way the bit streaming is chained. */
8960 if (pFlac->container == drflac_container_ogg) {
8961 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8962 DRFLAC_ASSERT(pFlac->bs.onRead == drflac__on_read_ogg);
8964 if (oggbs->onRead == drflac__on_read_stdio) {
8965 fclose((FILE*)oggbs->pUserData);
8966 }
8967 }
8968#endif
8969#endif
8971 drflac__free_from_callbacks(pFlac, &pFlac->allocationCallbacks);
8972}
8975#if 0
8976static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8977{
8978 drflac_uint64 i;
8979 for (i = 0; i < frameCount; ++i) {
8980 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
8981 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
8982 drflac_uint32 right = left - side;
8984 pOutputSamples[i*2+0] = (drflac_int32)left;
8985 pOutputSamples[i*2+1] = (drflac_int32)right;
8986 }
8987}
8988#endif
8990static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8991{
8992 drflac_uint64 i;
8993 drflac_uint64 frameCount4 = frameCount >> 2;
8994 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8995 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8996 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
8997 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
8999 for (i = 0; i < frameCount4; ++i) {
9000 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
9001 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
9002 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
9003 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
9005 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
9006 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
9007 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
9008 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
9010 drflac_uint32 right0 = left0 - side0;
9011 drflac_uint32 right1 = left1 - side1;
9012 drflac_uint32 right2 = left2 - side2;
9013 drflac_uint32 right3 = left3 - side3;
9015 pOutputSamples[i*8+0] = (drflac_int32)left0;
9016 pOutputSamples[i*8+1] = (drflac_int32)right0;
9017 pOutputSamples[i*8+2] = (drflac_int32)left1;
9018 pOutputSamples[i*8+3] = (drflac_int32)right1;
9019 pOutputSamples[i*8+4] = (drflac_int32)left2;
9020 pOutputSamples[i*8+5] = (drflac_int32)right2;
9021 pOutputSamples[i*8+6] = (drflac_int32)left3;
9022 pOutputSamples[i*8+7] = (drflac_int32)right3;
9023 }
9025 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9026 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9027 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9028 drflac_uint32 right = left - side;
9030 pOutputSamples[i*2+0] = (drflac_int32)left;
9031 pOutputSamples[i*2+1] = (drflac_int32)right;
9032 }
9033}
9035#if defined(DRFLAC_SUPPORT_SSE2)
9036static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9037{
9038 drflac_uint64 i;
9039 drflac_uint64 frameCount4 = frameCount >> 2;
9040 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9041 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9042 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9043 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9045 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9047 for (i = 0; i < frameCount4; ++i) {
9048 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9049 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9050 __m128i right = _mm_sub_epi32(left, side);
9052 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9053 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9054 }
9056 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9057 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9058 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9059 drflac_uint32 right = left - side;
9061 pOutputSamples[i*2+0] = (drflac_int32)left;
9062 pOutputSamples[i*2+1] = (drflac_int32)right;
9063 }
9064}
9065#endif
9067#if defined(DRFLAC_SUPPORT_NEON)
9068static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9069{
9070 drflac_uint64 i;
9071 drflac_uint64 frameCount4 = frameCount >> 2;
9072 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9073 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9074 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9075 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9076 int32x4_t shift0_4;
9077 int32x4_t shift1_4;
9079 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9081 shift0_4 = vdupq_n_s32(shift0);
9082 shift1_4 = vdupq_n_s32(shift1);
9084 for (i = 0; i < frameCount4; ++i) {
9085 uint32x4_t left;
9086 uint32x4_t side;
9087 uint32x4_t right;
9089 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9090 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9091 right = vsubq_u32(left, side);
9093 drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
9094 }
9096 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9097 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9098 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9099 drflac_uint32 right = left - side;
9101 pOutputSamples[i*2+0] = (drflac_int32)left;
9102 pOutputSamples[i*2+1] = (drflac_int32)right;
9103 }
9104}
9105#endif
9107static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9108{
9109#if defined(DRFLAC_SUPPORT_SSE2)
9110 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9111 drflac_read_pcm_frames_s32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9112 } else
9113#elif defined(DRFLAC_SUPPORT_NEON)
9114 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9115 drflac_read_pcm_frames_s32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9116 } else
9117#endif
9118 {
9119 /* Scalar fallback. */
9120#if 0
9121 drflac_read_pcm_frames_s32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9122#else
9123 drflac_read_pcm_frames_s32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9124#endif
9125 }
9126}
9129#if 0
9130static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9131{
9132 drflac_uint64 i;
9133 for (i = 0; i < frameCount; ++i) {
9134 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9135 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9136 drflac_uint32 left = right + side;
9138 pOutputSamples[i*2+0] = (drflac_int32)left;
9139 pOutputSamples[i*2+1] = (drflac_int32)right;
9140 }
9141}
9142#endif
9144static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9145{
9146 drflac_uint64 i;
9147 drflac_uint64 frameCount4 = frameCount >> 2;
9148 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9149 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9150 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9151 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9153 for (i = 0; i < frameCount4; ++i) {
9154 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
9155 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
9156 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
9157 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
9159 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
9160 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
9161 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
9162 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
9164 drflac_uint32 left0 = right0 + side0;
9165 drflac_uint32 left1 = right1 + side1;
9166 drflac_uint32 left2 = right2 + side2;
9167 drflac_uint32 left3 = right3 + side3;
9169 pOutputSamples[i*8+0] = (drflac_int32)left0;
9170 pOutputSamples[i*8+1] = (drflac_int32)right0;
9171 pOutputSamples[i*8+2] = (drflac_int32)left1;
9172 pOutputSamples[i*8+3] = (drflac_int32)right1;
9173 pOutputSamples[i*8+4] = (drflac_int32)left2;
9174 pOutputSamples[i*8+5] = (drflac_int32)right2;
9175 pOutputSamples[i*8+6] = (drflac_int32)left3;
9176 pOutputSamples[i*8+7] = (drflac_int32)right3;
9177 }
9179 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9180 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9181 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9182 drflac_uint32 left = right + side;
9184 pOutputSamples[i*2+0] = (drflac_int32)left;
9185 pOutputSamples[i*2+1] = (drflac_int32)right;
9186 }
9187}
9189#if defined(DRFLAC_SUPPORT_SSE2)
9190static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9191{
9192 drflac_uint64 i;
9193 drflac_uint64 frameCount4 = frameCount >> 2;
9194 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9195 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9196 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9197 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9199 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9201 for (i = 0; i < frameCount4; ++i) {
9202 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9203 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9204 __m128i left = _mm_add_epi32(right, side);
9206 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9207 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9208 }
9210 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9211 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9212 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9213 drflac_uint32 left = right + side;
9215 pOutputSamples[i*2+0] = (drflac_int32)left;
9216 pOutputSamples[i*2+1] = (drflac_int32)right;
9217 }
9218}
9219#endif
9221#if defined(DRFLAC_SUPPORT_NEON)
9222static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9223{
9224 drflac_uint64 i;
9225 drflac_uint64 frameCount4 = frameCount >> 2;
9226 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9227 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9228 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9229 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9230 int32x4_t shift0_4;
9231 int32x4_t shift1_4;
9233 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9235 shift0_4 = vdupq_n_s32(shift0);
9236 shift1_4 = vdupq_n_s32(shift1);
9238 for (i = 0; i < frameCount4; ++i) {
9239 uint32x4_t side;
9240 uint32x4_t right;
9241 uint32x4_t left;
9243 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9244 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9245 left = vaddq_u32(right, side);
9247 drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
9248 }
9250 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9251 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9252 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9253 drflac_uint32 left = right + side;
9255 pOutputSamples[i*2+0] = (drflac_int32)left;
9256 pOutputSamples[i*2+1] = (drflac_int32)right;
9257 }
9258}
9259#endif
9261static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9262{
9263#if defined(DRFLAC_SUPPORT_SSE2)
9264 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9265 drflac_read_pcm_frames_s32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9266 } else
9267#elif defined(DRFLAC_SUPPORT_NEON)
9268 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9269 drflac_read_pcm_frames_s32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9270 } else
9271#endif
9272 {
9273 /* Scalar fallback. */
9274#if 0
9275 drflac_read_pcm_frames_s32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9276#else
9277 drflac_read_pcm_frames_s32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9278#endif
9279 }
9280}
9283#if 0
9284static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9285{
9286 for (drflac_uint64 i = 0; i < frameCount; ++i) {
9287 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9288 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9290 mid = (mid << 1) | (side & 0x01);
9292 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
9293 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
9294 }
9295}
9296#endif
9298static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9299{
9300 drflac_uint64 i;
9301 drflac_uint64 frameCount4 = frameCount >> 2;
9302 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9303 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9304 drflac_int32 shift = unusedBitsPerSample;
9306 if (shift > 0) {
9307 shift -= 1;
9308 for (i = 0; i < frameCount4; ++i) {
9309 drflac_uint32 temp0L;
9310 drflac_uint32 temp1L;
9311 drflac_uint32 temp2L;
9312 drflac_uint32 temp3L;
9313 drflac_uint32 temp0R;
9314 drflac_uint32 temp1R;
9315 drflac_uint32 temp2R;
9316 drflac_uint32 temp3R;
9318 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9319 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9320 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9321 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9323 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9324 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9325 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9326 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9328 mid0 = (mid0 << 1) | (side0 & 0x01);
9329 mid1 = (mid1 << 1) | (side1 & 0x01);
9330 mid2 = (mid2 << 1) | (side2 & 0x01);
9331 mid3 = (mid3 << 1) | (side3 & 0x01);
9333 temp0L = (mid0 + side0) << shift;
9334 temp1L = (mid1 + side1) << shift;
9335 temp2L = (mid2 + side2) << shift;
9336 temp3L = (mid3 + side3) << shift;
9338 temp0R = (mid0 - side0) << shift;
9339 temp1R = (mid1 - side1) << shift;
9340 temp2R = (mid2 - side2) << shift;
9341 temp3R = (mid3 - side3) << shift;
9343 pOutputSamples[i*8+0] = (drflac_int32)temp0L;
9344 pOutputSamples[i*8+1] = (drflac_int32)temp0R;
9345 pOutputSamples[i*8+2] = (drflac_int32)temp1L;
9346 pOutputSamples[i*8+3] = (drflac_int32)temp1R;
9347 pOutputSamples[i*8+4] = (drflac_int32)temp2L;
9348 pOutputSamples[i*8+5] = (drflac_int32)temp2R;
9349 pOutputSamples[i*8+6] = (drflac_int32)temp3L;
9350 pOutputSamples[i*8+7] = (drflac_int32)temp3R;
9351 }
9352 } else {
9353 for (i = 0; i < frameCount4; ++i) {
9354 drflac_uint32 temp0L;
9355 drflac_uint32 temp1L;
9356 drflac_uint32 temp2L;
9357 drflac_uint32 temp3L;
9358 drflac_uint32 temp0R;
9359 drflac_uint32 temp1R;
9360 drflac_uint32 temp2R;
9361 drflac_uint32 temp3R;
9363 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9364 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9365 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9366 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9368 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9369 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9370 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9371 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9373 mid0 = (mid0 << 1) | (side0 & 0x01);
9374 mid1 = (mid1 << 1) | (side1 & 0x01);
9375 mid2 = (mid2 << 1) | (side2 & 0x01);
9376 mid3 = (mid3 << 1) | (side3 & 0x01);
9378 temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
9379 temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
9380 temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
9381 temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
9383 temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
9384 temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
9385 temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
9386 temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
9388 pOutputSamples[i*8+0] = (drflac_int32)temp0L;
9389 pOutputSamples[i*8+1] = (drflac_int32)temp0R;
9390 pOutputSamples[i*8+2] = (drflac_int32)temp1L;
9391 pOutputSamples[i*8+3] = (drflac_int32)temp1R;
9392 pOutputSamples[i*8+4] = (drflac_int32)temp2L;
9393 pOutputSamples[i*8+5] = (drflac_int32)temp2R;
9394 pOutputSamples[i*8+6] = (drflac_int32)temp3L;
9395 pOutputSamples[i*8+7] = (drflac_int32)temp3R;
9396 }
9397 }
9399 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9400 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9401 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9403 mid = (mid << 1) | (side & 0x01);
9405 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
9406 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
9407 }
9408}
9410#if defined(DRFLAC_SUPPORT_SSE2)
9411static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9412{
9413 drflac_uint64 i;
9414 drflac_uint64 frameCount4 = frameCount >> 2;
9415 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9416 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9417 drflac_int32 shift = unusedBitsPerSample;
9419 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9421 if (shift == 0) {
9422 for (i = 0; i < frameCount4; ++i) {
9423 __m128i mid;
9424 __m128i side;
9425 __m128i left;
9426 __m128i right;
9428 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9429 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9431 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
9433 left = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
9434 right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
9436 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9437 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9438 }
9440 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9441 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9442 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9444 mid = (mid << 1) | (side & 0x01);
9446 pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
9447 pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
9448 }
9449 } else {
9450 shift -= 1;
9451 for (i = 0; i < frameCount4; ++i) {
9452 __m128i mid;
9453 __m128i side;
9454 __m128i left;
9455 __m128i right;
9457 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9458 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9460 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
9462 left = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
9463 right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
9465 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9466 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9467 }
9469 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9470 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9471 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9473 mid = (mid << 1) | (side & 0x01);
9475 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
9476 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
9477 }
9478 }
9479}
9480#endif
9482#if defined(DRFLAC_SUPPORT_NEON)
9483static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9484{
9485 drflac_uint64 i;
9486 drflac_uint64 frameCount4 = frameCount >> 2;
9487 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9488 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9489 drflac_int32 shift = unusedBitsPerSample;
9490 int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
9491 int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
9492 uint32x4_t one4;
9494 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9496 wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9497 wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9498 one4 = vdupq_n_u32(1);
9500 if (shift == 0) {
9501 for (i = 0; i < frameCount4; ++i) {
9502 uint32x4_t mid;
9503 uint32x4_t side;
9504 int32x4_t left;
9505 int32x4_t right;
9507 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
9508 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
9510 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
9512 left = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
9513 right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
9515 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9516 }
9518 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9519 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9520 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9522 mid = (mid << 1) | (side & 0x01);
9524 pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
9525 pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
9526 }
9527 } else {
9528 int32x4_t shift4;
9530 shift -= 1;
9531 shift4 = vdupq_n_s32(shift);
9533 for (i = 0; i < frameCount4; ++i) {
9534 uint32x4_t mid;
9535 uint32x4_t side;
9536 int32x4_t left;
9537 int32x4_t right;
9539 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
9540 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
9542 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
9544 left = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
9545 right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
9547 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9548 }
9550 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9551 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9552 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9554 mid = (mid << 1) | (side & 0x01);
9556 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
9557 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
9558 }
9559 }
9560}
9561#endif
9563static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9564{
9565#if defined(DRFLAC_SUPPORT_SSE2)
9566 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9567 drflac_read_pcm_frames_s32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9568 } else
9569#elif defined(DRFLAC_SUPPORT_NEON)
9570 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9571 drflac_read_pcm_frames_s32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9572 } else
9573#endif
9574 {
9575 /* Scalar fallback. */
9576#if 0
9577 drflac_read_pcm_frames_s32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9578#else
9579 drflac_read_pcm_frames_s32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9580#endif
9581 }
9582}
9585#if 0
9586static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9587{
9588 for (drflac_uint64 i = 0; i < frameCount; ++i) {
9589 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample));
9590 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample));
9591 }
9592}
9593#endif
9595static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9596{
9597 drflac_uint64 i;
9598 drflac_uint64 frameCount4 = frameCount >> 2;
9599 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9600 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9601 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9602 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9604 for (i = 0; i < frameCount4; ++i) {
9605 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
9606 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
9607 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
9608 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
9610 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
9611 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
9612 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
9613 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
9615 pOutputSamples[i*8+0] = (drflac_int32)tempL0;
9616 pOutputSamples[i*8+1] = (drflac_int32)tempR0;
9617 pOutputSamples[i*8+2] = (drflac_int32)tempL1;
9618 pOutputSamples[i*8+3] = (drflac_int32)tempR1;
9619 pOutputSamples[i*8+4] = (drflac_int32)tempL2;
9620 pOutputSamples[i*8+5] = (drflac_int32)tempR2;
9621 pOutputSamples[i*8+6] = (drflac_int32)tempL3;
9622 pOutputSamples[i*8+7] = (drflac_int32)tempR3;
9623 }
9625 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9626 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9627 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9628 }
9629}
9631#if defined(DRFLAC_SUPPORT_SSE2)
9632static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9633{
9634 drflac_uint64 i;
9635 drflac_uint64 frameCount4 = frameCount >> 2;
9636 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9637 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9638 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9639 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9641 for (i = 0; i < frameCount4; ++i) {
9642 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9643 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9645 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9646 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9647 }
9649 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9650 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9651 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9652 }
9653}
9654#endif
9656#if defined(DRFLAC_SUPPORT_NEON)
9657static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9658{
9659 drflac_uint64 i;
9660 drflac_uint64 frameCount4 = frameCount >> 2;
9661 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9662 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9663 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9664 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9666 int32x4_t shift4_0 = vdupq_n_s32(shift0);
9667 int32x4_t shift4_1 = vdupq_n_s32(shift1);
9669 for (i = 0; i < frameCount4; ++i) {
9670 int32x4_t left;
9671 int32x4_t right;
9673 left = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift4_0));
9674 right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift4_1));
9676 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9677 }
9679 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9680 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9681 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9682 }
9683}
9684#endif
9686static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9687{
9688#if defined(DRFLAC_SUPPORT_SSE2)
9689 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9690 drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9691 } else
9692#elif defined(DRFLAC_SUPPORT_NEON)
9693 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9694 drflac_read_pcm_frames_s32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9695 } else
9696#endif
9697 {
9698 /* Scalar fallback. */
9699#if 0
9700 drflac_read_pcm_frames_s32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9701#else
9702 drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9703#endif
9704 }
9705}
9708DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut)
9709{
9710 drflac_uint64 framesRead;
9711 drflac_uint32 unusedBitsPerSample;
9713 if (pFlac == NULL || framesToRead == 0) {
9714 return 0;
9715 }
9717 if (pBufferOut == NULL) {
9718 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
9719 }
9721 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
9722 unusedBitsPerSample = 32 - pFlac->bitsPerSample;
9724 framesRead = 0;
9725 while (framesToRead > 0) {
9726 /* If we've run out of samples in this frame, go to the next. */
9727 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
9728 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
9729 break; /* Couldn't read the next frame, so just break from the loop and return. */
9730 }
9731 } else {
9732 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
9733 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
9734 drflac_uint64 frameCountThisIteration = framesToRead;
9736 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
9737 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
9738 }
9740 if (channelCount == 2) {
9741 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
9742 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
9744 switch (pFlac->currentFLACFrame.header.channelAssignment)
9745 {
9746 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
9747 {
9748 drflac_read_pcm_frames_s32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9749 } break;
9751 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
9752 {
9753 drflac_read_pcm_frames_s32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9754 } break;
9756 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
9757 {
9758 drflac_read_pcm_frames_s32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9759 } break;
9761 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
9762 default:
9763 {
9764 drflac_read_pcm_frames_s32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9765 } break;
9766 }
9767 } else {
9768 /* Generic interleaving. */
9769 drflac_uint64 i;
9770 for (i = 0; i < frameCountThisIteration; ++i) {
9771 unsigned int j;
9772 for (j = 0; j < channelCount; ++j) {
9773 pBufferOut[(i*channelCount)+j] = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
9774 }
9775 }
9776 }
9778 framesRead += frameCountThisIteration;
9779 pBufferOut += frameCountThisIteration * channelCount;
9780 framesToRead -= frameCountThisIteration;
9781 pFlac->currentPCMFrame += frameCountThisIteration;
9782 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
9783 }
9784 }
9786 return framesRead;
9787}
9790#if 0
9791static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9792{
9793 drflac_uint64 i;
9794 for (i = 0; i < frameCount; ++i) {
9795 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9796 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9797 drflac_uint32 right = left - side;
9799 left >>= 16;
9800 right >>= 16;
9802 pOutputSamples[i*2+0] = (drflac_int16)left;
9803 pOutputSamples[i*2+1] = (drflac_int16)right;
9804 }
9805}
9806#endif
9808static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9809{
9810 drflac_uint64 i;
9811 drflac_uint64 frameCount4 = frameCount >> 2;
9812 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9813 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9814 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9815 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9817 for (i = 0; i < frameCount4; ++i) {
9818 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
9819 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
9820 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
9821 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
9823 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
9824 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
9825 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
9826 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
9828 drflac_uint32 right0 = left0 - side0;
9829 drflac_uint32 right1 = left1 - side1;
9830 drflac_uint32 right2 = left2 - side2;
9831 drflac_uint32 right3 = left3 - side3;
9833 left0 >>= 16;
9834 left1 >>= 16;
9835 left2 >>= 16;
9836 left3 >>= 16;
9838 right0 >>= 16;
9839 right1 >>= 16;
9840 right2 >>= 16;
9841 right3 >>= 16;
9843 pOutputSamples[i*8+0] = (drflac_int16)left0;
9844 pOutputSamples[i*8+1] = (drflac_int16)right0;
9845 pOutputSamples[i*8+2] = (drflac_int16)left1;
9846 pOutputSamples[i*8+3] = (drflac_int16)right1;
9847 pOutputSamples[i*8+4] = (drflac_int16)left2;
9848 pOutputSamples[i*8+5] = (drflac_int16)right2;
9849 pOutputSamples[i*8+6] = (drflac_int16)left3;
9850 pOutputSamples[i*8+7] = (drflac_int16)right3;
9851 }
9853 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9854 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9855 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9856 drflac_uint32 right = left - side;
9858 left >>= 16;
9859 right >>= 16;
9861 pOutputSamples[i*2+0] = (drflac_int16)left;
9862 pOutputSamples[i*2+1] = (drflac_int16)right;
9863 }
9864}
9866#if defined(DRFLAC_SUPPORT_SSE2)
9867static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9868{
9869 drflac_uint64 i;
9870 drflac_uint64 frameCount4 = frameCount >> 2;
9871 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9872 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9873 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9874 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9876 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9878 for (i = 0; i < frameCount4; ++i) {
9879 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9880 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9881 __m128i right = _mm_sub_epi32(left, side);
9883 left = _mm_srai_epi32(left, 16);
9884 right = _mm_srai_epi32(right, 16);
9886 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
9887 }
9889 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9890 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9891 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9892 drflac_uint32 right = left - side;
9894 left >>= 16;
9895 right >>= 16;
9897 pOutputSamples[i*2+0] = (drflac_int16)left;
9898 pOutputSamples[i*2+1] = (drflac_int16)right;
9899 }
9900}
9901#endif
9903#if defined(DRFLAC_SUPPORT_NEON)
9904static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9905{
9906 drflac_uint64 i;
9907 drflac_uint64 frameCount4 = frameCount >> 2;
9908 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9909 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9910 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9911 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9912 int32x4_t shift0_4;
9913 int32x4_t shift1_4;
9915 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9917 shift0_4 = vdupq_n_s32(shift0);
9918 shift1_4 = vdupq_n_s32(shift1);
9920 for (i = 0; i < frameCount4; ++i) {
9921 uint32x4_t left;
9922 uint32x4_t side;
9923 uint32x4_t right;
9925 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9926 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9927 right = vsubq_u32(left, side);
9929 left = vshrq_n_u32(left, 16);
9930 right = vshrq_n_u32(right, 16);
9932 drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
9933 }
9935 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9936 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9937 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9938 drflac_uint32 right = left - side;
9940 left >>= 16;
9941 right >>= 16;
9943 pOutputSamples[i*2+0] = (drflac_int16)left;
9944 pOutputSamples[i*2+1] = (drflac_int16)right;
9945 }
9946}
9947#endif
9949static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9950{
9951#if defined(DRFLAC_SUPPORT_SSE2)
9952 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9953 drflac_read_pcm_frames_s16__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9954 } else
9955#elif defined(DRFLAC_SUPPORT_NEON)
9956 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9957 drflac_read_pcm_frames_s16__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9958 } else
9959#endif
9960 {
9961 /* Scalar fallback. */
9962#if 0
9963 drflac_read_pcm_frames_s16__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9964#else
9965 drflac_read_pcm_frames_s16__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9966#endif
9967 }
9968}
9971#if 0
9972static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9973{
9974 drflac_uint64 i;
9975 for (i = 0; i < frameCount; ++i) {
9976 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9977 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9978 drflac_uint32 left = right + side;
9980 left >>= 16;
9981 right >>= 16;
9983 pOutputSamples[i*2+0] = (drflac_int16)left;
9984 pOutputSamples[i*2+1] = (drflac_int16)right;
9985 }
9986}
9987#endif
9989static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9990{
9991 drflac_uint64 i;
9992 drflac_uint64 frameCount4 = frameCount >> 2;
9993 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9994 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9995 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9996 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9998 for (i = 0; i < frameCount4; ++i) {
9999 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
10000 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
10001 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
10002 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
10004 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
10005 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
10006 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
10007 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
10009 drflac_uint32 left0 = right0 + side0;
10010 drflac_uint32 left1 = right1 + side1;
10011 drflac_uint32 left2 = right2 + side2;
10012 drflac_uint32 left3 = right3 + side3;
10014 left0 >>= 16;
10015 left1 >>= 16;
10016 left2 >>= 16;
10017 left3 >>= 16;
10019 right0 >>= 16;
10020 right1 >>= 16;
10021 right2 >>= 16;
10022 right3 >>= 16;
10024 pOutputSamples[i*8+0] = (drflac_int16)left0;
10025 pOutputSamples[i*8+1] = (drflac_int16)right0;
10026 pOutputSamples[i*8+2] = (drflac_int16)left1;
10027 pOutputSamples[i*8+3] = (drflac_int16)right1;
10028 pOutputSamples[i*8+4] = (drflac_int16)left2;
10029 pOutputSamples[i*8+5] = (drflac_int16)right2;
10030 pOutputSamples[i*8+6] = (drflac_int16)left3;
10031 pOutputSamples[i*8+7] = (drflac_int16)right3;
10032 }
10034 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10035 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10036 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10037 drflac_uint32 left = right + side;
10039 left >>= 16;
10040 right >>= 16;
10042 pOutputSamples[i*2+0] = (drflac_int16)left;
10043 pOutputSamples[i*2+1] = (drflac_int16)right;
10044 }
10047#if defined(DRFLAC_SUPPORT_SSE2)
10048static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10050 drflac_uint64 i;
10051 drflac_uint64 frameCount4 = frameCount >> 2;
10052 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10053 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10054 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10055 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10057 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10059 for (i = 0; i < frameCount4; ++i) {
10060 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10061 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10062 __m128i left = _mm_add_epi32(right, side);
10064 left = _mm_srai_epi32(left, 16);
10065 right = _mm_srai_epi32(right, 16);
10067 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10068 }
10070 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10071 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10072 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10073 drflac_uint32 left = right + side;
10075 left >>= 16;
10076 right >>= 16;
10078 pOutputSamples[i*2+0] = (drflac_int16)left;
10079 pOutputSamples[i*2+1] = (drflac_int16)right;
10080 }
10082#endif
10084#if defined(DRFLAC_SUPPORT_NEON)
10085static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10087 drflac_uint64 i;
10088 drflac_uint64 frameCount4 = frameCount >> 2;
10089 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10090 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10091 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10092 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10093 int32x4_t shift0_4;
10094 int32x4_t shift1_4;
10096 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10098 shift0_4 = vdupq_n_s32(shift0);
10099 shift1_4 = vdupq_n_s32(shift1);
10101 for (i = 0; i < frameCount4; ++i) {
10102 uint32x4_t side;
10103 uint32x4_t right;
10104 uint32x4_t left;
10106 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10107 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10108 left = vaddq_u32(right, side);
10110 left = vshrq_n_u32(left, 16);
10111 right = vshrq_n_u32(right, 16);
10113 drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
10114 }
10116 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10117 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10118 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10119 drflac_uint32 left = right + side;
10121 left >>= 16;
10122 right >>= 16;
10124 pOutputSamples[i*2+0] = (drflac_int16)left;
10125 pOutputSamples[i*2+1] = (drflac_int16)right;
10126 }
10128#endif
10130static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10132#if defined(DRFLAC_SUPPORT_SSE2)
10133 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10134 drflac_read_pcm_frames_s16__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10135 } else
10136#elif defined(DRFLAC_SUPPORT_NEON)
10137 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10138 drflac_read_pcm_frames_s16__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10139 } else
10140#endif
10141 {
10142 /* Scalar fallback. */
10143#if 0
10144 drflac_read_pcm_frames_s16__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10145#else
10146 drflac_read_pcm_frames_s16__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10147#endif
10148 }
10152#if 0
10153static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10155 for (drflac_uint64 i = 0; i < frameCount; ++i) {
10156 drflac_uint32 mid = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10157 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10159 mid = (mid << 1) | (side & 0x01);
10161 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
10162 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
10163 }
10165#endif
10167static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10169 drflac_uint64 i;
10170 drflac_uint64 frameCount4 = frameCount >> 2;
10171 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10172 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10173 drflac_uint32 shift = unusedBitsPerSample;
10175 if (shift > 0) {
10176 shift -= 1;
10177 for (i = 0; i < frameCount4; ++i) {
10178 drflac_uint32 temp0L;
10179 drflac_uint32 temp1L;
10180 drflac_uint32 temp2L;
10181 drflac_uint32 temp3L;
10182 drflac_uint32 temp0R;
10183 drflac_uint32 temp1R;
10184 drflac_uint32 temp2R;
10185 drflac_uint32 temp3R;
10187 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10188 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10189 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10190 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10192 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10193 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10194 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10195 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10197 mid0 = (mid0 << 1) | (side0 & 0x01);
10198 mid1 = (mid1 << 1) | (side1 & 0x01);
10199 mid2 = (mid2 << 1) | (side2 & 0x01);
10200 mid3 = (mid3 << 1) | (side3 & 0x01);
10202 temp0L = (mid0 + side0) << shift;
10203 temp1L = (mid1 + side1) << shift;
10204 temp2L = (mid2 + side2) << shift;
10205 temp3L = (mid3 + side3) << shift;
10207 temp0R = (mid0 - side0) << shift;
10208 temp1R = (mid1 - side1) << shift;
10209 temp2R = (mid2 - side2) << shift;
10210 temp3R = (mid3 - side3) << shift;
10212 temp0L >>= 16;
10213 temp1L >>= 16;
10214 temp2L >>= 16;
10215 temp3L >>= 16;
10217 temp0R >>= 16;
10218 temp1R >>= 16;
10219 temp2R >>= 16;
10220 temp3R >>= 16;
10222 pOutputSamples[i*8+0] = (drflac_int16)temp0L;
10223 pOutputSamples[i*8+1] = (drflac_int16)temp0R;
10224 pOutputSamples[i*8+2] = (drflac_int16)temp1L;
10225 pOutputSamples[i*8+3] = (drflac_int16)temp1R;
10226 pOutputSamples[i*8+4] = (drflac_int16)temp2L;
10227 pOutputSamples[i*8+5] = (drflac_int16)temp2R;
10228 pOutputSamples[i*8+6] = (drflac_int16)temp3L;
10229 pOutputSamples[i*8+7] = (drflac_int16)temp3R;
10230 }
10231 } else {
10232 for (i = 0; i < frameCount4; ++i) {
10233 drflac_uint32 temp0L;
10234 drflac_uint32 temp1L;
10235 drflac_uint32 temp2L;
10236 drflac_uint32 temp3L;
10237 drflac_uint32 temp0R;
10238 drflac_uint32 temp1R;
10239 drflac_uint32 temp2R;
10240 drflac_uint32 temp3R;
10242 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10243 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10244 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10245 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10247 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10248 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10249 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10250 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10252 mid0 = (mid0 << 1) | (side0 & 0x01);
10253 mid1 = (mid1 << 1) | (side1 & 0x01);
10254 mid2 = (mid2 << 1) | (side2 & 0x01);
10255 mid3 = (mid3 << 1) | (side3 & 0x01);
10257 temp0L = ((drflac_int32)(mid0 + side0) >> 1);
10258 temp1L = ((drflac_int32)(mid1 + side1) >> 1);
10259 temp2L = ((drflac_int32)(mid2 + side2) >> 1);
10260 temp3L = ((drflac_int32)(mid3 + side3) >> 1);
10262 temp0R = ((drflac_int32)(mid0 - side0) >> 1);
10263 temp1R = ((drflac_int32)(mid1 - side1) >> 1);
10264 temp2R = ((drflac_int32)(mid2 - side2) >> 1);
10265 temp3R = ((drflac_int32)(mid3 - side3) >> 1);
10267 temp0L >>= 16;
10268 temp1L >>= 16;
10269 temp2L >>= 16;
10270 temp3L >>= 16;
10272 temp0R >>= 16;
10273 temp1R >>= 16;
10274 temp2R >>= 16;
10275 temp3R >>= 16;
10277 pOutputSamples[i*8+0] = (drflac_int16)temp0L;
10278 pOutputSamples[i*8+1] = (drflac_int16)temp0R;
10279 pOutputSamples[i*8+2] = (drflac_int16)temp1L;
10280 pOutputSamples[i*8+3] = (drflac_int16)temp1R;
10281 pOutputSamples[i*8+4] = (drflac_int16)temp2L;
10282 pOutputSamples[i*8+5] = (drflac_int16)temp2R;
10283 pOutputSamples[i*8+6] = (drflac_int16)temp3L;
10284 pOutputSamples[i*8+7] = (drflac_int16)temp3R;
10285 }
10286 }
10288 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10289 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10290 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10292 mid = (mid << 1) | (side & 0x01);
10294 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
10295 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
10296 }
10299#if defined(DRFLAC_SUPPORT_SSE2)
10300static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10302 drflac_uint64 i;
10303 drflac_uint64 frameCount4 = frameCount >> 2;
10304 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10305 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10306 drflac_uint32 shift = unusedBitsPerSample;
10308 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10310 if (shift == 0) {
10311 for (i = 0; i < frameCount4; ++i) {
10312 __m128i mid;
10313 __m128i side;
10314 __m128i left;
10315 __m128i right;
10317 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10318 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10320 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10322 left = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
10323 right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
10325 left = _mm_srai_epi32(left, 16);
10326 right = _mm_srai_epi32(right, 16);
10328 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10329 }
10331 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10332 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10333 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10335 mid = (mid << 1) | (side & 0x01);
10337 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
10338 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
10339 }
10340 } else {
10341 shift -= 1;
10342 for (i = 0; i < frameCount4; ++i) {
10343 __m128i mid;
10344 __m128i side;
10345 __m128i left;
10346 __m128i right;
10348 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10349 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10351 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10353 left = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
10354 right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
10356 left = _mm_srai_epi32(left, 16);
10357 right = _mm_srai_epi32(right, 16);
10359 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10360 }
10362 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10363 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10364 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10366 mid = (mid << 1) | (side & 0x01);
10368 pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
10369 pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
10370 }
10371 }
10373#endif
10375#if defined(DRFLAC_SUPPORT_NEON)
10376static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10378 drflac_uint64 i;
10379 drflac_uint64 frameCount4 = frameCount >> 2;
10380 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10381 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10382 drflac_uint32 shift = unusedBitsPerSample;
10383 int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
10384 int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
10386 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10388 wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10389 wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10391 if (shift == 0) {
10392 for (i = 0; i < frameCount4; ++i) {
10393 uint32x4_t mid;
10394 uint32x4_t side;
10395 int32x4_t left;
10396 int32x4_t right;
10398 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
10399 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
10401 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
10403 left = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
10404 right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
10406 left = vshrq_n_s32(left, 16);
10407 right = vshrq_n_s32(right, 16);
10409 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10410 }
10412 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10413 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10414 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10416 mid = (mid << 1) | (side & 0x01);
10418 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
10419 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
10420 }
10421 } else {
10422 int32x4_t shift4;
10424 shift -= 1;
10425 shift4 = vdupq_n_s32(shift);
10427 for (i = 0; i < frameCount4; ++i) {
10428 uint32x4_t mid;
10429 uint32x4_t side;
10430 int32x4_t left;
10431 int32x4_t right;
10433 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
10434 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
10436 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
10438 left = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
10439 right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
10441 left = vshrq_n_s32(left, 16);
10442 right = vshrq_n_s32(right, 16);
10444 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10445 }
10447 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10448 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10449 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10451 mid = (mid << 1) | (side & 0x01);
10453 pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
10454 pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
10455 }
10456 }
10458#endif
10460static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10462#if defined(DRFLAC_SUPPORT_SSE2)
10463 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10464 drflac_read_pcm_frames_s16__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10465 } else
10466#elif defined(DRFLAC_SUPPORT_NEON)
10467 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10468 drflac_read_pcm_frames_s16__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10469 } else
10470#endif
10471 {
10472 /* Scalar fallback. */
10473#if 0
10474 drflac_read_pcm_frames_s16__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10475#else
10476 drflac_read_pcm_frames_s16__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10477#endif
10478 }
10482#if 0
10483static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10485 for (drflac_uint64 i = 0; i < frameCount; ++i) {
10486 pOutputSamples[i*2+0] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) >> 16);
10487 pOutputSamples[i*2+1] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) >> 16);
10488 }
10490#endif
10492static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10494 drflac_uint64 i;
10495 drflac_uint64 frameCount4 = frameCount >> 2;
10496 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10497 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10498 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10499 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10501 for (i = 0; i < frameCount4; ++i) {
10502 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
10503 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
10504 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
10505 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
10507 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
10508 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
10509 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
10510 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
10512 tempL0 >>= 16;
10513 tempL1 >>= 16;
10514 tempL2 >>= 16;
10515 tempL3 >>= 16;
10517 tempR0 >>= 16;
10518 tempR1 >>= 16;
10519 tempR2 >>= 16;
10520 tempR3 >>= 16;
10522 pOutputSamples[i*8+0] = (drflac_int16)tempL0;
10523 pOutputSamples[i*8+1] = (drflac_int16)tempR0;
10524 pOutputSamples[i*8+2] = (drflac_int16)tempL1;
10525 pOutputSamples[i*8+3] = (drflac_int16)tempR1;
10526 pOutputSamples[i*8+4] = (drflac_int16)tempL2;
10527 pOutputSamples[i*8+5] = (drflac_int16)tempR2;
10528 pOutputSamples[i*8+6] = (drflac_int16)tempL3;
10529 pOutputSamples[i*8+7] = (drflac_int16)tempR3;
10530 }
10532 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10533 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10534 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10535 }
10538#if defined(DRFLAC_SUPPORT_SSE2)
10539static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10541 drflac_uint64 i;
10542 drflac_uint64 frameCount4 = frameCount >> 2;
10543 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10544 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10545 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10546 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10548 for (i = 0; i < frameCount4; ++i) {
10549 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10550 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10552 left = _mm_srai_epi32(left, 16);
10553 right = _mm_srai_epi32(right, 16);
10555 /* At this point we have results. We can now pack and interleave these into a single __m128i object and then store the in the output buffer. */
10556 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10557 }
10559 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10560 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10561 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10562 }
10564#endif
10566#if defined(DRFLAC_SUPPORT_NEON)
10567static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10569 drflac_uint64 i;
10570 drflac_uint64 frameCount4 = frameCount >> 2;
10571 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10572 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10573 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10574 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10576 int32x4_t shift0_4 = vdupq_n_s32(shift0);
10577 int32x4_t shift1_4 = vdupq_n_s32(shift1);
10579 for (i = 0; i < frameCount4; ++i) {
10580 int32x4_t left;
10581 int32x4_t right;
10583 left = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
10584 right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
10586 left = vshrq_n_s32(left, 16);
10587 right = vshrq_n_s32(right, 16);
10589 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10590 }
10592 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10593 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10594 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10595 }
10597#endif
10599static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10601#if defined(DRFLAC_SUPPORT_SSE2)
10602 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10603 drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10604 } else
10605#elif defined(DRFLAC_SUPPORT_NEON)
10606 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10607 drflac_read_pcm_frames_s16__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10608 } else
10609#endif
10610 {
10611 /* Scalar fallback. */
10612#if 0
10613 drflac_read_pcm_frames_s16__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10614#else
10615 drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10616#endif
10617 }
10620DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut)
10622 drflac_uint64 framesRead;
10623 drflac_uint32 unusedBitsPerSample;
10625 if (pFlac == NULL || framesToRead == 0) {
10626 return 0;
10627 }
10629 if (pBufferOut == NULL) {
10630 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
10631 }
10633 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
10634 unusedBitsPerSample = 32 - pFlac->bitsPerSample;
10636 framesRead = 0;
10637 while (framesToRead > 0) {
10638 /* If we've run out of samples in this frame, go to the next. */
10639 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
10640 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
10641 break; /* Couldn't read the next frame, so just break from the loop and return. */
10642 }
10643 } else {
10644 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
10645 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
10646 drflac_uint64 frameCountThisIteration = framesToRead;
10648 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
10649 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
10650 }
10652 if (channelCount == 2) {
10653 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
10654 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
10656 switch (pFlac->currentFLACFrame.header.channelAssignment)
10657 {
10658 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
10659 {
10660 drflac_read_pcm_frames_s16__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10661 } break;
10663 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
10664 {
10665 drflac_read_pcm_frames_s16__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10666 } break;
10668 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
10669 {
10670 drflac_read_pcm_frames_s16__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10671 } break;
10673 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
10674 default:
10675 {
10676 drflac_read_pcm_frames_s16__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10677 } break;
10678 }
10679 } else {
10680 /* Generic interleaving. */
10681 drflac_uint64 i;
10682 for (i = 0; i < frameCountThisIteration; ++i) {
10683 unsigned int j;
10684 for (j = 0; j < channelCount; ++j) {
10685 drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
10686 pBufferOut[(i*channelCount)+j] = (drflac_int16)(sampleS32 >> 16);
10687 }
10688 }
10689 }
10691 framesRead += frameCountThisIteration;
10692 pBufferOut += frameCountThisIteration * channelCount;
10693 framesToRead -= frameCountThisIteration;
10694 pFlac->currentPCMFrame += frameCountThisIteration;
10695 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
10696 }
10697 }
10699 return framesRead;
10703#if 0
10704static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10706 drflac_uint64 i;
10707 for (i = 0; i < frameCount; ++i) {
10708 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10709 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10710 drflac_uint32 right = left - side;
10712 pOutputSamples[i*2+0] = (float)((drflac_int32)left / 2147483648.0);
10713 pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
10714 }
10716#endif
10718static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10720 drflac_uint64 i;
10721 drflac_uint64 frameCount4 = frameCount >> 2;
10722 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10723 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10724 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10725 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10727 float factor = 1 / 2147483648.0;
10729 for (i = 0; i < frameCount4; ++i) {
10730 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
10731 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
10732 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
10733 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
10735 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
10736 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
10737 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
10738 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
10740 drflac_uint32 right0 = left0 - side0;
10741 drflac_uint32 right1 = left1 - side1;
10742 drflac_uint32 right2 = left2 - side2;
10743 drflac_uint32 right3 = left3 - side3;
10745 pOutputSamples[i*8+0] = (drflac_int32)left0 * factor;
10746 pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
10747 pOutputSamples[i*8+2] = (drflac_int32)left1 * factor;
10748 pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
10749 pOutputSamples[i*8+4] = (drflac_int32)left2 * factor;
10750 pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
10751 pOutputSamples[i*8+6] = (drflac_int32)left3 * factor;
10752 pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
10753 }
10755 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10756 drflac_uint32 left = pInputSamples0U32[i] << shift0;
10757 drflac_uint32 side = pInputSamples1U32[i] << shift1;
10758 drflac_uint32 right = left - side;
10760 pOutputSamples[i*2+0] = (drflac_int32)left * factor;
10761 pOutputSamples[i*2+1] = (drflac_int32)right * factor;
10762 }
10765#if defined(DRFLAC_SUPPORT_SSE2)
10766static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10768 drflac_uint64 i;
10769 drflac_uint64 frameCount4 = frameCount >> 2;
10770 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10771 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10772 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10773 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10774 __m128 factor;
10776 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10778 factor = _mm_set1_ps(1.0f / 8388608.0f);
10780 for (i = 0; i < frameCount4; ++i) {
10781 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10782 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10783 __m128i right = _mm_sub_epi32(left, side);
10784 __m128 leftf = _mm_mul_ps(_mm_cvtepi32_ps(left), factor);
10785 __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
10787 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10788 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10789 }
10791 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10792 drflac_uint32 left = pInputSamples0U32[i] << shift0;
10793 drflac_uint32 side = pInputSamples1U32[i] << shift1;
10794 drflac_uint32 right = left - side;
10796 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10797 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10798 }
10800#endif
10802#if defined(DRFLAC_SUPPORT_NEON)
10803static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10805 drflac_uint64 i;
10806 drflac_uint64 frameCount4 = frameCount >> 2;
10807 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10808 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10809 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10810 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10811 float32x4_t factor4;
10812 int32x4_t shift0_4;
10813 int32x4_t shift1_4;
10815 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10817 factor4 = vdupq_n_f32(1.0f / 8388608.0f);
10818 shift0_4 = vdupq_n_s32(shift0);
10819 shift1_4 = vdupq_n_s32(shift1);
10821 for (i = 0; i < frameCount4; ++i) {
10822 uint32x4_t left;
10823 uint32x4_t side;
10824 uint32x4_t right;
10825 float32x4_t leftf;
10826 float32x4_t rightf;
10828 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10829 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10830 right = vsubq_u32(left, side);
10831 leftf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)), factor4);
10832 rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
10834 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
10835 }
10837 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10838 drflac_uint32 left = pInputSamples0U32[i] << shift0;
10839 drflac_uint32 side = pInputSamples1U32[i] << shift1;
10840 drflac_uint32 right = left - side;
10842 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10843 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10844 }
10846#endif
10848static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10850#if defined(DRFLAC_SUPPORT_SSE2)
10851 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10852 drflac_read_pcm_frames_f32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10853 } else
10854#elif defined(DRFLAC_SUPPORT_NEON)
10855 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10856 drflac_read_pcm_frames_f32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10857 } else
10858#endif
10859 {
10860 /* Scalar fallback. */
10861#if 0
10862 drflac_read_pcm_frames_f32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10863#else
10864 drflac_read_pcm_frames_f32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10865#endif
10866 }
10870#if 0
10871static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10873 drflac_uint64 i;
10874 for (i = 0; i < frameCount; ++i) {
10875 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10876 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10877 drflac_uint32 left = right + side;
10879 pOutputSamples[i*2+0] = (float)((drflac_int32)left / 2147483648.0);
10880 pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
10881 }
10883#endif
10885static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10887 drflac_uint64 i;
10888 drflac_uint64 frameCount4 = frameCount >> 2;
10889 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10890 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10891 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10892 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10893 float factor = 1 / 2147483648.0;
10895 for (i = 0; i < frameCount4; ++i) {
10896 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
10897 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
10898 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
10899 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
10901 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
10902 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
10903 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
10904 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
10906 drflac_uint32 left0 = right0 + side0;
10907 drflac_uint32 left1 = right1 + side1;
10908 drflac_uint32 left2 = right2 + side2;
10909 drflac_uint32 left3 = right3 + side3;
10911 pOutputSamples[i*8+0] = (drflac_int32)left0 * factor;
10912 pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
10913 pOutputSamples[i*8+2] = (drflac_int32)left1 * factor;
10914 pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
10915 pOutputSamples[i*8+4] = (drflac_int32)left2 * factor;
10916 pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
10917 pOutputSamples[i*8+6] = (drflac_int32)left3 * factor;
10918 pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
10919 }
10921 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10922 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10923 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10924 drflac_uint32 left = right + side;
10926 pOutputSamples[i*2+0] = (drflac_int32)left * factor;
10927 pOutputSamples[i*2+1] = (drflac_int32)right * factor;
10928 }
10931#if defined(DRFLAC_SUPPORT_SSE2)
10932static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10934 drflac_uint64 i;
10935 drflac_uint64 frameCount4 = frameCount >> 2;
10936 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10937 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10938 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10939 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10940 __m128 factor;
10942 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10944 factor = _mm_set1_ps(1.0f / 8388608.0f);
10946 for (i = 0; i < frameCount4; ++i) {
10947 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10948 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10949 __m128i left = _mm_add_epi32(right, side);
10950 __m128 leftf = _mm_mul_ps(_mm_cvtepi32_ps(left), factor);
10951 __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
10953 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10954 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10955 }
10957 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10958 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10959 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10960 drflac_uint32 left = right + side;
10962 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10963 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10964 }
10966#endif
10968#if defined(DRFLAC_SUPPORT_NEON)
10969static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10971 drflac_uint64 i;
10972 drflac_uint64 frameCount4 = frameCount >> 2;
10973 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10974 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10975 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10976 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10977 float32x4_t factor4;
10978 int32x4_t shift0_4;
10979 int32x4_t shift1_4;
10981 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10983 factor4 = vdupq_n_f32(1.0f / 8388608.0f);
10984 shift0_4 = vdupq_n_s32(shift0);
10985 shift1_4 = vdupq_n_s32(shift1);
10987 for (i = 0; i < frameCount4; ++i) {
10988 uint32x4_t side;
10989 uint32x4_t right;
10990 uint32x4_t left;
10991 float32x4_t leftf;
10992 float32x4_t rightf;
10994 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10995 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10996 left = vaddq_u32(right, side);
10997 leftf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)), factor4);
10998 rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
11000 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11001 }
11003 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11004 drflac_uint32 side = pInputSamples0U32[i] << shift0;
11005 drflac_uint32 right = pInputSamples1U32[i] << shift1;
11006 drflac_uint32 left = right + side;
11008 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
11009 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
11010 }
11012#endif
11014static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11016#if defined(DRFLAC_SUPPORT_SSE2)
11017 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11018 drflac_read_pcm_frames_f32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11019 } else
11020#elif defined(DRFLAC_SUPPORT_NEON)
11021 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11022 drflac_read_pcm_frames_f32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11023 } else
11024#endif
11025 {
11026 /* Scalar fallback. */
11027#if 0
11028 drflac_read_pcm_frames_f32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11029#else
11030 drflac_read_pcm_frames_f32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11031#endif
11032 }
11036#if 0
11037static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11039 for (drflac_uint64 i = 0; i < frameCount; ++i) {
11040 drflac_uint32 mid = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11041 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11043 mid = (mid << 1) | (side & 0x01);
11045 pOutputSamples[i*2+0] = (float)((((drflac_int32)(mid + side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
11046 pOutputSamples[i*2+1] = (float)((((drflac_int32)(mid - side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
11047 }
11049#endif
11051static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11053 drflac_uint64 i;
11054 drflac_uint64 frameCount4 = frameCount >> 2;
11055 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11056 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11057 drflac_uint32 shift = unusedBitsPerSample;
11058 float factor = 1 / 2147483648.0;
11060 if (shift > 0) {
11061 shift -= 1;
11062 for (i = 0; i < frameCount4; ++i) {
11063 drflac_uint32 temp0L;
11064 drflac_uint32 temp1L;
11065 drflac_uint32 temp2L;
11066 drflac_uint32 temp3L;
11067 drflac_uint32 temp0R;
11068 drflac_uint32 temp1R;
11069 drflac_uint32 temp2R;
11070 drflac_uint32 temp3R;
11072 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11073 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11074 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11075 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11077 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11078 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11079 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11080 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11082 mid0 = (mid0 << 1) | (side0 & 0x01);
11083 mid1 = (mid1 << 1) | (side1 & 0x01);
11084 mid2 = (mid2 << 1) | (side2 & 0x01);
11085 mid3 = (mid3 << 1) | (side3 & 0x01);
11087 temp0L = (mid0 + side0) << shift;
11088 temp1L = (mid1 + side1) << shift;
11089 temp2L = (mid2 + side2) << shift;
11090 temp3L = (mid3 + side3) << shift;
11092 temp0R = (mid0 - side0) << shift;
11093 temp1R = (mid1 - side1) << shift;
11094 temp2R = (mid2 - side2) << shift;
11095 temp3R = (mid3 - side3) << shift;
11097 pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
11098 pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
11099 pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
11100 pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
11101 pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
11102 pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
11103 pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
11104 pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
11105 }
11106 } else {
11107 for (i = 0; i < frameCount4; ++i) {
11108 drflac_uint32 temp0L;
11109 drflac_uint32 temp1L;
11110 drflac_uint32 temp2L;
11111 drflac_uint32 temp3L;
11112 drflac_uint32 temp0R;
11113 drflac_uint32 temp1R;
11114 drflac_uint32 temp2R;
11115 drflac_uint32 temp3R;
11117 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11118 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11119 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11120 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11122 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11123 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11124 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11125 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11127 mid0 = (mid0 << 1) | (side0 & 0x01);
11128 mid1 = (mid1 << 1) | (side1 & 0x01);
11129 mid2 = (mid2 << 1) | (side2 & 0x01);
11130 mid3 = (mid3 << 1) | (side3 & 0x01);
11132 temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
11133 temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
11134 temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
11135 temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
11137 temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
11138 temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
11139 temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
11140 temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
11142 pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
11143 pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
11144 pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
11145 pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
11146 pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
11147 pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
11148 pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
11149 pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
11150 }
11151 }
11153 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11154 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11155 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11157 mid = (mid << 1) | (side & 0x01);
11159 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) * factor;
11160 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) * factor;
11161 }
11164#if defined(DRFLAC_SUPPORT_SSE2)
11165static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11167 drflac_uint64 i;
11168 drflac_uint64 frameCount4 = frameCount >> 2;
11169 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11170 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11171 drflac_uint32 shift = unusedBitsPerSample - 8;
11172 float factor;
11173 __m128 factor128;
11175 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
11177 factor = 1.0f / 8388608.0f;
11178 factor128 = _mm_set1_ps(factor);
11180 if (shift == 0) {
11181 for (i = 0; i < frameCount4; ++i) {
11182 __m128i mid;
11183 __m128i side;
11184 __m128i tempL;
11185 __m128i tempR;
11186 __m128 leftf;
11187 __m128 rightf;
11189 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
11190 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
11192 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
11194 tempL = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
11195 tempR = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
11197 leftf = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
11198 rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
11200 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11201 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11202 }
11204 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11205 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11206 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11208 mid = (mid << 1) | (side & 0x01);
11210 pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
11211 pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
11212 }
11213 } else {
11214 shift -= 1;
11215 for (i = 0; i < frameCount4; ++i) {
11216 __m128i mid;
11217 __m128i side;
11218 __m128i tempL;
11219 __m128i tempR;
11220 __m128 leftf;
11221 __m128 rightf;
11223 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
11224 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
11226 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
11228 tempL = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
11229 tempR = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
11231 leftf = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
11232 rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
11234 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11235 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11236 }
11238 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11239 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11240 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11242 mid = (mid << 1) | (side & 0x01);
11244 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
11245 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
11246 }
11247 }
11249#endif
11251#if defined(DRFLAC_SUPPORT_NEON)
11252static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11254 drflac_uint64 i;
11255 drflac_uint64 frameCount4 = frameCount >> 2;
11256 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11257 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11258 drflac_uint32 shift = unusedBitsPerSample - 8;
11259 float factor;
11260 float32x4_t factor4;
11261 int32x4_t shift4;
11262 int32x4_t wbps0_4; /* Wasted Bits Per Sample */
11263 int32x4_t wbps1_4; /* Wasted Bits Per Sample */
11265 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
11267 factor = 1.0f / 8388608.0f;
11268 factor4 = vdupq_n_f32(factor);
11269 wbps0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
11270 wbps1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
11272 if (shift == 0) {
11273 for (i = 0; i < frameCount4; ++i) {
11274 int32x4_t lefti;
11275 int32x4_t righti;
11276 float32x4_t leftf;
11277 float32x4_t rightf;
11279 uint32x4_t mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
11280 uint32x4_t side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
11282 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
11284 lefti = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
11285 righti = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
11287 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11288 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11290 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11291 }
11293 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11294 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11295 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11297 mid = (mid << 1) | (side & 0x01);
11299 pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
11300 pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
11301 }
11302 } else {
11303 shift -= 1;
11304 shift4 = vdupq_n_s32(shift);
11305 for (i = 0; i < frameCount4; ++i) {
11306 uint32x4_t mid;
11307 uint32x4_t side;
11308 int32x4_t lefti;
11309 int32x4_t righti;
11310 float32x4_t leftf;
11311 float32x4_t rightf;
11313 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
11314 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
11316 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
11318 lefti = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
11319 righti = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
11321 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11322 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11324 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11325 }
11327 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11328 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11329 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11331 mid = (mid << 1) | (side & 0x01);
11333 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
11334 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
11335 }
11336 }
11338#endif
11340static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11342#if defined(DRFLAC_SUPPORT_SSE2)
11343 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11344 drflac_read_pcm_frames_f32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11345 } else
11346#elif defined(DRFLAC_SUPPORT_NEON)
11347 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11348 drflac_read_pcm_frames_f32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11349 } else
11350#endif
11351 {
11352 /* Scalar fallback. */
11353#if 0
11354 drflac_read_pcm_frames_f32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11355#else
11356 drflac_read_pcm_frames_f32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11357#endif
11358 }
11361#if 0
11362static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11364 for (drflac_uint64 i = 0; i < frameCount; ++i) {
11365 pOutputSamples[i*2+0] = (float)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) / 2147483648.0);
11366 pOutputSamples[i*2+1] = (float)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) / 2147483648.0);
11367 }
11369#endif
11371static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11373 drflac_uint64 i;
11374 drflac_uint64 frameCount4 = frameCount >> 2;
11375 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11376 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11377 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11378 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11379 float factor = 1 / 2147483648.0;
11381 for (i = 0; i < frameCount4; ++i) {
11382 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
11383 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
11384 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
11385 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
11387 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
11388 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
11389 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
11390 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
11392 pOutputSamples[i*8+0] = (drflac_int32)tempL0 * factor;
11393 pOutputSamples[i*8+1] = (drflac_int32)tempR0 * factor;
11394 pOutputSamples[i*8+2] = (drflac_int32)tempL1 * factor;
11395 pOutputSamples[i*8+3] = (drflac_int32)tempR1 * factor;
11396 pOutputSamples[i*8+4] = (drflac_int32)tempL2 * factor;
11397 pOutputSamples[i*8+5] = (drflac_int32)tempR2 * factor;
11398 pOutputSamples[i*8+6] = (drflac_int32)tempL3 * factor;
11399 pOutputSamples[i*8+7] = (drflac_int32)tempR3 * factor;
11400 }
11402 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11403 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11404 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11405 }
11408#if defined(DRFLAC_SUPPORT_SSE2)
11409static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11411 drflac_uint64 i;
11412 drflac_uint64 frameCount4 = frameCount >> 2;
11413 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11414 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11415 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
11416 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
11418 float factor = 1.0f / 8388608.0f;
11419 __m128 factor128 = _mm_set1_ps(factor);
11421 for (i = 0; i < frameCount4; ++i) {
11422 __m128i lefti;
11423 __m128i righti;
11424 __m128 leftf;
11425 __m128 rightf;
11427 lefti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
11428 righti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
11430 leftf = _mm_mul_ps(_mm_cvtepi32_ps(lefti), factor128);
11431 rightf = _mm_mul_ps(_mm_cvtepi32_ps(righti), factor128);
11433 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11434 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11435 }
11437 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11438 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11439 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11440 }
11442#endif
11444#if defined(DRFLAC_SUPPORT_NEON)
11445static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11447 drflac_uint64 i;
11448 drflac_uint64 frameCount4 = frameCount >> 2;
11449 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11450 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11451 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
11452 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
11454 float factor = 1.0f / 8388608.0f;
11455 float32x4_t factor4 = vdupq_n_f32(factor);
11456 int32x4_t shift0_4 = vdupq_n_s32(shift0);
11457 int32x4_t shift1_4 = vdupq_n_s32(shift1);
11459 for (i = 0; i < frameCount4; ++i) {
11460 int32x4_t lefti;
11461 int32x4_t righti;
11462 float32x4_t leftf;
11463 float32x4_t rightf;
11465 lefti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
11466 righti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
11468 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11469 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11471 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11472 }
11474 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11475 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11476 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11477 }
11479#endif
11481static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11483#if defined(DRFLAC_SUPPORT_SSE2)
11484 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11485 drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11486 } else
11487#elif defined(DRFLAC_SUPPORT_NEON)
11488 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11489 drflac_read_pcm_frames_f32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11490 } else
11491#endif
11492 {
11493 /* Scalar fallback. */
11494#if 0
11495 drflac_read_pcm_frames_f32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11496#else
11497 drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11498#endif
11499 }
11502DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut)
11504 drflac_uint64 framesRead;
11505 drflac_uint32 unusedBitsPerSample;
11507 if (pFlac == NULL || framesToRead == 0) {
11508 return 0;
11509 }
11511 if (pBufferOut == NULL) {
11512 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
11513 }
11515 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
11516 unusedBitsPerSample = 32 - pFlac->bitsPerSample;
11518 framesRead = 0;
11519 while (framesToRead > 0) {
11520 /* If we've run out of samples in this frame, go to the next. */
11521 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
11522 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
11523 break; /* Couldn't read the next frame, so just break from the loop and return. */
11524 }
11525 } else {
11526 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
11527 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
11528 drflac_uint64 frameCountThisIteration = framesToRead;
11530 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
11531 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
11532 }
11534 if (channelCount == 2) {
11535 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
11536 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
11538 switch (pFlac->currentFLACFrame.header.channelAssignment)
11539 {
11540 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
11541 {
11542 drflac_read_pcm_frames_f32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11543 } break;
11545 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
11546 {
11547 drflac_read_pcm_frames_f32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11548 } break;
11550 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
11551 {
11552 drflac_read_pcm_frames_f32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11553 } break;
11555 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
11556 default:
11557 {
11558 drflac_read_pcm_frames_f32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11559 } break;
11560 }
11561 } else {
11562 /* Generic interleaving. */
11563 drflac_uint64 i;
11564 for (i = 0; i < frameCountThisIteration; ++i) {
11565 unsigned int j;
11566 for (j = 0; j < channelCount; ++j) {
11567 drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
11568 pBufferOut[(i*channelCount)+j] = (float)(sampleS32 / 2147483648.0);
11569 }
11570 }
11571 }
11573 framesRead += frameCountThisIteration;
11574 pBufferOut += frameCountThisIteration * channelCount;
11575 framesToRead -= frameCountThisIteration;
11576 pFlac->currentPCMFrame += frameCountThisIteration;
11577 pFlac->currentFLACFrame.pcmFramesRemaining -= (unsigned int)frameCountThisIteration;
11578 }
11579 }
11581 return framesRead;
11585DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
11587 if (pFlac == NULL) {
11588 return DRFLAC_FALSE;
11589 }
11591 /* Don't do anything if we're already on the seek point. */
11592 if (pFlac->currentPCMFrame == pcmFrameIndex) {
11593 return DRFLAC_TRUE;
11594 }
11596 /*
11597 If we don't know where the first frame begins then we can't seek. This will happen when the STREAMINFO block was not present
11598 when the decoder was opened.
11599 */
11600 if (pFlac->firstFLACFramePosInBytes == 0) {
11601 return DRFLAC_FALSE;
11602 }
11604 if (pcmFrameIndex == 0) {
11605 pFlac->currentPCMFrame = 0;
11606 return drflac__seek_to_first_frame(pFlac);
11607 } else {
11608 drflac_bool32 wasSuccessful = DRFLAC_FALSE;
11609 drflac_uint64 originalPCMFrame = pFlac->currentPCMFrame;
11611 /* Clamp the sample to the end. */
11612 if (pcmFrameIndex > pFlac->totalPCMFrameCount) {
11613 pcmFrameIndex = pFlac->totalPCMFrameCount;
11614 }
11616 /* If the target sample and the current sample are in the same frame we just move the position forward. */
11617 if (pcmFrameIndex > pFlac->currentPCMFrame) {
11618 /* Forward. */
11619 drflac_uint32 offset = (drflac_uint32)(pcmFrameIndex - pFlac->currentPCMFrame);
11620 if (pFlac->currentFLACFrame.pcmFramesRemaining > offset) {
11621 pFlac->currentFLACFrame.pcmFramesRemaining -= offset;
11622 pFlac->currentPCMFrame = pcmFrameIndex;
11623 return DRFLAC_TRUE;
11624 }
11625 } else {
11626 /* Backward. */
11627 drflac_uint32 offsetAbs = (drflac_uint32)(pFlac->currentPCMFrame - pcmFrameIndex);
11628 drflac_uint32 currentFLACFramePCMFrameCount = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
11629 drflac_uint32 currentFLACFramePCMFramesConsumed = currentFLACFramePCMFrameCount - pFlac->currentFLACFrame.pcmFramesRemaining;
11630 if (currentFLACFramePCMFramesConsumed > offsetAbs) {
11631 pFlac->currentFLACFrame.pcmFramesRemaining += offsetAbs;
11632 pFlac->currentPCMFrame = pcmFrameIndex;
11633 return DRFLAC_TRUE;
11634 }
11635 }
11637 /*
11638 Different techniques depending on encapsulation. Using the native FLAC seektable with Ogg encapsulation is a bit awkward so
11639 we'll instead use Ogg's natural seeking facility.
11640 */
11641#ifndef DR_FLAC_NO_OGG
11642 if (pFlac->container == drflac_container_ogg)
11643 {
11644 wasSuccessful = drflac_ogg__seek_to_pcm_frame(pFlac, pcmFrameIndex);
11645 }
11646 else
11647#endif
11648 {
11649 /* First try seeking via the seek table. If this fails, fall back to a brute force seek which is much slower. */
11650 if (/*!wasSuccessful && */!pFlac->_noSeekTableSeek) {
11651 wasSuccessful = drflac__seek_to_pcm_frame__seek_table(pFlac, pcmFrameIndex);
11652 }
11654#if !defined(DR_FLAC_NO_CRC)
11655 /* Fall back to binary search if seek table seeking fails. This requires the length of the stream to be known. */
11656 if (!wasSuccessful && !pFlac->_noBinarySearchSeek && pFlac->totalPCMFrameCount > 0) {
11657 wasSuccessful = drflac__seek_to_pcm_frame__binary_search(pFlac, pcmFrameIndex);
11658 }
11659#endif
11661 /* Fall back to brute force if all else fails. */
11662 if (!wasSuccessful && !pFlac->_noBruteForceSeek) {
11663 wasSuccessful = drflac__seek_to_pcm_frame__brute_force(pFlac, pcmFrameIndex);
11664 }
11665 }
11667 if (wasSuccessful) {
11668 pFlac->currentPCMFrame = pcmFrameIndex;
11669 } else {
11670 /* Seek failed. Try putting the decoder back to it's original state. */
11671 if (drflac_seek_to_pcm_frame(pFlac, originalPCMFrame) == DRFLAC_FALSE) {
11672 /* Failed to seek back to the original PCM frame. Fall back to 0. */
11673 drflac_seek_to_pcm_frame(pFlac, 0);
11674 }
11675 }
11677 return wasSuccessful;
11678 }
11683/* High Level APIs */
11685/* SIZE_MAX */
11686#if defined(SIZE_MAX)
11687 #define DRFLAC_SIZE_MAX SIZE_MAX
11688#else
11689 #if defined(DRFLAC_64BIT)
11690 #define DRFLAC_SIZE_MAX ((drflac_uint64)0xFFFFFFFFFFFFFFFF)
11691 #else
11692 #define DRFLAC_SIZE_MAX 0xFFFFFFFF
11693 #endif
11694#endif
11695/* End SIZE_MAX */
11698/* Using a macro as the definition of the drflac__full_decode_and_close_*() API family. Sue me. */
11699#define DRFLAC_DEFINE_FULL_READ_AND_CLOSE(extension, type) \
11700static type* drflac__full_read_and_close_ ## extension (drflac* pFlac, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut)\
11701{ \
11702 type* pSampleData = NULL; \
11703 drflac_uint64 totalPCMFrameCount; \
11704 \
11705 DRFLAC_ASSERT(pFlac != NULL); \
11706 \
11707 totalPCMFrameCount = pFlac->totalPCMFrameCount; \
11708 \
11709 if (totalPCMFrameCount == 0) { \
11710 type buffer[4096]; \
11711 drflac_uint64 pcmFramesRead; \
11712 size_t sampleDataBufferSize = sizeof(buffer); \
11713 \
11714 pSampleData = (type*)drflac__malloc_from_callbacks(sampleDataBufferSize, &pFlac->allocationCallbacks); \
11715 if (pSampleData == NULL) { \
11716 goto on_error; \
11717 } \
11718 \
11719 while ((pcmFramesRead = (drflac_uint64)drflac_read_pcm_frames_##extension(pFlac, sizeof(buffer)/sizeof(buffer[0])/pFlac->channels, buffer)) > 0) { \
11720 if (((totalPCMFrameCount + pcmFramesRead) * pFlac->channels * sizeof(type)) > sampleDataBufferSize) { \
11721 type* pNewSampleData; \
11722 size_t newSampleDataBufferSize; \
11723 \
11724 newSampleDataBufferSize = sampleDataBufferSize * 2; \
11725 pNewSampleData = (type*)drflac__realloc_from_callbacks(pSampleData, newSampleDataBufferSize, sampleDataBufferSize, &pFlac->allocationCallbacks); \
11726 if (pNewSampleData == NULL) { \
11727 drflac__free_from_callbacks(pSampleData, &pFlac->allocationCallbacks); \
11728 goto on_error; \
11729 } \
11730 \
11731 sampleDataBufferSize = newSampleDataBufferSize; \
11732 pSampleData = pNewSampleData; \
11733 } \
11734 \
11735 DRFLAC_COPY_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), buffer, (size_t)(pcmFramesRead*pFlac->channels*sizeof(type))); \
11736 totalPCMFrameCount += pcmFramesRead; \
11737 } \
11738 \
11739 /* At this point everything should be decoded, but we just want to fill the unused part buffer with silence - need to \
11740 protect those ears from random noise! */ \
11741 DRFLAC_ZERO_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), (size_t)(sampleDataBufferSize - totalPCMFrameCount*pFlac->channels*sizeof(type))); \
11742 } else { \
11743 drflac_uint64 dataSize = totalPCMFrameCount*pFlac->channels*sizeof(type); \
11744 if (dataSize > (drflac_uint64)DRFLAC_SIZE_MAX) { \
11745 goto on_error; /* The decoded data is too big. */ \
11746 } \
11747 \
11748 pSampleData = (type*)drflac__malloc_from_callbacks((size_t)dataSize, &pFlac->allocationCallbacks); /* <-- Safe cast as per the check above. */ \
11749 if (pSampleData == NULL) { \
11750 goto on_error; \
11751 } \
11752 \
11753 totalPCMFrameCount = drflac_read_pcm_frames_##extension(pFlac, pFlac->totalPCMFrameCount, pSampleData); \
11754 } \
11755 \
11756 if (sampleRateOut) *sampleRateOut = pFlac->sampleRate; \
11757 if (channelsOut) *channelsOut = pFlac->channels; \
11758 if (totalPCMFrameCountOut) *totalPCMFrameCountOut = totalPCMFrameCount; \
11759 \
11760 drflac_close(pFlac); \
11761 return pSampleData; \
11762 \
11763on_error: \
11764 drflac_close(pFlac); \
11765 return NULL; \
11768DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s32, drflac_int32)
11769DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s16, drflac_int16)
11770DRFLAC_DEFINE_FULL_READ_AND_CLOSE(f32, float)
11772DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11774 drflac* pFlac;
11776 if (channelsOut) {
11777 *channelsOut = 0;
11778 }
11779 if (sampleRateOut) {
11780 *sampleRateOut = 0;
11781 }
11782 if (totalPCMFrameCountOut) {
11783 *totalPCMFrameCountOut = 0;
11784 }
11786 pFlac = drflac_open(onRead, onSeek, onTell, pUserData, pAllocationCallbacks);
11787 if (pFlac == NULL) {
11788 return NULL;
11789 }
11791 return drflac__full_read_and_close_s32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11794DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11796 drflac* pFlac;
11798 if (channelsOut) {
11799 *channelsOut = 0;
11800 }
11801 if (sampleRateOut) {
11802 *sampleRateOut = 0;
11803 }
11804 if (totalPCMFrameCountOut) {
11805 *totalPCMFrameCountOut = 0;
11806 }
11808 pFlac = drflac_open(onRead, onSeek, onTell, pUserData, pAllocationCallbacks);
11809 if (pFlac == NULL) {
11810 return NULL;
11811 }
11813 return drflac__full_read_and_close_s16(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11816DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11818 drflac* pFlac;
11820 if (channelsOut) {
11821 *channelsOut = 0;
11822 }
11823 if (sampleRateOut) {
11824 *sampleRateOut = 0;
11825 }
11826 if (totalPCMFrameCountOut) {
11827 *totalPCMFrameCountOut = 0;
11828 }
11830 pFlac = drflac_open(onRead, onSeek, onTell, pUserData, pAllocationCallbacks);
11831 if (pFlac == NULL) {
11832 return NULL;
11833 }
11835 return drflac__full_read_and_close_f32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11838#ifndef DR_FLAC_NO_STDIO
11839DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11841 drflac* pFlac;
11843 if (sampleRate) {
11844 *sampleRate = 0;
11845 }
11846 if (channels) {
11847 *channels = 0;
11848 }
11849 if (totalPCMFrameCount) {
11850 *totalPCMFrameCount = 0;
11851 }
11853 pFlac = drflac_open_file(filename, pAllocationCallbacks);
11854 if (pFlac == NULL) {
11855 return NULL;
11856 }
11858 return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
11861DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11863 drflac* pFlac;
11865 if (sampleRate) {
11866 *sampleRate = 0;
11867 }
11868 if (channels) {
11869 *channels = 0;
11870 }
11871 if (totalPCMFrameCount) {
11872 *totalPCMFrameCount = 0;
11873 }
11875 pFlac = drflac_open_file(filename, pAllocationCallbacks);
11876 if (pFlac == NULL) {
11877 return NULL;
11878 }
11880 return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
11883DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11885 drflac* pFlac;
11887 if (sampleRate) {
11888 *sampleRate = 0;
11889 }
11890 if (channels) {
11891 *channels = 0;
11892 }
11893 if (totalPCMFrameCount) {
11894 *totalPCMFrameCount = 0;
11895 }
11897 pFlac = drflac_open_file(filename, pAllocationCallbacks);
11898 if (pFlac == NULL) {
11899 return NULL;
11900 }
11902 return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
11904#endif
11906DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11908 drflac* pFlac;
11910 if (sampleRate) {
11911 *sampleRate = 0;
11912 }
11913 if (channels) {
11914 *channels = 0;
11915 }
11916 if (totalPCMFrameCount) {
11917 *totalPCMFrameCount = 0;
11918 }
11920 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11921 if (pFlac == NULL) {
11922 return NULL;
11923 }
11925 return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
11928DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11930 drflac* pFlac;
11932 if (sampleRate) {
11933 *sampleRate = 0;
11934 }
11935 if (channels) {
11936 *channels = 0;
11937 }
11938 if (totalPCMFrameCount) {
11939 *totalPCMFrameCount = 0;
11940 }
11942 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11943 if (pFlac == NULL) {
11944 return NULL;
11945 }
11947 return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
11950DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11952 drflac* pFlac;
11954 if (sampleRate) {
11955 *sampleRate = 0;
11956 }
11957 if (channels) {
11958 *channels = 0;
11959 }
11960 if (totalPCMFrameCount) {
11961 *totalPCMFrameCount = 0;
11962 }
11964 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11965 if (pFlac == NULL) {
11966 return NULL;
11967 }
11969 return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
11973DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
11975 if (pAllocationCallbacks != NULL) {
11976 drflac__free_from_callbacks(p, pAllocationCallbacks);
11977 } else {
11978 drflac__free_default(p, NULL);
11979 }
11985DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments)
11987 if (pIter == NULL) {
11988 return;
11989 }
11991 pIter->countRemaining = commentCount;
11992 pIter->pRunningData = (const char*)pComments;
11995DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut)
11997 drflac_int32 length;
11998 const char* pComment;
12000 /* Safety. */
12001 if (pCommentLengthOut) {
12002 *pCommentLengthOut = 0;
12003 }
12005 if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
12006 return NULL;
12007 }
12009 length = drflac__le2host_32_ptr_unaligned(pIter->pRunningData);
12010 pIter->pRunningData += 4;
12012 pComment = pIter->pRunningData;
12013 pIter->pRunningData += length;
12014 pIter->countRemaining -= 1;
12016 if (pCommentLengthOut) {
12017 *pCommentLengthOut = length;
12018 }
12020 return pComment;
12026DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData)
12028 if (pIter == NULL) {
12029 return;
12030 }
12032 pIter->countRemaining = trackCount;
12033 pIter->pRunningData = (const char*)pTrackData;
12036DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack)
12038 drflac_cuesheet_track cuesheetTrack;
12039 const char* pRunningData;
12040 drflac_uint64 offsetHi;
12041 drflac_uint64 offsetLo;
12043 if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
12044 return DRFLAC_FALSE;
12045 }
12047 pRunningData = pIter->pRunningData;
12049 offsetHi = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
12050 offsetLo = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
12051 cuesheetTrack.offset = offsetLo | (offsetHi << 32);
12052 cuesheetTrack.trackNumber = pRunningData[0]; pRunningData += 1;
12053 DRFLAC_COPY_MEMORY(cuesheetTrack.ISRC, pRunningData, sizeof(cuesheetTrack.ISRC)); pRunningData += 12;
12054 cuesheetTrack.isAudio = (pRunningData[0] & 0x80) != 0;
12055 cuesheetTrack.preEmphasis = (pRunningData[0] & 0x40) != 0; pRunningData += 14;
12056 cuesheetTrack.indexCount = pRunningData[0]; pRunningData += 1;
12057 cuesheetTrack.pIndexPoints = (const drflac_cuesheet_track_index*)pRunningData; pRunningData += cuesheetTrack.indexCount * sizeof(drflac_cuesheet_track_index);
12059 pIter->pRunningData = pRunningData;
12060 pIter->countRemaining -= 1;
12062 if (pCuesheetTrack) {
12063 *pCuesheetTrack = cuesheetTrack;
12064 }
12066 return DRFLAC_TRUE;
12069#if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
12070 #pragma GCC diagnostic pop
12071#endif
12072#endif /* dr_flac_c */
12073#endif /* DR_FLAC_IMPLEMENTATION */
12076/*
12077REVISION HISTORY
12078================
12079v0.13.0 - TBD
12080 - API CHANGE: Seek origin enums have been renamed to match the naming convention used by other dr_libs libraries:
12081 - drflac_seek_origin_start -> DRFLAC_SEEK_SET
12082 - drflac_seek_origin_current -> DRFLAC_SEEK_CUR
12083 - DRFLAC_SEEK_END (new)
12084 - API CHANGE: A new seek origin has been added to allow seeking from the end of the file. If you implement your own `onSeek` callback, you should now detect and handle `DRFLAC_SEEK_END`. If seeking to the end is not supported, return `DRFLAC_FALSE`. If you only use `*_open_file()` or `*_open_memory()`, you need not change anything.
12085 - API CHANGE: An `onTell` callback has been added to the following functions:
12086 - drflac_open()
12087 - drflac_open_relaxed()
12088 - drflac_open_with_metadata()
12089 - drflac_open_with_metadata_relaxed()
12090 - drflac_open_and_read_pcm_frames_s32()
12091 - drflac_open_and_read_pcm_frames_s16()
12092 - drflac_open_and_read_pcm_frames_f32()
12093 - Fix compilation for AIX OS.
12095v0.12.43 - 2024-12-17
12096 - Fix a possible buffer overflow during decoding.
12097 - Improve detection of ARM64EC
12099v0.12.42 - 2023-11-02
12100 - Fix build for ARMv6-M.
12101 - Fix a compilation warning with GCC.
12103v0.12.41 - 2023-06-17
12104 - Fix an incorrect date in revision history. No functional change.
12106v0.12.40 - 2023-05-22
12107 - Minor code restructure. No functional change.
12109v0.12.39 - 2022-09-17
12110 - Fix compilation with DJGPP.
12111 - Fix compilation error with Visual Studio 2019 and the ARM build.
12112 - Fix an error with SSE 4.1 detection.
12113 - Add support for disabling wchar_t with DR_WAV_NO_WCHAR.
12114 - Improve compatibility with compilers which lack support for explicit struct packing.
12115 - Improve compatibility with low-end and embedded hardware by reducing the amount of stack
12116 allocation when loading an Ogg encapsulated file.
12118v0.12.38 - 2022-04-10
12119 - Fix compilation error on older versions of GCC.
12121v0.12.37 - 2022-02-12
12122 - Improve ARM detection.
12124v0.12.36 - 2022-02-07
12125 - Fix a compilation error with the ARM build.
12127v0.12.35 - 2022-02-06
12128 - Fix a bug due to underestimating the amount of precision required for the prediction stage.
12129 - Fix some bugs found from fuzz testing.
12131v0.12.34 - 2022-01-07
12132 - Fix some misalignment bugs when reading metadata.
12134v0.12.33 - 2021-12-22
12135 - Fix a bug with seeking when the seek table does not start at PCM frame 0.
12137v0.12.32 - 2021-12-11
12138 - Fix a warning with Clang.
12140v0.12.31 - 2021-08-16
12141 - Silence some warnings.
12143v0.12.30 - 2021-07-31
12144 - Fix platform detection for ARM64.
12146v0.12.29 - 2021-04-02
12147 - Fix a bug where the running PCM frame index is set to an invalid value when over-seeking.
12148 - Fix a decoding error due to an incorrect validation check.
12150v0.12.28 - 2021-02-21
12151 - Fix a warning due to referencing _MSC_VER when it is undefined.
12153v0.12.27 - 2021-01-31
12154 - Fix a static analysis warning.
12156v0.12.26 - 2021-01-17
12157 - Fix a compilation warning due to _BSD_SOURCE being deprecated.
12159v0.12.25 - 2020-12-26
12160 - Update documentation.
12162v0.12.24 - 2020-11-29
12163 - Fix ARM64/NEON detection when compiling with MSVC.
12165v0.12.23 - 2020-11-21
12166 - Fix compilation with OpenWatcom.
12168v0.12.22 - 2020-11-01
12169 - Fix an error with the previous release.
12171v0.12.21 - 2020-11-01
12172 - Fix a possible deadlock when seeking.
12173 - Improve compiler support for older versions of GCC.
12175v0.12.20 - 2020-09-08
12176 - Fix a compilation error on older compilers.
12178v0.12.19 - 2020-08-30
12179 - Fix a bug due to an undefined 32-bit shift.
12181v0.12.18 - 2020-08-14
12182 - Fix a crash when compiling with clang-cl.
12184v0.12.17 - 2020-08-02
12185 - Simplify sized types.
12187v0.12.16 - 2020-07-25
12188 - Fix a compilation warning.
12190v0.12.15 - 2020-07-06
12191 - Check for negative LPC shifts and return an error.
12193v0.12.14 - 2020-06-23
12194 - Add include guard for the implementation section.
12196v0.12.13 - 2020-05-16
12197 - Add compile-time and run-time version querying.
12198 - DRFLAC_VERSION_MINOR
12199 - DRFLAC_VERSION_MAJOR
12200 - DRFLAC_VERSION_REVISION
12201 - DRFLAC_VERSION_STRING
12202 - drflac_version()
12203 - drflac_version_string()
12205v0.12.12 - 2020-04-30
12206 - Fix compilation errors with VC6.
12208v0.12.11 - 2020-04-19
12209 - Fix some pedantic warnings.
12210 - Fix some undefined behaviour warnings.
12212v0.12.10 - 2020-04-10
12213 - Fix some bugs when trying to seek with an invalid seek table.
12215v0.12.9 - 2020-04-05
12216 - Fix warnings.
12218v0.12.8 - 2020-04-04
12219 - Add drflac_open_file_w() and drflac_open_file_with_metadata_w().
12220 - Fix some static analysis warnings.
12221 - Minor documentation updates.
12223v0.12.7 - 2020-03-14
12224 - Fix compilation errors with VC6.
12226v0.12.6 - 2020-03-07
12227 - Fix compilation error with Visual Studio .NET 2003.
12229v0.12.5 - 2020-01-30
12230 - Silence some static analysis warnings.
12232v0.12.4 - 2020-01-29
12233 - Silence some static analysis warnings.
12235v0.12.3 - 2019-12-02
12236 - Fix some warnings when compiling with GCC and the -Og flag.
12237 - Fix a crash in out-of-memory situations.
12238 - Fix potential integer overflow bug.
12239 - Fix some static analysis warnings.
12240 - Fix a possible crash when using custom memory allocators without a custom realloc() implementation.
12241 - Fix a bug with binary search seeking where the bits per sample is not a multiple of 8.
12243v0.12.2 - 2019-10-07
12244 - Internal code clean up.
12246v0.12.1 - 2019-09-29
12247 - Fix some Clang Static Analyzer warnings.
12248 - Fix an unused variable warning.
12250v0.12.0 - 2019-09-23
12251 - API CHANGE: Add support for user defined memory allocation routines. This system allows the program to specify their own memory allocation
12252 routines with a user data pointer for client-specific contextual data. This adds an extra parameter to the end of the following APIs:
12253 - drflac_open()
12254 - drflac_open_relaxed()
12255 - drflac_open_with_metadata()
12256 - drflac_open_with_metadata_relaxed()
12257 - drflac_open_file()
12258 - drflac_open_file_with_metadata()
12259 - drflac_open_memory()
12260 - drflac_open_memory_with_metadata()
12261 - drflac_open_and_read_pcm_frames_s32()
12262 - drflac_open_and_read_pcm_frames_s16()
12263 - drflac_open_and_read_pcm_frames_f32()
12264 - drflac_open_file_and_read_pcm_frames_s32()
12265 - drflac_open_file_and_read_pcm_frames_s16()
12266 - drflac_open_file_and_read_pcm_frames_f32()
12267 - drflac_open_memory_and_read_pcm_frames_s32()
12268 - drflac_open_memory_and_read_pcm_frames_s16()
12269 - drflac_open_memory_and_read_pcm_frames_f32()
12270 Set this extra parameter to NULL to use defaults which is the same as the previous behaviour. Setting this NULL will use
12271 DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
12272 - Remove deprecated APIs:
12273 - drflac_read_s32()
12274 - drflac_read_s16()
12275 - drflac_read_f32()
12276 - drflac_seek_to_sample()
12277 - drflac_open_and_decode_s32()
12278 - drflac_open_and_decode_s16()
12279 - drflac_open_and_decode_f32()
12280 - drflac_open_and_decode_file_s32()
12281 - drflac_open_and_decode_file_s16()
12282 - drflac_open_and_decode_file_f32()
12283 - drflac_open_and_decode_memory_s32()
12284 - drflac_open_and_decode_memory_s16()
12285 - drflac_open_and_decode_memory_f32()
12286 - Remove drflac.totalSampleCount which is now replaced with drflac.totalPCMFrameCount. You can emulate drflac.totalSampleCount
12287 by doing pFlac->totalPCMFrameCount*pFlac->channels.
12288 - Rename drflac.currentFrame to drflac.currentFLACFrame to remove ambiguity with PCM frames.
12289 - Fix errors when seeking to the end of a stream.
12290 - Optimizations to seeking.
12291 - SSE improvements and optimizations.
12292 - ARM NEON optimizations.
12293 - Optimizations to drflac_read_pcm_frames_s16().
12294 - Optimizations to drflac_read_pcm_frames_s32().
12296v0.11.10 - 2019-06-26
12297 - Fix a compiler error.
12299v0.11.9 - 2019-06-16
12300 - Silence some ThreadSanitizer warnings.
12302v0.11.8 - 2019-05-21
12303 - Fix warnings.
12305v0.11.7 - 2019-05-06
12306 - C89 fixes.
12308v0.11.6 - 2019-05-05
12309 - Add support for C89.
12310 - Fix a compiler warning when CRC is disabled.
12311 - Change license to choice of public domain or MIT-0.
12313v0.11.5 - 2019-04-19
12314 - Fix a compiler error with GCC.
12316v0.11.4 - 2019-04-17
12317 - Fix some warnings with GCC when compiling with -std=c99.
12319v0.11.3 - 2019-04-07
12320 - Silence warnings with GCC.
12322v0.11.2 - 2019-03-10
12323 - Fix a warning.
12325v0.11.1 - 2019-02-17
12326 - Fix a potential bug with seeking.
12328v0.11.0 - 2018-12-16
12329 - API CHANGE: Deprecated drflac_read_s32(), drflac_read_s16() and drflac_read_f32() and replaced them with
12330 drflac_read_pcm_frames_s32(), drflac_read_pcm_frames_s16() and drflac_read_pcm_frames_f32(). The new APIs take
12331 and return PCM frame counts instead of sample counts. To upgrade you will need to change the input count by
12332 dividing it by the channel count, and then do the same with the return value.
12333 - API_CHANGE: Deprecated drflac_seek_to_sample() and replaced with drflac_seek_to_pcm_frame(). Same rules as
12334 the changes to drflac_read_*() apply.
12335 - API CHANGE: Deprecated drflac_open_and_decode_*() and replaced with drflac_open_*_and_read_*(). Same rules as
12336 the changes to drflac_read_*() apply.
12337 - Optimizations.
12339v0.10.0 - 2018-09-11
12340 - Remove the DR_FLAC_NO_WIN32_IO option and the Win32 file IO functionality. If you need to use Win32 file IO you
12341 need to do it yourself via the callback API.
12342 - Fix the clang build.
12343 - Fix undefined behavior.
12344 - Fix errors with CUESHEET metdata blocks.
12345 - Add an API for iterating over each cuesheet track in the CUESHEET metadata block. This works the same way as the
12346 Vorbis comment API.
12347 - Other miscellaneous bug fixes, mostly relating to invalid FLAC streams.
12348 - Minor optimizations.
12350v0.9.11 - 2018-08-29
12351 - Fix a bug with sample reconstruction.
12353v0.9.10 - 2018-08-07
12354 - Improve 64-bit detection.
12356v0.9.9 - 2018-08-05
12357 - Fix C++ build on older versions of GCC.
12359v0.9.8 - 2018-07-24
12360 - Fix compilation errors.
12362v0.9.7 - 2018-07-05
12363 - Fix a warning.
12365v0.9.6 - 2018-06-29
12366 - Fix some typos.
12368v0.9.5 - 2018-06-23
12369 - Fix some warnings.
12371v0.9.4 - 2018-06-14
12372 - Optimizations to seeking.
12373 - Clean up.
12375v0.9.3 - 2018-05-22
12376 - Bug fix.
12378v0.9.2 - 2018-05-12
12379 - Fix a compilation error due to a missing break statement.
12381v0.9.1 - 2018-04-29
12382 - Fix compilation error with Clang.
12384v0.9 - 2018-04-24
12385 - Fix Clang build.
12386 - Start using major.minor.revision versioning.
12388v0.8g - 2018-04-19
12389 - Fix build on non-x86/x64 architectures.
12391v0.8f - 2018-02-02
12392 - Stop pretending to support changing rate/channels mid stream.
12394v0.8e - 2018-02-01
12395 - Fix a crash when the block size of a frame is larger than the maximum block size defined by the FLAC stream.
12396 - Fix a crash the the Rice partition order is invalid.
12398v0.8d - 2017-09-22
12399 - Add support for decoding streams with ID3 tags. ID3 tags are just skipped.
12401v0.8c - 2017-09-07
12402 - Fix warning on non-x86/x64 architectures.
12404v0.8b - 2017-08-19
12405 - Fix build on non-x86/x64 architectures.
12407v0.8a - 2017-08-13
12408 - A small optimization for the Clang build.
12410v0.8 - 2017-08-12
12411 - API CHANGE: Rename dr_* types to drflac_*.
12412 - Optimizations. This brings dr_flac back to about the same class of efficiency as the reference implementation.
12413 - Add support for custom implementations of malloc(), realloc(), etc.
12414 - Add CRC checking to Ogg encapsulated streams.
12415 - Fix VC++ 6 build. This is only for the C++ compiler. The C compiler is not currently supported.
12416 - Bug fixes.
12418v0.7 - 2017-07-23
12419 - Add support for opening a stream without a header block. To do this, use drflac_open_relaxed() / drflac_open_with_metadata_relaxed().
12421v0.6 - 2017-07-22
12422 - Add support for recovering from invalid frames. With this change, dr_flac will simply skip over invalid frames as if they
12423 never existed. Frames are checked against their sync code, the CRC-8 of the frame header and the CRC-16 of the whole frame.
12425v0.5 - 2017-07-16
12426 - Fix typos.
12427 - Change drflac_bool* types to unsigned.
12428 - Add CRC checking. This makes dr_flac slower, but can be disabled with #define DR_FLAC_NO_CRC.
12430v0.4f - 2017-03-10
12431 - Fix a couple of bugs with the bitstreaming code.
12433v0.4e - 2017-02-17
12434 - Fix some warnings.
12436v0.4d - 2016-12-26
12437 - Add support for 32-bit floating-point PCM decoding.
12438 - Use drflac_int* and drflac_uint* sized types to improve compiler support.
12439 - Minor improvements to documentation.
12441v0.4c - 2016-12-26
12442 - Add support for signed 16-bit integer PCM decoding.
12444v0.4b - 2016-10-23
12445 - A minor change to drflac_bool8 and drflac_bool32 types.
12447v0.4a - 2016-10-11
12448 - Rename drBool32 to drflac_bool32 for styling consistency.
12450v0.4 - 2016-09-29
12451 - API/ABI CHANGE: Use fixed size 32-bit booleans instead of the built-in bool type.
12452 - API CHANGE: Rename drflac_open_and_decode*() to drflac_open_and_decode*_s32().
12453 - API CHANGE: Swap the order of "channels" and "sampleRate" parameters in drflac_open_and_decode*(). Rationale for this is to
12454 keep it consistent with drflac_audio.
12456v0.3f - 2016-09-21
12457 - Fix a warning with GCC.
12459v0.3e - 2016-09-18
12460 - Fixed a bug where GCC 4.3+ was not getting properly identified.
12461 - Fixed a few typos.
12462 - Changed date formats to ISO 8601 (YYYY-MM-DD).
12464v0.3d - 2016-06-11
12465 - Minor clean up.
12467v0.3c - 2016-05-28
12468 - Fixed compilation error.
12470v0.3b - 2016-05-16
12471 - Fixed Linux/GCC build.
12472 - Updated documentation.
12474v0.3a - 2016-05-15
12475 - Minor fixes to documentation.
12477v0.3 - 2016-05-11
12478 - Optimizations. Now at about parity with the reference implementation on 32-bit builds.
12479 - Lots of clean up.
12481v0.2b - 2016-05-10
12482 - Bug fixes.
12484v0.2a - 2016-05-10
12485 - Made drflac_open_and_decode() more robust.
12486 - Removed an unused debugging variable
12488v0.2 - 2016-05-09
12489 - Added support for Ogg encapsulation.
12490 - API CHANGE. Have the onSeek callback take a third argument which specifies whether or not the seek
12491 should be relative to the start or the current position. Also changes the seeking rules such that
12492 seeking offsets will never be negative.
12493 - Have drflac_open_and_decode() fail gracefully if the stream has an unknown total sample count.
12495v0.1b - 2016-05-07
12496 - Properly close the file handle in drflac_open_file() and family when the decoder fails to initialize.
12497 - Removed a stale comment.
12499v0.1a - 2016-05-05
12500 - Minor formatting changes.
12501 - Fixed a warning on the GCC build.
12503v0.1 - 2016-05-03
12504 - Initial versioned release.
12505*/
12507/*
12508This software is available as a choice of the following licenses. Choose
12509whichever you prefer.
12511===============================================================================
12512ALTERNATIVE 1 - Public Domain (www.unlicense.org)
12513===============================================================================
12514This is free and unencumbered software released into the public domain.
12516Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
12517software, either in source code form or as a compiled binary, for any purpose,
12518commercial or non-commercial, and by any means.
12520In jurisdictions that recognize copyright laws, the author or authors of this
12521software dedicate any and all copyright interest in the software to the public
12522domain. We make this dedication for the benefit of the public at large and to
12523the detriment of our heirs and successors. We intend this dedication to be an
12524overt act of relinquishment in perpetuity of all present and future rights to
12525this software under copyright law.
12527THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12528IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
12529FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
12530AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
12531ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
12532WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
12534For more information, please refer to <http://unlicense.org/>
12536===============================================================================
12537ALTERNATIVE 2 - MIT No Attribution
12538===============================================================================
12539Copyright 2023 David Reid
12541Permission is hereby granted, free of charge, to any person obtaining a copy of
12542this software and associated documentation files (the "Software"), to deal in
12543the Software without restriction, including without limitation the rights to
12544use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
12545of the Software, and to permit persons to whom the Software is furnished to do
12546so.
12548THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12549IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
12550FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
12551AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
12552LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
12553OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
12554SOFTWARE.
12555*/
index : raylib-jai
---