Logo

index : raylib-jai

---

  • summary
  • about
  • tree
  • log
  • branches
<< path: root/public/raylib-jai.git/html/Raylib/raylib/src/external/dr_flac.h blob: 497fcddd5d1451ff32dba72d7d98d4acfeb122ad [raw] [clear marker]

        
0/*
1FLAC audio decoder. Choice of public domain or MIT-0. See license statements at the end of this file.
2dr_flac - v0.13.0 - TBD
3
4David Reid - mackron@gmail.com
5
6GitHub: https://github.com/mackron/dr_libs
7*/
8
9/*
10Introduction
11============
12dr_flac is a single file library. To use it, do something like the following in one .c file.
13
14 ```c
15 #define DR_FLAC_IMPLEMENTATION
16 #include "dr_flac.h"
17 ```
18
19You can then #include this file in other parts of the program as you would with any other header file. To decode audio data, do something like the following:
20
21 ```c
22 drflac* pFlac = drflac_open_file("MySong.flac", NULL);
23 if (pFlac == NULL) {
24 // Failed to open FLAC file
25 }
26
27 drflac_int32* pSamples = malloc(pFlac->totalPCMFrameCount * pFlac->channels * sizeof(drflac_int32));
28 drflac_uint64 numberOfInterleavedSamplesActuallyRead = drflac_read_pcm_frames_s32(pFlac, pFlac->totalPCMFrameCount, pSamples);
29 ```
30
31The drflac object represents the decoder. It is a transparent type so all the information you need, such as the number of channels and the bits per sample,
32should be directly accessible - just make sure you don't change their values. Samples are always output as interleaved signed 32-bit PCM. In the example above
33a native FLAC stream was opened, however dr_flac has seamless support for Ogg encapsulated FLAC streams as well.
34
35You do not need to decode the entire stream in one go - you just specify how many samples you'd like at any given time and the decoder will give you as many
36samples as it can, up to the amount requested. Later on when you need the next batch of samples, just call it again. Example:
37
38 ```c
39 while (drflac_read_pcm_frames_s32(pFlac, chunkSizeInPCMFrames, pChunkSamples) > 0) {
40 do_something();
41 }
42 ```
43
44You can seek to a specific PCM frame with `drflac_seek_to_pcm_frame()`.
45
46If you just want to quickly decode an entire FLAC file in one go you can do something like this:
47
48 ```c
49 unsigned int channels;
50 unsigned int sampleRate;
51 drflac_uint64 totalPCMFrameCount;
52 drflac_int32* pSampleData = drflac_open_file_and_read_pcm_frames_s32("MySong.flac", &channels, &sampleRate, &totalPCMFrameCount, NULL);
53 if (pSampleData == NULL) {
54 // Failed to open and decode FLAC file.
55 }
56
57 ...
58
59 drflac_free(pSampleData, NULL);
60 ```
61
62You can read samples as signed 16-bit integer and 32-bit floating-point PCM with the *_s16() and *_f32() family of APIs respectively, but note that these
63should be considered lossy.
64
65
66If you need access to metadata (album art, etc.), use `drflac_open_with_metadata()`, `drflac_open_file_with_metdata()` or `drflac_open_memory_with_metadata()`.
67The rationale for keeping these APIs separate is that they're slightly slower than the normal versions and also just a little bit harder to use. dr_flac
68reports metadata to the application through the use of a callback, and every metadata block is reported before `drflac_open_with_metdata()` returns.
69
70The main opening APIs (`drflac_open()`, etc.) will fail if the header is not present. The presents a problem in certain scenarios such as broadcast style
71streams or internet radio where the header may not be present because the user has started playback mid-stream. To handle this, use the relaxed APIs:
72
73 `drflac_open_relaxed()`
74 `drflac_open_with_metadata_relaxed()`
75
76It is not recommended to use these APIs for file based streams because a missing header would usually indicate a corrupt or perverse file. In addition, these
77APIs can take a long time to initialize because they may need to spend a lot of time finding the first frame.
78
79
80
81Build Options
82=============
83#define these options before including this file.
84
85#define DR_FLAC_NO_STDIO
86 Disable `drflac_open_file()` and family.
87
88#define DR_FLAC_NO_OGG
89 Disables support for Ogg/FLAC streams.
90
91#define DR_FLAC_BUFFER_SIZE <number>
92 Defines the size of the internal buffer to store data from onRead(). This buffer is used to reduce the number of calls back to the client for more data.
93 Larger values means more memory, but better performance. My tests show diminishing returns after about 4KB (which is the default). Consider reducing this if
94 you have a very efficient implementation of onRead(), or increase it if it's very inefficient. Must be a multiple of 8.
95
96#define DR_FLAC_NO_CRC
97 Disables CRC checks. This will offer a performance boost when CRC is unnecessary. This will disable binary search seeking. When seeking, the seek table will
98 be used if available. Otherwise the seek will be performed using brute force.
99
100#define DR_FLAC_NO_SIMD
101 Disables SIMD optimizations (SSE on x86/x64 architectures, NEON on ARM architectures). Use this if you are having compatibility issues with your compiler.
102
103#define DR_FLAC_NO_WCHAR
104 Disables all functions ending with `_w`. Use this if your compiler does not provide wchar.h. Not required if DR_FLAC_NO_STDIO is also defined.
105
106
107
108Notes
109=====
110- dr_flac does not support changing the sample rate nor channel count mid stream.
111- dr_flac is not thread-safe, but its APIs can be called from any thread so long as you do your own synchronization.
112- When using Ogg encapsulation, a corrupted metadata block will result in `drflac_open_with_metadata()` and `drflac_open()` returning inconsistent samples due
113 to differences in corrupted stream recorvery logic between the two APIs.
114*/
115
116#ifndef dr_flac_h
117#define dr_flac_h
118
119#ifdef __cplusplus
120extern "C" {
121#endif
122
123#define DRFLAC_STRINGIFY(x) #x
124#define DRFLAC_XSTRINGIFY(x) DRFLAC_STRINGIFY(x)
125
126#define DRFLAC_VERSION_MAJOR 0
127#define DRFLAC_VERSION_MINOR 13
128#define DRFLAC_VERSION_REVISION 0
129#define DRFLAC_VERSION_STRING DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MAJOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MINOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_REVISION)
130
131#include <stddef.h> /* For size_t. */
132
133/* Sized Types */
134typedef signed char drflac_int8;
135typedef unsigned char drflac_uint8;
136typedef signed short drflac_int16;
137typedef unsigned short drflac_uint16;
138typedef signed int drflac_int32;
139typedef unsigned int drflac_uint32;
140#if defined(_MSC_VER) && !defined(__clang__)
141 typedef signed __int64 drflac_int64;
142 typedef unsigned __int64 drflac_uint64;
143#else
144 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
145 #pragma GCC diagnostic push
146 #pragma GCC diagnostic ignored "-Wlong-long"
147 #if defined(__clang__)
148 #pragma GCC diagnostic ignored "-Wc++11-long-long"
149 #endif
150 #endif
151 typedef signed long long drflac_int64;
152 typedef unsigned long long drflac_uint64;
153 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
154 #pragma GCC diagnostic pop
155 #endif
156#endif
157#if defined(__LP64__) || defined(_WIN64) || (defined(__x86_64__) && !defined(__ILP32__)) || defined(_M_X64) || defined(__ia64) || defined(_M_IA64) || defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__)
158 typedef drflac_uint64 drflac_uintptr;
159#else
160 typedef drflac_uint32 drflac_uintptr;
161#endif
162typedef drflac_uint8 drflac_bool8;
163typedef drflac_uint32 drflac_bool32;
164#define DRFLAC_TRUE 1
165#define DRFLAC_FALSE 0
166/* End Sized Types */
167
168/* Decorations */
169#if !defined(DRFLAC_API)
170 #if defined(DRFLAC_DLL)
171 #if defined(_WIN32)
172 #define DRFLAC_DLL_IMPORT __declspec(dllimport)
173 #define DRFLAC_DLL_EXPORT __declspec(dllexport)
174 #define DRFLAC_DLL_PRIVATE static
175 #else
176 #if defined(__GNUC__) && __GNUC__ >= 4
177 #define DRFLAC_DLL_IMPORT __attribute__((visibility("default")))
178 #define DRFLAC_DLL_EXPORT __attribute__((visibility("default")))
179 #define DRFLAC_DLL_PRIVATE __attribute__((visibility("hidden")))
180 #else
181 #define DRFLAC_DLL_IMPORT
182 #define DRFLAC_DLL_EXPORT
183 #define DRFLAC_DLL_PRIVATE static
184 #endif
185 #endif
186
187 #if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
188 #define DRFLAC_API DRFLAC_DLL_EXPORT
189 #else
190 #define DRFLAC_API DRFLAC_DLL_IMPORT
191 #endif
192 #define DRFLAC_PRIVATE DRFLAC_DLL_PRIVATE
193 #else
194 #define DRFLAC_API extern
195 #define DRFLAC_PRIVATE static
196 #endif
197#endif
198/* End Decorations */
199
200#if defined(_MSC_VER) && _MSC_VER >= 1700 /* Visual Studio 2012 */
201 #define DRFLAC_DEPRECATED __declspec(deprecated)
202#elif (defined(__GNUC__) && __GNUC__ >= 4) /* GCC 4 */
203 #define DRFLAC_DEPRECATED __attribute__((deprecated))
204#elif defined(__has_feature) /* Clang */
205 #if __has_feature(attribute_deprecated)
206 #define DRFLAC_DEPRECATED __attribute__((deprecated))
207 #else
208 #define DRFLAC_DEPRECATED
209 #endif
210#else
211 #define DRFLAC_DEPRECATED
212#endif
213
214DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision);
215DRFLAC_API const char* drflac_version_string(void);
216
217/* Allocation Callbacks */
218typedef struct
219{
220 void* pUserData;
221 void* (* onMalloc)(size_t sz, void* pUserData);
222 void* (* onRealloc)(void* p, size_t sz, void* pUserData);
223 void (* onFree)(void* p, void* pUserData);
224} drflac_allocation_callbacks;
225/* End Allocation Callbacks */
226
227/*
228As data is read from the client it is placed into an internal buffer for fast access. This controls the size of that buffer. Larger values means more speed,
229but also more memory. In my testing there is diminishing returns after about 4KB, but you can fiddle with this to suit your own needs. Must be a multiple of 8.
230*/
231#ifndef DR_FLAC_BUFFER_SIZE
232#define DR_FLAC_BUFFER_SIZE 4096
233#endif
234
235
236/* Architecture Detection */
237#if defined(_WIN64) || defined(_LP64) || defined(__LP64__)
238#define DRFLAC_64BIT
239#endif
240
241#if defined(__x86_64__) || (defined(_M_X64) && !defined(_M_ARM64EC))
242 #define DRFLAC_X64
243#elif defined(__i386) || defined(_M_IX86)
244 #define DRFLAC_X86
245#elif defined(__arm__) || defined(_M_ARM) || defined(__arm64) || defined(__arm64__) || defined(__aarch64__) || defined(_M_ARM64) || defined(_M_ARM64EC)
246 #define DRFLAC_ARM
247#endif
248/* End Architecture Detection */
249
250
251#ifdef DRFLAC_64BIT
252typedef drflac_uint64 drflac_cache_t;
253#else
254typedef drflac_uint32 drflac_cache_t;
255#endif
256
257/* The various metadata block types. */
258#define DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO 0
259#define DRFLAC_METADATA_BLOCK_TYPE_PADDING 1
260#define DRFLAC_METADATA_BLOCK_TYPE_APPLICATION 2
261#define DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE 3
262#define DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT 4
263#define DRFLAC_METADATA_BLOCK_TYPE_CUESHEET 5
264#define DRFLAC_METADATA_BLOCK_TYPE_PICTURE 6
265#define DRFLAC_METADATA_BLOCK_TYPE_INVALID 127
266
267/* The various picture types specified in the PICTURE block. */
268#define DRFLAC_PICTURE_TYPE_OTHER 0
269#define DRFLAC_PICTURE_TYPE_FILE_ICON 1
270#define DRFLAC_PICTURE_TYPE_OTHER_FILE_ICON 2
271#define DRFLAC_PICTURE_TYPE_COVER_FRONT 3
272#define DRFLAC_PICTURE_TYPE_COVER_BACK 4
273#define DRFLAC_PICTURE_TYPE_LEAFLET_PAGE 5
274#define DRFLAC_PICTURE_TYPE_MEDIA 6
275#define DRFLAC_PICTURE_TYPE_LEAD_ARTIST 7
276#define DRFLAC_PICTURE_TYPE_ARTIST 8
277#define DRFLAC_PICTURE_TYPE_CONDUCTOR 9
278#define DRFLAC_PICTURE_TYPE_BAND 10
279#define DRFLAC_PICTURE_TYPE_COMPOSER 11
280#define DRFLAC_PICTURE_TYPE_LYRICIST 12
281#define DRFLAC_PICTURE_TYPE_RECORDING_LOCATION 13
282#define DRFLAC_PICTURE_TYPE_DURING_RECORDING 14
283#define DRFLAC_PICTURE_TYPE_DURING_PERFORMANCE 15
284#define DRFLAC_PICTURE_TYPE_SCREEN_CAPTURE 16
285#define DRFLAC_PICTURE_TYPE_BRIGHT_COLORED_FISH 17
286#define DRFLAC_PICTURE_TYPE_ILLUSTRATION 18
287#define DRFLAC_PICTURE_TYPE_BAND_LOGOTYPE 19
288#define DRFLAC_PICTURE_TYPE_PUBLISHER_LOGOTYPE 20
289
290typedef enum
291{
292 drflac_container_native,
293 drflac_container_ogg,
294 drflac_container_unknown
295} drflac_container;
296
297typedef enum
298{
299 DRFLAC_SEEK_SET,
300 DRFLAC_SEEK_CUR,
301 DRFLAC_SEEK_END
302} drflac_seek_origin;
303
304/* The order of members in this structure is important because we map this directly to the raw data within the SEEKTABLE metadata block. */
305typedef struct
306{
307 drflac_uint64 firstPCMFrame;
308 drflac_uint64 flacFrameOffset; /* The offset from the first byte of the header of the first frame. */
309 drflac_uint16 pcmFrameCount;
310} drflac_seekpoint;
311
312typedef struct
313{
314 drflac_uint16 minBlockSizeInPCMFrames;
315 drflac_uint16 maxBlockSizeInPCMFrames;
316 drflac_uint32 minFrameSizeInPCMFrames;
317 drflac_uint32 maxFrameSizeInPCMFrames;
318 drflac_uint32 sampleRate;
319 drflac_uint8 channels;
320 drflac_uint8 bitsPerSample;
321 drflac_uint64 totalPCMFrameCount;
322 drflac_uint8 md5[16];
323} drflac_streaminfo;
324
325typedef struct
326{
327 /*
328 The metadata type. Use this to know how to interpret the data below. Will be set to one of the
329 DRFLAC_METADATA_BLOCK_TYPE_* tokens.
330 */
331 drflac_uint32 type;
332
333 /*
334 A pointer to the raw data. This points to a temporary buffer so don't hold on to it. It's best to
335 not modify the contents of this buffer. Use the structures below for more meaningful and structured
336 information about the metadata. It's possible for this to be null.
337 */
338 const void* pRawData;
339
340 /* The size in bytes of the block and the buffer pointed to by pRawData if it's non-NULL. */
341 drflac_uint32 rawDataSize;
342
343 union
344 {
345 drflac_streaminfo streaminfo;
346
347 struct
348 {
349 int unused;
350 } padding;
351
352 struct
353 {
354 drflac_uint32 id;
355 const void* pData;
356 drflac_uint32 dataSize;
357 } application;
358
359 struct
360 {
361 drflac_uint32 seekpointCount;
362 const drflac_seekpoint* pSeekpoints;
363 } seektable;
364
365 struct
366 {
367 drflac_uint32 vendorLength;
368 const char* vendor;
369 drflac_uint32 commentCount;
370 const void* pComments;
371 } vorbis_comment;
372
373 struct
374 {
375 char catalog[128];
376 drflac_uint64 leadInSampleCount;
377 drflac_bool32 isCD;
378 drflac_uint8 trackCount;
379 const void* pTrackData;
380 } cuesheet;
381
382 struct
383 {
384 drflac_uint32 type;
385 drflac_uint32 mimeLength;
386 const char* mime;
387 drflac_uint32 descriptionLength;
388 const char* description;
389 drflac_uint32 width;
390 drflac_uint32 height;
391 drflac_uint32 colorDepth;
392 drflac_uint32 indexColorCount;
393 drflac_uint32 pictureDataSize;
394 const drflac_uint8* pPictureData;
395 } picture;
396 } data;
397} drflac_metadata;
398
399
400/*
401Callback for when data needs to be read from the client.
402
403
404Parameters
405----------
406pUserData (in)
407 The user data that was passed to drflac_open() and family.
408
409pBufferOut (out)
410 The output buffer.
411
412bytesToRead (in)
413 The number of bytes to read.
414
415
416Return Value
417------------
418The number of bytes actually read.
419
420
421Remarks
422-------
423A return value of less than bytesToRead indicates the end of the stream. Do _not_ return from this callback until either the entire bytesToRead is filled or
424you have reached the end of the stream.
425*/
426typedef size_t (* drflac_read_proc)(void* pUserData, void* pBufferOut, size_t bytesToRead);
427
428/*
429Callback for when data needs to be seeked.
430
431
432Parameters
433----------
434pUserData (in)
435 The user data that was passed to drflac_open() and family.
436
437offset (in)
438 The number of bytes to move, relative to the origin. Will never be negative.
439
440origin (in)
441 The origin of the seek - the current position, the start of the stream, or the end of the stream.
442
443
444Return Value
445------------
446Whether or not the seek was successful.
447
448
449Remarks
450-------
451Seeking relative to the start and the current position must always be supported. If seeking from the end of the stream is not supported, return DRFLAC_FALSE.
452
453When seeking to a PCM frame using drflac_seek_to_pcm_frame(), dr_flac may call this with an offset beyond the end of the FLAC stream. This needs to be detected
454and handled by returning DRFLAC_FALSE.
455*/
456typedef drflac_bool32 (* drflac_seek_proc)(void* pUserData, int offset, drflac_seek_origin origin);
457
458/*
459Callback for when the current position in the stream needs to be retrieved.
460
461
462Parameters
463----------
464pUserData (in)
465 The user data that was passed to drflac_open() and family.
466
467pCursor (out)
468 A pointer to a variable to receive the current position in the stream.
469
470
471Return Value
472------------
473Whether or not the operation was successful.
474*/
475typedef drflac_bool32 (* drflac_tell_proc)(void* pUserData, drflac_int64* pCursor);
476
477/*
478Callback for when a metadata block is read.
479
480
481Parameters
482----------
483pUserData (in)
484 The user data that was passed to drflac_open() and family.
485
486pMetadata (in)
487 A pointer to a structure containing the data of the metadata block.
488
489
490Remarks
491-------
492Use pMetadata->type to determine which metadata block is being handled and how to read the data. This
493will be set to one of the DRFLAC_METADATA_BLOCK_TYPE_* tokens.
494*/
495typedef void (* drflac_meta_proc)(void* pUserData, drflac_metadata* pMetadata);
496
497
498/* Structure for internal use. Only used for decoders opened with drflac_open_memory. */
499typedef struct
500{
501 const drflac_uint8* data;
502 size_t dataSize;
503 size_t currentReadPos;
504} drflac__memory_stream;
505
506/* Structure for internal use. Used for bit streaming. */
507typedef struct
508{
509 /* The function to call when more data needs to be read. */
510 drflac_read_proc onRead;
511
512 /* The function to call when the current read position needs to be moved. */
513 drflac_seek_proc onSeek;
514
515 /* The function to call when the current read position needs to be retrieved. */
516 drflac_tell_proc onTell;
517
518 /* The user data to pass around to onRead and onSeek. */
519 void* pUserData;
520
521
522 /*
523 The number of unaligned bytes in the L2 cache. This will always be 0 until the end of the stream is hit. At the end of the
524 stream there will be a number of bytes that don't cleanly fit in an L1 cache line, so we use this variable to know whether
525 or not the bistreamer needs to run on a slower path to read those last bytes. This will never be more than sizeof(drflac_cache_t).
526 */
527 size_t unalignedByteCount;
528
529 /* The content of the unaligned bytes. */
530 drflac_cache_t unalignedCache;
531
532 /* The index of the next valid cache line in the "L2" cache. */
533 drflac_uint32 nextL2Line;
534
535 /* The number of bits that have been consumed by the cache. This is used to determine how many valid bits are remaining. */
536 drflac_uint32 consumedBits;
537
538 /*
539 The cached data which was most recently read from the client. There are two levels of cache. Data flows as such:
540 Client -> L2 -> L1. The L2 -> L1 movement is aligned and runs on a fast path in just a few instructions.
541 */
542 drflac_cache_t cacheL2[DR_FLAC_BUFFER_SIZE/sizeof(drflac_cache_t)];
543 drflac_cache_t cache;
544
545 /*
546 CRC-16. This is updated whenever bits are read from the bit stream. Manually set this to 0 to reset the CRC. For FLAC, this
547 is reset to 0 at the beginning of each frame.
548 */
549 drflac_uint16 crc16;
550 drflac_cache_t crc16Cache; /* A cache for optimizing CRC calculations. This is filled when when the L1 cache is reloaded. */
551 drflac_uint32 crc16CacheIgnoredBytes; /* The number of bytes to ignore when updating the CRC-16 from the CRC-16 cache. */
552} drflac_bs;
553
554typedef struct
555{
556 /* The type of the subframe: SUBFRAME_CONSTANT, SUBFRAME_VERBATIM, SUBFRAME_FIXED or SUBFRAME_LPC. */
557 drflac_uint8 subframeType;
558
559 /* The number of wasted bits per sample as specified by the sub-frame header. */
560 drflac_uint8 wastedBitsPerSample;
561
562 /* The order to use for the prediction stage for SUBFRAME_FIXED and SUBFRAME_LPC. */
563 drflac_uint8 lpcOrder;
564
565 /* A pointer to the buffer containing the decoded samples in the subframe. This pointer is an offset from drflac::pExtraData. */
566 drflac_int32* pSamplesS32;
567} drflac_subframe;
568
569typedef struct
570{
571 /*
572 If the stream uses variable block sizes, this will be set to the index of the first PCM frame. If fixed block sizes are used, this will
573 always be set to 0. This is 64-bit because the decoded PCM frame number will be 36 bits.
574 */
575 drflac_uint64 pcmFrameNumber;
576
577 /*
578 If the stream uses fixed block sizes, this will be set to the frame number. If variable block sizes are used, this will always be 0. This
579 is 32-bit because in fixed block sizes, the maximum frame number will be 31 bits.
580 */
581 drflac_uint32 flacFrameNumber;
582
583 /* The sample rate of this frame. */
584 drflac_uint32 sampleRate;
585
586 /* The number of PCM frames in each sub-frame within this frame. */
587 drflac_uint16 blockSizeInPCMFrames;
588
589 /*
590 The channel assignment of this frame. This is not always set to the channel count. If interchannel decorrelation is being used this
591 will be set to DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE, DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE or DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE.
592 */
593 drflac_uint8 channelAssignment;
594
595 /* The number of bits per sample within this frame. */
596 drflac_uint8 bitsPerSample;
597
598 /* The frame's CRC. */
599 drflac_uint8 crc8;
600} drflac_frame_header;
601
602typedef struct
603{
604 /* The header. */
605 drflac_frame_header header;
606
607 /*
608 The number of PCM frames left to be read in this FLAC frame. This is initially set to the block size. As PCM frames are read,
609 this will be decremented. When it reaches 0, the decoder will see this frame as fully consumed and load the next frame.
610 */
611 drflac_uint32 pcmFramesRemaining;
612
613 /* The list of sub-frames within the frame. There is one sub-frame for each channel, and there's a maximum of 8 channels. */
614 drflac_subframe subframes[8];
615} drflac_frame;
616
617typedef struct
618{
619 /* The function to call when a metadata block is read. */
620 drflac_meta_proc onMeta;
621
622 /* The user data posted to the metadata callback function. */
623 void* pUserDataMD;
624
625 /* Memory allocation callbacks. */
626 drflac_allocation_callbacks allocationCallbacks;
627
628
629 /* The sample rate. Will be set to something like 44100. */
630 drflac_uint32 sampleRate;
631
632 /*
633 The number of channels. This will be set to 1 for monaural streams, 2 for stereo, etc. Maximum 8. This is set based on the
634 value specified in the STREAMINFO block.
635 */
636 drflac_uint8 channels;
637
638 /* The bits per sample. Will be set to something like 16, 24, etc. */
639 drflac_uint8 bitsPerSample;
640
641 /* The maximum block size, in samples. This number represents the number of samples in each channel (not combined). */
642 drflac_uint16 maxBlockSizeInPCMFrames;
643
644 /*
645 The total number of PCM Frames making up the stream. Can be 0 in which case it's still a valid stream, but just means
646 the total PCM frame count is unknown. Likely the case with streams like internet radio.
647 */
648 drflac_uint64 totalPCMFrameCount;
649
650
651 /* The container type. This is set based on whether or not the decoder was opened from a native or Ogg stream. */
652 drflac_container container;
653
654 /* The number of seekpoints in the seektable. */
655 drflac_uint32 seekpointCount;
656
657
658 /* Information about the frame the decoder is currently sitting on. */
659 drflac_frame currentFLACFrame;
660
661
662 /* The index of the PCM frame the decoder is currently sitting on. This is only used for seeking. */
663 drflac_uint64 currentPCMFrame;
664
665 /* The position of the first FLAC frame in the stream. This is only ever used for seeking. */
666 drflac_uint64 firstFLACFramePosInBytes;
667
668
669 /* A hack to avoid a malloc() when opening a decoder with drflac_open_memory(). */
670 drflac__memory_stream memoryStream;
671
672
673 /* A pointer to the decoded sample data. This is an offset of pExtraData. */
674 drflac_int32* pDecodedSamples;
675
676 /* A pointer to the seek table. This is an offset of pExtraData, or NULL if there is no seek table. */
677 drflac_seekpoint* pSeekpoints;
678
679 /* Internal use only. Only used with Ogg containers. Points to a drflac_oggbs object. This is an offset of pExtraData. */
680 void* _oggbs;
681
682 /* Internal use only. Used for profiling and testing different seeking modes. */
683 drflac_bool32 _noSeekTableSeek : 1;
684 drflac_bool32 _noBinarySearchSeek : 1;
685 drflac_bool32 _noBruteForceSeek : 1;
686
687 /* The bit streamer. The raw FLAC data is fed through this object. */
688 drflac_bs bs;
689
690 /* Variable length extra data. We attach this to the end of the object so we can avoid unnecessary mallocs. */
691 drflac_uint8 pExtraData[1];
692} drflac;
693
694
695/*
696Opens a FLAC decoder.
697
698
699Parameters
700----------
701onRead (in)
702 The function to call when data needs to be read from the client.
703
704onSeek (in)
705 The function to call when the read position of the client data needs to move.
706
707pUserData (in, optional)
708 A pointer to application defined data that will be passed to onRead and onSeek.
709
710pAllocationCallbacks (in, optional)
711 A pointer to application defined callbacks for managing memory allocations.
712
713
714Return Value
715------------
716Returns a pointer to an object representing the decoder.
717
718
719Remarks
720-------
721Close the decoder with `drflac_close()`.
722
723`pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
724
725This function will automatically detect whether or not you are attempting to open a native or Ogg encapsulated FLAC, both of which should work seamlessly
726without any manual intervention. Ogg encapsulation also works with multiplexed streams which basically means it can play FLAC encoded audio tracks in videos.
727
728This is the lowest level function for opening a FLAC stream. You can also use `drflac_open_file()` and `drflac_open_memory()` to open the stream from a file or
729from a block of memory respectively.
730
731The STREAMINFO block must be present for this to succeed. Use `drflac_open_relaxed()` to open a FLAC stream where the header may not be present.
732
733Use `drflac_open_with_metadata()` if you need access to metadata.
734
735
736Seek Also
737---------
738drflac_open_file()
739drflac_open_memory()
740drflac_open_with_metadata()
741drflac_close()
742*/
743DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
744
745/*
746Opens a FLAC stream with relaxed validation of the header block.
747
748
749Parameters
750----------
751onRead (in)
752 The function to call when data needs to be read from the client.
753
754onSeek (in)
755 The function to call when the read position of the client data needs to move.
756
757container (in)
758 Whether or not the FLAC stream is encapsulated using standard FLAC encapsulation or Ogg encapsulation.
759
760pUserData (in, optional)
761 A pointer to application defined data that will be passed to onRead and onSeek.
762
763pAllocationCallbacks (in, optional)
764 A pointer to application defined callbacks for managing memory allocations.
765
766
767Return Value
768------------
769A pointer to an object representing the decoder.
770
771
772Remarks
773-------
774The same as drflac_open(), except attempts to open the stream even when a header block is not present.
775
776Because the header is not necessarily available, the caller must explicitly define the container (Native or Ogg). Do not set this to `drflac_container_unknown`
777as that is for internal use only.
778
779Opening in relaxed mode will continue reading data from onRead until it finds a valid frame. If a frame is never found it will continue forever. To abort,
780force your `onRead` callback to return 0, which dr_flac will use as an indicator that the end of the stream was found.
781
782Use `drflac_open_with_metadata_relaxed()` if you need access to metadata.
783*/
784DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
785
786/*
787Opens a FLAC decoder and notifies the caller of the metadata chunks (album art, etc.).
788
789
790Parameters
791----------
792onRead (in)
793 The function to call when data needs to be read from the client.
794
795onSeek (in)
796 The function to call when the read position of the client data needs to move.
797
798onMeta (in)
799 The function to call for every metadata block.
800
801pUserData (in, optional)
802 A pointer to application defined data that will be passed to onRead, onSeek and onMeta.
803
804pAllocationCallbacks (in, optional)
805 A pointer to application defined callbacks for managing memory allocations.
806
807
808Return Value
809------------
810A pointer to an object representing the decoder.
811
812
813Remarks
814-------
815Close the decoder with `drflac_close()`.
816
817`pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
818
819This is slower than `drflac_open()`, so avoid this one if you don't need metadata. Internally, this will allocate and free memory on the heap for every
820metadata block except for STREAMINFO and PADDING blocks.
821
822The caller is notified of the metadata via the `onMeta` callback. All metadata blocks will be handled before the function returns. This callback takes a
823pointer to a `drflac_metadata` object which is a union containing the data of all relevant metadata blocks. Use the `type` member to discriminate against
824the different metadata types.
825
826The STREAMINFO block must be present for this to succeed. Use `drflac_open_with_metadata_relaxed()` to open a FLAC stream where the header may not be present.
827
828Note that this will behave inconsistently with `drflac_open()` if the stream is an Ogg encapsulated stream and a metadata block is corrupted. This is due to
829the way the Ogg stream recovers from corrupted pages. When `drflac_open_with_metadata()` is being used, the open routine will try to read the contents of the
830metadata block, whereas `drflac_open()` will simply seek past it (for the sake of efficiency). This inconsistency can result in different samples being
831returned depending on whether or not the stream is being opened with metadata.
832
833
834Seek Also
835---------
836drflac_open_file_with_metadata()
837drflac_open_memory_with_metadata()
838drflac_open()
839drflac_close()
840*/
841DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
842
843/*
844The same as drflac_open_with_metadata(), except attempts to open the stream even when a header block is not present.
845
846See Also
847--------
848drflac_open_with_metadata()
849drflac_open_relaxed()
850*/
851DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
852
853/*
854Closes the given FLAC decoder.
855
856
857Parameters
858----------
859pFlac (in)
860 The decoder to close.
861
862
863Remarks
864-------
865This will destroy the decoder object.
866
867
868See Also
869--------
870drflac_open()
871drflac_open_with_metadata()
872drflac_open_file()
873drflac_open_file_w()
874drflac_open_file_with_metadata()
875drflac_open_file_with_metadata_w()
876drflac_open_memory()
877drflac_open_memory_with_metadata()
878*/
879DRFLAC_API void drflac_close(drflac* pFlac);
880
881
882/*
883Reads sample data from the given FLAC decoder, output as interleaved signed 32-bit PCM.
884
885
886Parameters
887----------
888pFlac (in)
889 The decoder.
890
891framesToRead (in)
892 The number of PCM frames to read.
893
894pBufferOut (out, optional)
895 A pointer to the buffer that will receive the decoded samples.
896
897
898Return Value
899------------
900Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
901
902
903Remarks
904-------
905pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
906*/
907DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut);
908
909
910/*
911Reads sample data from the given FLAC decoder, output as interleaved signed 16-bit PCM.
912
913
914Parameters
915----------
916pFlac (in)
917 The decoder.
918
919framesToRead (in)
920 The number of PCM frames to read.
921
922pBufferOut (out, optional)
923 A pointer to the buffer that will receive the decoded samples.
924
925
926Return Value
927------------
928Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
929
930
931Remarks
932-------
933pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
934
935Note that this is lossy for streams where the bits per sample is larger than 16.
936*/
937DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut);
938
939/*
940Reads sample data from the given FLAC decoder, output as interleaved 32-bit floating point PCM.
941
942
943Parameters
944----------
945pFlac (in)
946 The decoder.
947
948framesToRead (in)
949 The number of PCM frames to read.
950
951pBufferOut (out, optional)
952 A pointer to the buffer that will receive the decoded samples.
953
954
955Return Value
956------------
957Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
958
959
960Remarks
961-------
962pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
963
964Note that this should be considered lossy due to the nature of floating point numbers not being able to exactly represent every possible number.
965*/
966DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut);
967
968/*
969Seeks to the PCM frame at the given index.
970
971
972Parameters
973----------
974pFlac (in)
975 The decoder.
976
977pcmFrameIndex (in)
978 The index of the PCM frame to seek to. See notes below.
979
980
981Return Value
982-------------
983`DRFLAC_TRUE` if successful; `DRFLAC_FALSE` otherwise.
984*/
985DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex);
986
987
988
989#ifndef DR_FLAC_NO_STDIO
990/*
991Opens a FLAC decoder from the file at the given path.
992
993
994Parameters
995----------
996pFileName (in)
997 The path of the file to open, either absolute or relative to the current directory.
998
999pAllocationCallbacks (in, optional)
1000 A pointer to application defined callbacks for managing memory allocations.
1001
1002
1003Return Value
1004------------
1005A pointer to an object representing the decoder.
1006
1007
1008Remarks
1009-------
1010Close the decoder with drflac_close().
1011
1012
1013Remarks
1014-------
1015This will hold a handle to the file until the decoder is closed with drflac_close(). Some platforms will restrict the number of files a process can have open
1016at any given time, so keep this mind if you have many decoders open at the same time.
1017
1018
1019See Also
1020--------
1021drflac_open_file_with_metadata()
1022drflac_open()
1023drflac_close()
1024*/
1025DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
1026DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
1027
1028/*
1029Opens a FLAC decoder from the file at the given path and notifies the caller of the metadata chunks (album art, etc.)
1030
1031
1032Parameters
1033----------
1034pFileName (in)
1035 The path of the file to open, either absolute or relative to the current directory.
1036
1037pAllocationCallbacks (in, optional)
1038 A pointer to application defined callbacks for managing memory allocations.
1039
1040onMeta (in)
1041 The callback to fire for each metadata block.
1042
1043pUserData (in)
1044 A pointer to the user data to pass to the metadata callback.
1045
1046pAllocationCallbacks (in)
1047 A pointer to application defined callbacks for managing memory allocations.
1048
1049
1050Remarks
1051-------
1052Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
1053
1054
1055See Also
1056--------
1057drflac_open_with_metadata()
1058drflac_open()
1059drflac_close()
1060*/
1061DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1062DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1063#endif
1064
1065/*
1066Opens a FLAC decoder from a pre-allocated block of memory
1067
1068
1069Parameters
1070----------
1071pData (in)
1072 A pointer to the raw encoded FLAC data.
1073
1074dataSize (in)
1075 The size in bytes of `data`.
1076
1077pAllocationCallbacks (in)
1078 A pointer to application defined callbacks for managing memory allocations.
1079
1080
1081Return Value
1082------------
1083A pointer to an object representing the decoder.
1084
1085
1086Remarks
1087-------
1088This does not create a copy of the data. It is up to the application to ensure the buffer remains valid for the lifetime of the decoder.
1089
1090
1091See Also
1092--------
1093drflac_open()
1094drflac_close()
1095*/
1096DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks);
1097
1098/*
1099Opens a FLAC decoder from a pre-allocated block of memory and notifies the caller of the metadata chunks (album art, etc.)
1100
1101
1102Parameters
1103----------
1104pData (in)
1105 A pointer to the raw encoded FLAC data.
1106
1107dataSize (in)
1108 The size in bytes of `data`.
1109
1110onMeta (in)
1111 The callback to fire for each metadata block.
1112
1113pUserData (in)
1114 A pointer to the user data to pass to the metadata callback.
1115
1116pAllocationCallbacks (in)
1117 A pointer to application defined callbacks for managing memory allocations.
1118
1119
1120Remarks
1121-------
1122Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
1123
1124
1125See Also
1126-------
1127drflac_open_with_metadata()
1128drflac_open()
1129drflac_close()
1130*/
1131DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1132
1133
1134
1135/* High Level APIs */
1136
1137/*
1138Opens a FLAC stream from the given callbacks and fully decodes it in a single operation. The return value is a
1139pointer to the sample data as interleaved signed 32-bit PCM. The returned data must be freed with drflac_free().
1140
1141You can pass in custom memory allocation callbacks via the pAllocationCallbacks parameter. This can be NULL in which
1142case it will use DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
1143
1144Sometimes a FLAC file won't keep track of the total sample count. In this situation the function will continuously
1145read samples into a dynamically sized buffer on the heap until no samples are left.
1146
1147Do not call this function on a broadcast type of stream (like internet radio streams and whatnot).
1148*/
1149DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1150
1151/* Same as drflac_open_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1152DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1153
1154/* Same as drflac_open_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1155DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1156
1157#ifndef DR_FLAC_NO_STDIO
1158/* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a file. */
1159DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1160
1161/* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1162DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1163
1164/* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1165DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1166#endif
1167
1168/* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a block of memory. */
1169DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1170
1171/* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1172DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1173
1174/* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1175DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1176
1177/*
1178Frees memory that was allocated internally by dr_flac.
1179
1180Set pAllocationCallbacks to the same object that was passed to drflac_open_*_and_read_pcm_frames_*(). If you originally passed in NULL, pass in NULL for this.
1181*/
1182DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks);
1183
1184
1185/* Structure representing an iterator for vorbis comments in a VORBIS_COMMENT metadata block. */
1186typedef struct
1187{
1188 drflac_uint32 countRemaining;
1189 const char* pRunningData;
1190} drflac_vorbis_comment_iterator;
1191
1192/*
1193Initializes a vorbis comment iterator. This can be used for iterating over the vorbis comments in a VORBIS_COMMENT
1194metadata block.
1195*/
1196DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments);
1197
1198/*
1199Goes to the next vorbis comment in the given iterator. If null is returned it means there are no more comments. The
1200returned string is NOT null terminated.
1201*/
1202DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut);
1203
1204
1205/* Structure representing an iterator for cuesheet tracks in a CUESHEET metadata block. */
1206typedef struct
1207{
1208 drflac_uint32 countRemaining;
1209 const char* pRunningData;
1210} drflac_cuesheet_track_iterator;
1211
1212/* The order of members here is important because we map this directly to the raw data within the CUESHEET metadata block. */
1213typedef struct
1214{
1215 drflac_uint64 offset;
1216 drflac_uint8 index;
1217 drflac_uint8 reserved[3];
1218} drflac_cuesheet_track_index;
1219
1220typedef struct
1221{
1222 drflac_uint64 offset;
1223 drflac_uint8 trackNumber;
1224 char ISRC[12];
1225 drflac_bool8 isAudio;
1226 drflac_bool8 preEmphasis;
1227 drflac_uint8 indexCount;
1228 const drflac_cuesheet_track_index* pIndexPoints;
1229} drflac_cuesheet_track;
1230
1231/*
1232Initializes a cuesheet track iterator. This can be used for iterating over the cuesheet tracks in a CUESHEET metadata
1233block.
1234*/
1235DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData);
1236
1237/* Goes to the next cuesheet track in the given iterator. If DRFLAC_FALSE is returned it means there are no more comments. */
1238DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack);
1239
1240
1241#ifdef __cplusplus
1242}
1243#endif
1244#endif /* dr_flac_h */
1245
1246
1247/************************************************************************************************************************************************************
1248 ************************************************************************************************************************************************************
1249
1250 IMPLEMENTATION
1251
1252 ************************************************************************************************************************************************************
1253 ************************************************************************************************************************************************************/
1254#if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
1255#ifndef dr_flac_c
1256#define dr_flac_c
1257
1258/* Disable some annoying warnings. */
1259#if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
1260 #pragma GCC diagnostic push
1261 #if __GNUC__ >= 7
1262 #pragma GCC diagnostic ignored "-Wimplicit-fallthrough"
1263 #endif
1264#endif
1265
1266#ifdef __linux__
1267 #ifndef _BSD_SOURCE
1268 #define _BSD_SOURCE
1269 #endif
1270 #ifndef _DEFAULT_SOURCE
1271 #define _DEFAULT_SOURCE
1272 #endif
1273 #ifndef __USE_BSD
1274 #define __USE_BSD
1275 #endif
1276 #include <endian.h>
1277#endif
1278
1279#include <stdlib.h>
1280#include <string.h>
1281
1282/* Inline */
1283#ifdef _MSC_VER
1284 #define DRFLAC_INLINE __forceinline
1285#elif defined(__GNUC__)
1286 /*
1287 I've had a bug report where GCC is emitting warnings about functions possibly not being inlineable. This warning happens when
1288 the __attribute__((always_inline)) attribute is defined without an "inline" statement. I think therefore there must be some
1289 case where "__inline__" is not always defined, thus the compiler emitting these warnings. When using -std=c89 or -ansi on the
1290 command line, we cannot use the "inline" keyword and instead need to use "__inline__". In an attempt to work around this issue
1291 I am using "__inline__" only when we're compiling in strict ANSI mode.
1292 */
1293 #if defined(__STRICT_ANSI__)
1294 #define DRFLAC_GNUC_INLINE_HINT __inline__
1295 #else
1296 #define DRFLAC_GNUC_INLINE_HINT inline
1297 #endif
1298
1299 #if (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 2)) || defined(__clang__)
1300 #define DRFLAC_INLINE DRFLAC_GNUC_INLINE_HINT __attribute__((always_inline))
1301 #else
1302 #define DRFLAC_INLINE DRFLAC_GNUC_INLINE_HINT
1303 #endif
1304#elif defined(__WATCOMC__)
1305 #define DRFLAC_INLINE __inline
1306#else
1307 #define DRFLAC_INLINE
1308#endif
1309/* End Inline */
1310
1311/*
1312Intrinsics Support
1313
1314There's a bug in GCC 4.2.x which results in an incorrect compilation error when using _mm_slli_epi32() where it complains with
1315
1316 "error: shift must be an immediate"
1317
1318Unfortuantely dr_flac depends on this for a few things so we're just going to disable SSE on GCC 4.2 and below.
1319*/
1320#if !defined(DR_FLAC_NO_SIMD)
1321 #if defined(DRFLAC_X64) || defined(DRFLAC_X86)
1322 #if defined(_MSC_VER) && !defined(__clang__)
1323 /* MSVC. */
1324 #if _MSC_VER >= 1400 && !defined(DRFLAC_NO_SSE2) /* 2005 */
1325 #define DRFLAC_SUPPORT_SSE2
1326 #endif
1327 #if _MSC_VER >= 1600 && !defined(DRFLAC_NO_SSE41) /* 2010 */
1328 #define DRFLAC_SUPPORT_SSE41
1329 #endif
1330 #elif defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)))
1331 /* Assume GNUC-style. */
1332 #if defined(__SSE2__) && !defined(DRFLAC_NO_SSE2)
1333 #define DRFLAC_SUPPORT_SSE2
1334 #endif
1335 #if defined(__SSE4_1__) && !defined(DRFLAC_NO_SSE41)
1336 #define DRFLAC_SUPPORT_SSE41
1337 #endif
1338 #endif
1339
1340 /* If at this point we still haven't determined compiler support for the intrinsics just fall back to __has_include. */
1341 #if !defined(__GNUC__) && !defined(__clang__) && defined(__has_include)
1342 #if !defined(DRFLAC_SUPPORT_SSE2) && !defined(DRFLAC_NO_SSE2) && __has_include(<emmintrin.h>)
1343 #define DRFLAC_SUPPORT_SSE2
1344 #endif
1345 #if !defined(DRFLAC_SUPPORT_SSE41) && !defined(DRFLAC_NO_SSE41) && __has_include(<smmintrin.h>)
1346 #define DRFLAC_SUPPORT_SSE41
1347 #endif
1348 #endif
1349
1350 #if defined(DRFLAC_SUPPORT_SSE41)
1351 #include <smmintrin.h>
1352 #elif defined(DRFLAC_SUPPORT_SSE2)
1353 #include <emmintrin.h>
1354 #endif
1355 #endif
1356
1357 #if defined(DRFLAC_ARM)
1358 #if !defined(DRFLAC_NO_NEON) && (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
1359 #define DRFLAC_SUPPORT_NEON
1360 #include <arm_neon.h>
1361 #endif
1362 #endif
1363#endif
1364
1365/* Compile-time CPU feature support. */
1366#if !defined(DR_FLAC_NO_SIMD) && (defined(DRFLAC_X86) || defined(DRFLAC_X64))
1367 #if defined(_MSC_VER) && !defined(__clang__)
1368 #if _MSC_VER >= 1400
1369 #include <intrin.h>
1370 static void drflac__cpuid(int info[4], int fid)
1371 {
1372 __cpuid(info, fid);
1373 }
1374 #else
1375 #define DRFLAC_NO_CPUID
1376 #endif
1377 #else
1378 #if defined(__GNUC__) || defined(__clang__)
1379 static void drflac__cpuid(int info[4], int fid)
1380 {
1381 /*
1382 It looks like the -fPIC option uses the ebx register which GCC complains about. We can work around this by just using a different register, the
1383 specific register of which I'm letting the compiler decide on. The "k" prefix is used to specify a 32-bit register. The {...} syntax is for
1384 supporting different assembly dialects.
1385
1386 What's basically happening is that we're saving and restoring the ebx register manually.
1387 */
1388 #if defined(DRFLAC_X86) && defined(__PIC__)
1389 __asm__ __volatile__ (
1390 "xchg{l} {%%}ebx, %k1;"
1391 "cpuid;"
1392 "xchg{l} {%%}ebx, %k1;"
1393 : "=a"(info[0]), "=&r"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
1394 );
1395 #else
1396 __asm__ __volatile__ (
1397 "cpuid" : "=a"(info[0]), "=b"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
1398 );
1399 #endif
1400 }
1401 #else
1402 #define DRFLAC_NO_CPUID
1403 #endif
1404 #endif
1405#else
1406 #define DRFLAC_NO_CPUID
1407#endif
1408
1409static DRFLAC_INLINE drflac_bool32 drflac_has_sse2(void)
1410{
1411#if defined(DRFLAC_SUPPORT_SSE2)
1412 #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE2)
1413 #if defined(DRFLAC_X64)
1414 return DRFLAC_TRUE; /* 64-bit targets always support SSE2. */
1415 #elif (defined(_M_IX86_FP) && _M_IX86_FP == 2) || defined(__SSE2__)
1416 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate SSE2 code we can assume support. */
1417 #else
1418 #if defined(DRFLAC_NO_CPUID)
1419 return DRFLAC_FALSE;
1420 #else
1421 int info[4];
1422 drflac__cpuid(info, 1);
1423 return (info[3] & (1 << 26)) != 0;
1424 #endif
1425 #endif
1426 #else
1427 return DRFLAC_FALSE; /* SSE2 is only supported on x86 and x64 architectures. */
1428 #endif
1429#else
1430 return DRFLAC_FALSE; /* No compiler support. */
1431#endif
1432}
1433
1434static DRFLAC_INLINE drflac_bool32 drflac_has_sse41(void)
1435{
1436#if defined(DRFLAC_SUPPORT_SSE41)
1437 #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE41)
1438 #if defined(__SSE4_1__) || defined(__AVX__)
1439 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate SSE41 code we can assume support. */
1440 #else
1441 #if defined(DRFLAC_NO_CPUID)
1442 return DRFLAC_FALSE;
1443 #else
1444 int info[4];
1445 drflac__cpuid(info, 1);
1446 return (info[2] & (1 << 19)) != 0;
1447 #endif
1448 #endif
1449 #else
1450 return DRFLAC_FALSE; /* SSE41 is only supported on x86 and x64 architectures. */
1451 #endif
1452#else
1453 return DRFLAC_FALSE; /* No compiler support. */
1454#endif
1455}
1456
1457
1458#if defined(_MSC_VER) && _MSC_VER >= 1500 && (defined(DRFLAC_X86) || defined(DRFLAC_X64)) && !defined(__clang__)
1459 #define DRFLAC_HAS_LZCNT_INTRINSIC
1460#elif (defined(__GNUC__) && ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 7)))
1461 #define DRFLAC_HAS_LZCNT_INTRINSIC
1462#elif defined(__clang__)
1463 #if defined(__has_builtin)
1464 #if __has_builtin(__builtin_clzll) || __has_builtin(__builtin_clzl)
1465 #define DRFLAC_HAS_LZCNT_INTRINSIC
1466 #endif
1467 #endif
1468#endif
1469
1470#if defined(_MSC_VER) && _MSC_VER >= 1400 && !defined(__clang__)
1471 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1472 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1473 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1474#elif defined(__clang__)
1475 #if defined(__has_builtin)
1476 #if __has_builtin(__builtin_bswap16)
1477 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1478 #endif
1479 #if __has_builtin(__builtin_bswap32)
1480 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1481 #endif
1482 #if __has_builtin(__builtin_bswap64)
1483 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1484 #endif
1485 #endif
1486#elif defined(__GNUC__)
1487 #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))
1488 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1489 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1490 #endif
1491 #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8))
1492 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1493 #endif
1494#elif defined(__WATCOMC__) && defined(__386__)
1495 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1496 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1497 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1498 extern __inline drflac_uint16 _watcom_bswap16(drflac_uint16);
1499 extern __inline drflac_uint32 _watcom_bswap32(drflac_uint32);
1500 extern __inline drflac_uint64 _watcom_bswap64(drflac_uint64);
1501#pragma aux _watcom_bswap16 = \
1502 "xchg al, ah" \
1503 parm [ax] \
1504 value [ax] \
1505 modify nomemory;
1506#pragma aux _watcom_bswap32 = \
1507 "bswap eax" \
1508 parm [eax] \
1509 value [eax] \
1510 modify nomemory;
1511#pragma aux _watcom_bswap64 = \
1512 "bswap eax" \
1513 "bswap edx" \
1514 "xchg eax,edx" \
1515 parm [eax edx] \
1516 value [eax edx] \
1517 modify nomemory;
1518#endif
1519
1520
1521/* Standard library stuff. */
1522#ifndef DRFLAC_ASSERT
1523#include <assert.h>
1524#define DRFLAC_ASSERT(expression) assert(expression)
1525#endif
1526#ifndef DRFLAC_MALLOC
1527#define DRFLAC_MALLOC(sz) malloc((sz))
1528#endif
1529#ifndef DRFLAC_REALLOC
1530#define DRFLAC_REALLOC(p, sz) realloc((p), (sz))
1531#endif
1532#ifndef DRFLAC_FREE
1533#define DRFLAC_FREE(p) free((p))
1534#endif
1535#ifndef DRFLAC_COPY_MEMORY
1536#define DRFLAC_COPY_MEMORY(dst, src, sz) memcpy((dst), (src), (sz))
1537#endif
1538#ifndef DRFLAC_ZERO_MEMORY
1539#define DRFLAC_ZERO_MEMORY(p, sz) memset((p), 0, (sz))
1540#endif
1541#ifndef DRFLAC_ZERO_OBJECT
1542#define DRFLAC_ZERO_OBJECT(p) DRFLAC_ZERO_MEMORY((p), sizeof(*(p)))
1543#endif
1544
1545#define DRFLAC_MAX_SIMD_VECTOR_SIZE 64 /* 64 for AVX-512 in the future. */
1546
1547/* Result Codes */
1548typedef drflac_int32 drflac_result;
1549#define DRFLAC_SUCCESS 0
1550#define DRFLAC_ERROR -1 /* A generic error. */
1551#define DRFLAC_INVALID_ARGS -2
1552#define DRFLAC_INVALID_OPERATION -3
1553#define DRFLAC_OUT_OF_MEMORY -4
1554#define DRFLAC_OUT_OF_RANGE -5
1555#define DRFLAC_ACCESS_DENIED -6
1556#define DRFLAC_DOES_NOT_EXIST -7
1557#define DRFLAC_ALREADY_EXISTS -8
1558#define DRFLAC_TOO_MANY_OPEN_FILES -9
1559#define DRFLAC_INVALID_FILE -10
1560#define DRFLAC_TOO_BIG -11
1561#define DRFLAC_PATH_TOO_LONG -12
1562#define DRFLAC_NAME_TOO_LONG -13
1563#define DRFLAC_NOT_DIRECTORY -14
1564#define DRFLAC_IS_DIRECTORY -15
1565#define DRFLAC_DIRECTORY_NOT_EMPTY -16
1566#define DRFLAC_END_OF_FILE -17
1567#define DRFLAC_NO_SPACE -18
1568#define DRFLAC_BUSY -19
1569#define DRFLAC_IO_ERROR -20
1570#define DRFLAC_INTERRUPT -21
1571#define DRFLAC_UNAVAILABLE -22
1572#define DRFLAC_ALREADY_IN_USE -23
1573#define DRFLAC_BAD_ADDRESS -24
1574#define DRFLAC_BAD_SEEK -25
1575#define DRFLAC_BAD_PIPE -26
1576#define DRFLAC_DEADLOCK -27
1577#define DRFLAC_TOO_MANY_LINKS -28
1578#define DRFLAC_NOT_IMPLEMENTED -29
1579#define DRFLAC_NO_MESSAGE -30
1580#define DRFLAC_BAD_MESSAGE -31
1581#define DRFLAC_NO_DATA_AVAILABLE -32
1582#define DRFLAC_INVALID_DATA -33
1583#define DRFLAC_TIMEOUT -34
1584#define DRFLAC_NO_NETWORK -35
1585#define DRFLAC_NOT_UNIQUE -36
1586#define DRFLAC_NOT_SOCKET -37
1587#define DRFLAC_NO_ADDRESS -38
1588#define DRFLAC_BAD_PROTOCOL -39
1589#define DRFLAC_PROTOCOL_UNAVAILABLE -40
1590#define DRFLAC_PROTOCOL_NOT_SUPPORTED -41
1591#define DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED -42
1592#define DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED -43
1593#define DRFLAC_SOCKET_NOT_SUPPORTED -44
1594#define DRFLAC_CONNECTION_RESET -45
1595#define DRFLAC_ALREADY_CONNECTED -46
1596#define DRFLAC_NOT_CONNECTED -47
1597#define DRFLAC_CONNECTION_REFUSED -48
1598#define DRFLAC_NO_HOST -49
1599#define DRFLAC_IN_PROGRESS -50
1600#define DRFLAC_CANCELLED -51
1601#define DRFLAC_MEMORY_ALREADY_MAPPED -52
1602#define DRFLAC_AT_END -53
1603
1604#define DRFLAC_CRC_MISMATCH -100
1605/* End Result Codes */
1606
1607
1608#define DRFLAC_SUBFRAME_CONSTANT 0
1609#define DRFLAC_SUBFRAME_VERBATIM 1
1610#define DRFLAC_SUBFRAME_FIXED 8
1611#define DRFLAC_SUBFRAME_LPC 32
1612#define DRFLAC_SUBFRAME_RESERVED 255
1613
1614#define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE 0
1615#define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2 1
1616
1617#define DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT 0
1618#define DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE 8
1619#define DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE 9
1620#define DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE 10
1621
1622#define DRFLAC_SEEKPOINT_SIZE_IN_BYTES 18
1623#define DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES 36
1624#define DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES 12
1625
1626#define drflac_align(x, a) ((((x) + (a) - 1) / (a)) * (a))
1627
1628
1629DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision)
1630{
1631 if (pMajor) {
1632 *pMajor = DRFLAC_VERSION_MAJOR;
1633 }
1634
1635 if (pMinor) {
1636 *pMinor = DRFLAC_VERSION_MINOR;
1637 }
1638
1639 if (pRevision) {
1640 *pRevision = DRFLAC_VERSION_REVISION;
1641 }
1642}
1643
1644DRFLAC_API const char* drflac_version_string(void)
1645{
1646 return DRFLAC_VERSION_STRING;
1647}
1648
1649
1650/* CPU caps. */
1651#if defined(__has_feature)
1652 #if __has_feature(thread_sanitizer)
1653 #define DRFLAC_NO_THREAD_SANITIZE __attribute__((no_sanitize("thread")))
1654 #else
1655 #define DRFLAC_NO_THREAD_SANITIZE
1656 #endif
1657#else
1658 #define DRFLAC_NO_THREAD_SANITIZE
1659#endif
1660
1661#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
1662static drflac_bool32 drflac__gIsLZCNTSupported = DRFLAC_FALSE;
1663#endif
1664
1665#ifndef DRFLAC_NO_CPUID
1666static drflac_bool32 drflac__gIsSSE2Supported = DRFLAC_FALSE;
1667static drflac_bool32 drflac__gIsSSE41Supported = DRFLAC_FALSE;
1668
1669/*
1670I've had a bug report that Clang's ThreadSanitizer presents a warning in this function. Having reviewed this, this does
1671actually make sense. However, since CPU caps should never differ for a running process, I don't think the trade off of
1672complicating internal API's by passing around CPU caps versus just disabling the warnings is worthwhile. I'm therefore
1673just going to disable these warnings. This is disabled via the DRFLAC_NO_THREAD_SANITIZE attribute.
1674*/
1675DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
1676{
1677 static drflac_bool32 isCPUCapsInitialized = DRFLAC_FALSE;
1678
1679 if (!isCPUCapsInitialized) {
1680 /* LZCNT */
1681#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
1682 int info[4] = {0};
1683 drflac__cpuid(info, 0x80000001);
1684 drflac__gIsLZCNTSupported = (info[2] & (1 << 5)) != 0;
1685#endif
1686
1687 /* SSE2 */
1688 drflac__gIsSSE2Supported = drflac_has_sse2();
1689
1690 /* SSE4.1 */
1691 drflac__gIsSSE41Supported = drflac_has_sse41();
1692
1693 /* Initialized. */
1694 isCPUCapsInitialized = DRFLAC_TRUE;
1695 }
1696}
1697#else
1698static drflac_bool32 drflac__gIsNEONSupported = DRFLAC_FALSE;
1699
1700static DRFLAC_INLINE drflac_bool32 drflac__has_neon(void)
1701{
1702#if defined(DRFLAC_SUPPORT_NEON)
1703 #if defined(DRFLAC_ARM) && !defined(DRFLAC_NO_NEON)
1704 #if (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
1705 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate NEON code we can assume support. */
1706 #else
1707 /* TODO: Runtime check. */
1708 return DRFLAC_FALSE;
1709 #endif
1710 #else
1711 return DRFLAC_FALSE; /* NEON is only supported on ARM architectures. */
1712 #endif
1713#else
1714 return DRFLAC_FALSE; /* No compiler support. */
1715#endif
1716}
1717
1718DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
1719{
1720 drflac__gIsNEONSupported = drflac__has_neon();
1721
1722#if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
1723 drflac__gIsLZCNTSupported = DRFLAC_TRUE;
1724#endif
1725}
1726#endif
1727
1728
1729/* Endian Management */
1730static DRFLAC_INLINE drflac_bool32 drflac__is_little_endian(void)
1731{
1732#if defined(DRFLAC_X86) || defined(DRFLAC_X64)
1733 return DRFLAC_TRUE;
1734#elif defined(__BYTE_ORDER) && defined(__LITTLE_ENDIAN) && __BYTE_ORDER == __LITTLE_ENDIAN
1735 return DRFLAC_TRUE;
1736#else
1737 int n = 1;
1738 return (*(char*)&n) == 1;
1739#endif
1740}
1741
1742static DRFLAC_INLINE drflac_uint16 drflac__swap_endian_uint16(drflac_uint16 n)
1743{
1744#ifdef DRFLAC_HAS_BYTESWAP16_INTRINSIC
1745 #if defined(_MSC_VER) && !defined(__clang__)
1746 return _byteswap_ushort(n);
1747 #elif defined(__GNUC__) || defined(__clang__)
1748 return __builtin_bswap16(n);
1749 #elif defined(__WATCOMC__) && defined(__386__)
1750 return _watcom_bswap16(n);
1751 #else
1752 #error "This compiler does not support the byte swap intrinsic."
1753 #endif
1754#else
1755 return ((n & 0xFF00) >> 8) |
1756 ((n & 0x00FF) << 8);
1757#endif
1758}
1759
1760static DRFLAC_INLINE drflac_uint32 drflac__swap_endian_uint32(drflac_uint32 n)
1761{
1762#ifdef DRFLAC_HAS_BYTESWAP32_INTRINSIC
1763 #if defined(_MSC_VER) && !defined(__clang__)
1764 return _byteswap_ulong(n);
1765 #elif defined(__GNUC__) || defined(__clang__)
1766 #if defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 6) && !defined(__ARM_ARCH_6M__) && !defined(DRFLAC_64BIT) /* <-- 64-bit inline assembly has not been tested, so disabling for now. */
1767 /* Inline assembly optimized implementation for ARM. In my testing, GCC does not generate optimized code with __builtin_bswap32(). */
1768 drflac_uint32 r;
1769 __asm__ __volatile__ (
1770 #if defined(DRFLAC_64BIT)
1771 "rev %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(n) /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
1772 #else
1773 "rev %[out], %[in]" : [out]"=r"(r) : [in]"r"(n)
1774 #endif
1775 );
1776 return r;
1777 #else
1778 return __builtin_bswap32(n);
1779 #endif
1780 #elif defined(__WATCOMC__) && defined(__386__)
1781 return _watcom_bswap32(n);
1782 #else
1783 #error "This compiler does not support the byte swap intrinsic."
1784 #endif
1785#else
1786 return ((n & 0xFF000000) >> 24) |
1787 ((n & 0x00FF0000) >> 8) |
1788 ((n & 0x0000FF00) << 8) |
1789 ((n & 0x000000FF) << 24);
1790#endif
1791}
1792
1793static DRFLAC_INLINE drflac_uint64 drflac__swap_endian_uint64(drflac_uint64 n)
1794{
1795#ifdef DRFLAC_HAS_BYTESWAP64_INTRINSIC
1796 #if defined(_MSC_VER) && !defined(__clang__)
1797 return _byteswap_uint64(n);
1798 #elif defined(__GNUC__) || defined(__clang__)
1799 return __builtin_bswap64(n);
1800 #elif defined(__WATCOMC__) && defined(__386__)
1801 return _watcom_bswap64(n);
1802 #else
1803 #error "This compiler does not support the byte swap intrinsic."
1804 #endif
1805#else
1806 /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
1807 return ((n & ((drflac_uint64)0xFF000000 << 32)) >> 56) |
1808 ((n & ((drflac_uint64)0x00FF0000 << 32)) >> 40) |
1809 ((n & ((drflac_uint64)0x0000FF00 << 32)) >> 24) |
1810 ((n & ((drflac_uint64)0x000000FF << 32)) >> 8) |
1811 ((n & ((drflac_uint64)0xFF000000 )) << 8) |
1812 ((n & ((drflac_uint64)0x00FF0000 )) << 24) |
1813 ((n & ((drflac_uint64)0x0000FF00 )) << 40) |
1814 ((n & ((drflac_uint64)0x000000FF )) << 56);
1815#endif
1816}
1817
1818
1819static DRFLAC_INLINE drflac_uint16 drflac__be2host_16(drflac_uint16 n)
1820{
1821 if (drflac__is_little_endian()) {
1822 return drflac__swap_endian_uint16(n);
1823 }
1824
1825 return n;
1826}
1827
1828static DRFLAC_INLINE drflac_uint32 drflac__be2host_32(drflac_uint32 n)
1829{
1830 if (drflac__is_little_endian()) {
1831 return drflac__swap_endian_uint32(n);
1832 }
1833
1834 return n;
1835}
1836
1837static DRFLAC_INLINE drflac_uint32 drflac__be2host_32_ptr_unaligned(const void* pData)
1838{
1839 const drflac_uint8* pNum = (drflac_uint8*)pData;
1840 return *(pNum) << 24 | *(pNum+1) << 16 | *(pNum+2) << 8 | *(pNum+3);
1841}
1842
1843static DRFLAC_INLINE drflac_uint64 drflac__be2host_64(drflac_uint64 n)
1844{
1845 if (drflac__is_little_endian()) {
1846 return drflac__swap_endian_uint64(n);
1847 }
1848
1849 return n;
1850}
1851
1852
1853static DRFLAC_INLINE drflac_uint32 drflac__le2host_32(drflac_uint32 n)
1854{
1855 if (!drflac__is_little_endian()) {
1856 return drflac__swap_endian_uint32(n);
1857 }
1858
1859 return n;
1860}
1861
1862static DRFLAC_INLINE drflac_uint32 drflac__le2host_32_ptr_unaligned(const void* pData)
1863{
1864 const drflac_uint8* pNum = (drflac_uint8*)pData;
1865 return *pNum | *(pNum+1) << 8 | *(pNum+2) << 16 | *(pNum+3) << 24;
1866}
1867
1868
1869static DRFLAC_INLINE drflac_uint32 drflac__unsynchsafe_32(drflac_uint32 n)
1870{
1871 drflac_uint32 result = 0;
1872 result |= (n & 0x7F000000) >> 3;
1873 result |= (n & 0x007F0000) >> 2;
1874 result |= (n & 0x00007F00) >> 1;
1875 result |= (n & 0x0000007F) >> 0;
1876
1877 return result;
1878}
1879
1880
1881
1882/* The CRC code below is based on this document: http://zlib.net/crc_v3.txt */
1883static drflac_uint8 drflac__crc8_table[] = {
1884 0x00, 0x07, 0x0E, 0x09, 0x1C, 0x1B, 0x12, 0x15, 0x38, 0x3F, 0x36, 0x31, 0x24, 0x23, 0x2A, 0x2D,
1885 0x70, 0x77, 0x7E, 0x79, 0x6C, 0x6B, 0x62, 0x65, 0x48, 0x4F, 0x46, 0x41, 0x54, 0x53, 0x5A, 0x5D,
1886 0xE0, 0xE7, 0xEE, 0xE9, 0xFC, 0xFB, 0xF2, 0xF5, 0xD8, 0xDF, 0xD6, 0xD1, 0xC4, 0xC3, 0xCA, 0xCD,
1887 0x90, 0x97, 0x9E, 0x99, 0x8C, 0x8B, 0x82, 0x85, 0xA8, 0xAF, 0xA6, 0xA1, 0xB4, 0xB3, 0xBA, 0xBD,
1888 0xC7, 0xC0, 0xC9, 0xCE, 0xDB, 0xDC, 0xD5, 0xD2, 0xFF, 0xF8, 0xF1, 0xF6, 0xE3, 0xE4, 0xED, 0xEA,
1889 0xB7, 0xB0, 0xB9, 0xBE, 0xAB, 0xAC, 0xA5, 0xA2, 0x8F, 0x88, 0x81, 0x86, 0x93, 0x94, 0x9D, 0x9A,
1890 0x27, 0x20, 0x29, 0x2E, 0x3B, 0x3C, 0x35, 0x32, 0x1F, 0x18, 0x11, 0x16, 0x03, 0x04, 0x0D, 0x0A,
1891 0x57, 0x50, 0x59, 0x5E, 0x4B, 0x4C, 0x45, 0x42, 0x6F, 0x68, 0x61, 0x66, 0x73, 0x74, 0x7D, 0x7A,
1892 0x89, 0x8E, 0x87, 0x80, 0x95, 0x92, 0x9B, 0x9C, 0xB1, 0xB6, 0xBF, 0xB8, 0xAD, 0xAA, 0xA3, 0xA4,
1893 0xF9, 0xFE, 0xF7, 0xF0, 0xE5, 0xE2, 0xEB, 0xEC, 0xC1, 0xC6, 0xCF, 0xC8, 0xDD, 0xDA, 0xD3, 0xD4,
1894 0x69, 0x6E, 0x67, 0x60, 0x75, 0x72, 0x7B, 0x7C, 0x51, 0x56, 0x5F, 0x58, 0x4D, 0x4A, 0x43, 0x44,
1895 0x19, 0x1E, 0x17, 0x10, 0x05, 0x02, 0x0B, 0x0C, 0x21, 0x26, 0x2F, 0x28, 0x3D, 0x3A, 0x33, 0x34,
1896 0x4E, 0x49, 0x40, 0x47, 0x52, 0x55, 0x5C, 0x5B, 0x76, 0x71, 0x78, 0x7F, 0x6A, 0x6D, 0x64, 0x63,
1897 0x3E, 0x39, 0x30, 0x37, 0x22, 0x25, 0x2C, 0x2B, 0x06, 0x01, 0x08, 0x0F, 0x1A, 0x1D, 0x14, 0x13,
1898 0xAE, 0xA9, 0xA0, 0xA7, 0xB2, 0xB5, 0xBC, 0xBB, 0x96, 0x91, 0x98, 0x9F, 0x8A, 0x8D, 0x84, 0x83,
1899 0xDE, 0xD9, 0xD0, 0xD7, 0xC2, 0xC5, 0xCC, 0xCB, 0xE6, 0xE1, 0xE8, 0xEF, 0xFA, 0xFD, 0xF4, 0xF3
1900};
1901
1902static drflac_uint16 drflac__crc16_table[] = {
1903 0x0000, 0x8005, 0x800F, 0x000A, 0x801B, 0x001E, 0x0014, 0x8011,
1904 0x8033, 0x0036, 0x003C, 0x8039, 0x0028, 0x802D, 0x8027, 0x0022,
1905 0x8063, 0x0066, 0x006C, 0x8069, 0x0078, 0x807D, 0x8077, 0x0072,
1906 0x0050, 0x8055, 0x805F, 0x005A, 0x804B, 0x004E, 0x0044, 0x8041,
1907 0x80C3, 0x00C6, 0x00CC, 0x80C9, 0x00D8, 0x80DD, 0x80D7, 0x00D2,
1908 0x00F0, 0x80F5, 0x80FF, 0x00FA, 0x80EB, 0x00EE, 0x00E4, 0x80E1,
1909 0x00A0, 0x80A5, 0x80AF, 0x00AA, 0x80BB, 0x00BE, 0x00B4, 0x80B1,
1910 0x8093, 0x0096, 0x009C, 0x8099, 0x0088, 0x808D, 0x8087, 0x0082,
1911 0x8183, 0x0186, 0x018C, 0x8189, 0x0198, 0x819D, 0x8197, 0x0192,
1912 0x01B0, 0x81B5, 0x81BF, 0x01BA, 0x81AB, 0x01AE, 0x01A4, 0x81A1,
1913 0x01E0, 0x81E5, 0x81EF, 0x01EA, 0x81FB, 0x01FE, 0x01F4, 0x81F1,
1914 0x81D3, 0x01D6, 0x01DC, 0x81D9, 0x01C8, 0x81CD, 0x81C7, 0x01C2,
1915 0x0140, 0x8145, 0x814F, 0x014A, 0x815B, 0x015E, 0x0154, 0x8151,
1916 0x8173, 0x0176, 0x017C, 0x8179, 0x0168, 0x816D, 0x8167, 0x0162,
1917 0x8123, 0x0126, 0x012C, 0x8129, 0x0138, 0x813D, 0x8137, 0x0132,
1918 0x0110, 0x8115, 0x811F, 0x011A, 0x810B, 0x010E, 0x0104, 0x8101,
1919 0x8303, 0x0306, 0x030C, 0x8309, 0x0318, 0x831D, 0x8317, 0x0312,
1920 0x0330, 0x8335, 0x833F, 0x033A, 0x832B, 0x032E, 0x0324, 0x8321,
1921 0x0360, 0x8365, 0x836F, 0x036A, 0x837B, 0x037E, 0x0374, 0x8371,
1922 0x8353, 0x0356, 0x035C, 0x8359, 0x0348, 0x834D, 0x8347, 0x0342,
1923 0x03C0, 0x83C5, 0x83CF, 0x03CA, 0x83DB, 0x03DE, 0x03D4, 0x83D1,
1924 0x83F3, 0x03F6, 0x03FC, 0x83F9, 0x03E8, 0x83ED, 0x83E7, 0x03E2,
1925 0x83A3, 0x03A6, 0x03AC, 0x83A9, 0x03B8, 0x83BD, 0x83B7, 0x03B2,
1926 0x0390, 0x8395, 0x839F, 0x039A, 0x838B, 0x038E, 0x0384, 0x8381,
1927 0x0280, 0x8285, 0x828F, 0x028A, 0x829B, 0x029E, 0x0294, 0x8291,
1928 0x82B3, 0x02B6, 0x02BC, 0x82B9, 0x02A8, 0x82AD, 0x82A7, 0x02A2,
1929 0x82E3, 0x02E6, 0x02EC, 0x82E9, 0x02F8, 0x82FD, 0x82F7, 0x02F2,
1930 0x02D0, 0x82D5, 0x82DF, 0x02DA, 0x82CB, 0x02CE, 0x02C4, 0x82C1,
1931 0x8243, 0x0246, 0x024C, 0x8249, 0x0258, 0x825D, 0x8257, 0x0252,
1932 0x0270, 0x8275, 0x827F, 0x027A, 0x826B, 0x026E, 0x0264, 0x8261,
1933 0x0220, 0x8225, 0x822F, 0x022A, 0x823B, 0x023E, 0x0234, 0x8231,
1934 0x8213, 0x0216, 0x021C, 0x8219, 0x0208, 0x820D, 0x8207, 0x0202
1935};
1936
1937static DRFLAC_INLINE drflac_uint8 drflac_crc8_byte(drflac_uint8 crc, drflac_uint8 data)
1938{
1939 return drflac__crc8_table[crc ^ data];
1940}
1941
1942static DRFLAC_INLINE drflac_uint8 drflac_crc8(drflac_uint8 crc, drflac_uint32 data, drflac_uint32 count)
1943{
1944#ifdef DR_FLAC_NO_CRC
1945 (void)crc;
1946 (void)data;
1947 (void)count;
1948 return 0;
1949#else
1950#if 0
1951 /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc8(crc, 0, 8);") */
1952 drflac_uint8 p = 0x07;
1953 for (int i = count-1; i >= 0; --i) {
1954 drflac_uint8 bit = (data & (1 << i)) >> i;
1955 if (crc & 0x80) {
1956 crc = ((crc << 1) | bit) ^ p;
1957 } else {
1958 crc = ((crc << 1) | bit);
1959 }
1960 }
1961 return crc;
1962#else
1963 drflac_uint32 wholeBytes;
1964 drflac_uint32 leftoverBits;
1965 drflac_uint64 leftoverDataMask;
1966
1967 static drflac_uint64 leftoverDataMaskTable[8] = {
1968 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
1969 };
1970
1971 DRFLAC_ASSERT(count <= 32);
1972
1973 wholeBytes = count >> 3;
1974 leftoverBits = count - (wholeBytes*8);
1975 leftoverDataMask = leftoverDataMaskTable[leftoverBits];
1976
1977 switch (wholeBytes) {
1978 case 4: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
1979 case 3: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
1980 case 2: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
1981 case 1: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
1982 case 0: if (leftoverBits > 0) crc = (drflac_uint8)((crc << leftoverBits) ^ drflac__crc8_table[(crc >> (8 - leftoverBits)) ^ (data & leftoverDataMask)]);
1983 }
1984 return crc;
1985#endif
1986#endif
1987}
1988
1989static DRFLAC_INLINE drflac_uint16 drflac_crc16_byte(drflac_uint16 crc, drflac_uint8 data)
1990{
1991 return (crc << 8) ^ drflac__crc16_table[(drflac_uint8)(crc >> 8) ^ data];
1992}
1993
1994static DRFLAC_INLINE drflac_uint16 drflac_crc16_cache(drflac_uint16 crc, drflac_cache_t data)
1995{
1996#ifdef DRFLAC_64BIT
1997 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
1998 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
1999 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
2000 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
2001#endif
2002 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
2003 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
2004 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 8) & 0xFF));
2005 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 0) & 0xFF));
2006
2007 return crc;
2008}
2009
2010static DRFLAC_INLINE drflac_uint16 drflac_crc16_bytes(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 byteCount)
2011{
2012 switch (byteCount)
2013 {
2014#ifdef DRFLAC_64BIT
2015 case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
2016 case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
2017 case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
2018 case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
2019#endif
2020 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
2021 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
2022 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 8) & 0xFF));
2023 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 0) & 0xFF));
2024 }
2025
2026 return crc;
2027}
2028
2029#if 0
2030static DRFLAC_INLINE drflac_uint16 drflac_crc16__32bit(drflac_uint16 crc, drflac_uint32 data, drflac_uint32 count)
2031{
2032#ifdef DR_FLAC_NO_CRC
2033 (void)crc;
2034 (void)data;
2035 (void)count;
2036 return 0;
2037#else
2038#if 0
2039 /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc16(crc, 0, 16);") */
2040 drflac_uint16 p = 0x8005;
2041 for (int i = count-1; i >= 0; --i) {
2042 drflac_uint16 bit = (data & (1ULL << i)) >> i;
2043 if (r & 0x8000) {
2044 r = ((r << 1) | bit) ^ p;
2045 } else {
2046 r = ((r << 1) | bit);
2047 }
2048 }
2049
2050 return crc;
2051#else
2052 drflac_uint32 wholeBytes;
2053 drflac_uint32 leftoverBits;
2054 drflac_uint64 leftoverDataMask;
2055
2056 static drflac_uint64 leftoverDataMaskTable[8] = {
2057 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2058 };
2059
2060 DRFLAC_ASSERT(count <= 64);
2061
2062 wholeBytes = count >> 3;
2063 leftoverBits = count & 7;
2064 leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2065
2066 switch (wholeBytes) {
2067 default:
2068 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
2069 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
2070 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
2071 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
2072 case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
2073 }
2074 return crc;
2075#endif
2076#endif
2077}
2078
2079static DRFLAC_INLINE drflac_uint16 drflac_crc16__64bit(drflac_uint16 crc, drflac_uint64 data, drflac_uint32 count)
2080{
2081#ifdef DR_FLAC_NO_CRC
2082 (void)crc;
2083 (void)data;
2084 (void)count;
2085 return 0;
2086#else
2087 drflac_uint32 wholeBytes;
2088 drflac_uint32 leftoverBits;
2089 drflac_uint64 leftoverDataMask;
2090
2091 static drflac_uint64 leftoverDataMaskTable[8] = {
2092 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2093 };
2094
2095 DRFLAC_ASSERT(count <= 64);
2096
2097 wholeBytes = count >> 3;
2098 leftoverBits = count & 7;
2099 leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2100
2101 switch (wholeBytes) {
2102 default:
2103 case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 << 32) << leftoverBits)) >> (56 + leftoverBits))); /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
2104 case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 << 32) << leftoverBits)) >> (48 + leftoverBits)));
2105 case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 << 32) << leftoverBits)) >> (40 + leftoverBits)));
2106 case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF << 32) << leftoverBits)) >> (32 + leftoverBits)));
2107 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 ) << leftoverBits)) >> (24 + leftoverBits)));
2108 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 ) << leftoverBits)) >> (16 + leftoverBits)));
2109 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 ) << leftoverBits)) >> ( 8 + leftoverBits)));
2110 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF ) << leftoverBits)) >> ( 0 + leftoverBits)));
2111 case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
2112 }
2113 return crc;
2114#endif
2115}
2116
2117
2118static DRFLAC_INLINE drflac_uint16 drflac_crc16(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 count)
2119{
2120#ifdef DRFLAC_64BIT
2121 return drflac_crc16__64bit(crc, data, count);
2122#else
2123 return drflac_crc16__32bit(crc, data, count);
2124#endif
2125}
2126#endif
2127
2128
2129#ifdef DRFLAC_64BIT
2130#define drflac__be2host__cache_line drflac__be2host_64
2131#else
2132#define drflac__be2host__cache_line drflac__be2host_32
2133#endif
2134
2135/*
2136BIT READING ATTEMPT #2
2137
2138This uses a 32- or 64-bit bit-shifted cache - as bits are read, the cache is shifted such that the first valid bit is sitting
2139on the most significant bit. It uses the notion of an L1 and L2 cache (borrowed from CPU architecture), where the L1 cache
2140is a 32- or 64-bit unsigned integer (depending on whether or not a 32- or 64-bit build is being compiled) and the L2 is an
2141array of "cache lines", with each cache line being the same size as the L1. The L2 is a buffer of about 4KB and is where data
2142from onRead() is read into.
2143*/
2144#define DRFLAC_CACHE_L1_SIZE_BYTES(bs) (sizeof((bs)->cache))
2145#define DRFLAC_CACHE_L1_SIZE_BITS(bs) (sizeof((bs)->cache)*8)
2146#define DRFLAC_CACHE_L1_BITS_REMAINING(bs) (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (bs)->consumedBits)
2147#define DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount) (~((~(drflac_cache_t)0) >> (_bitCount)))
2148#define DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, _bitCount) (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (_bitCount))
2149#define DRFLAC_CACHE_L1_SELECT(bs, _bitCount) (((bs)->cache) & DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount))
2150#define DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, _bitCount) (DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)))
2151#define DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, _bitCount)(DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> (DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)) & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1)))
2152#define DRFLAC_CACHE_L2_SIZE_BYTES(bs) (sizeof((bs)->cacheL2))
2153#define DRFLAC_CACHE_L2_LINE_COUNT(bs) (DRFLAC_CACHE_L2_SIZE_BYTES(bs) / sizeof((bs)->cacheL2[0]))
2154#define DRFLAC_CACHE_L2_LINES_REMAINING(bs) (DRFLAC_CACHE_L2_LINE_COUNT(bs) - (bs)->nextL2Line)
2155
2156
2157#ifndef DR_FLAC_NO_CRC
2158static DRFLAC_INLINE void drflac__reset_crc16(drflac_bs* bs)
2159{
2160 bs->crc16 = 0;
2161 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2162}
2163
2164static DRFLAC_INLINE void drflac__update_crc16(drflac_bs* bs)
2165{
2166 if (bs->crc16CacheIgnoredBytes == 0) {
2167 bs->crc16 = drflac_crc16_cache(bs->crc16, bs->crc16Cache);
2168 } else {
2169 bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache, DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bs->crc16CacheIgnoredBytes);
2170 bs->crc16CacheIgnoredBytes = 0;
2171 }
2172}
2173
2174static DRFLAC_INLINE drflac_uint16 drflac__flush_crc16(drflac_bs* bs)
2175{
2176 /* We should never be flushing in a situation where we are not aligned on a byte boundary. */
2177 DRFLAC_ASSERT((DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7) == 0);
2178
2179 /*
2180 The bits that were read from the L1 cache need to be accumulated. The number of bytes needing to be accumulated is determined
2181 by the number of bits that have been consumed.
2182 */
2183 if (DRFLAC_CACHE_L1_BITS_REMAINING(bs) == 0) {
2184 drflac__update_crc16(bs);
2185 } else {
2186 /* We only accumulate the consumed bits. */
2187 bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache >> DRFLAC_CACHE_L1_BITS_REMAINING(bs), (bs->consumedBits >> 3) - bs->crc16CacheIgnoredBytes);
2188
2189 /*
2190 The bits that we just accumulated should never be accumulated again. We need to keep track of how many bytes were accumulated
2191 so we can handle that later.
2192 */
2193 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2194 }
2195
2196 return bs->crc16;
2197}
2198#endif
2199
2200static DRFLAC_INLINE drflac_bool32 drflac__reload_l1_cache_from_l2(drflac_bs* bs)
2201{
2202 size_t bytesRead;
2203 size_t alignedL1LineCount;
2204
2205 /* Fast path. Try loading straight from L2. */
2206 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
2207 bs->cache = bs->cacheL2[bs->nextL2Line++];
2208 return DRFLAC_TRUE;
2209 }
2210
2211 /*
2212 If we get here it means we've run out of data in the L2 cache. We'll need to fetch more from the client, if there's
2213 any left.
2214 */
2215 if (bs->unalignedByteCount > 0) {
2216 return DRFLAC_FALSE; /* If we have any unaligned bytes it means there's no more aligned bytes left in the client. */
2217 }
2218
2219 bytesRead = bs->onRead(bs->pUserData, bs->cacheL2, DRFLAC_CACHE_L2_SIZE_BYTES(bs));
2220
2221 bs->nextL2Line = 0;
2222 if (bytesRead == DRFLAC_CACHE_L2_SIZE_BYTES(bs)) {
2223 bs->cache = bs->cacheL2[bs->nextL2Line++];
2224 return DRFLAC_TRUE;
2225 }
2226
2227
2228 /*
2229 If we get here it means we were unable to retrieve enough data to fill the entire L2 cache. It probably
2230 means we've just reached the end of the file. We need to move the valid data down to the end of the buffer
2231 and adjust the index of the next line accordingly. Also keep in mind that the L2 cache must be aligned to
2232 the size of the L1 so we'll need to seek backwards by any misaligned bytes.
2233 */
2234 alignedL1LineCount = bytesRead / DRFLAC_CACHE_L1_SIZE_BYTES(bs);
2235
2236 /* We need to keep track of any unaligned bytes for later use. */
2237 bs->unalignedByteCount = bytesRead - (alignedL1LineCount * DRFLAC_CACHE_L1_SIZE_BYTES(bs));
2238 if (bs->unalignedByteCount > 0) {
2239 bs->unalignedCache = bs->cacheL2[alignedL1LineCount];
2240 }
2241
2242 if (alignedL1LineCount > 0) {
2243 size_t offset = DRFLAC_CACHE_L2_LINE_COUNT(bs) - alignedL1LineCount;
2244 size_t i;
2245 for (i = alignedL1LineCount; i > 0; --i) {
2246 bs->cacheL2[i-1 + offset] = bs->cacheL2[i-1];
2247 }
2248
2249 bs->nextL2Line = (drflac_uint32)offset;
2250 bs->cache = bs->cacheL2[bs->nextL2Line++];
2251 return DRFLAC_TRUE;
2252 } else {
2253 /* If we get into this branch it means we weren't able to load any L1-aligned data. */
2254 bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs);
2255 return DRFLAC_FALSE;
2256 }
2257}
2258
2259static drflac_bool32 drflac__reload_cache(drflac_bs* bs)
2260{
2261 size_t bytesRead;
2262
2263#ifndef DR_FLAC_NO_CRC
2264 drflac__update_crc16(bs);
2265#endif
2266
2267 /* Fast path. Try just moving the next value in the L2 cache to the L1 cache. */
2268 if (drflac__reload_l1_cache_from_l2(bs)) {
2269 bs->cache = drflac__be2host__cache_line(bs->cache);
2270 bs->consumedBits = 0;
2271#ifndef DR_FLAC_NO_CRC
2272 bs->crc16Cache = bs->cache;
2273#endif
2274 return DRFLAC_TRUE;
2275 }
2276
2277 /* Slow path. */
2278
2279 /*
2280 If we get here it means we have failed to load the L1 cache from the L2. Likely we've just reached the end of the stream and the last
2281 few bytes did not meet the alignment requirements for the L2 cache. In this case we need to fall back to a slower path and read the
2282 data from the unaligned cache.
2283 */
2284 bytesRead = bs->unalignedByteCount;
2285 if (bytesRead == 0) {
2286 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs); /* <-- The stream has been exhausted, so marked the bits as consumed. */
2287 return DRFLAC_FALSE;
2288 }
2289
2290 DRFLAC_ASSERT(bytesRead < DRFLAC_CACHE_L1_SIZE_BYTES(bs));
2291 bs->consumedBits = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bytesRead) * 8;
2292
2293 bs->cache = drflac__be2host__cache_line(bs->unalignedCache);
2294 bs->cache &= DRFLAC_CACHE_L1_SELECTION_MASK(DRFLAC_CACHE_L1_BITS_REMAINING(bs)); /* <-- Make sure the consumed bits are always set to zero. Other parts of the library depend on this property. */
2295 bs->unalignedByteCount = 0; /* <-- At this point the unaligned bytes have been moved into the cache and we thus have no more unaligned bytes. */
2296
2297#ifndef DR_FLAC_NO_CRC
2298 bs->crc16Cache = bs->cache >> bs->consumedBits;
2299 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2300#endif
2301 return DRFLAC_TRUE;
2302}
2303
2304static void drflac__reset_cache(drflac_bs* bs)
2305{
2306 bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs); /* <-- This clears the L2 cache. */
2307 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs); /* <-- This clears the L1 cache. */
2308 bs->cache = 0;
2309 bs->unalignedByteCount = 0; /* <-- This clears the trailing unaligned bytes. */
2310 bs->unalignedCache = 0;
2311
2312#ifndef DR_FLAC_NO_CRC
2313 bs->crc16Cache = 0;
2314 bs->crc16CacheIgnoredBytes = 0;
2315#endif
2316}
2317
2318
2319static DRFLAC_INLINE drflac_bool32 drflac__read_uint32(drflac_bs* bs, unsigned int bitCount, drflac_uint32* pResultOut)
2320{
2321 DRFLAC_ASSERT(bs != NULL);
2322 DRFLAC_ASSERT(pResultOut != NULL);
2323 DRFLAC_ASSERT(bitCount > 0);
2324 DRFLAC_ASSERT(bitCount <= 32);
2325
2326 if (bs->consumedBits == DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2327 if (!drflac__reload_cache(bs)) {
2328 return DRFLAC_FALSE;
2329 }
2330 }
2331
2332 if (bitCount <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2333 /*
2334 If we want to load all 32-bits from a 32-bit cache we need to do it slightly differently because we can't do
2335 a 32-bit shift on a 32-bit integer. This will never be the case on 64-bit caches, so we can have a slightly
2336 more optimal solution for this.
2337 */
2338#ifdef DRFLAC_64BIT
2339 *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
2340 bs->consumedBits += bitCount;
2341 bs->cache <<= bitCount;
2342#else
2343 if (bitCount < DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2344 *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
2345 bs->consumedBits += bitCount;
2346 bs->cache <<= bitCount;
2347 } else {
2348 /* Cannot shift by 32-bits, so need to do it differently. */
2349 *pResultOut = (drflac_uint32)bs->cache;
2350 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs);
2351 bs->cache = 0;
2352 }
2353#endif
2354
2355 return DRFLAC_TRUE;
2356 } else {
2357 /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
2358 drflac_uint32 bitCountHi = DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2359 drflac_uint32 bitCountLo = bitCount - bitCountHi;
2360 drflac_uint32 resultHi;
2361
2362 DRFLAC_ASSERT(bitCountHi > 0);
2363 DRFLAC_ASSERT(bitCountHi < 32);
2364 resultHi = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountHi);
2365
2366 if (!drflac__reload_cache(bs)) {
2367 return DRFLAC_FALSE;
2368 }
2369 if (bitCountLo > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2370 /* This happens when we get to end of stream */
2371 return DRFLAC_FALSE;
2372 }
2373
2374 *pResultOut = (resultHi << bitCountLo) | (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountLo);
2375 bs->consumedBits += bitCountLo;
2376 bs->cache <<= bitCountLo;
2377 return DRFLAC_TRUE;
2378 }
2379}
2380
2381static drflac_bool32 drflac__read_int32(drflac_bs* bs, unsigned int bitCount, drflac_int32* pResult)
2382{
2383 drflac_uint32 result;
2384
2385 DRFLAC_ASSERT(bs != NULL);
2386 DRFLAC_ASSERT(pResult != NULL);
2387 DRFLAC_ASSERT(bitCount > 0);
2388 DRFLAC_ASSERT(bitCount <= 32);
2389
2390 if (!drflac__read_uint32(bs, bitCount, &result)) {
2391 return DRFLAC_FALSE;
2392 }
2393
2394 /* Do not attempt to shift by 32 as it's undefined. */
2395 if (bitCount < 32) {
2396 drflac_uint32 signbit;
2397 signbit = ((result >> (bitCount-1)) & 0x01);
2398 result |= (~signbit + 1) << bitCount;
2399 }
2400
2401 *pResult = (drflac_int32)result;
2402 return DRFLAC_TRUE;
2403}
2404
2405#ifdef DRFLAC_64BIT
2406static drflac_bool32 drflac__read_uint64(drflac_bs* bs, unsigned int bitCount, drflac_uint64* pResultOut)
2407{
2408 drflac_uint32 resultHi;
2409 drflac_uint32 resultLo;
2410
2411 DRFLAC_ASSERT(bitCount <= 64);
2412 DRFLAC_ASSERT(bitCount > 32);
2413
2414 if (!drflac__read_uint32(bs, bitCount - 32, &resultHi)) {
2415 return DRFLAC_FALSE;
2416 }
2417
2418 if (!drflac__read_uint32(bs, 32, &resultLo)) {
2419 return DRFLAC_FALSE;
2420 }
2421
2422 *pResultOut = (((drflac_uint64)resultHi) << 32) | ((drflac_uint64)resultLo);
2423 return DRFLAC_TRUE;
2424}
2425#endif
2426
2427/* Function below is unused, but leaving it here in case I need to quickly add it again. */
2428#if 0
2429static drflac_bool32 drflac__read_int64(drflac_bs* bs, unsigned int bitCount, drflac_int64* pResultOut)
2430{
2431 drflac_uint64 result;
2432 drflac_uint64 signbit;
2433
2434 DRFLAC_ASSERT(bitCount <= 64);
2435
2436 if (!drflac__read_uint64(bs, bitCount, &result)) {
2437 return DRFLAC_FALSE;
2438 }
2439
2440 signbit = ((result >> (bitCount-1)) & 0x01);
2441 result |= (~signbit + 1) << bitCount;
2442
2443 *pResultOut = (drflac_int64)result;
2444 return DRFLAC_TRUE;
2445}
2446#endif
2447
2448static drflac_bool32 drflac__read_uint16(drflac_bs* bs, unsigned int bitCount, drflac_uint16* pResult)
2449{
2450 drflac_uint32 result;
2451
2452 DRFLAC_ASSERT(bs != NULL);
2453 DRFLAC_ASSERT(pResult != NULL);
2454 DRFLAC_ASSERT(bitCount > 0);
2455 DRFLAC_ASSERT(bitCount <= 16);
2456
2457 if (!drflac__read_uint32(bs, bitCount, &result)) {
2458 return DRFLAC_FALSE;
2459 }
2460
2461 *pResult = (drflac_uint16)result;
2462 return DRFLAC_TRUE;
2463}
2464
2465#if 0
2466static drflac_bool32 drflac__read_int16(drflac_bs* bs, unsigned int bitCount, drflac_int16* pResult)
2467{
2468 drflac_int32 result;
2469
2470 DRFLAC_ASSERT(bs != NULL);
2471 DRFLAC_ASSERT(pResult != NULL);
2472 DRFLAC_ASSERT(bitCount > 0);
2473 DRFLAC_ASSERT(bitCount <= 16);
2474
2475 if (!drflac__read_int32(bs, bitCount, &result)) {
2476 return DRFLAC_FALSE;
2477 }
2478
2479 *pResult = (drflac_int16)result;
2480 return DRFLAC_TRUE;
2481}
2482#endif
2483
2484static drflac_bool32 drflac__read_uint8(drflac_bs* bs, unsigned int bitCount, drflac_uint8* pResult)
2485{
2486 drflac_uint32 result;
2487
2488 DRFLAC_ASSERT(bs != NULL);
2489 DRFLAC_ASSERT(pResult != NULL);
2490 DRFLAC_ASSERT(bitCount > 0);
2491 DRFLAC_ASSERT(bitCount <= 8);
2492
2493 if (!drflac__read_uint32(bs, bitCount, &result)) {
2494 return DRFLAC_FALSE;
2495 }
2496
2497 *pResult = (drflac_uint8)result;
2498 return DRFLAC_TRUE;
2499}
2500
2501static drflac_bool32 drflac__read_int8(drflac_bs* bs, unsigned int bitCount, drflac_int8* pResult)
2502{
2503 drflac_int32 result;
2504
2505 DRFLAC_ASSERT(bs != NULL);
2506 DRFLAC_ASSERT(pResult != NULL);
2507 DRFLAC_ASSERT(bitCount > 0);
2508 DRFLAC_ASSERT(bitCount <= 8);
2509
2510 if (!drflac__read_int32(bs, bitCount, &result)) {
2511 return DRFLAC_FALSE;
2512 }
2513
2514 *pResult = (drflac_int8)result;
2515 return DRFLAC_TRUE;
2516}
2517
2518
2519static drflac_bool32 drflac__seek_bits(drflac_bs* bs, size_t bitsToSeek)
2520{
2521 if (bitsToSeek <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2522 bs->consumedBits += (drflac_uint32)bitsToSeek;
2523 bs->cache <<= bitsToSeek;
2524 return DRFLAC_TRUE;
2525 } else {
2526 /* It straddles the cached data. This function isn't called too frequently so I'm favouring simplicity here. */
2527 bitsToSeek -= DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2528 bs->consumedBits += DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2529 bs->cache = 0;
2530
2531 /* Simple case. Seek in groups of the same number as bits that fit within a cache line. */
2532#ifdef DRFLAC_64BIT
2533 while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2534 drflac_uint64 bin;
2535 if (!drflac__read_uint64(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
2536 return DRFLAC_FALSE;
2537 }
2538 bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
2539 }
2540#else
2541 while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2542 drflac_uint32 bin;
2543 if (!drflac__read_uint32(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
2544 return DRFLAC_FALSE;
2545 }
2546 bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
2547 }
2548#endif
2549
2550 /* Whole leftover bytes. */
2551 while (bitsToSeek >= 8) {
2552 drflac_uint8 bin;
2553 if (!drflac__read_uint8(bs, 8, &bin)) {
2554 return DRFLAC_FALSE;
2555 }
2556 bitsToSeek -= 8;
2557 }
2558
2559 /* Leftover bits. */
2560 if (bitsToSeek > 0) {
2561 drflac_uint8 bin;
2562 if (!drflac__read_uint8(bs, (drflac_uint32)bitsToSeek, &bin)) {
2563 return DRFLAC_FALSE;
2564 }
2565 bitsToSeek = 0; /* <-- Necessary for the assert below. */
2566 }
2567
2568 DRFLAC_ASSERT(bitsToSeek == 0);
2569 return DRFLAC_TRUE;
2570 }
2571}
2572
2573
2574/* This function moves the bit streamer to the first bit after the sync code (bit 15 of the of the frame header). It will also update the CRC-16. */
2575static drflac_bool32 drflac__find_and_seek_to_next_sync_code(drflac_bs* bs)
2576{
2577 DRFLAC_ASSERT(bs != NULL);
2578
2579 /*
2580 The sync code is always aligned to 8 bits. This is convenient for us because it means we can do byte-aligned movements. The first
2581 thing to do is align to the next byte.
2582 */
2583 if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
2584 return DRFLAC_FALSE;
2585 }
2586
2587 for (;;) {
2588 drflac_uint8 hi;
2589
2590#ifndef DR_FLAC_NO_CRC
2591 drflac__reset_crc16(bs);
2592#endif
2593
2594 if (!drflac__read_uint8(bs, 8, &hi)) {
2595 return DRFLAC_FALSE;
2596 }
2597
2598 if (hi == 0xFF) {
2599 drflac_uint8 lo;
2600 if (!drflac__read_uint8(bs, 6, &lo)) {
2601 return DRFLAC_FALSE;
2602 }
2603
2604 if (lo == 0x3E) {
2605 return DRFLAC_TRUE;
2606 } else {
2607 if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
2608 return DRFLAC_FALSE;
2609 }
2610 }
2611 }
2612 }
2613
2614 /* Should never get here. */
2615 /*return DRFLAC_FALSE;*/
2616}
2617
2618
2619#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
2620#define DRFLAC_IMPLEMENT_CLZ_LZCNT
2621#endif
2622#if defined(_MSC_VER) && _MSC_VER >= 1400 && (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(__clang__)
2623#define DRFLAC_IMPLEMENT_CLZ_MSVC
2624#endif
2625#if defined(__WATCOMC__) && defined(__386__)
2626#define DRFLAC_IMPLEMENT_CLZ_WATCOM
2627#endif
2628#ifdef __MRC__
2629#include <intrinsics.h>
2630#define DRFLAC_IMPLEMENT_CLZ_MRC
2631#endif
2632
2633static DRFLAC_INLINE drflac_uint32 drflac__clz_software(drflac_cache_t x)
2634{
2635 drflac_uint32 n;
2636 static drflac_uint32 clz_table_4[] = {
2637 0,
2638 4,
2639 3, 3,
2640 2, 2, 2, 2,
2641 1, 1, 1, 1, 1, 1, 1, 1
2642 };
2643
2644 if (x == 0) {
2645 return sizeof(x)*8;
2646 }
2647
2648 n = clz_table_4[x >> (sizeof(x)*8 - 4)];
2649 if (n == 0) {
2650#ifdef DRFLAC_64BIT
2651 if ((x & ((drflac_uint64)0xFFFFFFFF << 32)) == 0) { n = 32; x <<= 32; }
2652 if ((x & ((drflac_uint64)0xFFFF0000 << 32)) == 0) { n += 16; x <<= 16; }
2653 if ((x & ((drflac_uint64)0xFF000000 << 32)) == 0) { n += 8; x <<= 8; }
2654 if ((x & ((drflac_uint64)0xF0000000 << 32)) == 0) { n += 4; x <<= 4; }
2655#else
2656 if ((x & 0xFFFF0000) == 0) { n = 16; x <<= 16; }
2657 if ((x & 0xFF000000) == 0) { n += 8; x <<= 8; }
2658 if ((x & 0xF0000000) == 0) { n += 4; x <<= 4; }
2659#endif
2660 n += clz_table_4[x >> (sizeof(x)*8 - 4)];
2661 }
2662
2663 return n - 1;
2664}
2665
2666#ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
2667static DRFLAC_INLINE drflac_bool32 drflac__is_lzcnt_supported(void)
2668{
2669 /* Fast compile time check for ARM. */
2670#if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
2671 return DRFLAC_TRUE;
2672#elif defined(__MRC__)
2673 return DRFLAC_TRUE;
2674#else
2675 /* If the compiler itself does not support the intrinsic then we'll need to return false. */
2676 #ifdef DRFLAC_HAS_LZCNT_INTRINSIC
2677 return drflac__gIsLZCNTSupported;
2678 #else
2679 return DRFLAC_FALSE;
2680 #endif
2681#endif
2682}
2683
2684static DRFLAC_INLINE drflac_uint32 drflac__clz_lzcnt(drflac_cache_t x)
2685{
2686 /*
2687 It's critical for competitive decoding performance that this function be highly optimal. With MSVC we can use the __lzcnt64() and __lzcnt() intrinsics
2688 to achieve good performance, however on GCC and Clang it's a little bit more annoying. The __builtin_clzl() and __builtin_clzll() intrinsics leave
2689 it undefined as to the return value when `x` is 0. We need this to be well defined as returning 32 or 64, depending on whether or not it's a 32- or
2690 64-bit build. To work around this we would need to add a conditional to check for the x = 0 case, but this creates unnecessary inefficiency. To work
2691 around this problem I have written some inline assembly to emit the LZCNT (x86) or CLZ (ARM) instruction directly which removes the need to include
2692 the conditional. This has worked well in the past, but for some reason Clang's MSVC compatible driver, clang-cl, does not seem to be handling this
2693 in the same way as the normal Clang driver. It seems that `clang-cl` is just outputting the wrong results sometimes, maybe due to some register
2694 getting clobbered?
2695
2696 I'm not sure if this is a bug with dr_flac's inlined assembly (most likely), a bug in `clang-cl` or just a misunderstanding on my part with inline
2697 assembly rules for `clang-cl`. If somebody can identify an error in dr_flac's inlined assembly I'm happy to get that fixed.
2698
2699 Fortunately there is an easy workaround for this. Clang implements MSVC-specific intrinsics for compatibility. It also defines _MSC_VER for extra
2700 compatibility. We can therefore just check for _MSC_VER and use the MSVC intrinsic which, fortunately for us, Clang supports. It would still be nice
2701 to know how to fix the inlined assembly for correctness sake, however.
2702 */
2703
2704#if defined(_MSC_VER) /*&& !defined(__clang__)*/ /* <-- Intentionally wanting Clang to use the MSVC __lzcnt64/__lzcnt intrinsics due to above ^. */
2705 #ifdef DRFLAC_64BIT
2706 return (drflac_uint32)__lzcnt64(x);
2707 #else
2708 return (drflac_uint32)__lzcnt(x);
2709 #endif
2710#else
2711 #if defined(__GNUC__) || defined(__clang__)
2712 #if defined(DRFLAC_X64)
2713 {
2714 drflac_uint64 r;
2715 __asm__ __volatile__ (
2716 "lzcnt{ %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
2717 );
2718
2719 return (drflac_uint32)r;
2720 }
2721 #elif defined(DRFLAC_X86)
2722 {
2723 drflac_uint32 r;
2724 __asm__ __volatile__ (
2725 "lzcnt{l %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
2726 );
2727
2728 return r;
2729 }
2730 #elif defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5) && !defined(__ARM_ARCH_6M__) && !defined(DRFLAC_64BIT) /* <-- I haven't tested 64-bit inline assembly, so only enabling this for the 32-bit build for now. */
2731 {
2732 unsigned int r;
2733 __asm__ __volatile__ (
2734 #if defined(DRFLAC_64BIT)
2735 "clz %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(x) /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
2736 #else
2737 "clz %[out], %[in]" : [out]"=r"(r) : [in]"r"(x)
2738 #endif
2739 );
2740
2741 return r;
2742 }
2743 #else
2744 if (x == 0) {
2745 return sizeof(x)*8;
2746 }
2747 #ifdef DRFLAC_64BIT
2748 return (drflac_uint32)__builtin_clzll((drflac_uint64)x);
2749 #else
2750 return (drflac_uint32)__builtin_clzl((drflac_uint32)x);
2751 #endif
2752 #endif
2753 #else
2754 /* Unsupported compiler. */
2755 #error "This compiler does not support the lzcnt intrinsic."
2756 #endif
2757#endif
2758}
2759#endif
2760
2761#ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
2762#include <intrin.h> /* For BitScanReverse(). */
2763
2764static DRFLAC_INLINE drflac_uint32 drflac__clz_msvc(drflac_cache_t x)
2765{
2766 drflac_uint32 n;
2767
2768 if (x == 0) {
2769 return sizeof(x)*8;
2770 }
2771
2772#ifdef DRFLAC_64BIT
2773 _BitScanReverse64((unsigned long*)&n, x);
2774#else
2775 _BitScanReverse((unsigned long*)&n, x);
2776#endif
2777 return sizeof(x)*8 - n - 1;
2778}
2779#endif
2780
2781#ifdef DRFLAC_IMPLEMENT_CLZ_WATCOM
2782static __inline drflac_uint32 drflac__clz_watcom (drflac_uint32);
2783#ifdef DRFLAC_IMPLEMENT_CLZ_WATCOM_LZCNT
2784/* Use the LZCNT instruction (only available on some processors since the 2010s). */
2785#pragma aux drflac__clz_watcom_lzcnt = \
2786 "db 0F3h, 0Fh, 0BDh, 0C0h" /* lzcnt eax, eax */ \
2787 parm [eax] \
2788 value [eax] \
2789 modify nomemory;
2790#else
2791/* Use the 386+-compatible implementation. */
2792#pragma aux drflac__clz_watcom = \
2793 "bsr eax, eax" \
2794 "xor eax, 31" \
2795 parm [eax] nomemory \
2796 value [eax] \
2797 modify exact [eax] nomemory;
2798#endif
2799#endif
2800
2801static DRFLAC_INLINE drflac_uint32 drflac__clz(drflac_cache_t x)
2802{
2803#ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
2804 if (drflac__is_lzcnt_supported()) {
2805 return drflac__clz_lzcnt(x);
2806 } else
2807#endif
2808 {
2809#ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
2810 return drflac__clz_msvc(x);
2811#elif defined(DRFLAC_IMPLEMENT_CLZ_WATCOM_LZCNT)
2812 return drflac__clz_watcom_lzcnt(x);
2813#elif defined(DRFLAC_IMPLEMENT_CLZ_WATCOM)
2814 return (x == 0) ? sizeof(x)*8 : drflac__clz_watcom(x);
2815#elif defined(__MRC__)
2816 return __cntlzw(x);
2817#else
2818 return drflac__clz_software(x);
2819#endif
2820 }
2821}
2822
2823
2824static DRFLAC_INLINE drflac_bool32 drflac__seek_past_next_set_bit(drflac_bs* bs, unsigned int* pOffsetOut)
2825{
2826 drflac_uint32 zeroCounter = 0;
2827 drflac_uint32 setBitOffsetPlus1;
2828
2829 while (bs->cache == 0) {
2830 zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2831 if (!drflac__reload_cache(bs)) {
2832 return DRFLAC_FALSE;
2833 }
2834 }
2835
2836 if (bs->cache == 1) {
2837 /* Not catching this would lead to undefined behaviour: a shift of a 32-bit number by 32 or more is undefined */
2838 *pOffsetOut = zeroCounter + (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs) - 1;
2839 if (!drflac__reload_cache(bs)) {
2840 return DRFLAC_FALSE;
2841 }
2842
2843 return DRFLAC_TRUE;
2844 }
2845
2846 setBitOffsetPlus1 = drflac__clz(bs->cache);
2847 setBitOffsetPlus1 += 1;
2848
2849 if (setBitOffsetPlus1 > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2850 /* This happens when we get to end of stream */
2851 return DRFLAC_FALSE;
2852 }
2853
2854 bs->consumedBits += setBitOffsetPlus1;
2855 bs->cache <<= setBitOffsetPlus1;
2856
2857 *pOffsetOut = zeroCounter + setBitOffsetPlus1 - 1;
2858 return DRFLAC_TRUE;
2859}
2860
2861
2862
2863static drflac_bool32 drflac__seek_to_byte(drflac_bs* bs, drflac_uint64 offsetFromStart)
2864{
2865 DRFLAC_ASSERT(bs != NULL);
2866 DRFLAC_ASSERT(offsetFromStart > 0);
2867
2868 /*
2869 Seeking from the start is not quite as trivial as it sounds because the onSeek callback takes a signed 32-bit integer (which
2870 is intentional because it simplifies the implementation of the onSeek callbacks), however offsetFromStart is unsigned 64-bit.
2871 To resolve we just need to do an initial seek from the start, and then a series of offset seeks to make up the remainder.
2872 */
2873 if (offsetFromStart > 0x7FFFFFFF) {
2874 drflac_uint64 bytesRemaining = offsetFromStart;
2875 if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, DRFLAC_SEEK_SET)) {
2876 return DRFLAC_FALSE;
2877 }
2878 bytesRemaining -= 0x7FFFFFFF;
2879
2880 while (bytesRemaining > 0x7FFFFFFF) {
2881 if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, DRFLAC_SEEK_CUR)) {
2882 return DRFLAC_FALSE;
2883 }
2884 bytesRemaining -= 0x7FFFFFFF;
2885 }
2886
2887 if (bytesRemaining > 0) {
2888 if (!bs->onSeek(bs->pUserData, (int)bytesRemaining, DRFLAC_SEEK_CUR)) {
2889 return DRFLAC_FALSE;
2890 }
2891 }
2892 } else {
2893 if (!bs->onSeek(bs->pUserData, (int)offsetFromStart, DRFLAC_SEEK_SET)) {
2894 return DRFLAC_FALSE;
2895 }
2896 }
2897
2898 /* The cache should be reset to force a reload of fresh data from the client. */
2899 drflac__reset_cache(bs);
2900 return DRFLAC_TRUE;
2901}
2902
2903
2904static drflac_result drflac__read_utf8_coded_number(drflac_bs* bs, drflac_uint64* pNumberOut, drflac_uint8* pCRCOut)
2905{
2906 drflac_uint8 crc;
2907 drflac_uint64 result;
2908 drflac_uint8 utf8[7] = {0};
2909 int byteCount;
2910 int i;
2911
2912 DRFLAC_ASSERT(bs != NULL);
2913 DRFLAC_ASSERT(pNumberOut != NULL);
2914 DRFLAC_ASSERT(pCRCOut != NULL);
2915
2916 crc = *pCRCOut;
2917
2918 if (!drflac__read_uint8(bs, 8, utf8)) {
2919 *pNumberOut = 0;
2920 return DRFLAC_AT_END;
2921 }
2922 crc = drflac_crc8(crc, utf8[0], 8);
2923
2924 if ((utf8[0] & 0x80) == 0) {
2925 *pNumberOut = utf8[0];
2926 *pCRCOut = crc;
2927 return DRFLAC_SUCCESS;
2928 }
2929
2930 /*byteCount = 1;*/
2931 if ((utf8[0] & 0xE0) == 0xC0) {
2932 byteCount = 2;
2933 } else if ((utf8[0] & 0xF0) == 0xE0) {
2934 byteCount = 3;
2935 } else if ((utf8[0] & 0xF8) == 0xF0) {
2936 byteCount = 4;
2937 } else if ((utf8[0] & 0xFC) == 0xF8) {
2938 byteCount = 5;
2939 } else if ((utf8[0] & 0xFE) == 0xFC) {
2940 byteCount = 6;
2941 } else if ((utf8[0] & 0xFF) == 0xFE) {
2942 byteCount = 7;
2943 } else {
2944 *pNumberOut = 0;
2945 return DRFLAC_CRC_MISMATCH; /* Bad UTF-8 encoding. */
2946 }
2947
2948 /* Read extra bytes. */
2949 DRFLAC_ASSERT(byteCount > 1);
2950
2951 result = (drflac_uint64)(utf8[0] & (0xFF >> (byteCount + 1)));
2952 for (i = 1; i < byteCount; ++i) {
2953 if (!drflac__read_uint8(bs, 8, utf8 + i)) {
2954 *pNumberOut = 0;
2955 return DRFLAC_AT_END;
2956 }
2957 crc = drflac_crc8(crc, utf8[i], 8);
2958
2959 result = (result << 6) | (utf8[i] & 0x3F);
2960 }
2961
2962 *pNumberOut = result;
2963 *pCRCOut = crc;
2964 return DRFLAC_SUCCESS;
2965}
2966
2967
2968static DRFLAC_INLINE drflac_uint32 drflac__ilog2_u32(drflac_uint32 x)
2969{
2970#if 1 /* Needs optimizing. */
2971 drflac_uint32 result = 0;
2972 while (x > 0) {
2973 result += 1;
2974 x >>= 1;
2975 }
2976
2977 return result;
2978#endif
2979}
2980
2981static DRFLAC_INLINE drflac_bool32 drflac__use_64_bit_prediction(drflac_uint32 bitsPerSample, drflac_uint32 order, drflac_uint32 precision)
2982{
2983 /* https://web.archive.org/web/20220205005724/https://github.com/ietf-wg-cellar/flac-specification/blob/37a49aa48ba4ba12e8757badfc59c0df35435fec/rfc_backmatter.md */
2984 return bitsPerSample + precision + drflac__ilog2_u32(order) > 32;
2985}
2986
2987
2988/*
2989The next two functions are responsible for calculating the prediction.
2990
2991When the bits per sample is >16 we need to use 64-bit integer arithmetic because otherwise we'll run out of precision. It's
2992safe to assume this will be slower on 32-bit platforms so we use a more optimal solution when the bits per sample is <=16.
2993*/
2994#if defined(__clang__)
2995__attribute__((no_sanitize("signed-integer-overflow")))
2996#endif
2997static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_32(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
2998{
2999 drflac_int32 prediction = 0;
3000
3001 DRFLAC_ASSERT(order <= 32);
3002
3003 /* 32-bit version. */
3004
3005 /* VC++ optimizes this to a single jmp. I've not yet verified this for other compilers. */
3006 switch (order)
3007 {
3008 case 32: prediction += coefficients[31] * pDecodedSamples[-32];
3009 case 31: prediction += coefficients[30] * pDecodedSamples[-31];
3010 case 30: prediction += coefficients[29] * pDecodedSamples[-30];
3011 case 29: prediction += coefficients[28] * pDecodedSamples[-29];
3012 case 28: prediction += coefficients[27] * pDecodedSamples[-28];
3013 case 27: prediction += coefficients[26] * pDecodedSamples[-27];
3014 case 26: prediction += coefficients[25] * pDecodedSamples[-26];
3015 case 25: prediction += coefficients[24] * pDecodedSamples[-25];
3016 case 24: prediction += coefficients[23] * pDecodedSamples[-24];
3017 case 23: prediction += coefficients[22] * pDecodedSamples[-23];
3018 case 22: prediction += coefficients[21] * pDecodedSamples[-22];
3019 case 21: prediction += coefficients[20] * pDecodedSamples[-21];
3020 case 20: prediction += coefficients[19] * pDecodedSamples[-20];
3021 case 19: prediction += coefficients[18] * pDecodedSamples[-19];
3022 case 18: prediction += coefficients[17] * pDecodedSamples[-18];
3023 case 17: prediction += coefficients[16] * pDecodedSamples[-17];
3024 case 16: prediction += coefficients[15] * pDecodedSamples[-16];
3025 case 15: prediction += coefficients[14] * pDecodedSamples[-15];
3026 case 14: prediction += coefficients[13] * pDecodedSamples[-14];
3027 case 13: prediction += coefficients[12] * pDecodedSamples[-13];
3028 case 12: prediction += coefficients[11] * pDecodedSamples[-12];
3029 case 11: prediction += coefficients[10] * pDecodedSamples[-11];
3030 case 10: prediction += coefficients[ 9] * pDecodedSamples[-10];
3031 case 9: prediction += coefficients[ 8] * pDecodedSamples[- 9];
3032 case 8: prediction += coefficients[ 7] * pDecodedSamples[- 8];
3033 case 7: prediction += coefficients[ 6] * pDecodedSamples[- 7];
3034 case 6: prediction += coefficients[ 5] * pDecodedSamples[- 6];
3035 case 5: prediction += coefficients[ 4] * pDecodedSamples[- 5];
3036 case 4: prediction += coefficients[ 3] * pDecodedSamples[- 4];
3037 case 3: prediction += coefficients[ 2] * pDecodedSamples[- 3];
3038 case 2: prediction += coefficients[ 1] * pDecodedSamples[- 2];
3039 case 1: prediction += coefficients[ 0] * pDecodedSamples[- 1];
3040 }
3041
3042 return (drflac_int32)(prediction >> shift);
3043}
3044
3045static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_64(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
3046{
3047 drflac_int64 prediction;
3048
3049 DRFLAC_ASSERT(order <= 32);
3050
3051 /* 64-bit version. */
3052
3053 /* This method is faster on the 32-bit build when compiling with VC++. See note below. */
3054#ifndef DRFLAC_64BIT
3055 if (order == 8)
3056 {
3057 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3058 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3059 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3060 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3061 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3062 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3063 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3064 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3065 }
3066 else if (order == 7)
3067 {
3068 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3069 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3070 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3071 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3072 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3073 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3074 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3075 }
3076 else if (order == 3)
3077 {
3078 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3079 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3080 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3081 }
3082 else if (order == 6)
3083 {
3084 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3085 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3086 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3087 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3088 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3089 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3090 }
3091 else if (order == 5)
3092 {
3093 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3094 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3095 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3096 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3097 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3098 }
3099 else if (order == 4)
3100 {
3101 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3102 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3103 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3104 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3105 }
3106 else if (order == 12)
3107 {
3108 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3109 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3110 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3111 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3112 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3113 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3114 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3115 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3116 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3117 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
3118 prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3119 prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
3120 }
3121 else if (order == 2)
3122 {
3123 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3124 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3125 }
3126 else if (order == 1)
3127 {
3128 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3129 }
3130 else if (order == 10)
3131 {
3132 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3133 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3134 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3135 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3136 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3137 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3138 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3139 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3140 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3141 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
3142 }
3143 else if (order == 9)
3144 {
3145 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3146 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3147 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3148 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3149 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3150 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3151 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3152 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3153 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3154 }
3155 else if (order == 11)
3156 {
3157 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3158 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3159 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3160 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3161 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3162 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3163 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3164 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3165 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3166 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
3167 prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3168 }
3169 else
3170 {
3171 int j;
3172
3173 prediction = 0;
3174 for (j = 0; j < (int)order; ++j) {
3175 prediction += coefficients[j] * (drflac_int64)pDecodedSamples[-j-1];
3176 }
3177 }
3178#endif
3179
3180 /*
3181 VC++ optimizes this to a single jmp instruction, but only the 64-bit build. The 32-bit build generates less efficient code for some
3182 reason. The ugly version above is faster so we'll just switch between the two depending on the target platform.
3183 */
3184#ifdef DRFLAC_64BIT
3185 prediction = 0;
3186 switch (order)
3187 {
3188 case 32: prediction += coefficients[31] * (drflac_int64)pDecodedSamples[-32];
3189 case 31: prediction += coefficients[30] * (drflac_int64)pDecodedSamples[-31];
3190 case 30: prediction += coefficients[29] * (drflac_int64)pDecodedSamples[-30];
3191 case 29: prediction += coefficients[28] * (drflac_int64)pDecodedSamples[-29];
3192 case 28: prediction += coefficients[27] * (drflac_int64)pDecodedSamples[-28];
3193 case 27: prediction += coefficients[26] * (drflac_int64)pDecodedSamples[-27];
3194 case 26: prediction += coefficients[25] * (drflac_int64)pDecodedSamples[-26];
3195 case 25: prediction += coefficients[24] * (drflac_int64)pDecodedSamples[-25];
3196 case 24: prediction += coefficients[23] * (drflac_int64)pDecodedSamples[-24];
3197 case 23: prediction += coefficients[22] * (drflac_int64)pDecodedSamples[-23];
3198 case 22: prediction += coefficients[21] * (drflac_int64)pDecodedSamples[-22];
3199 case 21: prediction += coefficients[20] * (drflac_int64)pDecodedSamples[-21];
3200 case 20: prediction += coefficients[19] * (drflac_int64)pDecodedSamples[-20];
3201 case 19: prediction += coefficients[18] * (drflac_int64)pDecodedSamples[-19];
3202 case 18: prediction += coefficients[17] * (drflac_int64)pDecodedSamples[-18];
3203 case 17: prediction += coefficients[16] * (drflac_int64)pDecodedSamples[-17];
3204 case 16: prediction += coefficients[15] * (drflac_int64)pDecodedSamples[-16];
3205 case 15: prediction += coefficients[14] * (drflac_int64)pDecodedSamples[-15];
3206 case 14: prediction += coefficients[13] * (drflac_int64)pDecodedSamples[-14];
3207 case 13: prediction += coefficients[12] * (drflac_int64)pDecodedSamples[-13];
3208 case 12: prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
3209 case 11: prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3210 case 10: prediction += coefficients[ 9] * (drflac_int64)pDecodedSamples[-10];
3211 case 9: prediction += coefficients[ 8] * (drflac_int64)pDecodedSamples[- 9];
3212 case 8: prediction += coefficients[ 7] * (drflac_int64)pDecodedSamples[- 8];
3213 case 7: prediction += coefficients[ 6] * (drflac_int64)pDecodedSamples[- 7];
3214 case 6: prediction += coefficients[ 5] * (drflac_int64)pDecodedSamples[- 6];
3215 case 5: prediction += coefficients[ 4] * (drflac_int64)pDecodedSamples[- 5];
3216 case 4: prediction += coefficients[ 3] * (drflac_int64)pDecodedSamples[- 4];
3217 case 3: prediction += coefficients[ 2] * (drflac_int64)pDecodedSamples[- 3];
3218 case 2: prediction += coefficients[ 1] * (drflac_int64)pDecodedSamples[- 2];
3219 case 1: prediction += coefficients[ 0] * (drflac_int64)pDecodedSamples[- 1];
3220 }
3221#endif
3222
3223 return (drflac_int32)(prediction >> shift);
3224}
3225
3226
3227#if 0
3228/*
3229Reference implementation for reading and decoding samples with residual. This is intentionally left unoptimized for the
3230sake of readability and should only be used as a reference.
3231*/
3232static drflac_bool32 drflac__decode_samples_with_residual__rice__reference(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3233{
3234 drflac_uint32 i;
3235
3236 DRFLAC_ASSERT(bs != NULL);
3237 DRFLAC_ASSERT(pSamplesOut != NULL);
3238
3239 for (i = 0; i < count; ++i) {
3240 drflac_uint32 zeroCounter = 0;
3241 for (;;) {
3242 drflac_uint8 bit;
3243 if (!drflac__read_uint8(bs, 1, &bit)) {
3244 return DRFLAC_FALSE;
3245 }
3246
3247 if (bit == 0) {
3248 zeroCounter += 1;
3249 } else {
3250 break;
3251 }
3252 }
3253
3254 drflac_uint32 decodedRice;
3255 if (riceParam > 0) {
3256 if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
3257 return DRFLAC_FALSE;
3258 }
3259 } else {
3260 decodedRice = 0;
3261 }
3262
3263 decodedRice |= (zeroCounter << riceParam);
3264 if ((decodedRice & 0x01)) {
3265 decodedRice = ~(decodedRice >> 1);
3266 } else {
3267 decodedRice = (decodedRice >> 1);
3268 }
3269
3270
3271 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
3272 pSamplesOut[i] = decodedRice + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
3273 } else {
3274 pSamplesOut[i] = decodedRice + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
3275 }
3276 }
3277
3278 return DRFLAC_TRUE;
3279}
3280#endif
3281
3282#if 0
3283static drflac_bool32 drflac__read_rice_parts__reference(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3284{
3285 drflac_uint32 zeroCounter = 0;
3286 drflac_uint32 decodedRice;
3287
3288 for (;;) {
3289 drflac_uint8 bit;
3290 if (!drflac__read_uint8(bs, 1, &bit)) {
3291 return DRFLAC_FALSE;
3292 }
3293
3294 if (bit == 0) {
3295 zeroCounter += 1;
3296 } else {
3297 break;
3298 }
3299 }
3300
3301 if (riceParam > 0) {
3302 if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
3303 return DRFLAC_FALSE;
3304 }
3305 } else {
3306 decodedRice = 0;
3307 }
3308
3309 *pZeroCounterOut = zeroCounter;
3310 *pRiceParamPartOut = decodedRice;
3311 return DRFLAC_TRUE;
3312}
3313#endif
3314
3315#if 0
3316static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3317{
3318 drflac_cache_t riceParamMask;
3319 drflac_uint32 zeroCounter;
3320 drflac_uint32 setBitOffsetPlus1;
3321 drflac_uint32 riceParamPart;
3322 drflac_uint32 riceLength;
3323
3324 DRFLAC_ASSERT(riceParam > 0); /* <-- riceParam should never be 0. drflac__read_rice_parts__param_equals_zero() should be used instead for this case. */
3325
3326 riceParamMask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParam);
3327
3328 zeroCounter = 0;
3329 while (bs->cache == 0) {
3330 zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
3331 if (!drflac__reload_cache(bs)) {
3332 return DRFLAC_FALSE;
3333 }
3334 }
3335
3336 setBitOffsetPlus1 = drflac__clz(bs->cache);
3337 zeroCounter += setBitOffsetPlus1;
3338 setBitOffsetPlus1 += 1;
3339
3340 riceLength = setBitOffsetPlus1 + riceParam;
3341 if (riceLength < DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3342 riceParamPart = (drflac_uint32)((bs->cache & (riceParamMask >> setBitOffsetPlus1)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceLength));
3343
3344 bs->consumedBits += riceLength;
3345 bs->cache <<= riceLength;
3346 } else {
3347 drflac_uint32 bitCountLo;
3348 drflac_cache_t resultHi;
3349
3350 bs->consumedBits += riceLength;
3351 bs->cache <<= setBitOffsetPlus1 & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1); /* <-- Equivalent to "if (setBitOffsetPlus1 < DRFLAC_CACHE_L1_SIZE_BITS(bs)) { bs->cache <<= setBitOffsetPlus1; }" */
3352
3353 /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
3354 bitCountLo = bs->consumedBits - DRFLAC_CACHE_L1_SIZE_BITS(bs);
3355 resultHi = DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, riceParam); /* <-- Use DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE() if ever this function allows riceParam=0. */
3356
3357 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3358#ifndef DR_FLAC_NO_CRC
3359 drflac__update_crc16(bs);
3360#endif
3361 bs->cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3362 bs->consumedBits = 0;
3363#ifndef DR_FLAC_NO_CRC
3364 bs->crc16Cache = bs->cache;
3365#endif
3366 } else {
3367 /* Slow path. We need to fetch more data from the client. */
3368 if (!drflac__reload_cache(bs)) {
3369 return DRFLAC_FALSE;
3370 }
3371 if (bitCountLo > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3372 /* This happens when we get to end of stream */
3373 return DRFLAC_FALSE;
3374 }
3375 }
3376
3377 riceParamPart = (drflac_uint32)(resultHi | DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, bitCountLo));
3378
3379 bs->consumedBits += bitCountLo;
3380 bs->cache <<= bitCountLo;
3381 }
3382
3383 pZeroCounterOut[0] = zeroCounter;
3384 pRiceParamPartOut[0] = riceParamPart;
3385
3386 return DRFLAC_TRUE;
3387}
3388#endif
3389
3390static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts_x1(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3391{
3392 drflac_uint32 riceParamPlus1 = riceParam + 1;
3393 /*drflac_cache_t riceParamPlus1Mask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParamPlus1);*/
3394 drflac_uint32 riceParamPlus1Shift = DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPlus1);
3395 drflac_uint32 riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
3396
3397 /*
3398 The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
3399 no idea how this will work in practice...
3400 */
3401 drflac_cache_t bs_cache = bs->cache;
3402 drflac_uint32 bs_consumedBits = bs->consumedBits;
3403
3404 /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
3405 drflac_uint32 lzcount = drflac__clz(bs_cache);
3406 if (lzcount < sizeof(bs_cache)*8) {
3407 pZeroCounterOut[0] = lzcount;
3408
3409 /*
3410 It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
3411 this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
3412 outside of this function at a higher level.
3413 */
3414 extract_rice_param_part:
3415 bs_cache <<= lzcount;
3416 bs_consumedBits += lzcount;
3417
3418 if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
3419 /* Getting here means the rice parameter part is wholly contained within the current cache line. */
3420 pRiceParamPartOut[0] = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
3421 bs_cache <<= riceParamPlus1;
3422 bs_consumedBits += riceParamPlus1;
3423 } else {
3424 drflac_uint32 riceParamPartHi;
3425 drflac_uint32 riceParamPartLo;
3426 drflac_uint32 riceParamPartLoBitCount;
3427
3428 /*
3429 Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
3430 line, reload the cache, and then combine it with the head of the next cache line.
3431 */
3432
3433 /* Grab the high part of the rice parameter part. */
3434 riceParamPartHi = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
3435
3436 /* Before reloading the cache we need to grab the size in bits of the low part. */
3437 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
3438 DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
3439
3440 /* Now reload the cache. */
3441 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3442 #ifndef DR_FLAC_NO_CRC
3443 drflac__update_crc16(bs);
3444 #endif
3445 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3446 bs_consumedBits = riceParamPartLoBitCount;
3447 #ifndef DR_FLAC_NO_CRC
3448 bs->crc16Cache = bs_cache;
3449 #endif
3450 } else {
3451 /* Slow path. We need to fetch more data from the client. */
3452 if (!drflac__reload_cache(bs)) {
3453 return DRFLAC_FALSE;
3454 }
3455 if (riceParamPartLoBitCount > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3456 /* This happens when we get to end of stream */
3457 return DRFLAC_FALSE;
3458 }
3459
3460 bs_cache = bs->cache;
3461 bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
3462 }
3463
3464 /* We should now have enough information to construct the rice parameter part. */
3465 riceParamPartLo = (drflac_uint32)(bs_cache >> (DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPartLoBitCount)));
3466 pRiceParamPartOut[0] = riceParamPartHi | riceParamPartLo;
3467
3468 bs_cache <<= riceParamPartLoBitCount;
3469 }
3470 } else {
3471 /*
3472 Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
3473 to drflac__clz() and we need to reload the cache.
3474 */
3475 drflac_uint32 zeroCounter = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BITS(bs) - bs_consumedBits);
3476 for (;;) {
3477 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3478 #ifndef DR_FLAC_NO_CRC
3479 drflac__update_crc16(bs);
3480 #endif
3481 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3482 bs_consumedBits = 0;
3483 #ifndef DR_FLAC_NO_CRC
3484 bs->crc16Cache = bs_cache;
3485 #endif
3486 } else {
3487 /* Slow path. We need to fetch more data from the client. */
3488 if (!drflac__reload_cache(bs)) {
3489 return DRFLAC_FALSE;
3490 }
3491
3492 bs_cache = bs->cache;
3493 bs_consumedBits = bs->consumedBits;
3494 }
3495
3496 lzcount = drflac__clz(bs_cache);
3497 zeroCounter += lzcount;
3498
3499 if (lzcount < sizeof(bs_cache)*8) {
3500 break;
3501 }
3502 }
3503
3504 pZeroCounterOut[0] = zeroCounter;
3505 goto extract_rice_param_part;
3506 }
3507
3508 /* Make sure the cache is restored at the end of it all. */
3509 bs->cache = bs_cache;
3510 bs->consumedBits = bs_consumedBits;
3511
3512 return DRFLAC_TRUE;
3513}
3514
3515static DRFLAC_INLINE drflac_bool32 drflac__seek_rice_parts(drflac_bs* bs, drflac_uint8 riceParam)
3516{
3517 drflac_uint32 riceParamPlus1 = riceParam + 1;
3518 drflac_uint32 riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
3519
3520 /*
3521 The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
3522 no idea how this will work in practice...
3523 */
3524 drflac_cache_t bs_cache = bs->cache;
3525 drflac_uint32 bs_consumedBits = bs->consumedBits;
3526
3527 /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
3528 drflac_uint32 lzcount = drflac__clz(bs_cache);
3529 if (lzcount < sizeof(bs_cache)*8) {
3530 /*
3531 It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
3532 this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
3533 outside of this function at a higher level.
3534 */
3535 extract_rice_param_part:
3536 bs_cache <<= lzcount;
3537 bs_consumedBits += lzcount;
3538
3539 if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
3540 /* Getting here means the rice parameter part is wholly contained within the current cache line. */
3541 bs_cache <<= riceParamPlus1;
3542 bs_consumedBits += riceParamPlus1;
3543 } else {
3544 /*
3545 Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
3546 line, reload the cache, and then combine it with the head of the next cache line.
3547 */
3548
3549 /* Before reloading the cache we need to grab the size in bits of the low part. */
3550 drflac_uint32 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
3551 DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
3552
3553 /* Now reload the cache. */
3554 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3555 #ifndef DR_FLAC_NO_CRC
3556 drflac__update_crc16(bs);
3557 #endif
3558 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3559 bs_consumedBits = riceParamPartLoBitCount;
3560 #ifndef DR_FLAC_NO_CRC
3561 bs->crc16Cache = bs_cache;
3562 #endif
3563 } else {
3564 /* Slow path. We need to fetch more data from the client. */
3565 if (!drflac__reload_cache(bs)) {
3566 return DRFLAC_FALSE;
3567 }
3568
3569 if (riceParamPartLoBitCount > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3570 /* This happens when we get to end of stream */
3571 return DRFLAC_FALSE;
3572 }
3573
3574 bs_cache = bs->cache;
3575 bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
3576 }
3577
3578 bs_cache <<= riceParamPartLoBitCount;
3579 }
3580 } else {
3581 /*
3582 Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
3583 to drflac__clz() and we need to reload the cache.
3584 */
3585 for (;;) {
3586 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3587 #ifndef DR_FLAC_NO_CRC
3588 drflac__update_crc16(bs);
3589 #endif
3590 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3591 bs_consumedBits = 0;
3592 #ifndef DR_FLAC_NO_CRC
3593 bs->crc16Cache = bs_cache;
3594 #endif
3595 } else {
3596 /* Slow path. We need to fetch more data from the client. */
3597 if (!drflac__reload_cache(bs)) {
3598 return DRFLAC_FALSE;
3599 }
3600
3601 bs_cache = bs->cache;
3602 bs_consumedBits = bs->consumedBits;
3603 }
3604
3605 lzcount = drflac__clz(bs_cache);
3606 if (lzcount < sizeof(bs_cache)*8) {
3607 break;
3608 }
3609 }
3610
3611 goto extract_rice_param_part;
3612 }
3613
3614 /* Make sure the cache is restored at the end of it all. */
3615 bs->cache = bs_cache;
3616 bs->consumedBits = bs_consumedBits;
3617
3618 return DRFLAC_TRUE;
3619}
3620
3621
3622static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar_zeroorder(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3623{
3624 drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3625 drflac_uint32 zeroCountPart0;
3626 drflac_uint32 riceParamPart0;
3627 drflac_uint32 riceParamMask;
3628 drflac_uint32 i;
3629
3630 DRFLAC_ASSERT(bs != NULL);
3631 DRFLAC_ASSERT(pSamplesOut != NULL);
3632
3633 (void)bitsPerSample;
3634 (void)order;
3635 (void)shift;
3636 (void)coefficients;
3637
3638 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
3639
3640 i = 0;
3641 while (i < count) {
3642 /* Rice extraction. */
3643 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
3644 return DRFLAC_FALSE;
3645 }
3646
3647 /* Rice reconstruction. */
3648 riceParamPart0 &= riceParamMask;
3649 riceParamPart0 |= (zeroCountPart0 << riceParam);
3650 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3651
3652 pSamplesOut[i] = riceParamPart0;
3653
3654 i += 1;
3655 }
3656
3657 return DRFLAC_TRUE;
3658}
3659
3660static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3661{
3662 drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3663 drflac_uint32 zeroCountPart0 = 0;
3664 drflac_uint32 zeroCountPart1 = 0;
3665 drflac_uint32 zeroCountPart2 = 0;
3666 drflac_uint32 zeroCountPart3 = 0;
3667 drflac_uint32 riceParamPart0 = 0;
3668 drflac_uint32 riceParamPart1 = 0;
3669 drflac_uint32 riceParamPart2 = 0;
3670 drflac_uint32 riceParamPart3 = 0;
3671 drflac_uint32 riceParamMask;
3672 const drflac_int32* pSamplesOutEnd;
3673 drflac_uint32 i;
3674
3675 DRFLAC_ASSERT(bs != NULL);
3676 DRFLAC_ASSERT(pSamplesOut != NULL);
3677
3678 if (lpcOrder == 0) {
3679 return drflac__decode_samples_with_residual__rice__scalar_zeroorder(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
3680 }
3681
3682 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
3683 pSamplesOutEnd = pSamplesOut + (count & ~3);
3684
3685 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
3686 while (pSamplesOut < pSamplesOutEnd) {
3687 /*
3688 Rice extraction. It's faster to do this one at a time against local variables than it is to use the x4 version
3689 against an array. Not sure why, but perhaps it's making more efficient use of registers?
3690 */
3691 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
3692 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
3693 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
3694 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
3695 return DRFLAC_FALSE;
3696 }
3697
3698 riceParamPart0 &= riceParamMask;
3699 riceParamPart1 &= riceParamMask;
3700 riceParamPart2 &= riceParamMask;
3701 riceParamPart3 &= riceParamMask;
3702
3703 riceParamPart0 |= (zeroCountPart0 << riceParam);
3704 riceParamPart1 |= (zeroCountPart1 << riceParam);
3705 riceParamPart2 |= (zeroCountPart2 << riceParam);
3706 riceParamPart3 |= (zeroCountPart3 << riceParam);
3707
3708 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3709 riceParamPart1 = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
3710 riceParamPart2 = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
3711 riceParamPart3 = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
3712
3713 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
3714 pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 1);
3715 pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 2);
3716 pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 3);
3717
3718 pSamplesOut += 4;
3719 }
3720 } else {
3721 while (pSamplesOut < pSamplesOutEnd) {
3722 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
3723 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
3724 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
3725 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
3726 return DRFLAC_FALSE;
3727 }
3728
3729 riceParamPart0 &= riceParamMask;
3730 riceParamPart1 &= riceParamMask;
3731 riceParamPart2 &= riceParamMask;
3732 riceParamPart3 &= riceParamMask;
3733
3734 riceParamPart0 |= (zeroCountPart0 << riceParam);
3735 riceParamPart1 |= (zeroCountPart1 << riceParam);
3736 riceParamPart2 |= (zeroCountPart2 << riceParam);
3737 riceParamPart3 |= (zeroCountPart3 << riceParam);
3738
3739 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3740 riceParamPart1 = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
3741 riceParamPart2 = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
3742 riceParamPart3 = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
3743
3744 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
3745 pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 1);
3746 pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 2);
3747 pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 3);
3748
3749 pSamplesOut += 4;
3750 }
3751 }
3752
3753 i = (count & ~3);
3754 while (i < count) {
3755 /* Rice extraction. */
3756 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
3757 return DRFLAC_FALSE;
3758 }
3759
3760 /* Rice reconstruction. */
3761 riceParamPart0 &= riceParamMask;
3762 riceParamPart0 |= (zeroCountPart0 << riceParam);
3763 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3764 /*riceParamPart0 = (riceParamPart0 >> 1) ^ (~(riceParamPart0 & 0x01) + 1);*/
3765
3766 /* Sample reconstruction. */
3767 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
3768 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
3769 } else {
3770 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
3771 }
3772
3773 i += 1;
3774 pSamplesOut += 1;
3775 }
3776
3777 return DRFLAC_TRUE;
3778}
3779
3780#if defined(DRFLAC_SUPPORT_SSE2)
3781static DRFLAC_INLINE __m128i drflac__mm_packs_interleaved_epi32(__m128i a, __m128i b)
3782{
3783 __m128i r;
3784
3785 /* Pack. */
3786 r = _mm_packs_epi32(a, b);
3787
3788 /* a3a2 a1a0 b3b2 b1b0 -> a3a2 b3b2 a1a0 b1b0 */
3789 r = _mm_shuffle_epi32(r, _MM_SHUFFLE(3, 1, 2, 0));
3790
3791 /* a3a2 b3b2 a1a0 b1b0 -> a3b3 a2b2 a1b1 a0b0 */
3792 r = _mm_shufflehi_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
3793 r = _mm_shufflelo_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
3794
3795 return r;
3796}
3797#endif
3798
3799#if defined(DRFLAC_SUPPORT_SSE41)
3800static DRFLAC_INLINE __m128i drflac__mm_not_si128(__m128i a)
3801{
3802 return _mm_xor_si128(a, _mm_cmpeq_epi32(_mm_setzero_si128(), _mm_setzero_si128()));
3803}
3804
3805static DRFLAC_INLINE __m128i drflac__mm_hadd_epi32(__m128i x)
3806{
3807 __m128i x64 = _mm_add_epi32(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
3808 __m128i x32 = _mm_shufflelo_epi16(x64, _MM_SHUFFLE(1, 0, 3, 2));
3809 return _mm_add_epi32(x64, x32);
3810}
3811
3812static DRFLAC_INLINE __m128i drflac__mm_hadd_epi64(__m128i x)
3813{
3814 return _mm_add_epi64(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
3815}
3816
3817static DRFLAC_INLINE __m128i drflac__mm_srai_epi64(__m128i x, int count)
3818{
3819 /*
3820 To simplify this we are assuming count < 32. This restriction allows us to work on a low side and a high side. The low side
3821 is shifted with zero bits, whereas the right side is shifted with sign bits.
3822 */
3823 __m128i lo = _mm_srli_epi64(x, count);
3824 __m128i hi = _mm_srai_epi32(x, count);
3825
3826 hi = _mm_and_si128(hi, _mm_set_epi32(0xFFFFFFFF, 0, 0xFFFFFFFF, 0)); /* The high part needs to have the low part cleared. */
3827
3828 return _mm_or_si128(lo, hi);
3829}
3830
3831static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3832{
3833 int i;
3834 drflac_uint32 riceParamMask;
3835 drflac_int32* pDecodedSamples = pSamplesOut;
3836 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
3837 drflac_uint32 zeroCountParts0 = 0;
3838 drflac_uint32 zeroCountParts1 = 0;
3839 drflac_uint32 zeroCountParts2 = 0;
3840 drflac_uint32 zeroCountParts3 = 0;
3841 drflac_uint32 riceParamParts0 = 0;
3842 drflac_uint32 riceParamParts1 = 0;
3843 drflac_uint32 riceParamParts2 = 0;
3844 drflac_uint32 riceParamParts3 = 0;
3845 __m128i coefficients128_0;
3846 __m128i coefficients128_4;
3847 __m128i coefficients128_8;
3848 __m128i samples128_0;
3849 __m128i samples128_4;
3850 __m128i samples128_8;
3851 __m128i riceParamMask128;
3852
3853 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3854
3855 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
3856 riceParamMask128 = _mm_set1_epi32(riceParamMask);
3857
3858 /* Pre-load. */
3859 coefficients128_0 = _mm_setzero_si128();
3860 coefficients128_4 = _mm_setzero_si128();
3861 coefficients128_8 = _mm_setzero_si128();
3862
3863 samples128_0 = _mm_setzero_si128();
3864 samples128_4 = _mm_setzero_si128();
3865 samples128_8 = _mm_setzero_si128();
3866
3867 /*
3868 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
3869 what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results
3870 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
3871 so I think there's opportunity for this to be simplified.
3872 */
3873#if 1
3874 {
3875 int runningOrder = order;
3876
3877 /* 0 - 3. */
3878 if (runningOrder >= 4) {
3879 coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
3880 samples128_0 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 4));
3881 runningOrder -= 4;
3882 } else {
3883 switch (runningOrder) {
3884 case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
3885 case 2: coefficients128_0 = _mm_set_epi32(0, 0, coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0, 0); break;
3886 case 1: coefficients128_0 = _mm_set_epi32(0, 0, 0, coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0, 0, 0); break;
3887 }
3888 runningOrder = 0;
3889 }
3890
3891 /* 4 - 7 */
3892 if (runningOrder >= 4) {
3893 coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
3894 samples128_4 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 8));
3895 runningOrder -= 4;
3896 } else {
3897 switch (runningOrder) {
3898 case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
3899 case 2: coefficients128_4 = _mm_set_epi32(0, 0, coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0, 0); break;
3900 case 1: coefficients128_4 = _mm_set_epi32(0, 0, 0, coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0, 0, 0); break;
3901 }
3902 runningOrder = 0;
3903 }
3904
3905 /* 8 - 11 */
3906 if (runningOrder == 4) {
3907 coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
3908 samples128_8 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 12));
3909 runningOrder -= 4;
3910 } else {
3911 switch (runningOrder) {
3912 case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
3913 case 2: coefficients128_8 = _mm_set_epi32(0, 0, coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0, 0); break;
3914 case 1: coefficients128_8 = _mm_set_epi32(0, 0, 0, coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0, 0, 0); break;
3915 }
3916 runningOrder = 0;
3917 }
3918
3919 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
3920 coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
3921 coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
3922 coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
3923 }
3924#else
3925 /* This causes strict-aliasing warnings with GCC. */
3926 switch (order)
3927 {
3928 case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
3929 case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
3930 case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
3931 case 9: ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
3932 case 8: ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
3933 case 7: ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
3934 case 6: ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
3935 case 5: ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
3936 case 4: ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
3937 case 3: ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
3938 case 2: ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
3939 case 1: ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
3940 }
3941#endif
3942
3943 /* For this version we are doing one sample at a time. */
3944 while (pDecodedSamples < pDecodedSamplesEnd) {
3945 __m128i prediction128;
3946 __m128i zeroCountPart128;
3947 __m128i riceParamPart128;
3948
3949 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
3950 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
3951 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
3952 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
3953 return DRFLAC_FALSE;
3954 }
3955
3956 zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
3957 riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
3958
3959 riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
3960 riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
3961 riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01))), _mm_set1_epi32(0x01))); /* <-- SSE2 compatible */
3962 /*riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_mullo_epi32(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01)), _mm_set1_epi32(0xFFFFFFFF)));*/ /* <-- Only supported from SSE4.1 and is slower in my testing... */
3963
3964 if (order <= 4) {
3965 for (i = 0; i < 4; i += 1) {
3966 prediction128 = _mm_mullo_epi32(coefficients128_0, samples128_0);
3967
3968 /* Horizontal add and shift. */
3969 prediction128 = drflac__mm_hadd_epi32(prediction128);
3970 prediction128 = _mm_srai_epi32(prediction128, shift);
3971 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
3972
3973 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
3974 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
3975 }
3976 } else if (order <= 8) {
3977 for (i = 0; i < 4; i += 1) {
3978 prediction128 = _mm_mullo_epi32(coefficients128_4, samples128_4);
3979 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
3980
3981 /* Horizontal add and shift. */
3982 prediction128 = drflac__mm_hadd_epi32(prediction128);
3983 prediction128 = _mm_srai_epi32(prediction128, shift);
3984 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
3985
3986 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
3987 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
3988 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
3989 }
3990 } else {
3991 for (i = 0; i < 4; i += 1) {
3992 prediction128 = _mm_mullo_epi32(coefficients128_8, samples128_8);
3993 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_4, samples128_4));
3994 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
3995
3996 /* Horizontal add and shift. */
3997 prediction128 = drflac__mm_hadd_epi32(prediction128);
3998 prediction128 = _mm_srai_epi32(prediction128, shift);
3999 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4000
4001 samples128_8 = _mm_alignr_epi8(samples128_4, samples128_8, 4);
4002 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
4003 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4004 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4005 }
4006 }
4007
4008 /* We store samples in groups of 4. */
4009 _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
4010 pDecodedSamples += 4;
4011 }
4012
4013 /* Make sure we process the last few samples. */
4014 i = (count & ~3);
4015 while (i < (int)count) {
4016 /* Rice extraction. */
4017 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
4018 return DRFLAC_FALSE;
4019 }
4020
4021 /* Rice reconstruction. */
4022 riceParamParts0 &= riceParamMask;
4023 riceParamParts0 |= (zeroCountParts0 << riceParam);
4024 riceParamParts0 = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
4025
4026 /* Sample reconstruction. */
4027 pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
4028
4029 i += 1;
4030 pDecodedSamples += 1;
4031 }
4032
4033 return DRFLAC_TRUE;
4034}
4035
4036static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4037{
4038 int i;
4039 drflac_uint32 riceParamMask;
4040 drflac_int32* pDecodedSamples = pSamplesOut;
4041 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4042 drflac_uint32 zeroCountParts0 = 0;
4043 drflac_uint32 zeroCountParts1 = 0;
4044 drflac_uint32 zeroCountParts2 = 0;
4045 drflac_uint32 zeroCountParts3 = 0;
4046 drflac_uint32 riceParamParts0 = 0;
4047 drflac_uint32 riceParamParts1 = 0;
4048 drflac_uint32 riceParamParts2 = 0;
4049 drflac_uint32 riceParamParts3 = 0;
4050 __m128i coefficients128_0;
4051 __m128i coefficients128_4;
4052 __m128i coefficients128_8;
4053 __m128i samples128_0;
4054 __m128i samples128_4;
4055 __m128i samples128_8;
4056 __m128i prediction128;
4057 __m128i riceParamMask128;
4058
4059 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4060
4061 DRFLAC_ASSERT(order <= 12);
4062
4063 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
4064 riceParamMask128 = _mm_set1_epi32(riceParamMask);
4065
4066 prediction128 = _mm_setzero_si128();
4067
4068 /* Pre-load. */
4069 coefficients128_0 = _mm_setzero_si128();
4070 coefficients128_4 = _mm_setzero_si128();
4071 coefficients128_8 = _mm_setzero_si128();
4072
4073 samples128_0 = _mm_setzero_si128();
4074 samples128_4 = _mm_setzero_si128();
4075 samples128_8 = _mm_setzero_si128();
4076
4077#if 1
4078 {
4079 int runningOrder = order;
4080
4081 /* 0 - 3. */
4082 if (runningOrder >= 4) {
4083 coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
4084 samples128_0 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 4));
4085 runningOrder -= 4;
4086 } else {
4087 switch (runningOrder) {
4088 case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
4089 case 2: coefficients128_0 = _mm_set_epi32(0, 0, coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0, 0); break;
4090 case 1: coefficients128_0 = _mm_set_epi32(0, 0, 0, coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0, 0, 0); break;
4091 }
4092 runningOrder = 0;
4093 }
4094
4095 /* 4 - 7 */
4096 if (runningOrder >= 4) {
4097 coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
4098 samples128_4 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 8));
4099 runningOrder -= 4;
4100 } else {
4101 switch (runningOrder) {
4102 case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
4103 case 2: coefficients128_4 = _mm_set_epi32(0, 0, coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0, 0); break;
4104 case 1: coefficients128_4 = _mm_set_epi32(0, 0, 0, coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0, 0, 0); break;
4105 }
4106 runningOrder = 0;
4107 }
4108
4109 /* 8 - 11 */
4110 if (runningOrder == 4) {
4111 coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
4112 samples128_8 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 12));
4113 runningOrder -= 4;
4114 } else {
4115 switch (runningOrder) {
4116 case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
4117 case 2: coefficients128_8 = _mm_set_epi32(0, 0, coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0, 0); break;
4118 case 1: coefficients128_8 = _mm_set_epi32(0, 0, 0, coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0, 0, 0); break;
4119 }
4120 runningOrder = 0;
4121 }
4122
4123 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4124 coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
4125 coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
4126 coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
4127 }
4128#else
4129 switch (order)
4130 {
4131 case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
4132 case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
4133 case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
4134 case 9: ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
4135 case 8: ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
4136 case 7: ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
4137 case 6: ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
4138 case 5: ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
4139 case 4: ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
4140 case 3: ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
4141 case 2: ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
4142 case 1: ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
4143 }
4144#endif
4145
4146 /* For this version we are doing one sample at a time. */
4147 while (pDecodedSamples < pDecodedSamplesEnd) {
4148 __m128i zeroCountPart128;
4149 __m128i riceParamPart128;
4150
4151 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
4152 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
4153 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
4154 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
4155 return DRFLAC_FALSE;
4156 }
4157
4158 zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
4159 riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
4160
4161 riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
4162 riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
4163 riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(1))), _mm_set1_epi32(1)));
4164
4165 for (i = 0; i < 4; i += 1) {
4166 prediction128 = _mm_xor_si128(prediction128, prediction128); /* Reset to 0. */
4167
4168 switch (order)
4169 {
4170 case 12:
4171 case 11: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(1, 1, 0, 0))));
4172 case 10:
4173 case 9: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(3, 3, 2, 2))));
4174 case 8:
4175 case 7: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(1, 1, 0, 0))));
4176 case 6:
4177 case 5: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(3, 3, 2, 2))));
4178 case 4:
4179 case 3: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(1, 1, 0, 0))));
4180 case 2:
4181 case 1: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(3, 3, 2, 2))));
4182 }
4183
4184 /* Horizontal add and shift. */
4185 prediction128 = drflac__mm_hadd_epi64(prediction128);
4186 prediction128 = drflac__mm_srai_epi64(prediction128, shift);
4187 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4188
4189 /* Our value should be sitting in prediction128[0]. We need to combine this with our SSE samples. */
4190 samples128_8 = _mm_alignr_epi8(samples128_4, samples128_8, 4);
4191 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
4192 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4193
4194 /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
4195 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4196 }
4197
4198 /* We store samples in groups of 4. */
4199 _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
4200 pDecodedSamples += 4;
4201 }
4202
4203 /* Make sure we process the last few samples. */
4204 i = (count & ~3);
4205 while (i < (int)count) {
4206 /* Rice extraction. */
4207 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
4208 return DRFLAC_FALSE;
4209 }
4210
4211 /* Rice reconstruction. */
4212 riceParamParts0 &= riceParamMask;
4213 riceParamParts0 |= (zeroCountParts0 << riceParam);
4214 riceParamParts0 = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
4215
4216 /* Sample reconstruction. */
4217 pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
4218
4219 i += 1;
4220 pDecodedSamples += 1;
4221 }
4222
4223 return DRFLAC_TRUE;
4224}
4225
4226static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4227{
4228 DRFLAC_ASSERT(bs != NULL);
4229 DRFLAC_ASSERT(pSamplesOut != NULL);
4230
4231 /* In my testing the order is rarely > 12, so in this case I'm going to simplify the SSE implementation by only handling order <= 12. */
4232 if (lpcOrder > 0 && lpcOrder <= 12) {
4233 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
4234 return drflac__decode_samples_with_residual__rice__sse41_64(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
4235 } else {
4236 return drflac__decode_samples_with_residual__rice__sse41_32(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
4237 }
4238 } else {
4239 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4240 }
4241}
4242#endif
4243
4244#if defined(DRFLAC_SUPPORT_NEON)
4245static DRFLAC_INLINE void drflac__vst2q_s32(drflac_int32* p, int32x4x2_t x)
4246{
4247 vst1q_s32(p+0, x.val[0]);
4248 vst1q_s32(p+4, x.val[1]);
4249}
4250
4251static DRFLAC_INLINE void drflac__vst2q_u32(drflac_uint32* p, uint32x4x2_t x)
4252{
4253 vst1q_u32(p+0, x.val[0]);
4254 vst1q_u32(p+4, x.val[1]);
4255}
4256
4257static DRFLAC_INLINE void drflac__vst2q_f32(float* p, float32x4x2_t x)
4258{
4259 vst1q_f32(p+0, x.val[0]);
4260 vst1q_f32(p+4, x.val[1]);
4261}
4262
4263static DRFLAC_INLINE void drflac__vst2q_s16(drflac_int16* p, int16x4x2_t x)
4264{
4265 vst1q_s16(p, vcombine_s16(x.val[0], x.val[1]));
4266}
4267
4268static DRFLAC_INLINE void drflac__vst2q_u16(drflac_uint16* p, uint16x4x2_t x)
4269{
4270 vst1q_u16(p, vcombine_u16(x.val[0], x.val[1]));
4271}
4272
4273static DRFLAC_INLINE int32x4_t drflac__vdupq_n_s32x4(drflac_int32 x3, drflac_int32 x2, drflac_int32 x1, drflac_int32 x0)
4274{
4275 drflac_int32 x[4];
4276 x[3] = x3;
4277 x[2] = x2;
4278 x[1] = x1;
4279 x[0] = x0;
4280 return vld1q_s32(x);
4281}
4282
4283static DRFLAC_INLINE int32x4_t drflac__valignrq_s32_1(int32x4_t a, int32x4_t b)
4284{
4285 /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
4286
4287 /* Reference */
4288 /*return drflac__vdupq_n_s32x4(
4289 vgetq_lane_s32(a, 0),
4290 vgetq_lane_s32(b, 3),
4291 vgetq_lane_s32(b, 2),
4292 vgetq_lane_s32(b, 1)
4293 );*/
4294
4295 return vextq_s32(b, a, 1);
4296}
4297
4298static DRFLAC_INLINE uint32x4_t drflac__valignrq_u32_1(uint32x4_t a, uint32x4_t b)
4299{
4300 /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
4301
4302 /* Reference */
4303 /*return drflac__vdupq_n_s32x4(
4304 vgetq_lane_s32(a, 0),
4305 vgetq_lane_s32(b, 3),
4306 vgetq_lane_s32(b, 2),
4307 vgetq_lane_s32(b, 1)
4308 );*/
4309
4310 return vextq_u32(b, a, 1);
4311}
4312
4313static DRFLAC_INLINE int32x2_t drflac__vhaddq_s32(int32x4_t x)
4314{
4315 /* The sum must end up in position 0. */
4316
4317 /* Reference */
4318 /*return vdupq_n_s32(
4319 vgetq_lane_s32(x, 3) +
4320 vgetq_lane_s32(x, 2) +
4321 vgetq_lane_s32(x, 1) +
4322 vgetq_lane_s32(x, 0)
4323 );*/
4324
4325 int32x2_t r = vadd_s32(vget_high_s32(x), vget_low_s32(x));
4326 return vpadd_s32(r, r);
4327}
4328
4329static DRFLAC_INLINE int64x1_t drflac__vhaddq_s64(int64x2_t x)
4330{
4331 return vadd_s64(vget_high_s64(x), vget_low_s64(x));
4332}
4333
4334static DRFLAC_INLINE int32x4_t drflac__vrevq_s32(int32x4_t x)
4335{
4336 /* Reference */
4337 /*return drflac__vdupq_n_s32x4(
4338 vgetq_lane_s32(x, 0),
4339 vgetq_lane_s32(x, 1),
4340 vgetq_lane_s32(x, 2),
4341 vgetq_lane_s32(x, 3)
4342 );*/
4343
4344 return vrev64q_s32(vcombine_s32(vget_high_s32(x), vget_low_s32(x)));
4345}
4346
4347static DRFLAC_INLINE int32x4_t drflac__vnotq_s32(int32x4_t x)
4348{
4349 return veorq_s32(x, vdupq_n_s32(0xFFFFFFFF));
4350}
4351
4352static DRFLAC_INLINE uint32x4_t drflac__vnotq_u32(uint32x4_t x)
4353{
4354 return veorq_u32(x, vdupq_n_u32(0xFFFFFFFF));
4355}
4356
4357static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4358{
4359 int i;
4360 drflac_uint32 riceParamMask;
4361 drflac_int32* pDecodedSamples = pSamplesOut;
4362 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4363 drflac_uint32 zeroCountParts[4];
4364 drflac_uint32 riceParamParts[4];
4365 int32x4_t coefficients128_0;
4366 int32x4_t coefficients128_4;
4367 int32x4_t coefficients128_8;
4368 int32x4_t samples128_0;
4369 int32x4_t samples128_4;
4370 int32x4_t samples128_8;
4371 uint32x4_t riceParamMask128;
4372 int32x4_t riceParam128;
4373 int32x2_t shift64;
4374 uint32x4_t one128;
4375
4376 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4377
4378 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
4379 riceParamMask128 = vdupq_n_u32(riceParamMask);
4380
4381 riceParam128 = vdupq_n_s32(riceParam);
4382 shift64 = vdup_n_s32(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
4383 one128 = vdupq_n_u32(1);
4384
4385 /*
4386 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
4387 what's available in the input buffers. It would be conenient to use a fall-through switch to do this, but this results
4388 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
4389 so I think there's opportunity for this to be simplified.
4390 */
4391 {
4392 int runningOrder = order;
4393 drflac_int32 tempC[4] = {0, 0, 0, 0};
4394 drflac_int32 tempS[4] = {0, 0, 0, 0};
4395
4396 /* 0 - 3. */
4397 if (runningOrder >= 4) {
4398 coefficients128_0 = vld1q_s32(coefficients + 0);
4399 samples128_0 = vld1q_s32(pSamplesOut - 4);
4400 runningOrder -= 4;
4401 } else {
4402 switch (runningOrder) {
4403 case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
4404 case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
4405 case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
4406 }
4407
4408 coefficients128_0 = vld1q_s32(tempC);
4409 samples128_0 = vld1q_s32(tempS);
4410 runningOrder = 0;
4411 }
4412
4413 /* 4 - 7 */
4414 if (runningOrder >= 4) {
4415 coefficients128_4 = vld1q_s32(coefficients + 4);
4416 samples128_4 = vld1q_s32(pSamplesOut - 8);
4417 runningOrder -= 4;
4418 } else {
4419 switch (runningOrder) {
4420 case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
4421 case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
4422 case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
4423 }
4424
4425 coefficients128_4 = vld1q_s32(tempC);
4426 samples128_4 = vld1q_s32(tempS);
4427 runningOrder = 0;
4428 }
4429
4430 /* 8 - 11 */
4431 if (runningOrder == 4) {
4432 coefficients128_8 = vld1q_s32(coefficients + 8);
4433 samples128_8 = vld1q_s32(pSamplesOut - 12);
4434 runningOrder -= 4;
4435 } else {
4436 switch (runningOrder) {
4437 case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
4438 case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
4439 case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
4440 }
4441
4442 coefficients128_8 = vld1q_s32(tempC);
4443 samples128_8 = vld1q_s32(tempS);
4444 runningOrder = 0;
4445 }
4446
4447 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4448 coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
4449 coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
4450 coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
4451 }
4452
4453 /* For this version we are doing one sample at a time. */
4454 while (pDecodedSamples < pDecodedSamplesEnd) {
4455 int32x4_t prediction128;
4456 int32x2_t prediction64;
4457 uint32x4_t zeroCountPart128;
4458 uint32x4_t riceParamPart128;
4459
4460 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
4461 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
4462 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
4463 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
4464 return DRFLAC_FALSE;
4465 }
4466
4467 zeroCountPart128 = vld1q_u32(zeroCountParts);
4468 riceParamPart128 = vld1q_u32(riceParamParts);
4469
4470 riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
4471 riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
4472 riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
4473
4474 if (order <= 4) {
4475 for (i = 0; i < 4; i += 1) {
4476 prediction128 = vmulq_s32(coefficients128_0, samples128_0);
4477
4478 /* Horizontal add and shift. */
4479 prediction64 = drflac__vhaddq_s32(prediction128);
4480 prediction64 = vshl_s32(prediction64, shift64);
4481 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4482
4483 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4484 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4485 }
4486 } else if (order <= 8) {
4487 for (i = 0; i < 4; i += 1) {
4488 prediction128 = vmulq_s32(coefficients128_4, samples128_4);
4489 prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
4490
4491 /* Horizontal add and shift. */
4492 prediction64 = drflac__vhaddq_s32(prediction128);
4493 prediction64 = vshl_s32(prediction64, shift64);
4494 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4495
4496 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4497 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4498 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4499 }
4500 } else {
4501 for (i = 0; i < 4; i += 1) {
4502 prediction128 = vmulq_s32(coefficients128_8, samples128_8);
4503 prediction128 = vmlaq_s32(prediction128, coefficients128_4, samples128_4);
4504 prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
4505
4506 /* Horizontal add and shift. */
4507 prediction64 = drflac__vhaddq_s32(prediction128);
4508 prediction64 = vshl_s32(prediction64, shift64);
4509 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4510
4511 samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
4512 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4513 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4514 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4515 }
4516 }
4517
4518 /* We store samples in groups of 4. */
4519 vst1q_s32(pDecodedSamples, samples128_0);
4520 pDecodedSamples += 4;
4521 }
4522
4523 /* Make sure we process the last few samples. */
4524 i = (count & ~3);
4525 while (i < (int)count) {
4526 /* Rice extraction. */
4527 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
4528 return DRFLAC_FALSE;
4529 }
4530
4531 /* Rice reconstruction. */
4532 riceParamParts[0] &= riceParamMask;
4533 riceParamParts[0] |= (zeroCountParts[0] << riceParam);
4534 riceParamParts[0] = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
4535
4536 /* Sample reconstruction. */
4537 pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
4538
4539 i += 1;
4540 pDecodedSamples += 1;
4541 }
4542
4543 return DRFLAC_TRUE;
4544}
4545
4546static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4547{
4548 int i;
4549 drflac_uint32 riceParamMask;
4550 drflac_int32* pDecodedSamples = pSamplesOut;
4551 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4552 drflac_uint32 zeroCountParts[4];
4553 drflac_uint32 riceParamParts[4];
4554 int32x4_t coefficients128_0;
4555 int32x4_t coefficients128_4;
4556 int32x4_t coefficients128_8;
4557 int32x4_t samples128_0;
4558 int32x4_t samples128_4;
4559 int32x4_t samples128_8;
4560 uint32x4_t riceParamMask128;
4561 int32x4_t riceParam128;
4562 int64x1_t shift64;
4563 uint32x4_t one128;
4564 int64x2_t prediction128 = { 0 };
4565 uint32x4_t zeroCountPart128;
4566 uint32x4_t riceParamPart128;
4567
4568 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4569
4570 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
4571 riceParamMask128 = vdupq_n_u32(riceParamMask);
4572
4573 riceParam128 = vdupq_n_s32(riceParam);
4574 shift64 = vdup_n_s64(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
4575 one128 = vdupq_n_u32(1);
4576
4577 /*
4578 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
4579 what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results
4580 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
4581 so I think there's opportunity for this to be simplified.
4582 */
4583 {
4584 int runningOrder = order;
4585 drflac_int32 tempC[4] = {0, 0, 0, 0};
4586 drflac_int32 tempS[4] = {0, 0, 0, 0};
4587
4588 /* 0 - 3. */
4589 if (runningOrder >= 4) {
4590 coefficients128_0 = vld1q_s32(coefficients + 0);
4591 samples128_0 = vld1q_s32(pSamplesOut - 4);
4592 runningOrder -= 4;
4593 } else {
4594 switch (runningOrder) {
4595 case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
4596 case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
4597 case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
4598 }
4599
4600 coefficients128_0 = vld1q_s32(tempC);
4601 samples128_0 = vld1q_s32(tempS);
4602 runningOrder = 0;
4603 }
4604
4605 /* 4 - 7 */
4606 if (runningOrder >= 4) {
4607 coefficients128_4 = vld1q_s32(coefficients + 4);
4608 samples128_4 = vld1q_s32(pSamplesOut - 8);
4609 runningOrder -= 4;
4610 } else {
4611 switch (runningOrder) {
4612 case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
4613 case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
4614 case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
4615 }
4616
4617 coefficients128_4 = vld1q_s32(tempC);
4618 samples128_4 = vld1q_s32(tempS);
4619 runningOrder = 0;
4620 }
4621
4622 /* 8 - 11 */
4623 if (runningOrder == 4) {
4624 coefficients128_8 = vld1q_s32(coefficients + 8);
4625 samples128_8 = vld1q_s32(pSamplesOut - 12);
4626 runningOrder -= 4;
4627 } else {
4628 switch (runningOrder) {
4629 case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
4630 case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
4631 case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
4632 }
4633
4634 coefficients128_8 = vld1q_s32(tempC);
4635 samples128_8 = vld1q_s32(tempS);
4636 runningOrder = 0;
4637 }
4638
4639 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4640 coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
4641 coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
4642 coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
4643 }
4644
4645 /* For this version we are doing one sample at a time. */
4646 while (pDecodedSamples < pDecodedSamplesEnd) {
4647 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
4648 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
4649 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
4650 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
4651 return DRFLAC_FALSE;
4652 }
4653
4654 zeroCountPart128 = vld1q_u32(zeroCountParts);
4655 riceParamPart128 = vld1q_u32(riceParamParts);
4656
4657 riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
4658 riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
4659 riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
4660
4661 for (i = 0; i < 4; i += 1) {
4662 int64x1_t prediction64;
4663
4664 prediction128 = veorq_s64(prediction128, prediction128); /* Reset to 0. */
4665 switch (order)
4666 {
4667 case 12:
4668 case 11: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_8), vget_low_s32(samples128_8)));
4669 case 10:
4670 case 9: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_8), vget_high_s32(samples128_8)));
4671 case 8:
4672 case 7: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_4), vget_low_s32(samples128_4)));
4673 case 6:
4674 case 5: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_4), vget_high_s32(samples128_4)));
4675 case 4:
4676 case 3: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_0), vget_low_s32(samples128_0)));
4677 case 2:
4678 case 1: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_0), vget_high_s32(samples128_0)));
4679 }
4680
4681 /* Horizontal add and shift. */
4682 prediction64 = drflac__vhaddq_s64(prediction128);
4683 prediction64 = vshl_s64(prediction64, shift64);
4684 prediction64 = vadd_s64(prediction64, vdup_n_s64(vgetq_lane_u32(riceParamPart128, 0)));
4685
4686 /* Our value should be sitting in prediction64[0]. We need to combine this with our SSE samples. */
4687 samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
4688 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4689 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(vreinterpret_s32_s64(prediction64), vdup_n_s32(0)), samples128_0);
4690
4691 /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
4692 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4693 }
4694
4695 /* We store samples in groups of 4. */
4696 vst1q_s32(pDecodedSamples, samples128_0);
4697 pDecodedSamples += 4;
4698 }
4699
4700 /* Make sure we process the last few samples. */
4701 i = (count & ~3);
4702 while (i < (int)count) {
4703 /* Rice extraction. */
4704 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
4705 return DRFLAC_FALSE;
4706 }
4707
4708 /* Rice reconstruction. */
4709 riceParamParts[0] &= riceParamMask;
4710 riceParamParts[0] |= (zeroCountParts[0] << riceParam);
4711 riceParamParts[0] = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
4712
4713 /* Sample reconstruction. */
4714 pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
4715
4716 i += 1;
4717 pDecodedSamples += 1;
4718 }
4719
4720 return DRFLAC_TRUE;
4721}
4722
4723static drflac_bool32 drflac__decode_samples_with_residual__rice__neon(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4724{
4725 DRFLAC_ASSERT(bs != NULL);
4726 DRFLAC_ASSERT(pSamplesOut != NULL);
4727
4728 /* In my testing the order is rarely > 12, so in this case I'm going to simplify the NEON implementation by only handling order <= 12. */
4729 if (lpcOrder > 0 && lpcOrder <= 12) {
4730 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
4731 return drflac__decode_samples_with_residual__rice__neon_64(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
4732 } else {
4733 return drflac__decode_samples_with_residual__rice__neon_32(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
4734 }
4735 } else {
4736 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4737 }
4738}
4739#endif
4740
4741static drflac_bool32 drflac__decode_samples_with_residual__rice(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4742{
4743#if defined(DRFLAC_SUPPORT_SSE41)
4744 if (drflac__gIsSSE41Supported) {
4745 return drflac__decode_samples_with_residual__rice__sse41(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4746 } else
4747#elif defined(DRFLAC_SUPPORT_NEON)
4748 if (drflac__gIsNEONSupported) {
4749 return drflac__decode_samples_with_residual__rice__neon(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4750 } else
4751#endif
4752 {
4753 /* Scalar fallback. */
4754 #if 0
4755 return drflac__decode_samples_with_residual__rice__reference(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4756 #else
4757 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4758 #endif
4759 }
4760}
4761
4762/* Reads and seeks past a string of residual values as Rice codes. The decoder should be sitting on the first bit of the Rice codes. */
4763static drflac_bool32 drflac__read_and_seek_residual__rice(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam)
4764{
4765 drflac_uint32 i;
4766
4767 DRFLAC_ASSERT(bs != NULL);
4768
4769 for (i = 0; i < count; ++i) {
4770 if (!drflac__seek_rice_parts(bs, riceParam)) {
4771 return DRFLAC_FALSE;
4772 }
4773 }
4774
4775 return DRFLAC_TRUE;
4776}
4777
4778#if defined(__clang__)
4779__attribute__((no_sanitize("signed-integer-overflow")))
4780#endif
4781static drflac_bool32 drflac__decode_samples_with_residual__unencoded(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 unencodedBitsPerSample, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4782{
4783 drflac_uint32 i;
4784
4785 DRFLAC_ASSERT(bs != NULL);
4786 DRFLAC_ASSERT(unencodedBitsPerSample <= 31); /* <-- unencodedBitsPerSample is a 5 bit number, so cannot exceed 31. */
4787 DRFLAC_ASSERT(pSamplesOut != NULL);
4788
4789 for (i = 0; i < count; ++i) {
4790 if (unencodedBitsPerSample > 0) {
4791 if (!drflac__read_int32(bs, unencodedBitsPerSample, pSamplesOut + i)) {
4792 return DRFLAC_FALSE;
4793 }
4794 } else {
4795 pSamplesOut[i] = 0;
4796 }
4797
4798 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
4799 pSamplesOut[i] += drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
4800 } else {
4801 pSamplesOut[i] += drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
4802 }
4803 }
4804
4805 return DRFLAC_TRUE;
4806}
4807
4808
4809/*
4810Reads and decodes the residual for the sub-frame the decoder is currently sitting on. This function should be called
4811when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be ignored. The
4812<blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
4813*/
4814static drflac_bool32 drflac__decode_samples_with_residual(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 blockSize, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
4815{
4816 drflac_uint8 residualMethod;
4817 drflac_uint8 partitionOrder;
4818 drflac_uint32 samplesInPartition;
4819 drflac_uint32 partitionsRemaining;
4820
4821 DRFLAC_ASSERT(bs != NULL);
4822 DRFLAC_ASSERT(blockSize != 0);
4823 DRFLAC_ASSERT(pDecodedSamples != NULL); /* <-- Should we allow NULL, in which case we just seek past the residual rather than do a full decode? */
4824
4825 if (!drflac__read_uint8(bs, 2, &residualMethod)) {
4826 return DRFLAC_FALSE;
4827 }
4828
4829 if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4830 return DRFLAC_FALSE; /* Unknown or unsupported residual coding method. */
4831 }
4832
4833 /* Ignore the first <order> values. */
4834 pDecodedSamples += lpcOrder;
4835
4836 if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
4837 return DRFLAC_FALSE;
4838 }
4839
4840 /*
4841 From the FLAC spec:
4842 The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
4843 */
4844 if (partitionOrder > 8) {
4845 return DRFLAC_FALSE;
4846 }
4847
4848 /* Validation check. */
4849 if ((blockSize / (1 << partitionOrder)) < lpcOrder) {
4850 return DRFLAC_FALSE;
4851 }
4852
4853 samplesInPartition = (blockSize / (1 << partitionOrder)) - lpcOrder;
4854 partitionsRemaining = (1 << partitionOrder);
4855 for (;;) {
4856 drflac_uint8 riceParam = 0;
4857 if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
4858 if (!drflac__read_uint8(bs, 4, &riceParam)) {
4859 return DRFLAC_FALSE;
4860 }
4861 if (riceParam == 15) {
4862 riceParam = 0xFF;
4863 }
4864 } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4865 if (!drflac__read_uint8(bs, 5, &riceParam)) {
4866 return DRFLAC_FALSE;
4867 }
4868 if (riceParam == 31) {
4869 riceParam = 0xFF;
4870 }
4871 }
4872
4873 if (riceParam != 0xFF) {
4874 if (!drflac__decode_samples_with_residual__rice(bs, bitsPerSample, samplesInPartition, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
4875 return DRFLAC_FALSE;
4876 }
4877 } else {
4878 drflac_uint8 unencodedBitsPerSample = 0;
4879 if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
4880 return DRFLAC_FALSE;
4881 }
4882
4883 if (!drflac__decode_samples_with_residual__unencoded(bs, bitsPerSample, samplesInPartition, unencodedBitsPerSample, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
4884 return DRFLAC_FALSE;
4885 }
4886 }
4887
4888 pDecodedSamples += samplesInPartition;
4889
4890 if (partitionsRemaining == 1) {
4891 break;
4892 }
4893
4894 partitionsRemaining -= 1;
4895
4896 if (partitionOrder != 0) {
4897 samplesInPartition = blockSize / (1 << partitionOrder);
4898 }
4899 }
4900
4901 return DRFLAC_TRUE;
4902}
4903
4904/*
4905Reads and seeks past the residual for the sub-frame the decoder is currently sitting on. This function should be called
4906when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be set to 0. The
4907<blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
4908*/
4909static drflac_bool32 drflac__read_and_seek_residual(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 order)
4910{
4911 drflac_uint8 residualMethod;
4912 drflac_uint8 partitionOrder;
4913 drflac_uint32 samplesInPartition;
4914 drflac_uint32 partitionsRemaining;
4915
4916 DRFLAC_ASSERT(bs != NULL);
4917 DRFLAC_ASSERT(blockSize != 0);
4918
4919 if (!drflac__read_uint8(bs, 2, &residualMethod)) {
4920 return DRFLAC_FALSE;
4921 }
4922
4923 if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4924 return DRFLAC_FALSE; /* Unknown or unsupported residual coding method. */
4925 }
4926
4927 if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
4928 return DRFLAC_FALSE;
4929 }
4930
4931 /*
4932 From the FLAC spec:
4933 The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
4934 */
4935 if (partitionOrder > 8) {
4936 return DRFLAC_FALSE;
4937 }
4938
4939 /* Validation check. */
4940 if ((blockSize / (1 << partitionOrder)) <= order) {
4941 return DRFLAC_FALSE;
4942 }
4943
4944 samplesInPartition = (blockSize / (1 << partitionOrder)) - order;
4945 partitionsRemaining = (1 << partitionOrder);
4946 for (;;)
4947 {
4948 drflac_uint8 riceParam = 0;
4949 if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
4950 if (!drflac__read_uint8(bs, 4, &riceParam)) {
4951 return DRFLAC_FALSE;
4952 }
4953 if (riceParam == 15) {
4954 riceParam = 0xFF;
4955 }
4956 } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4957 if (!drflac__read_uint8(bs, 5, &riceParam)) {
4958 return DRFLAC_FALSE;
4959 }
4960 if (riceParam == 31) {
4961 riceParam = 0xFF;
4962 }
4963 }
4964
4965 if (riceParam != 0xFF) {
4966 if (!drflac__read_and_seek_residual__rice(bs, samplesInPartition, riceParam)) {
4967 return DRFLAC_FALSE;
4968 }
4969 } else {
4970 drflac_uint8 unencodedBitsPerSample = 0;
4971 if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
4972 return DRFLAC_FALSE;
4973 }
4974
4975 if (!drflac__seek_bits(bs, unencodedBitsPerSample * samplesInPartition)) {
4976 return DRFLAC_FALSE;
4977 }
4978 }
4979
4980
4981 if (partitionsRemaining == 1) {
4982 break;
4983 }
4984
4985 partitionsRemaining -= 1;
4986 samplesInPartition = blockSize / (1 << partitionOrder);
4987 }
4988
4989 return DRFLAC_TRUE;
4990}
4991
4992
4993static drflac_bool32 drflac__decode_samples__constant(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
4994{
4995 drflac_uint32 i;
4996
4997 /* Only a single sample needs to be decoded here. */
4998 drflac_int32 sample;
4999 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5000 return DRFLAC_FALSE;
5001 }
5002
5003 /*
5004 We don't really need to expand this, but it does simplify the process of reading samples. If this becomes a performance issue (unlikely)
5005 we'll want to look at a more efficient way.
5006 */
5007 for (i = 0; i < blockSize; ++i) {
5008 pDecodedSamples[i] = sample;
5009 }
5010
5011 return DRFLAC_TRUE;
5012}
5013
5014static drflac_bool32 drflac__decode_samples__verbatim(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
5015{
5016 drflac_uint32 i;
5017
5018 for (i = 0; i < blockSize; ++i) {
5019 drflac_int32 sample;
5020 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5021 return DRFLAC_FALSE;
5022 }
5023
5024 pDecodedSamples[i] = sample;
5025 }
5026
5027 return DRFLAC_TRUE;
5028}
5029
5030static drflac_bool32 drflac__decode_samples__fixed(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
5031{
5032 drflac_uint32 i;
5033
5034 static drflac_int32 lpcCoefficientsTable[5][4] = {
5035 {0, 0, 0, 0},
5036 {1, 0, 0, 0},
5037 {2, -1, 0, 0},
5038 {3, -3, 1, 0},
5039 {4, -6, 4, -1}
5040 };
5041
5042 /* Warm up samples and coefficients. */
5043 for (i = 0; i < lpcOrder; ++i) {
5044 drflac_int32 sample;
5045 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5046 return DRFLAC_FALSE;
5047 }
5048
5049 pDecodedSamples[i] = sample;
5050 }
5051
5052 if (!drflac__decode_samples_with_residual(bs, subframeBitsPerSample, blockSize, lpcOrder, 0, 4, lpcCoefficientsTable[lpcOrder], pDecodedSamples)) {
5053 return DRFLAC_FALSE;
5054 }
5055
5056 return DRFLAC_TRUE;
5057}
5058
5059static drflac_bool32 drflac__decode_samples__lpc(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 bitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
5060{
5061 drflac_uint8 i;
5062 drflac_uint8 lpcPrecision;
5063 drflac_int8 lpcShift;
5064 drflac_int32 coefficients[32];
5065
5066 /* Warm up samples. */
5067 for (i = 0; i < lpcOrder; ++i) {
5068 drflac_int32 sample;
5069 if (!drflac__read_int32(bs, bitsPerSample, &sample)) {
5070 return DRFLAC_FALSE;
5071 }
5072
5073 pDecodedSamples[i] = sample;
5074 }
5075
5076 if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
5077 return DRFLAC_FALSE;
5078 }
5079 if (lpcPrecision == 15) {
5080 return DRFLAC_FALSE; /* Invalid. */
5081 }
5082 lpcPrecision += 1;
5083
5084 if (!drflac__read_int8(bs, 5, &lpcShift)) {
5085 return DRFLAC_FALSE;
5086 }
5087
5088 /*
5089 From the FLAC specification:
5090
5091 Quantized linear predictor coefficient shift needed in bits (NOTE: this number is signed two's-complement)
5092
5093 Emphasis on the "signed two's-complement". In practice there does not seem to be any encoders nor decoders supporting negative shifts. For now dr_flac is
5094 not going to support negative shifts as I don't have any reference files. However, when a reference file comes through I will consider adding support.
5095 */
5096 if (lpcShift < 0) {
5097 return DRFLAC_FALSE;
5098 }
5099
5100 DRFLAC_ZERO_MEMORY(coefficients, sizeof(coefficients));
5101 for (i = 0; i < lpcOrder; ++i) {
5102 if (!drflac__read_int32(bs, lpcPrecision, coefficients + i)) {
5103 return DRFLAC_FALSE;
5104 }
5105 }
5106
5107 if (!drflac__decode_samples_with_residual(bs, bitsPerSample, blockSize, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
5108 return DRFLAC_FALSE;
5109 }
5110
5111 return DRFLAC_TRUE;
5112}
5113
5114
5115static drflac_bool32 drflac__read_next_flac_frame_header(drflac_bs* bs, drflac_uint8 streaminfoBitsPerSample, drflac_frame_header* header)
5116{
5117 const drflac_uint32 sampleRateTable[12] = {0, 88200, 176400, 192000, 8000, 16000, 22050, 24000, 32000, 44100, 48000, 96000};
5118 const drflac_uint8 bitsPerSampleTable[8] = {0, 8, 12, (drflac_uint8)-1, 16, 20, 24, (drflac_uint8)-1}; /* -1 = reserved. */
5119
5120 DRFLAC_ASSERT(bs != NULL);
5121 DRFLAC_ASSERT(header != NULL);
5122
5123 /* Keep looping until we find a valid sync code. */
5124 for (;;) {
5125 drflac_uint8 crc8 = 0xCE; /* 0xCE = drflac_crc8(0, 0x3FFE, 14); */
5126 drflac_uint8 reserved = 0;
5127 drflac_uint8 blockingStrategy = 0;
5128 drflac_uint8 blockSize = 0;
5129 drflac_uint8 sampleRate = 0;
5130 drflac_uint8 channelAssignment = 0;
5131 drflac_uint8 bitsPerSample = 0;
5132 drflac_bool32 isVariableBlockSize;
5133
5134 if (!drflac__find_and_seek_to_next_sync_code(bs)) {
5135 return DRFLAC_FALSE;
5136 }
5137
5138 if (!drflac__read_uint8(bs, 1, &reserved)) {
5139 return DRFLAC_FALSE;
5140 }
5141 if (reserved == 1) {
5142 continue;
5143 }
5144 crc8 = drflac_crc8(crc8, reserved, 1);
5145
5146 if (!drflac__read_uint8(bs, 1, &blockingStrategy)) {
5147 return DRFLAC_FALSE;
5148 }
5149 crc8 = drflac_crc8(crc8, blockingStrategy, 1);
5150
5151 if (!drflac__read_uint8(bs, 4, &blockSize)) {
5152 return DRFLAC_FALSE;
5153 }
5154 if (blockSize == 0) {
5155 continue;
5156 }
5157 crc8 = drflac_crc8(crc8, blockSize, 4);
5158
5159 if (!drflac__read_uint8(bs, 4, &sampleRate)) {
5160 return DRFLAC_FALSE;
5161 }
5162 crc8 = drflac_crc8(crc8, sampleRate, 4);
5163
5164 if (!drflac__read_uint8(bs, 4, &channelAssignment)) {
5165 return DRFLAC_FALSE;
5166 }
5167 if (channelAssignment > 10) {
5168 continue;
5169 }
5170 crc8 = drflac_crc8(crc8, channelAssignment, 4);
5171
5172 if (!drflac__read_uint8(bs, 3, &bitsPerSample)) {
5173 return DRFLAC_FALSE;
5174 }
5175 if (bitsPerSample == 3 || bitsPerSample == 7) {
5176 continue;
5177 }
5178 crc8 = drflac_crc8(crc8, bitsPerSample, 3);
5179
5180
5181 if (!drflac__read_uint8(bs, 1, &reserved)) {
5182 return DRFLAC_FALSE;
5183 }
5184 if (reserved == 1) {
5185 continue;
5186 }
5187 crc8 = drflac_crc8(crc8, reserved, 1);
5188
5189
5190 isVariableBlockSize = blockingStrategy == 1;
5191 if (isVariableBlockSize) {
5192 drflac_uint64 pcmFrameNumber;
5193 drflac_result result = drflac__read_utf8_coded_number(bs, &pcmFrameNumber, &crc8);
5194 if (result != DRFLAC_SUCCESS) {
5195 if (result == DRFLAC_AT_END) {
5196 return DRFLAC_FALSE;
5197 } else {
5198 continue;
5199 }
5200 }
5201 header->flacFrameNumber = 0;
5202 header->pcmFrameNumber = pcmFrameNumber;
5203 } else {
5204 drflac_uint64 flacFrameNumber = 0;
5205 drflac_result result = drflac__read_utf8_coded_number(bs, &flacFrameNumber, &crc8);
5206 if (result != DRFLAC_SUCCESS) {
5207 if (result == DRFLAC_AT_END) {
5208 return DRFLAC_FALSE;
5209 } else {
5210 continue;
5211 }
5212 }
5213 header->flacFrameNumber = (drflac_uint32)flacFrameNumber; /* <-- Safe cast. */
5214 header->pcmFrameNumber = 0;
5215 }
5216
5217
5218 DRFLAC_ASSERT(blockSize > 0);
5219 if (blockSize == 1) {
5220 header->blockSizeInPCMFrames = 192;
5221 } else if (blockSize <= 5) {
5222 DRFLAC_ASSERT(blockSize >= 2);
5223 header->blockSizeInPCMFrames = 576 * (1 << (blockSize - 2));
5224 } else if (blockSize == 6) {
5225 if (!drflac__read_uint16(bs, 8, &header->blockSizeInPCMFrames)) {
5226 return DRFLAC_FALSE;
5227 }
5228 crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 8);
5229 header->blockSizeInPCMFrames += 1;
5230 } else if (blockSize == 7) {
5231 if (!drflac__read_uint16(bs, 16, &header->blockSizeInPCMFrames)) {
5232 return DRFLAC_FALSE;
5233 }
5234 crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 16);
5235 if (header->blockSizeInPCMFrames == 0xFFFF) {
5236 return DRFLAC_FALSE; /* Frame is too big. This is the size of the frame minus 1. The STREAMINFO block defines the max block size which is 16-bits. Adding one will make it 17 bits and therefore too big. */
5237 }
5238 header->blockSizeInPCMFrames += 1;
5239 } else {
5240 DRFLAC_ASSERT(blockSize >= 8);
5241 header->blockSizeInPCMFrames = 256 * (1 << (blockSize - 8));
5242 }
5243
5244
5245 if (sampleRate <= 11) {
5246 header->sampleRate = sampleRateTable[sampleRate];
5247 } else if (sampleRate == 12) {
5248 if (!drflac__read_uint32(bs, 8, &header->sampleRate)) {
5249 return DRFLAC_FALSE;
5250 }
5251 crc8 = drflac_crc8(crc8, header->sampleRate, 8);
5252 header->sampleRate *= 1000;
5253 } else if (sampleRate == 13) {
5254 if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
5255 return DRFLAC_FALSE;
5256 }
5257 crc8 = drflac_crc8(crc8, header->sampleRate, 16);
5258 } else if (sampleRate == 14) {
5259 if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
5260 return DRFLAC_FALSE;
5261 }
5262 crc8 = drflac_crc8(crc8, header->sampleRate, 16);
5263 header->sampleRate *= 10;
5264 } else {
5265 continue; /* Invalid. Assume an invalid block. */
5266 }
5267
5268
5269 header->channelAssignment = channelAssignment;
5270
5271 header->bitsPerSample = bitsPerSampleTable[bitsPerSample];
5272 if (header->bitsPerSample == 0) {
5273 header->bitsPerSample = streaminfoBitsPerSample;
5274 }
5275
5276 if (header->bitsPerSample != streaminfoBitsPerSample) {
5277 /* If this subframe has a different bitsPerSample then streaminfo or the first frame, reject it */
5278 return DRFLAC_FALSE;
5279 }
5280
5281 if (!drflac__read_uint8(bs, 8, &header->crc8)) {
5282 return DRFLAC_FALSE;
5283 }
5284
5285#ifndef DR_FLAC_NO_CRC
5286 if (header->crc8 != crc8) {
5287 continue; /* CRC mismatch. Loop back to the top and find the next sync code. */
5288 }
5289#endif
5290 return DRFLAC_TRUE;
5291 }
5292}
5293
5294static drflac_bool32 drflac__read_subframe_header(drflac_bs* bs, drflac_subframe* pSubframe)
5295{
5296 drflac_uint8 header;
5297 int type;
5298
5299 if (!drflac__read_uint8(bs, 8, &header)) {
5300 return DRFLAC_FALSE;
5301 }
5302
5303 /* First bit should always be 0. */
5304 if ((header & 0x80) != 0) {
5305 return DRFLAC_FALSE;
5306 }
5307
5308 /*
5309 Default to 0 for the LPC order. It's important that we always set this to 0 for non LPC
5310 and FIXED subframes because we'll be using it in a generic validation check later.
5311 */
5312 pSubframe->lpcOrder = 0;
5313
5314 type = (header & 0x7E) >> 1;
5315 if (type == 0) {
5316 pSubframe->subframeType = DRFLAC_SUBFRAME_CONSTANT;
5317 } else if (type == 1) {
5318 pSubframe->subframeType = DRFLAC_SUBFRAME_VERBATIM;
5319 } else {
5320 if ((type & 0x20) != 0) {
5321 pSubframe->subframeType = DRFLAC_SUBFRAME_LPC;
5322 pSubframe->lpcOrder = (drflac_uint8)(type & 0x1F) + 1;
5323 } else if ((type & 0x08) != 0) {
5324 pSubframe->subframeType = DRFLAC_SUBFRAME_FIXED;
5325 pSubframe->lpcOrder = (drflac_uint8)(type & 0x07);
5326 if (pSubframe->lpcOrder > 4) {
5327 pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
5328 pSubframe->lpcOrder = 0;
5329 }
5330 } else {
5331 pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
5332 }
5333 }
5334
5335 if (pSubframe->subframeType == DRFLAC_SUBFRAME_RESERVED) {
5336 return DRFLAC_FALSE;
5337 }
5338
5339 /* Wasted bits per sample. */
5340 pSubframe->wastedBitsPerSample = 0;
5341 if ((header & 0x01) == 1) {
5342 unsigned int wastedBitsPerSample;
5343 if (!drflac__seek_past_next_set_bit(bs, &wastedBitsPerSample)) {
5344 return DRFLAC_FALSE;
5345 }
5346 pSubframe->wastedBitsPerSample = (drflac_uint8)wastedBitsPerSample + 1;
5347 }
5348
5349 return DRFLAC_TRUE;
5350}
5351
5352static drflac_bool32 drflac__decode_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex, drflac_int32* pDecodedSamplesOut)
5353{
5354 drflac_subframe* pSubframe;
5355 drflac_uint32 subframeBitsPerSample;
5356
5357 DRFLAC_ASSERT(bs != NULL);
5358 DRFLAC_ASSERT(frame != NULL);
5359
5360 pSubframe = frame->subframes + subframeIndex;
5361 if (!drflac__read_subframe_header(bs, pSubframe)) {
5362 return DRFLAC_FALSE;
5363 }
5364
5365 /* Side channels require an extra bit per sample. Took a while to figure that one out... */
5366 subframeBitsPerSample = frame->header.bitsPerSample;
5367 if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
5368 subframeBitsPerSample += 1;
5369 } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
5370 subframeBitsPerSample += 1;
5371 }
5372
5373 if (subframeBitsPerSample > 32) {
5374 /* libFLAC and ffmpeg reject 33-bit subframes as well */
5375 return DRFLAC_FALSE;
5376 }
5377
5378 /* Need to handle wasted bits per sample. */
5379 if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
5380 return DRFLAC_FALSE;
5381 }
5382 subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
5383
5384 pSubframe->pSamplesS32 = pDecodedSamplesOut;
5385
5386 /*
5387 pDecodedSamplesOut will be pointing to a buffer that was allocated with enough memory to store
5388 maxBlockSizeInPCMFrames samples (as specified in the FLAC header). We need to guard against an
5389 overflow here. At a higher level we are checking maxBlockSizeInPCMFrames from the header, but
5390 here we need to do an additional check to ensure this frame's block size fully encompasses any
5391 warmup samples which is determined by the LPC order. For non LPC and FIXED subframes, the LPC
5392 order will be have been set to 0 in drflac__read_subframe_header().
5393 */
5394 if (frame->header.blockSizeInPCMFrames < pSubframe->lpcOrder) {
5395 return DRFLAC_FALSE;
5396 }
5397
5398 switch (pSubframe->subframeType)
5399 {
5400 case DRFLAC_SUBFRAME_CONSTANT:
5401 {
5402 drflac__decode_samples__constant(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
5403 } break;
5404
5405 case DRFLAC_SUBFRAME_VERBATIM:
5406 {
5407 drflac__decode_samples__verbatim(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
5408 } break;
5409
5410 case DRFLAC_SUBFRAME_FIXED:
5411 {
5412 drflac__decode_samples__fixed(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
5413 } break;
5414
5415 case DRFLAC_SUBFRAME_LPC:
5416 {
5417 drflac__decode_samples__lpc(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
5418 } break;
5419
5420 default: return DRFLAC_FALSE;
5421 }
5422
5423 return DRFLAC_TRUE;
5424}
5425
5426static drflac_bool32 drflac__seek_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex)
5427{
5428 drflac_subframe* pSubframe;
5429 drflac_uint32 subframeBitsPerSample;
5430
5431 DRFLAC_ASSERT(bs != NULL);
5432 DRFLAC_ASSERT(frame != NULL);
5433
5434 pSubframe = frame->subframes + subframeIndex;
5435 if (!drflac__read_subframe_header(bs, pSubframe)) {
5436 return DRFLAC_FALSE;
5437 }
5438
5439 /* Side channels require an extra bit per sample. Took a while to figure that one out... */
5440 subframeBitsPerSample = frame->header.bitsPerSample;
5441 if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
5442 subframeBitsPerSample += 1;
5443 } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
5444 subframeBitsPerSample += 1;
5445 }
5446
5447 /* Need to handle wasted bits per sample. */
5448 if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
5449 return DRFLAC_FALSE;
5450 }
5451 subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
5452
5453 pSubframe->pSamplesS32 = NULL;
5454
5455 switch (pSubframe->subframeType)
5456 {
5457 case DRFLAC_SUBFRAME_CONSTANT:
5458 {
5459 if (!drflac__seek_bits(bs, subframeBitsPerSample)) {
5460 return DRFLAC_FALSE;
5461 }
5462 } break;
5463
5464 case DRFLAC_SUBFRAME_VERBATIM:
5465 {
5466 unsigned int bitsToSeek = frame->header.blockSizeInPCMFrames * subframeBitsPerSample;
5467 if (!drflac__seek_bits(bs, bitsToSeek)) {
5468 return DRFLAC_FALSE;
5469 }
5470 } break;
5471
5472 case DRFLAC_SUBFRAME_FIXED:
5473 {
5474 unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
5475 if (!drflac__seek_bits(bs, bitsToSeek)) {
5476 return DRFLAC_FALSE;
5477 }
5478
5479 if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
5480 return DRFLAC_FALSE;
5481 }
5482 } break;
5483
5484 case DRFLAC_SUBFRAME_LPC:
5485 {
5486 drflac_uint8 lpcPrecision;
5487
5488 unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
5489 if (!drflac__seek_bits(bs, bitsToSeek)) {
5490 return DRFLAC_FALSE;
5491 }
5492
5493 if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
5494 return DRFLAC_FALSE;
5495 }
5496 if (lpcPrecision == 15) {
5497 return DRFLAC_FALSE; /* Invalid. */
5498 }
5499 lpcPrecision += 1;
5500
5501
5502 bitsToSeek = (pSubframe->lpcOrder * lpcPrecision) + 5; /* +5 for shift. */
5503 if (!drflac__seek_bits(bs, bitsToSeek)) {
5504 return DRFLAC_FALSE;
5505 }
5506
5507 if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
5508 return DRFLAC_FALSE;
5509 }
5510 } break;
5511
5512 default: return DRFLAC_FALSE;
5513 }
5514
5515 return DRFLAC_TRUE;
5516}
5517
5518
5519static DRFLAC_INLINE drflac_uint8 drflac__get_channel_count_from_channel_assignment(drflac_int8 channelAssignment)
5520{
5521 drflac_uint8 lookup[] = {1, 2, 3, 4, 5, 6, 7, 8, 2, 2, 2};
5522
5523 DRFLAC_ASSERT(channelAssignment <= 10);
5524 return lookup[channelAssignment];
5525}
5526
5527static drflac_result drflac__decode_flac_frame(drflac* pFlac)
5528{
5529 int channelCount;
5530 int i;
5531 drflac_uint8 paddingSizeInBits;
5532 drflac_uint16 desiredCRC16;
5533#ifndef DR_FLAC_NO_CRC
5534 drflac_uint16 actualCRC16;
5535#endif
5536
5537 /* This function should be called while the stream is sitting on the first byte after the frame header. */
5538 DRFLAC_ZERO_MEMORY(pFlac->currentFLACFrame.subframes, sizeof(pFlac->currentFLACFrame.subframes));
5539
5540 /* The frame block size must never be larger than the maximum block size defined by the FLAC stream. */
5541 if (pFlac->currentFLACFrame.header.blockSizeInPCMFrames > pFlac->maxBlockSizeInPCMFrames) {
5542 return DRFLAC_ERROR;
5543 }
5544
5545 /* The number of channels in the frame must match the channel count from the STREAMINFO block. */
5546 channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
5547 if (channelCount != (int)pFlac->channels) {
5548 return DRFLAC_ERROR;
5549 }
5550
5551 for (i = 0; i < channelCount; ++i) {
5552 if (!drflac__decode_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i, pFlac->pDecodedSamples + (pFlac->currentFLACFrame.header.blockSizeInPCMFrames * i))) {
5553 return DRFLAC_ERROR;
5554 }
5555 }
5556
5557 paddingSizeInBits = (drflac_uint8)(DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7);
5558 if (paddingSizeInBits > 0) {
5559 drflac_uint8 padding = 0;
5560 if (!drflac__read_uint8(&pFlac->bs, paddingSizeInBits, &padding)) {
5561 return DRFLAC_AT_END;
5562 }
5563 }
5564
5565#ifndef DR_FLAC_NO_CRC
5566 actualCRC16 = drflac__flush_crc16(&pFlac->bs);
5567#endif
5568 if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
5569 return DRFLAC_AT_END;
5570 }
5571
5572#ifndef DR_FLAC_NO_CRC
5573 if (actualCRC16 != desiredCRC16) {
5574 return DRFLAC_CRC_MISMATCH; /* CRC mismatch. */
5575 }
5576#endif
5577
5578 pFlac->currentFLACFrame.pcmFramesRemaining = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
5579
5580 return DRFLAC_SUCCESS;
5581}
5582
5583static drflac_result drflac__seek_flac_frame(drflac* pFlac)
5584{
5585 int channelCount;
5586 int i;
5587 drflac_uint16 desiredCRC16;
5588#ifndef DR_FLAC_NO_CRC
5589 drflac_uint16 actualCRC16;
5590#endif
5591
5592 channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
5593 for (i = 0; i < channelCount; ++i) {
5594 if (!drflac__seek_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i)) {
5595 return DRFLAC_ERROR;
5596 }
5597 }
5598
5599 /* Padding. */
5600 if (!drflac__seek_bits(&pFlac->bs, DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7)) {
5601 return DRFLAC_ERROR;
5602 }
5603
5604 /* CRC. */
5605#ifndef DR_FLAC_NO_CRC
5606 actualCRC16 = drflac__flush_crc16(&pFlac->bs);
5607#endif
5608 if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
5609 return DRFLAC_AT_END;
5610 }
5611
5612#ifndef DR_FLAC_NO_CRC
5613 if (actualCRC16 != desiredCRC16) {
5614 return DRFLAC_CRC_MISMATCH; /* CRC mismatch. */
5615 }
5616#endif
5617
5618 return DRFLAC_SUCCESS;
5619}
5620
5621static drflac_bool32 drflac__read_and_decode_next_flac_frame(drflac* pFlac)
5622{
5623 DRFLAC_ASSERT(pFlac != NULL);
5624
5625 for (;;) {
5626 drflac_result result;
5627
5628 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5629 return DRFLAC_FALSE;
5630 }
5631
5632 result = drflac__decode_flac_frame(pFlac);
5633 if (result != DRFLAC_SUCCESS) {
5634 if (result == DRFLAC_CRC_MISMATCH) {
5635 continue; /* CRC mismatch. Skip to the next frame. */
5636 } else {
5637 return DRFLAC_FALSE;
5638 }
5639 }
5640
5641 return DRFLAC_TRUE;
5642 }
5643}
5644
5645static void drflac__get_pcm_frame_range_of_current_flac_frame(drflac* pFlac, drflac_uint64* pFirstPCMFrame, drflac_uint64* pLastPCMFrame)
5646{
5647 drflac_uint64 firstPCMFrame;
5648 drflac_uint64 lastPCMFrame;
5649
5650 DRFLAC_ASSERT(pFlac != NULL);
5651
5652 firstPCMFrame = pFlac->currentFLACFrame.header.pcmFrameNumber;
5653 if (firstPCMFrame == 0) {
5654 firstPCMFrame = ((drflac_uint64)pFlac->currentFLACFrame.header.flacFrameNumber) * pFlac->maxBlockSizeInPCMFrames;
5655 }
5656
5657 lastPCMFrame = firstPCMFrame + pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
5658 if (lastPCMFrame > 0) {
5659 lastPCMFrame -= 1; /* Needs to be zero based. */
5660 }
5661
5662 if (pFirstPCMFrame) {
5663 *pFirstPCMFrame = firstPCMFrame;
5664 }
5665 if (pLastPCMFrame) {
5666 *pLastPCMFrame = lastPCMFrame;
5667 }
5668}
5669
5670static drflac_bool32 drflac__seek_to_first_frame(drflac* pFlac)
5671{
5672 drflac_bool32 result;
5673
5674 DRFLAC_ASSERT(pFlac != NULL);
5675
5676 result = drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes);
5677
5678 DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
5679 pFlac->currentPCMFrame = 0;
5680
5681 return result;
5682}
5683
5684static DRFLAC_INLINE drflac_result drflac__seek_to_next_flac_frame(drflac* pFlac)
5685{
5686 /* This function should only ever be called while the decoder is sitting on the first byte past the FRAME_HEADER section. */
5687 DRFLAC_ASSERT(pFlac != NULL);
5688 return drflac__seek_flac_frame(pFlac);
5689}
5690
5691
5692static drflac_uint64 drflac__seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 pcmFramesToSeek)
5693{
5694 drflac_uint64 pcmFramesRead = 0;
5695 while (pcmFramesToSeek > 0) {
5696 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
5697 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
5698 break; /* Couldn't read the next frame, so just break from the loop and return. */
5699 }
5700 } else {
5701 if (pFlac->currentFLACFrame.pcmFramesRemaining > pcmFramesToSeek) {
5702 pcmFramesRead += pcmFramesToSeek;
5703 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)pcmFramesToSeek; /* <-- Safe cast. Will always be < currentFrame.pcmFramesRemaining < 65536. */
5704 pcmFramesToSeek = 0;
5705 } else {
5706 pcmFramesRead += pFlac->currentFLACFrame.pcmFramesRemaining;
5707 pcmFramesToSeek -= pFlac->currentFLACFrame.pcmFramesRemaining;
5708 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
5709 }
5710 }
5711 }
5712
5713 pFlac->currentPCMFrame += pcmFramesRead;
5714 return pcmFramesRead;
5715}
5716
5717
5718static drflac_bool32 drflac__seek_to_pcm_frame__brute_force(drflac* pFlac, drflac_uint64 pcmFrameIndex)
5719{
5720 drflac_bool32 isMidFrame = DRFLAC_FALSE;
5721 drflac_uint64 runningPCMFrameCount;
5722
5723 DRFLAC_ASSERT(pFlac != NULL);
5724
5725 /* If we are seeking forward we start from the current position. Otherwise we need to start all the way from the start of the file. */
5726 if (pcmFrameIndex >= pFlac->currentPCMFrame) {
5727 /* Seeking forward. Need to seek from the current position. */
5728 runningPCMFrameCount = pFlac->currentPCMFrame;
5729
5730 /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
5731 if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
5732 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5733 return DRFLAC_FALSE;
5734 }
5735 } else {
5736 isMidFrame = DRFLAC_TRUE;
5737 }
5738 } else {
5739 /* Seeking backwards. Need to seek from the start of the file. */
5740 runningPCMFrameCount = 0;
5741
5742 /* Move back to the start. */
5743 if (!drflac__seek_to_first_frame(pFlac)) {
5744 return DRFLAC_FALSE;
5745 }
5746
5747 /* Decode the first frame in preparation for sample-exact seeking below. */
5748 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5749 return DRFLAC_FALSE;
5750 }
5751 }
5752
5753 /*
5754 We need to as quickly as possible find the frame that contains the target sample. To do this, we iterate over each frame and inspect its
5755 header. If based on the header we can determine that the frame contains the sample, we do a full decode of that frame.
5756 */
5757 for (;;) {
5758 drflac_uint64 pcmFrameCountInThisFLACFrame;
5759 drflac_uint64 firstPCMFrameInFLACFrame = 0;
5760 drflac_uint64 lastPCMFrameInFLACFrame = 0;
5761
5762 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
5763
5764 pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
5765 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
5766 /*
5767 The sample should be in this frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
5768 it never existed and keep iterating.
5769 */
5770 drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
5771
5772 if (!isMidFrame) {
5773 drflac_result result = drflac__decode_flac_frame(pFlac);
5774 if (result == DRFLAC_SUCCESS) {
5775 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
5776 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
5777 } else {
5778 if (result == DRFLAC_CRC_MISMATCH) {
5779 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
5780 } else {
5781 return DRFLAC_FALSE;
5782 }
5783 }
5784 } else {
5785 /* We started seeking mid-frame which means we need to skip the frame decoding part. */
5786 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
5787 }
5788 } else {
5789 /*
5790 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
5791 frame never existed and leave the running sample count untouched.
5792 */
5793 if (!isMidFrame) {
5794 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
5795 if (result == DRFLAC_SUCCESS) {
5796 runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
5797 } else {
5798 if (result == DRFLAC_CRC_MISMATCH) {
5799 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
5800 } else {
5801 return DRFLAC_FALSE;
5802 }
5803 }
5804 } else {
5805 /*
5806 We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
5807 drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
5808 */
5809 runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
5810 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
5811 isMidFrame = DRFLAC_FALSE;
5812 }
5813
5814 /* If we are seeking to the end of the file and we've just hit it, we're done. */
5815 if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
5816 return DRFLAC_TRUE;
5817 }
5818 }
5819
5820 next_iteration:
5821 /* Grab the next frame in preparation for the next iteration. */
5822 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5823 return DRFLAC_FALSE;
5824 }
5825 }
5826}
5827
5828
5829#if !defined(DR_FLAC_NO_CRC)
5830/*
5831We use an average compression ratio to determine our approximate start location. FLAC files are generally about 50%-70% the size of their
5832uncompressed counterparts so we'll use this as a basis. I'm going to split the middle and use a factor of 0.6 to determine the starting
5833location.
5834*/
5835#define DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO 0.6f
5836
5837static drflac_bool32 drflac__seek_to_approximate_flac_frame_to_byte(drflac* pFlac, drflac_uint64 targetByte, drflac_uint64 rangeLo, drflac_uint64 rangeHi, drflac_uint64* pLastSuccessfulSeekOffset)
5838{
5839 DRFLAC_ASSERT(pFlac != NULL);
5840 DRFLAC_ASSERT(pLastSuccessfulSeekOffset != NULL);
5841 DRFLAC_ASSERT(targetByte >= rangeLo);
5842 DRFLAC_ASSERT(targetByte <= rangeHi);
5843
5844 *pLastSuccessfulSeekOffset = pFlac->firstFLACFramePosInBytes;
5845
5846 for (;;) {
5847 /* After rangeLo == rangeHi == targetByte fails, we need to break out. */
5848 drflac_uint64 lastTargetByte = targetByte;
5849
5850 /* When seeking to a byte, failure probably means we've attempted to seek beyond the end of the stream. To counter this we just halve it each attempt. */
5851 if (!drflac__seek_to_byte(&pFlac->bs, targetByte)) {
5852 /* If we couldn't even seek to the first byte in the stream we have a problem. Just abandon the whole thing. */
5853 if (targetByte == 0) {
5854 drflac__seek_to_first_frame(pFlac); /* Try to recover. */
5855 return DRFLAC_FALSE;
5856 }
5857
5858 /* Halve the byte location and continue. */
5859 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5860 rangeHi = targetByte;
5861 } else {
5862 /* Getting here should mean that we have seeked to an appropriate byte. */
5863
5864 /* Clear the details of the FLAC frame so we don't misreport data. */
5865 DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
5866
5867 /*
5868 Now seek to the next FLAC frame. We need to decode the entire frame (not just the header) because it's possible for the header to incorrectly pass the
5869 CRC check and return bad data. We need to decode the entire frame to be more certain. Although this seems unlikely, this has happened to me in testing
5870 so it needs to stay this way for now.
5871 */
5872#if 1
5873 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
5874 /* Halve the byte location and continue. */
5875 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5876 rangeHi = targetByte;
5877 } else {
5878 break;
5879 }
5880#else
5881 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5882 /* Halve the byte location and continue. */
5883 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5884 rangeHi = targetByte;
5885 } else {
5886 break;
5887 }
5888#endif
5889 }
5890
5891 /* We already tried this byte and there are no more to try, break out. */
5892 if(targetByte == lastTargetByte) {
5893 return DRFLAC_FALSE;
5894 }
5895 }
5896
5897 /* The current PCM frame needs to be updated based on the frame we just seeked to. */
5898 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
5899
5900 DRFLAC_ASSERT(targetByte <= rangeHi);
5901
5902 *pLastSuccessfulSeekOffset = targetByte;
5903 return DRFLAC_TRUE;
5904}
5905
5906static drflac_bool32 drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 offset)
5907{
5908 /* This section of code would be used if we were only decoding the FLAC frame header when calling drflac__seek_to_approximate_flac_frame_to_byte(). */
5909#if 0
5910 if (drflac__decode_flac_frame(pFlac) != DRFLAC_SUCCESS) {
5911 /* We failed to decode this frame which may be due to it being corrupt. We'll just use the next valid FLAC frame. */
5912 if (drflac__read_and_decode_next_flac_frame(pFlac) == DRFLAC_FALSE) {
5913 return DRFLAC_FALSE;
5914 }
5915 }
5916#endif
5917
5918 return drflac__seek_forward_by_pcm_frames(pFlac, offset) == offset;
5919}
5920
5921
5922static drflac_bool32 drflac__seek_to_pcm_frame__binary_search_internal(drflac* pFlac, drflac_uint64 pcmFrameIndex, drflac_uint64 byteRangeLo, drflac_uint64 byteRangeHi)
5923{
5924 /* This assumes pFlac->currentPCMFrame is sitting on byteRangeLo upon entry. */
5925
5926 drflac_uint64 targetByte;
5927 drflac_uint64 pcmRangeLo = pFlac->totalPCMFrameCount;
5928 drflac_uint64 pcmRangeHi = 0;
5929 drflac_uint64 lastSuccessfulSeekOffset = (drflac_uint64)-1;
5930 drflac_uint64 closestSeekOffsetBeforeTargetPCMFrame = byteRangeLo;
5931 drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
5932
5933 targetByte = byteRangeLo + (drflac_uint64)(((drflac_int64)((pcmFrameIndex - pFlac->currentPCMFrame) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO);
5934 if (targetByte > byteRangeHi) {
5935 targetByte = byteRangeHi;
5936 }
5937
5938 for (;;) {
5939 if (drflac__seek_to_approximate_flac_frame_to_byte(pFlac, targetByte, byteRangeLo, byteRangeHi, &lastSuccessfulSeekOffset)) {
5940 /* We found a FLAC frame. We need to check if it contains the sample we're looking for. */
5941 drflac_uint64 newPCMRangeLo;
5942 drflac_uint64 newPCMRangeHi;
5943 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &newPCMRangeLo, &newPCMRangeHi);
5944
5945 /* If we selected the same frame, it means we should be pretty close. Just decode the rest. */
5946 if (pcmRangeLo == newPCMRangeLo) {
5947 if (!drflac__seek_to_approximate_flac_frame_to_byte(pFlac, closestSeekOffsetBeforeTargetPCMFrame, closestSeekOffsetBeforeTargetPCMFrame, byteRangeHi, &lastSuccessfulSeekOffset)) {
5948 break; /* Failed to seek to closest frame. */
5949 }
5950
5951 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
5952 return DRFLAC_TRUE;
5953 } else {
5954 break; /* Failed to seek forward. */
5955 }
5956 }
5957
5958 pcmRangeLo = newPCMRangeLo;
5959 pcmRangeHi = newPCMRangeHi;
5960
5961 if (pcmRangeLo <= pcmFrameIndex && pcmRangeHi >= pcmFrameIndex) {
5962 /* The target PCM frame is in this FLAC frame. */
5963 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame) ) {
5964 return DRFLAC_TRUE;
5965 } else {
5966 break; /* Failed to seek to FLAC frame. */
5967 }
5968 } else {
5969 const float approxCompressionRatio = (drflac_int64)(lastSuccessfulSeekOffset - pFlac->firstFLACFramePosInBytes) / ((drflac_int64)(pcmRangeLo * pFlac->channels * pFlac->bitsPerSample)/8.0f);
5970
5971 if (pcmRangeLo > pcmFrameIndex) {
5972 /* We seeked too far forward. We need to move our target byte backward and try again. */
5973 byteRangeHi = lastSuccessfulSeekOffset;
5974 if (byteRangeLo > byteRangeHi) {
5975 byteRangeLo = byteRangeHi;
5976 }
5977
5978 targetByte = byteRangeLo + ((byteRangeHi - byteRangeLo) / 2);
5979 if (targetByte < byteRangeLo) {
5980 targetByte = byteRangeLo;
5981 }
5982 } else /*if (pcmRangeHi < pcmFrameIndex)*/ {
5983 /* We didn't seek far enough. We need to move our target byte forward and try again. */
5984
5985 /* If we're close enough we can just seek forward. */
5986 if ((pcmFrameIndex - pcmRangeLo) < seekForwardThreshold) {
5987 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
5988 return DRFLAC_TRUE;
5989 } else {
5990 break; /* Failed to seek to FLAC frame. */
5991 }
5992 } else {
5993 byteRangeLo = lastSuccessfulSeekOffset;
5994 if (byteRangeHi < byteRangeLo) {
5995 byteRangeHi = byteRangeLo;
5996 }
5997
5998 targetByte = lastSuccessfulSeekOffset + (drflac_uint64)(((drflac_int64)((pcmFrameIndex-pcmRangeLo) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * approxCompressionRatio);
5999 if (targetByte > byteRangeHi) {
6000 targetByte = byteRangeHi;
6001 }
6002
6003 if (closestSeekOffsetBeforeTargetPCMFrame < lastSuccessfulSeekOffset) {
6004 closestSeekOffsetBeforeTargetPCMFrame = lastSuccessfulSeekOffset;
6005 }
6006 }
6007 }
6008 }
6009 } else {
6010 /* Getting here is really bad. We just recover as best we can, but moving to the first frame in the stream, and then abort. */
6011 break;
6012 }
6013 }
6014
6015 drflac__seek_to_first_frame(pFlac); /* <-- Try to recover. */
6016 return DRFLAC_FALSE;
6017}
6018
6019static drflac_bool32 drflac__seek_to_pcm_frame__binary_search(drflac* pFlac, drflac_uint64 pcmFrameIndex)
6020{
6021 drflac_uint64 byteRangeLo;
6022 drflac_uint64 byteRangeHi;
6023 drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
6024
6025 /* Our algorithm currently assumes the FLAC stream is currently sitting at the start. */
6026 if (drflac__seek_to_first_frame(pFlac) == DRFLAC_FALSE) {
6027 return DRFLAC_FALSE;
6028 }
6029
6030 /* If we're close enough to the start, just move to the start and seek forward. */
6031 if (pcmFrameIndex < seekForwardThreshold) {
6032 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFrameIndex) == pcmFrameIndex;
6033 }
6034
6035 /*
6036 Our starting byte range is the byte position of the first FLAC frame and the approximate end of the file as if it were completely uncompressed. This ensures
6037 the entire file is included, even though most of the time it'll exceed the end of the actual stream. This is OK as the frame searching logic will handle it.
6038 */
6039 byteRangeLo = pFlac->firstFLACFramePosInBytes;
6040 byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
6041
6042 return drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi);
6043}
6044#endif /* !DR_FLAC_NO_CRC */
6045
6046static drflac_bool32 drflac__seek_to_pcm_frame__seek_table(drflac* pFlac, drflac_uint64 pcmFrameIndex)
6047{
6048 drflac_uint32 iClosestSeekpoint = 0;
6049 drflac_bool32 isMidFrame = DRFLAC_FALSE;
6050 drflac_uint64 runningPCMFrameCount;
6051 drflac_uint32 iSeekpoint;
6052
6053
6054 DRFLAC_ASSERT(pFlac != NULL);
6055
6056 if (pFlac->pSeekpoints == NULL || pFlac->seekpointCount == 0) {
6057 return DRFLAC_FALSE;
6058 }
6059
6060 /* Do not use the seektable if pcmFramIndex is not coverd by it. */
6061 if (pFlac->pSeekpoints[0].firstPCMFrame > pcmFrameIndex) {
6062 return DRFLAC_FALSE;
6063 }
6064
6065 for (iSeekpoint = 0; iSeekpoint < pFlac->seekpointCount; ++iSeekpoint) {
6066 if (pFlac->pSeekpoints[iSeekpoint].firstPCMFrame >= pcmFrameIndex) {
6067 break;
6068 }
6069
6070 iClosestSeekpoint = iSeekpoint;
6071 }
6072
6073 /* There's been cases where the seek table contains only zeros. We need to do some basic validation on the closest seekpoint. */
6074 if (pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount == 0 || pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount > pFlac->maxBlockSizeInPCMFrames) {
6075 return DRFLAC_FALSE;
6076 }
6077 if (pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame > pFlac->totalPCMFrameCount && pFlac->totalPCMFrameCount > 0) {
6078 return DRFLAC_FALSE;
6079 }
6080
6081#if !defined(DR_FLAC_NO_CRC)
6082 /* At this point we should know the closest seek point. We can use a binary search for this. We need to know the total sample count for this. */
6083 if (pFlac->totalPCMFrameCount > 0) {
6084 drflac_uint64 byteRangeLo;
6085 drflac_uint64 byteRangeHi;
6086
6087 byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
6088 byteRangeLo = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset;
6089
6090 /*
6091 If our closest seek point is not the last one, we only need to search between it and the next one. The section below calculates an appropriate starting
6092 value for byteRangeHi which will clamp it appropriately.
6093
6094 Note that the next seekpoint must have an offset greater than the closest seekpoint because otherwise our binary search algorithm will break down. There
6095 have been cases where a seektable consists of seek points where every byte offset is set to 0 which causes problems. If this happens we need to abort.
6096 */
6097 if (iClosestSeekpoint < pFlac->seekpointCount-1) {
6098 drflac_uint32 iNextSeekpoint = iClosestSeekpoint + 1;
6099
6100 /* Basic validation on the seekpoints to ensure they're usable. */
6101 if (pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset >= pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset || pFlac->pSeekpoints[iNextSeekpoint].pcmFrameCount == 0) {
6102 return DRFLAC_FALSE; /* The next seekpoint doesn't look right. The seek table cannot be trusted from here. Abort. */
6103 }
6104
6105 if (pFlac->pSeekpoints[iNextSeekpoint].firstPCMFrame != (((drflac_uint64)0xFFFFFFFF << 32) | 0xFFFFFFFF)) { /* Make sure it's not a placeholder seekpoint. */
6106 byteRangeHi = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset - 1; /* byteRangeHi must be zero based. */
6107 }
6108 }
6109
6110 if (drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
6111 if (drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6112 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
6113
6114 if (drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi)) {
6115 return DRFLAC_TRUE;
6116 }
6117 }
6118 }
6119 }
6120#endif /* !DR_FLAC_NO_CRC */
6121
6122 /* Getting here means we need to use a slower algorithm because the binary search method failed or cannot be used. */
6123
6124 /*
6125 If we are seeking forward and the closest seekpoint is _before_ the current sample, we just seek forward from where we are. Otherwise we start seeking
6126 from the seekpoint's first sample.
6127 */
6128 if (pcmFrameIndex >= pFlac->currentPCMFrame && pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame <= pFlac->currentPCMFrame) {
6129 /* Optimized case. Just seek forward from where we are. */
6130 runningPCMFrameCount = pFlac->currentPCMFrame;
6131
6132 /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
6133 if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
6134 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6135 return DRFLAC_FALSE;
6136 }
6137 } else {
6138 isMidFrame = DRFLAC_TRUE;
6139 }
6140 } else {
6141 /* Slower case. Seek to the start of the seekpoint and then seek forward from there. */
6142 runningPCMFrameCount = pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame;
6143
6144 if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
6145 return DRFLAC_FALSE;
6146 }
6147
6148 /* Grab the frame the seekpoint is sitting on in preparation for the sample-exact seeking below. */
6149 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6150 return DRFLAC_FALSE;
6151 }
6152 }
6153
6154 for (;;) {
6155 drflac_uint64 pcmFrameCountInThisFLACFrame;
6156 drflac_uint64 firstPCMFrameInFLACFrame = 0;
6157 drflac_uint64 lastPCMFrameInFLACFrame = 0;
6158
6159 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
6160
6161 pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
6162 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
6163 /*
6164 The sample should be in this frame. We need to fully decode it, but if it's an invalid frame (a CRC mismatch) we need to pretend
6165 it never existed and keep iterating.
6166 */
6167 drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
6168
6169 if (!isMidFrame) {
6170 drflac_result result = drflac__decode_flac_frame(pFlac);
6171 if (result == DRFLAC_SUCCESS) {
6172 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
6173 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
6174 } else {
6175 if (result == DRFLAC_CRC_MISMATCH) {
6176 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
6177 } else {
6178 return DRFLAC_FALSE;
6179 }
6180 }
6181 } else {
6182 /* We started seeking mid-frame which means we need to skip the frame decoding part. */
6183 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
6184 }
6185 } else {
6186 /*
6187 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
6188 frame never existed and leave the running sample count untouched.
6189 */
6190 if (!isMidFrame) {
6191 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
6192 if (result == DRFLAC_SUCCESS) {
6193 runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
6194 } else {
6195 if (result == DRFLAC_CRC_MISMATCH) {
6196 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
6197 } else {
6198 return DRFLAC_FALSE;
6199 }
6200 }
6201 } else {
6202 /*
6203 We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
6204 drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
6205 */
6206 runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
6207 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
6208 isMidFrame = DRFLAC_FALSE;
6209 }
6210
6211 /* If we are seeking to the end of the file and we've just hit it, we're done. */
6212 if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
6213 return DRFLAC_TRUE;
6214 }
6215 }
6216
6217 next_iteration:
6218 /* Grab the next frame in preparation for the next iteration. */
6219 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6220 return DRFLAC_FALSE;
6221 }
6222 }
6223}
6224
6225
6226#ifndef DR_FLAC_NO_OGG
6227typedef struct
6228{
6229 drflac_uint8 capturePattern[4]; /* Should be "OggS" */
6230 drflac_uint8 structureVersion; /* Always 0. */
6231 drflac_uint8 headerType;
6232 drflac_uint64 granulePosition;
6233 drflac_uint32 serialNumber;
6234 drflac_uint32 sequenceNumber;
6235 drflac_uint32 checksum;
6236 drflac_uint8 segmentCount;
6237 drflac_uint8 segmentTable[255];
6238} drflac_ogg_page_header;
6239#endif
6240
6241typedef struct
6242{
6243 drflac_read_proc onRead;
6244 drflac_seek_proc onSeek;
6245 drflac_tell_proc onTell;
6246 drflac_meta_proc onMeta;
6247 drflac_container container;
6248 void* pUserData;
6249 void* pUserDataMD;
6250 drflac_uint32 sampleRate;
6251 drflac_uint8 channels;
6252 drflac_uint8 bitsPerSample;
6253 drflac_uint64 totalPCMFrameCount;
6254 drflac_uint16 maxBlockSizeInPCMFrames;
6255 drflac_uint64 runningFilePos;
6256 drflac_bool32 hasStreamInfoBlock;
6257 drflac_bool32 hasMetadataBlocks;
6258 drflac_bs bs; /* <-- A bit streamer is required for loading data during initialization. */
6259 drflac_frame_header firstFrameHeader; /* <-- The header of the first frame that was read during relaxed initalization. Only set if there is no STREAMINFO block. */
6260
6261#ifndef DR_FLAC_NO_OGG
6262 drflac_uint32 oggSerial;
6263 drflac_uint64 oggFirstBytePos;
6264 drflac_ogg_page_header oggBosHeader;
6265#endif
6266} drflac_init_info;
6267
6268static DRFLAC_INLINE void drflac__decode_block_header(drflac_uint32 blockHeader, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
6269{
6270 blockHeader = drflac__be2host_32(blockHeader);
6271 *isLastBlock = (drflac_uint8)((blockHeader & 0x80000000UL) >> 31);
6272 *blockType = (drflac_uint8)((blockHeader & 0x7F000000UL) >> 24);
6273 *blockSize = (blockHeader & 0x00FFFFFFUL);
6274}
6275
6276static DRFLAC_INLINE drflac_bool32 drflac__read_and_decode_block_header(drflac_read_proc onRead, void* pUserData, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
6277{
6278 drflac_uint32 blockHeader;
6279
6280 *blockSize = 0;
6281 if (onRead(pUserData, &blockHeader, 4) != 4) {
6282 return DRFLAC_FALSE;
6283 }
6284
6285 drflac__decode_block_header(blockHeader, isLastBlock, blockType, blockSize);
6286 return DRFLAC_TRUE;
6287}
6288
6289static drflac_bool32 drflac__read_streaminfo(drflac_read_proc onRead, void* pUserData, drflac_streaminfo* pStreamInfo)
6290{
6291 drflac_uint32 blockSizes;
6292 drflac_uint64 frameSizes = 0;
6293 drflac_uint64 importantProps;
6294 drflac_uint8 md5[16];
6295
6296 /* min/max block size. */
6297 if (onRead(pUserData, &blockSizes, 4) != 4) {
6298 return DRFLAC_FALSE;
6299 }
6300
6301 /* min/max frame size. */
6302 if (onRead(pUserData, &frameSizes, 6) != 6) {
6303 return DRFLAC_FALSE;
6304 }
6305
6306 /* Sample rate, channels, bits per sample and total sample count. */
6307 if (onRead(pUserData, &importantProps, 8) != 8) {
6308 return DRFLAC_FALSE;
6309 }
6310
6311 /* MD5 */
6312 if (onRead(pUserData, md5, sizeof(md5)) != sizeof(md5)) {
6313 return DRFLAC_FALSE;
6314 }
6315
6316 blockSizes = drflac__be2host_32(blockSizes);
6317 frameSizes = drflac__be2host_64(frameSizes);
6318 importantProps = drflac__be2host_64(importantProps);
6319
6320 pStreamInfo->minBlockSizeInPCMFrames = (drflac_uint16)((blockSizes & 0xFFFF0000) >> 16);
6321 pStreamInfo->maxBlockSizeInPCMFrames = (drflac_uint16) (blockSizes & 0x0000FFFF);
6322 pStreamInfo->minFrameSizeInPCMFrames = (drflac_uint32)((frameSizes & (((drflac_uint64)0x00FFFFFF << 16) << 24)) >> 40);
6323 pStreamInfo->maxFrameSizeInPCMFrames = (drflac_uint32)((frameSizes & (((drflac_uint64)0x00FFFFFF << 16) << 0)) >> 16);
6324 pStreamInfo->sampleRate = (drflac_uint32)((importantProps & (((drflac_uint64)0x000FFFFF << 16) << 28)) >> 44);
6325 pStreamInfo->channels = (drflac_uint8 )((importantProps & (((drflac_uint64)0x0000000E << 16) << 24)) >> 41) + 1;
6326 pStreamInfo->bitsPerSample = (drflac_uint8 )((importantProps & (((drflac_uint64)0x0000001F << 16) << 20)) >> 36) + 1;
6327 pStreamInfo->totalPCMFrameCount = ((importantProps & ((((drflac_uint64)0x0000000F << 16) << 16) | 0xFFFFFFFF)));
6328 DRFLAC_COPY_MEMORY(pStreamInfo->md5, md5, sizeof(md5));
6329
6330 return DRFLAC_TRUE;
6331}
6332
6333
6334static void* drflac__malloc_default(size_t sz, void* pUserData)
6335{
6336 (void)pUserData;
6337 return DRFLAC_MALLOC(sz);
6338}
6339
6340static void* drflac__realloc_default(void* p, size_t sz, void* pUserData)
6341{
6342 (void)pUserData;
6343 return DRFLAC_REALLOC(p, sz);
6344}
6345
6346static void drflac__free_default(void* p, void* pUserData)
6347{
6348 (void)pUserData;
6349 DRFLAC_FREE(p);
6350}
6351
6352
6353static void* drflac__malloc_from_callbacks(size_t sz, const drflac_allocation_callbacks* pAllocationCallbacks)
6354{
6355 if (pAllocationCallbacks == NULL) {
6356 return NULL;
6357 }
6358
6359 if (pAllocationCallbacks->onMalloc != NULL) {
6360 return pAllocationCallbacks->onMalloc(sz, pAllocationCallbacks->pUserData);
6361 }
6362
6363 /* Try using realloc(). */
6364 if (pAllocationCallbacks->onRealloc != NULL) {
6365 return pAllocationCallbacks->onRealloc(NULL, sz, pAllocationCallbacks->pUserData);
6366 }
6367
6368 return NULL;
6369}
6370
6371static void* drflac__realloc_from_callbacks(void* p, size_t szNew, size_t szOld, const drflac_allocation_callbacks* pAllocationCallbacks)
6372{
6373 if (pAllocationCallbacks == NULL) {
6374 return NULL;
6375 }
6376
6377 if (pAllocationCallbacks->onRealloc != NULL) {
6378 return pAllocationCallbacks->onRealloc(p, szNew, pAllocationCallbacks->pUserData);
6379 }
6380
6381 /* Try emulating realloc() in terms of malloc()/free(). */
6382 if (pAllocationCallbacks->onMalloc != NULL && pAllocationCallbacks->onFree != NULL) {
6383 void* p2;
6384
6385 p2 = pAllocationCallbacks->onMalloc(szNew, pAllocationCallbacks->pUserData);
6386 if (p2 == NULL) {
6387 return NULL;
6388 }
6389
6390 if (p != NULL) {
6391 DRFLAC_COPY_MEMORY(p2, p, szOld);
6392 pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
6393 }
6394
6395 return p2;
6396 }
6397
6398 return NULL;
6399}
6400
6401static void drflac__free_from_callbacks(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
6402{
6403 if (p == NULL || pAllocationCallbacks == NULL) {
6404 return;
6405 }
6406
6407 if (pAllocationCallbacks->onFree != NULL) {
6408 pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
6409 }
6410}
6411
6412
6413static drflac_bool32 drflac__read_and_decode_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_uint64* pFirstFramePos, drflac_uint64* pSeektablePos, drflac_uint32* pSeekpointCount, drflac_allocation_callbacks* pAllocationCallbacks)
6414{
6415 /*
6416 We want to keep track of the byte position in the stream of the seektable. At the time of calling this function we know that
6417 we'll be sitting on byte 42.
6418 */
6419 drflac_uint64 runningFilePos = 42;
6420 drflac_uint64 seektablePos = 0;
6421 drflac_uint32 seektableSize = 0;
6422
6423 (void)onTell;
6424
6425 for (;;) {
6426 drflac_metadata metadata;
6427 drflac_uint8 isLastBlock = 0;
6428 drflac_uint8 blockType = 0;
6429 drflac_uint32 blockSize;
6430 if (drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize) == DRFLAC_FALSE) {
6431 return DRFLAC_FALSE;
6432 }
6433 runningFilePos += 4;
6434
6435 metadata.type = blockType;
6436 metadata.pRawData = NULL;
6437 metadata.rawDataSize = 0;
6438
6439 switch (blockType)
6440 {
6441 case DRFLAC_METADATA_BLOCK_TYPE_APPLICATION:
6442 {
6443 if (blockSize < 4) {
6444 return DRFLAC_FALSE;
6445 }
6446
6447 if (onMeta) {
6448 void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6449 if (pRawData == NULL) {
6450 return DRFLAC_FALSE;
6451 }
6452
6453 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6454 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6455 return DRFLAC_FALSE;
6456 }
6457
6458 metadata.pRawData = pRawData;
6459 metadata.rawDataSize = blockSize;
6460 metadata.data.application.id = drflac__be2host_32(*(drflac_uint32*)pRawData);
6461 metadata.data.application.pData = (const void*)((drflac_uint8*)pRawData + sizeof(drflac_uint32));
6462 metadata.data.application.dataSize = blockSize - sizeof(drflac_uint32);
6463 onMeta(pUserDataMD, &metadata);
6464
6465 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6466 }
6467 } break;
6468
6469 case DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE:
6470 {
6471 seektablePos = runningFilePos;
6472 seektableSize = blockSize;
6473
6474 if (onMeta) {
6475 drflac_uint32 seekpointCount;
6476 drflac_uint32 iSeekpoint;
6477 void* pRawData;
6478
6479 seekpointCount = blockSize/DRFLAC_SEEKPOINT_SIZE_IN_BYTES;
6480
6481 pRawData = drflac__malloc_from_callbacks(seekpointCount * sizeof(drflac_seekpoint), pAllocationCallbacks);
6482 if (pRawData == NULL) {
6483 return DRFLAC_FALSE;
6484 }
6485
6486 /* We need to read seekpoint by seekpoint and do some processing. */
6487 for (iSeekpoint = 0; iSeekpoint < seekpointCount; ++iSeekpoint) {
6488 drflac_seekpoint* pSeekpoint = (drflac_seekpoint*)pRawData + iSeekpoint;
6489
6490 if (onRead(pUserData, pSeekpoint, DRFLAC_SEEKPOINT_SIZE_IN_BYTES) != DRFLAC_SEEKPOINT_SIZE_IN_BYTES) {
6491 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6492 return DRFLAC_FALSE;
6493 }
6494
6495 /* Endian swap. */
6496 pSeekpoint->firstPCMFrame = drflac__be2host_64(pSeekpoint->firstPCMFrame);
6497 pSeekpoint->flacFrameOffset = drflac__be2host_64(pSeekpoint->flacFrameOffset);
6498 pSeekpoint->pcmFrameCount = drflac__be2host_16(pSeekpoint->pcmFrameCount);
6499 }
6500
6501 metadata.pRawData = pRawData;
6502 metadata.rawDataSize = blockSize;
6503 metadata.data.seektable.seekpointCount = seekpointCount;
6504 metadata.data.seektable.pSeekpoints = (const drflac_seekpoint*)pRawData;
6505
6506 onMeta(pUserDataMD, &metadata);
6507
6508 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6509 }
6510 } break;
6511
6512 case DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT:
6513 {
6514 if (blockSize < 8) {
6515 return DRFLAC_FALSE;
6516 }
6517
6518 if (onMeta) {
6519 void* pRawData;
6520 const char* pRunningData;
6521 const char* pRunningDataEnd;
6522 drflac_uint32 i;
6523
6524 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6525 if (pRawData == NULL) {
6526 return DRFLAC_FALSE;
6527 }
6528
6529 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6530 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6531 return DRFLAC_FALSE;
6532 }
6533
6534 metadata.pRawData = pRawData;
6535 metadata.rawDataSize = blockSize;
6536
6537 pRunningData = (const char*)pRawData;
6538 pRunningDataEnd = (const char*)pRawData + blockSize;
6539
6540 metadata.data.vorbis_comment.vendorLength = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6541
6542 /* Need space for the rest of the block */
6543 if ((pRunningDataEnd - pRunningData) - 4 < (drflac_int64)metadata.data.vorbis_comment.vendorLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6544 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6545 return DRFLAC_FALSE;
6546 }
6547 metadata.data.vorbis_comment.vendor = pRunningData; pRunningData += metadata.data.vorbis_comment.vendorLength;
6548 metadata.data.vorbis_comment.commentCount = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6549
6550 /* Need space for 'commentCount' comments after the block, which at minimum is a drflac_uint32 per comment */
6551 if ((pRunningDataEnd - pRunningData) / sizeof(drflac_uint32) < metadata.data.vorbis_comment.commentCount) { /* <-- Note the order of operations to avoid overflow to a valid value */
6552 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6553 return DRFLAC_FALSE;
6554 }
6555 metadata.data.vorbis_comment.pComments = pRunningData;
6556
6557 /* Check that the comments section is valid before passing it to the callback */
6558 for (i = 0; i < metadata.data.vorbis_comment.commentCount; ++i) {
6559 drflac_uint32 commentLength;
6560
6561 if (pRunningDataEnd - pRunningData < 4) {
6562 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6563 return DRFLAC_FALSE;
6564 }
6565
6566 commentLength = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6567 if (pRunningDataEnd - pRunningData < (drflac_int64)commentLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6568 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6569 return DRFLAC_FALSE;
6570 }
6571 pRunningData += commentLength;
6572 }
6573
6574 onMeta(pUserDataMD, &metadata);
6575
6576 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6577 }
6578 } break;
6579
6580 case DRFLAC_METADATA_BLOCK_TYPE_CUESHEET:
6581 {
6582 if (blockSize < 396) {
6583 return DRFLAC_FALSE;
6584 }
6585
6586 if (onMeta) {
6587 void* pRawData;
6588 const char* pRunningData;
6589 const char* pRunningDataEnd;
6590 size_t bufferSize;
6591 drflac_uint8 iTrack;
6592 drflac_uint8 iIndex;
6593 void* pTrackData;
6594
6595 /*
6596 This needs to be loaded in two passes. The first pass is used to calculate the size of the memory allocation
6597 we need for storing the necessary data. The second pass will fill that buffer with usable data.
6598 */
6599 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6600 if (pRawData == NULL) {
6601 return DRFLAC_FALSE;
6602 }
6603
6604 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6605 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6606 return DRFLAC_FALSE;
6607 }
6608
6609 metadata.pRawData = pRawData;
6610 metadata.rawDataSize = blockSize;
6611
6612 pRunningData = (const char*)pRawData;
6613 pRunningDataEnd = (const char*)pRawData + blockSize;
6614
6615 DRFLAC_COPY_MEMORY(metadata.data.cuesheet.catalog, pRunningData, 128); pRunningData += 128;
6616 metadata.data.cuesheet.leadInSampleCount = drflac__be2host_64(*(const drflac_uint64*)pRunningData); pRunningData += 8;
6617 metadata.data.cuesheet.isCD = (pRunningData[0] & 0x80) != 0; pRunningData += 259;
6618 metadata.data.cuesheet.trackCount = pRunningData[0]; pRunningData += 1;
6619 metadata.data.cuesheet.pTrackData = NULL; /* Will be filled later. */
6620
6621 /* Pass 1: Calculate the size of the buffer for the track data. */
6622 {
6623 const char* pRunningDataSaved = pRunningData; /* Will be restored at the end in preparation for the second pass. */
6624
6625 bufferSize = metadata.data.cuesheet.trackCount * DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES;
6626
6627 for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) {
6628 drflac_uint8 indexCount;
6629 drflac_uint32 indexPointSize;
6630
6631 if (pRunningDataEnd - pRunningData < DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES) {
6632 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6633 return DRFLAC_FALSE;
6634 }
6635
6636 /* Skip to the index point count */
6637 pRunningData += 35;
6638
6639 indexCount = pRunningData[0];
6640 pRunningData += 1;
6641
6642 bufferSize += indexCount * sizeof(drflac_cuesheet_track_index);
6643
6644 /* Quick validation check. */
6645 indexPointSize = indexCount * DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES;
6646 if (pRunningDataEnd - pRunningData < (drflac_int64)indexPointSize) {
6647 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6648 return DRFLAC_FALSE;
6649 }
6650
6651 pRunningData += indexPointSize;
6652 }
6653
6654 pRunningData = pRunningDataSaved;
6655 }
6656
6657 /* Pass 2: Allocate a buffer and fill the data. Validation was done in the step above so can be skipped. */
6658 {
6659 char* pRunningTrackData;
6660
6661 pTrackData = drflac__malloc_from_callbacks(bufferSize, pAllocationCallbacks);
6662 if (pTrackData == NULL) {
6663 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6664 return DRFLAC_FALSE;
6665 }
6666
6667 pRunningTrackData = (char*)pTrackData;
6668
6669 for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) {
6670 drflac_uint8 indexCount;
6671
6672 DRFLAC_COPY_MEMORY(pRunningTrackData, pRunningData, DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES);
6673 pRunningData += DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES-1; /* Skip forward, but not beyond the last byte in the CUESHEET_TRACK block which is the index count. */
6674 pRunningTrackData += DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES-1;
6675
6676 /* Grab the index count for the next part. */
6677 indexCount = pRunningData[0];
6678 pRunningData += 1;
6679 pRunningTrackData += 1;
6680
6681 /* Extract each track index. */
6682 for (iIndex = 0; iIndex < indexCount; ++iIndex) {
6683 drflac_cuesheet_track_index* pTrackIndex = (drflac_cuesheet_track_index*)pRunningTrackData;
6684
6685 DRFLAC_COPY_MEMORY(pRunningTrackData, pRunningData, DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES);
6686 pRunningData += DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES;
6687 pRunningTrackData += sizeof(drflac_cuesheet_track_index);
6688
6689 pTrackIndex->offset = drflac__be2host_64(pTrackIndex->offset);
6690 }
6691 }
6692
6693 metadata.data.cuesheet.pTrackData = pTrackData;
6694 }
6695
6696 /* The original data is no longer needed. */
6697 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6698 pRawData = NULL;
6699
6700 onMeta(pUserDataMD, &metadata);
6701
6702 drflac__free_from_callbacks(pTrackData, pAllocationCallbacks);
6703 pTrackData = NULL;
6704 }
6705 } break;
6706
6707 case DRFLAC_METADATA_BLOCK_TYPE_PICTURE:
6708 {
6709 if (blockSize < 32) {
6710 return DRFLAC_FALSE;
6711 }
6712
6713 if (onMeta) {
6714 void* pRawData;
6715 const char* pRunningData;
6716 const char* pRunningDataEnd;
6717
6718 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6719 if (pRawData == NULL) {
6720 return DRFLAC_FALSE;
6721 }
6722
6723 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6724 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6725 return DRFLAC_FALSE;
6726 }
6727
6728 metadata.pRawData = pRawData;
6729 metadata.rawDataSize = blockSize;
6730
6731 pRunningData = (const char*)pRawData;
6732 pRunningDataEnd = (const char*)pRawData + blockSize;
6733
6734 metadata.data.picture.type = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6735 metadata.data.picture.mimeLength = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6736
6737 /* Need space for the rest of the block */
6738 if ((pRunningDataEnd - pRunningData) - 24 < (drflac_int64)metadata.data.picture.mimeLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6739 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6740 return DRFLAC_FALSE;
6741 }
6742 metadata.data.picture.mime = pRunningData; pRunningData += metadata.data.picture.mimeLength;
6743 metadata.data.picture.descriptionLength = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6744
6745 /* Need space for the rest of the block */
6746 if ((pRunningDataEnd - pRunningData) - 20 < (drflac_int64)metadata.data.picture.descriptionLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6747 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6748 return DRFLAC_FALSE;
6749 }
6750 metadata.data.picture.description = pRunningData; pRunningData += metadata.data.picture.descriptionLength;
6751 metadata.data.picture.width = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6752 metadata.data.picture.height = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6753 metadata.data.picture.colorDepth = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6754 metadata.data.picture.indexColorCount = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6755 metadata.data.picture.pictureDataSize = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6756 metadata.data.picture.pPictureData = (const drflac_uint8*)pRunningData;
6757
6758 /* Need space for the picture after the block */
6759 if (pRunningDataEnd - pRunningData < (drflac_int64)metadata.data.picture.pictureDataSize) { /* <-- Note the order of operations to avoid overflow to a valid value */
6760 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6761 return DRFLAC_FALSE;
6762 }
6763
6764 onMeta(pUserDataMD, &metadata);
6765
6766 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6767 }
6768 } break;
6769
6770 case DRFLAC_METADATA_BLOCK_TYPE_PADDING:
6771 {
6772 if (onMeta) {
6773 metadata.data.padding.unused = 0;
6774
6775 /* Padding doesn't have anything meaningful in it, so just skip over it, but make sure the caller is aware of it by firing the callback. */
6776 if (!onSeek(pUserData, blockSize, DRFLAC_SEEK_CUR)) {
6777 isLastBlock = DRFLAC_TRUE; /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
6778 } else {
6779 onMeta(pUserDataMD, &metadata);
6780 }
6781 }
6782 } break;
6783
6784 case DRFLAC_METADATA_BLOCK_TYPE_INVALID:
6785 {
6786 /* Invalid chunk. Just skip over this one. */
6787 if (onMeta) {
6788 if (!onSeek(pUserData, blockSize, DRFLAC_SEEK_CUR)) {
6789 isLastBlock = DRFLAC_TRUE; /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
6790 }
6791 }
6792 } break;
6793
6794 default:
6795 {
6796 /*
6797 It's an unknown chunk, but not necessarily invalid. There's a chance more metadata blocks might be defined later on, so we
6798 can at the very least report the chunk to the application and let it look at the raw data.
6799 */
6800 if (onMeta) {
6801 void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6802 if (pRawData == NULL) {
6803 return DRFLAC_FALSE;
6804 }
6805
6806 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6807 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6808 return DRFLAC_FALSE;
6809 }
6810
6811 metadata.pRawData = pRawData;
6812 metadata.rawDataSize = blockSize;
6813 onMeta(pUserDataMD, &metadata);
6814
6815 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6816 }
6817 } break;
6818 }
6819
6820 /* If we're not handling metadata, just skip over the block. If we are, it will have been handled earlier in the switch statement above. */
6821 if (onMeta == NULL && blockSize > 0) {
6822 if (!onSeek(pUserData, blockSize, DRFLAC_SEEK_CUR)) {
6823 isLastBlock = DRFLAC_TRUE;
6824 }
6825 }
6826
6827 runningFilePos += blockSize;
6828 if (isLastBlock) {
6829 break;
6830 }
6831 }
6832
6833 *pSeektablePos = seektablePos;
6834 *pSeekpointCount = seektableSize / DRFLAC_SEEKPOINT_SIZE_IN_BYTES;
6835 *pFirstFramePos = runningFilePos;
6836
6837 return DRFLAC_TRUE;
6838}
6839
6840static drflac_bool32 drflac__init_private__native(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
6841{
6842 /* Pre Condition: The bit stream should be sitting just past the 4-byte id header. */
6843
6844 drflac_uint8 isLastBlock;
6845 drflac_uint8 blockType;
6846 drflac_uint32 blockSize;
6847
6848 (void)onSeek;
6849
6850 pInit->container = drflac_container_native;
6851
6852 /* The first metadata block should be the STREAMINFO block. */
6853 if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
6854 return DRFLAC_FALSE;
6855 }
6856
6857 if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
6858 if (!relaxed) {
6859 /* We're opening in strict mode and the first block is not the STREAMINFO block. Error. */
6860 return DRFLAC_FALSE;
6861 } else {
6862 /*
6863 Relaxed mode. To open from here we need to just find the first frame and set the sample rate, etc. to whatever is defined
6864 for that frame.
6865 */
6866 pInit->hasStreamInfoBlock = DRFLAC_FALSE;
6867 pInit->hasMetadataBlocks = DRFLAC_FALSE;
6868
6869 if (!drflac__read_next_flac_frame_header(&pInit->bs, 0, &pInit->firstFrameHeader)) {
6870 return DRFLAC_FALSE; /* Couldn't find a frame. */
6871 }
6872
6873 if (pInit->firstFrameHeader.bitsPerSample == 0) {
6874 return DRFLAC_FALSE; /* Failed to initialize because the first frame depends on the STREAMINFO block, which does not exist. */
6875 }
6876
6877 pInit->sampleRate = pInit->firstFrameHeader.sampleRate;
6878 pInit->channels = drflac__get_channel_count_from_channel_assignment(pInit->firstFrameHeader.channelAssignment);
6879 pInit->bitsPerSample = pInit->firstFrameHeader.bitsPerSample;
6880 pInit->maxBlockSizeInPCMFrames = 65535; /* <-- See notes here: https://xiph.org/flac/format.html#metadata_block_streaminfo */
6881 return DRFLAC_TRUE;
6882 }
6883 } else {
6884 drflac_streaminfo streaminfo;
6885 if (!drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
6886 return DRFLAC_FALSE;
6887 }
6888
6889 pInit->hasStreamInfoBlock = DRFLAC_TRUE;
6890 pInit->sampleRate = streaminfo.sampleRate;
6891 pInit->channels = streaminfo.channels;
6892 pInit->bitsPerSample = streaminfo.bitsPerSample;
6893 pInit->totalPCMFrameCount = streaminfo.totalPCMFrameCount;
6894 pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames; /* Don't care about the min block size - only the max (used for determining the size of the memory allocation). */
6895 pInit->hasMetadataBlocks = !isLastBlock;
6896
6897 if (onMeta) {
6898 drflac_metadata metadata;
6899 metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
6900 metadata.pRawData = NULL;
6901 metadata.rawDataSize = 0;
6902 metadata.data.streaminfo = streaminfo;
6903 onMeta(pUserDataMD, &metadata);
6904 }
6905
6906 return DRFLAC_TRUE;
6907 }
6908}
6909
6910#ifndef DR_FLAC_NO_OGG
6911#define DRFLAC_OGG_MAX_PAGE_SIZE 65307
6912#define DRFLAC_OGG_CAPTURE_PATTERN_CRC32 1605413199 /* CRC-32 of "OggS". */
6913
6914typedef enum
6915{
6916 drflac_ogg_recover_on_crc_mismatch,
6917 drflac_ogg_fail_on_crc_mismatch
6918} drflac_ogg_crc_mismatch_recovery;
6919
6920#ifndef DR_FLAC_NO_CRC
6921static drflac_uint32 drflac__crc32_table[] = {
6922 0x00000000L, 0x04C11DB7L, 0x09823B6EL, 0x0D4326D9L,
6923 0x130476DCL, 0x17C56B6BL, 0x1A864DB2L, 0x1E475005L,
6924 0x2608EDB8L, 0x22C9F00FL, 0x2F8AD6D6L, 0x2B4BCB61L,
6925 0x350C9B64L, 0x31CD86D3L, 0x3C8EA00AL, 0x384FBDBDL,
6926 0x4C11DB70L, 0x48D0C6C7L, 0x4593E01EL, 0x4152FDA9L,
6927 0x5F15ADACL, 0x5BD4B01BL, 0x569796C2L, 0x52568B75L,
6928 0x6A1936C8L, 0x6ED82B7FL, 0x639B0DA6L, 0x675A1011L,
6929 0x791D4014L, 0x7DDC5DA3L, 0x709F7B7AL, 0x745E66CDL,
6930 0x9823B6E0L, 0x9CE2AB57L, 0x91A18D8EL, 0x95609039L,
6931 0x8B27C03CL, 0x8FE6DD8BL, 0x82A5FB52L, 0x8664E6E5L,
6932 0xBE2B5B58L, 0xBAEA46EFL, 0xB7A96036L, 0xB3687D81L,
6933 0xAD2F2D84L, 0xA9EE3033L, 0xA4AD16EAL, 0xA06C0B5DL,
6934 0xD4326D90L, 0xD0F37027L, 0xDDB056FEL, 0xD9714B49L,
6935 0xC7361B4CL, 0xC3F706FBL, 0xCEB42022L, 0xCA753D95L,
6936 0xF23A8028L, 0xF6FB9D9FL, 0xFBB8BB46L, 0xFF79A6F1L,
6937 0xE13EF6F4L, 0xE5FFEB43L, 0xE8BCCD9AL, 0xEC7DD02DL,
6938 0x34867077L, 0x30476DC0L, 0x3D044B19L, 0x39C556AEL,
6939 0x278206ABL, 0x23431B1CL, 0x2E003DC5L, 0x2AC12072L,
6940 0x128E9DCFL, 0x164F8078L, 0x1B0CA6A1L, 0x1FCDBB16L,
6941 0x018AEB13L, 0x054BF6A4L, 0x0808D07DL, 0x0CC9CDCAL,
6942 0x7897AB07L, 0x7C56B6B0L, 0x71159069L, 0x75D48DDEL,
6943 0x6B93DDDBL, 0x6F52C06CL, 0x6211E6B5L, 0x66D0FB02L,
6944 0x5E9F46BFL, 0x5A5E5B08L, 0x571D7DD1L, 0x53DC6066L,
6945 0x4D9B3063L, 0x495A2DD4L, 0x44190B0DL, 0x40D816BAL,
6946 0xACA5C697L, 0xA864DB20L, 0xA527FDF9L, 0xA1E6E04EL,
6947 0xBFA1B04BL, 0xBB60ADFCL, 0xB6238B25L, 0xB2E29692L,
6948 0x8AAD2B2FL, 0x8E6C3698L, 0x832F1041L, 0x87EE0DF6L,
6949 0x99A95DF3L, 0x9D684044L, 0x902B669DL, 0x94EA7B2AL,
6950 0xE0B41DE7L, 0xE4750050L, 0xE9362689L, 0xEDF73B3EL,
6951 0xF3B06B3BL, 0xF771768CL, 0xFA325055L, 0xFEF34DE2L,
6952 0xC6BCF05FL, 0xC27DEDE8L, 0xCF3ECB31L, 0xCBFFD686L,
6953 0xD5B88683L, 0xD1799B34L, 0xDC3ABDEDL, 0xD8FBA05AL,
6954 0x690CE0EEL, 0x6DCDFD59L, 0x608EDB80L, 0x644FC637L,
6955 0x7A089632L, 0x7EC98B85L, 0x738AAD5CL, 0x774BB0EBL,
6956 0x4F040D56L, 0x4BC510E1L, 0x46863638L, 0x42472B8FL,
6957 0x5C007B8AL, 0x58C1663DL, 0x558240E4L, 0x51435D53L,
6958 0x251D3B9EL, 0x21DC2629L, 0x2C9F00F0L, 0x285E1D47L,
6959 0x36194D42L, 0x32D850F5L, 0x3F9B762CL, 0x3B5A6B9BL,
6960 0x0315D626L, 0x07D4CB91L, 0x0A97ED48L, 0x0E56F0FFL,
6961 0x1011A0FAL, 0x14D0BD4DL, 0x19939B94L, 0x1D528623L,
6962 0xF12F560EL, 0xF5EE4BB9L, 0xF8AD6D60L, 0xFC6C70D7L,
6963 0xE22B20D2L, 0xE6EA3D65L, 0xEBA91BBCL, 0xEF68060BL,
6964 0xD727BBB6L, 0xD3E6A601L, 0xDEA580D8L, 0xDA649D6FL,
6965 0xC423CD6AL, 0xC0E2D0DDL, 0xCDA1F604L, 0xC960EBB3L,
6966 0xBD3E8D7EL, 0xB9FF90C9L, 0xB4BCB610L, 0xB07DABA7L,
6967 0xAE3AFBA2L, 0xAAFBE615L, 0xA7B8C0CCL, 0xA379DD7BL,
6968 0x9B3660C6L, 0x9FF77D71L, 0x92B45BA8L, 0x9675461FL,
6969 0x8832161AL, 0x8CF30BADL, 0x81B02D74L, 0x857130C3L,
6970 0x5D8A9099L, 0x594B8D2EL, 0x5408ABF7L, 0x50C9B640L,
6971 0x4E8EE645L, 0x4A4FFBF2L, 0x470CDD2BL, 0x43CDC09CL,
6972 0x7B827D21L, 0x7F436096L, 0x7200464FL, 0x76C15BF8L,
6973 0x68860BFDL, 0x6C47164AL, 0x61043093L, 0x65C52D24L,
6974 0x119B4BE9L, 0x155A565EL, 0x18197087L, 0x1CD86D30L,
6975 0x029F3D35L, 0x065E2082L, 0x0B1D065BL, 0x0FDC1BECL,
6976 0x3793A651L, 0x3352BBE6L, 0x3E119D3FL, 0x3AD08088L,
6977 0x2497D08DL, 0x2056CD3AL, 0x2D15EBE3L, 0x29D4F654L,
6978 0xC5A92679L, 0xC1683BCEL, 0xCC2B1D17L, 0xC8EA00A0L,
6979 0xD6AD50A5L, 0xD26C4D12L, 0xDF2F6BCBL, 0xDBEE767CL,
6980 0xE3A1CBC1L, 0xE760D676L, 0xEA23F0AFL, 0xEEE2ED18L,
6981 0xF0A5BD1DL, 0xF464A0AAL, 0xF9278673L, 0xFDE69BC4L,
6982 0x89B8FD09L, 0x8D79E0BEL, 0x803AC667L, 0x84FBDBD0L,
6983 0x9ABC8BD5L, 0x9E7D9662L, 0x933EB0BBL, 0x97FFAD0CL,
6984 0xAFB010B1L, 0xAB710D06L, 0xA6322BDFL, 0xA2F33668L,
6985 0xBCB4666DL, 0xB8757BDAL, 0xB5365D03L, 0xB1F740B4L
6986};
6987#endif
6988
6989static DRFLAC_INLINE drflac_uint32 drflac_crc32_byte(drflac_uint32 crc32, drflac_uint8 data)
6990{
6991#ifndef DR_FLAC_NO_CRC
6992 return (crc32 << 8) ^ drflac__crc32_table[(drflac_uint8)((crc32 >> 24) & 0xFF) ^ data];
6993#else
6994 (void)data;
6995 return crc32;
6996#endif
6997}
6998
6999#if 0
7000static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint32(drflac_uint32 crc32, drflac_uint32 data)
7001{
7002 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 24) & 0xFF));
7003 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 16) & 0xFF));
7004 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 8) & 0xFF));
7005 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 0) & 0xFF));
7006 return crc32;
7007}
7008
7009static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint64(drflac_uint32 crc32, drflac_uint64 data)
7010{
7011 crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 32) & 0xFFFFFFFF));
7012 crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 0) & 0xFFFFFFFF));
7013 return crc32;
7014}
7015#endif
7016
7017static DRFLAC_INLINE drflac_uint32 drflac_crc32_buffer(drflac_uint32 crc32, drflac_uint8* pData, drflac_uint32 dataSize)
7018{
7019 /* This can be optimized. */
7020 drflac_uint32 i;
7021 for (i = 0; i < dataSize; ++i) {
7022 crc32 = drflac_crc32_byte(crc32, pData[i]);
7023 }
7024 return crc32;
7025}
7026
7027
7028static DRFLAC_INLINE drflac_bool32 drflac_ogg__is_capture_pattern(drflac_uint8 pattern[4])
7029{
7030 return pattern[0] == 'O' && pattern[1] == 'g' && pattern[2] == 'g' && pattern[3] == 'S';
7031}
7032
7033static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_header_size(drflac_ogg_page_header* pHeader)
7034{
7035 return 27 + pHeader->segmentCount;
7036}
7037
7038static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_body_size(drflac_ogg_page_header* pHeader)
7039{
7040 drflac_uint32 pageBodySize = 0;
7041 int i;
7042
7043 for (i = 0; i < pHeader->segmentCount; ++i) {
7044 pageBodySize += pHeader->segmentTable[i];
7045 }
7046
7047 return pageBodySize;
7048}
7049
7050static drflac_result drflac_ogg__read_page_header_after_capture_pattern(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
7051{
7052 drflac_uint8 data[23];
7053 drflac_uint32 i;
7054
7055 DRFLAC_ASSERT(*pCRC32 == DRFLAC_OGG_CAPTURE_PATTERN_CRC32);
7056
7057 if (onRead(pUserData, data, 23) != 23) {
7058 return DRFLAC_AT_END;
7059 }
7060 *pBytesRead += 23;
7061
7062 /*
7063 It's not actually used, but set the capture pattern to 'OggS' for completeness. Not doing this will cause static analysers to complain about
7064 us trying to access uninitialized data. We could alternatively just comment out this member of the drflac_ogg_page_header structure, but I
7065 like to have it map to the structure of the underlying data.
7066 */
7067 pHeader->capturePattern[0] = 'O';
7068 pHeader->capturePattern[1] = 'g';
7069 pHeader->capturePattern[2] = 'g';
7070 pHeader->capturePattern[3] = 'S';
7071
7072 pHeader->structureVersion = data[0];
7073 pHeader->headerType = data[1];
7074 DRFLAC_COPY_MEMORY(&pHeader->granulePosition, &data[ 2], 8);
7075 DRFLAC_COPY_MEMORY(&pHeader->serialNumber, &data[10], 4);
7076 DRFLAC_COPY_MEMORY(&pHeader->sequenceNumber, &data[14], 4);
7077 DRFLAC_COPY_MEMORY(&pHeader->checksum, &data[18], 4);
7078 pHeader->segmentCount = data[22];
7079
7080 /* Calculate the CRC. Note that for the calculation the checksum part of the page needs to be set to 0. */
7081 data[18] = 0;
7082 data[19] = 0;
7083 data[20] = 0;
7084 data[21] = 0;
7085
7086 for (i = 0; i < 23; ++i) {
7087 *pCRC32 = drflac_crc32_byte(*pCRC32, data[i]);
7088 }
7089
7090
7091 if (onRead(pUserData, pHeader->segmentTable, pHeader->segmentCount) != pHeader->segmentCount) {
7092 return DRFLAC_AT_END;
7093 }
7094 *pBytesRead += pHeader->segmentCount;
7095
7096 for (i = 0; i < pHeader->segmentCount; ++i) {
7097 *pCRC32 = drflac_crc32_byte(*pCRC32, pHeader->segmentTable[i]);
7098 }
7099
7100 return DRFLAC_SUCCESS;
7101}
7102
7103static drflac_result drflac_ogg__read_page_header(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
7104{
7105 drflac_uint8 id[4];
7106
7107 *pBytesRead = 0;
7108
7109 if (onRead(pUserData, id, 4) != 4) {
7110 return DRFLAC_AT_END;
7111 }
7112 *pBytesRead += 4;
7113
7114 /* We need to read byte-by-byte until we find the OggS capture pattern. */
7115 for (;;) {
7116 if (drflac_ogg__is_capture_pattern(id)) {
7117 drflac_result result;
7118
7119 *pCRC32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
7120
7121 result = drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, pHeader, pBytesRead, pCRC32);
7122 if (result == DRFLAC_SUCCESS) {
7123 return DRFLAC_SUCCESS;
7124 } else {
7125 if (result == DRFLAC_CRC_MISMATCH) {
7126 continue;
7127 } else {
7128 return result;
7129 }
7130 }
7131 } else {
7132 /* The first 4 bytes did not equal the capture pattern. Read the next byte and try again. */
7133 id[0] = id[1];
7134 id[1] = id[2];
7135 id[2] = id[3];
7136 if (onRead(pUserData, &id[3], 1) != 1) {
7137 return DRFLAC_AT_END;
7138 }
7139 *pBytesRead += 1;
7140 }
7141 }
7142}
7143
7144
7145/*
7146The main part of the Ogg encapsulation is the conversion from the physical Ogg bitstream to the native FLAC bitstream. It works
7147in three general stages: Ogg Physical Bitstream -> Ogg/FLAC Logical Bitstream -> FLAC Native Bitstream. dr_flac is designed
7148in such a way that the core sections assume everything is delivered in native format. Therefore, for each encapsulation type
7149dr_flac is supporting there needs to be a layer sitting on top of the onRead and onSeek callbacks that ensures the bits read from
7150the physical Ogg bitstream are converted and delivered in native FLAC format.
7151*/
7152typedef struct
7153{
7154 drflac_read_proc onRead; /* The original onRead callback from drflac_open() and family. */
7155 drflac_seek_proc onSeek; /* The original onSeek callback from drflac_open() and family. */
7156 drflac_tell_proc onTell; /* The original onTell callback from drflac_open() and family. */
7157 void* pUserData; /* The user data passed on onRead and onSeek. This is the user data that was passed on drflac_open() and family. */
7158 drflac_uint64 currentBytePos; /* The position of the byte we are sitting on in the physical byte stream. Used for efficient seeking. */
7159 drflac_uint64 firstBytePos; /* The position of the first byte in the physical bitstream. Points to the start of the "OggS" identifier of the FLAC bos page. */
7160 drflac_uint32 serialNumber; /* The serial number of the FLAC audio pages. This is determined by the initial header page that was read during initialization. */
7161 drflac_ogg_page_header bosPageHeader; /* Used for seeking. */
7162 drflac_ogg_page_header currentPageHeader;
7163 drflac_uint32 bytesRemainingInPage;
7164 drflac_uint32 pageDataSize;
7165 drflac_uint8 pageData[DRFLAC_OGG_MAX_PAGE_SIZE];
7166} drflac_oggbs; /* oggbs = Ogg Bitstream */
7167
7168static size_t drflac_oggbs__read_physical(drflac_oggbs* oggbs, void* bufferOut, size_t bytesToRead)
7169{
7170 size_t bytesActuallyRead = oggbs->onRead(oggbs->pUserData, bufferOut, bytesToRead);
7171 oggbs->currentBytePos += bytesActuallyRead;
7172
7173 return bytesActuallyRead;
7174}
7175
7176static drflac_bool32 drflac_oggbs__seek_physical(drflac_oggbs* oggbs, drflac_uint64 offset, drflac_seek_origin origin)
7177{
7178 if (origin == DRFLAC_SEEK_SET) {
7179 if (offset <= 0x7FFFFFFF) {
7180 if (!oggbs->onSeek(oggbs->pUserData, (int)offset, DRFLAC_SEEK_SET)) {
7181 return DRFLAC_FALSE;
7182 }
7183 oggbs->currentBytePos = offset;
7184
7185 return DRFLAC_TRUE;
7186 } else {
7187 if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, DRFLAC_SEEK_SET)) {
7188 return DRFLAC_FALSE;
7189 }
7190 oggbs->currentBytePos = offset;
7191
7192 return drflac_oggbs__seek_physical(oggbs, offset - 0x7FFFFFFF, DRFLAC_SEEK_CUR);
7193 }
7194 } else {
7195 while (offset > 0x7FFFFFFF) {
7196 if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, DRFLAC_SEEK_CUR)) {
7197 return DRFLAC_FALSE;
7198 }
7199 oggbs->currentBytePos += 0x7FFFFFFF;
7200 offset -= 0x7FFFFFFF;
7201 }
7202
7203 if (!oggbs->onSeek(oggbs->pUserData, (int)offset, DRFLAC_SEEK_CUR)) { /* <-- Safe cast thanks to the loop above. */
7204 return DRFLAC_FALSE;
7205 }
7206 oggbs->currentBytePos += offset;
7207
7208 return DRFLAC_TRUE;
7209 }
7210}
7211
7212static drflac_bool32 drflac_oggbs__goto_next_page(drflac_oggbs* oggbs, drflac_ogg_crc_mismatch_recovery recoveryMethod)
7213{
7214 drflac_ogg_page_header header;
7215 for (;;) {
7216 drflac_uint32 crc32 = 0;
7217 drflac_uint32 bytesRead;
7218 drflac_uint32 pageBodySize;
7219#ifndef DR_FLAC_NO_CRC
7220 drflac_uint32 actualCRC32;
7221#endif
7222
7223 if (drflac_ogg__read_page_header(oggbs->onRead, oggbs->pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7224 return DRFLAC_FALSE;
7225 }
7226 oggbs->currentBytePos += bytesRead;
7227
7228 pageBodySize = drflac_ogg__get_page_body_size(&header);
7229 if (pageBodySize > DRFLAC_OGG_MAX_PAGE_SIZE) {
7230 continue; /* Invalid page size. Assume it's corrupted and just move to the next page. */
7231 }
7232
7233 if (header.serialNumber != oggbs->serialNumber) {
7234 /* It's not a FLAC page. Skip it. */
7235 if (pageBodySize > 0 && !drflac_oggbs__seek_physical(oggbs, pageBodySize, DRFLAC_SEEK_CUR)) {
7236 return DRFLAC_FALSE;
7237 }
7238 continue;
7239 }
7240
7241
7242 /* We need to read the entire page and then do a CRC check on it. If there's a CRC mismatch we need to skip this page. */
7243 if (drflac_oggbs__read_physical(oggbs, oggbs->pageData, pageBodySize) != pageBodySize) {
7244 return DRFLAC_FALSE;
7245 }
7246 oggbs->pageDataSize = pageBodySize;
7247
7248#ifndef DR_FLAC_NO_CRC
7249 actualCRC32 = drflac_crc32_buffer(crc32, oggbs->pageData, oggbs->pageDataSize);
7250 if (actualCRC32 != header.checksum) {
7251 if (recoveryMethod == drflac_ogg_recover_on_crc_mismatch) {
7252 continue; /* CRC mismatch. Skip this page. */
7253 } else {
7254 /*
7255 Even though we are failing on a CRC mismatch, we still want our stream to be in a good state. Therefore we
7256 go to the next valid page to ensure we're in a good state, but return false to let the caller know that the
7257 seek did not fully complete.
7258 */
7259 drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch);
7260 return DRFLAC_FALSE;
7261 }
7262 }
7263#else
7264 (void)recoveryMethod; /* <-- Silence a warning. */
7265#endif
7266
7267 oggbs->currentPageHeader = header;
7268 oggbs->bytesRemainingInPage = pageBodySize;
7269 return DRFLAC_TRUE;
7270 }
7271}
7272
7273/* Function below is unused at the moment, but I might be re-adding it later. */
7274#if 0
7275static drflac_uint8 drflac_oggbs__get_current_segment_index(drflac_oggbs* oggbs, drflac_uint8* pBytesRemainingInSeg)
7276{
7277 drflac_uint32 bytesConsumedInPage = drflac_ogg__get_page_body_size(&oggbs->currentPageHeader) - oggbs->bytesRemainingInPage;
7278 drflac_uint8 iSeg = 0;
7279 drflac_uint32 iByte = 0;
7280 while (iByte < bytesConsumedInPage) {
7281 drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
7282 if (iByte + segmentSize > bytesConsumedInPage) {
7283 break;
7284 } else {
7285 iSeg += 1;
7286 iByte += segmentSize;
7287 }
7288 }
7289
7290 *pBytesRemainingInSeg = oggbs->currentPageHeader.segmentTable[iSeg] - (drflac_uint8)(bytesConsumedInPage - iByte);
7291 return iSeg;
7292}
7293
7294static drflac_bool32 drflac_oggbs__seek_to_next_packet(drflac_oggbs* oggbs)
7295{
7296 /* The current packet ends when we get to the segment with a lacing value of < 255 which is not at the end of a page. */
7297 for (;;) {
7298 drflac_bool32 atEndOfPage = DRFLAC_FALSE;
7299
7300 drflac_uint8 bytesRemainingInSeg;
7301 drflac_uint8 iFirstSeg = drflac_oggbs__get_current_segment_index(oggbs, &bytesRemainingInSeg);
7302
7303 drflac_uint32 bytesToEndOfPacketOrPage = bytesRemainingInSeg;
7304 for (drflac_uint8 iSeg = iFirstSeg; iSeg < oggbs->currentPageHeader.segmentCount; ++iSeg) {
7305 drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
7306 if (segmentSize < 255) {
7307 if (iSeg == oggbs->currentPageHeader.segmentCount-1) {
7308 atEndOfPage = DRFLAC_TRUE;
7309 }
7310
7311 break;
7312 }
7313
7314 bytesToEndOfPacketOrPage += segmentSize;
7315 }
7316
7317 /*
7318 At this point we will have found either the packet or the end of the page. If were at the end of the page we'll
7319 want to load the next page and keep searching for the end of the packet.
7320 */
7321 drflac_oggbs__seek_physical(oggbs, bytesToEndOfPacketOrPage, DRFLAC_SEEK_CUR);
7322 oggbs->bytesRemainingInPage -= bytesToEndOfPacketOrPage;
7323
7324 if (atEndOfPage) {
7325 /*
7326 We're potentially at the next packet, but we need to check the next page first to be sure because the packet may
7327 straddle pages.
7328 */
7329 if (!drflac_oggbs__goto_next_page(oggbs)) {
7330 return DRFLAC_FALSE;
7331 }
7332
7333 /* If it's a fresh packet it most likely means we're at the next packet. */
7334 if ((oggbs->currentPageHeader.headerType & 0x01) == 0) {
7335 return DRFLAC_TRUE;
7336 }
7337 } else {
7338 /* We're at the next packet. */
7339 return DRFLAC_TRUE;
7340 }
7341 }
7342}
7343
7344static drflac_bool32 drflac_oggbs__seek_to_next_frame(drflac_oggbs* oggbs)
7345{
7346 /* The bitstream should be sitting on the first byte just after the header of the frame. */
7347
7348 /* What we're actually doing here is seeking to the start of the next packet. */
7349 return drflac_oggbs__seek_to_next_packet(oggbs);
7350}
7351#endif
7352
7353static size_t drflac__on_read_ogg(void* pUserData, void* bufferOut, size_t bytesToRead)
7354{
7355 drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
7356 drflac_uint8* pRunningBufferOut = (drflac_uint8*)bufferOut;
7357 size_t bytesRead = 0;
7358
7359 DRFLAC_ASSERT(oggbs != NULL);
7360 DRFLAC_ASSERT(pRunningBufferOut != NULL);
7361
7362 /* Reading is done page-by-page. If we've run out of bytes in the page we need to move to the next one. */
7363 while (bytesRead < bytesToRead) {
7364 size_t bytesRemainingToRead = bytesToRead - bytesRead;
7365
7366 if (oggbs->bytesRemainingInPage >= bytesRemainingToRead) {
7367 DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), bytesRemainingToRead);
7368 bytesRead += bytesRemainingToRead;
7369 oggbs->bytesRemainingInPage -= (drflac_uint32)bytesRemainingToRead;
7370 break;
7371 }
7372
7373 /* If we get here it means some of the requested data is contained in the next pages. */
7374 if (oggbs->bytesRemainingInPage > 0) {
7375 DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), oggbs->bytesRemainingInPage);
7376 bytesRead += oggbs->bytesRemainingInPage;
7377 pRunningBufferOut += oggbs->bytesRemainingInPage;
7378 oggbs->bytesRemainingInPage = 0;
7379 }
7380
7381 DRFLAC_ASSERT(bytesRemainingToRead > 0);
7382 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7383 break; /* Failed to go to the next page. Might have simply hit the end of the stream. */
7384 }
7385 }
7386
7387 return bytesRead;
7388}
7389
7390static drflac_bool32 drflac__on_seek_ogg(void* pUserData, int offset, drflac_seek_origin origin)
7391{
7392 drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
7393 int bytesSeeked = 0;
7394
7395 DRFLAC_ASSERT(oggbs != NULL);
7396 DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
7397
7398 /* Seeking is always forward which makes things a lot simpler. */
7399 if (origin == DRFLAC_SEEK_SET) {
7400 if (!drflac_oggbs__seek_physical(oggbs, (int)oggbs->firstBytePos, DRFLAC_SEEK_SET)) {
7401 return DRFLAC_FALSE;
7402 }
7403
7404 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
7405 return DRFLAC_FALSE;
7406 }
7407
7408 return drflac__on_seek_ogg(pUserData, offset, DRFLAC_SEEK_CUR);
7409 } else if (origin == DRFLAC_SEEK_CUR) {
7410 while (bytesSeeked < offset) {
7411 int bytesRemainingToSeek = offset - bytesSeeked;
7412 DRFLAC_ASSERT(bytesRemainingToSeek >= 0);
7413
7414 if (oggbs->bytesRemainingInPage >= (size_t)bytesRemainingToSeek) {
7415 bytesSeeked += bytesRemainingToSeek;
7416 (void)bytesSeeked; /* <-- Silence a dead store warning emitted by Clang Static Analyzer. */
7417 oggbs->bytesRemainingInPage -= bytesRemainingToSeek;
7418 break;
7419 }
7420
7421 /* If we get here it means some of the requested data is contained in the next pages. */
7422 if (oggbs->bytesRemainingInPage > 0) {
7423 bytesSeeked += (int)oggbs->bytesRemainingInPage;
7424 oggbs->bytesRemainingInPage = 0;
7425 }
7426
7427 DRFLAC_ASSERT(bytesRemainingToSeek > 0);
7428 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
7429 /* Failed to go to the next page. We either hit the end of the stream or had a CRC mismatch. */
7430 return DRFLAC_FALSE;
7431 }
7432 }
7433 } else if (origin == DRFLAC_SEEK_END) {
7434 /* Seeking to the end is not supported. */
7435 return DRFLAC_FALSE;
7436 }
7437
7438 return DRFLAC_TRUE;
7439}
7440
7441static drflac_bool32 drflac__on_tell_ogg(void* pUserData, drflac_int64* pCursor)
7442{
7443 /*
7444 Not implemented for Ogg containers because we don't currently track the byte position of the logical bitstream. To support this, we'll need
7445 to track the position in drflac__on_read_ogg and drflac__on_seek_ogg.
7446 */
7447 (void)pUserData;
7448 (void)pCursor;
7449 return DRFLAC_FALSE;
7450}
7451
7452
7453static drflac_bool32 drflac_ogg__seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
7454{
7455 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
7456 drflac_uint64 originalBytePos;
7457 drflac_uint64 runningGranulePosition;
7458 drflac_uint64 runningFrameBytePos;
7459 drflac_uint64 runningPCMFrameCount;
7460
7461 DRFLAC_ASSERT(oggbs != NULL);
7462
7463 originalBytePos = oggbs->currentBytePos; /* For recovery. Points to the OggS identifier. */
7464
7465 /* First seek to the first frame. */
7466 if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes)) {
7467 return DRFLAC_FALSE;
7468 }
7469 oggbs->bytesRemainingInPage = 0;
7470
7471 runningGranulePosition = 0;
7472 for (;;) {
7473 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7474 drflac_oggbs__seek_physical(oggbs, originalBytePos, DRFLAC_SEEK_SET);
7475 return DRFLAC_FALSE; /* Never did find that sample... */
7476 }
7477
7478 runningFrameBytePos = oggbs->currentBytePos - drflac_ogg__get_page_header_size(&oggbs->currentPageHeader) - oggbs->pageDataSize;
7479 if (oggbs->currentPageHeader.granulePosition >= pcmFrameIndex) {
7480 break; /* The sample is somewhere in the previous page. */
7481 }
7482
7483 /*
7484 At this point we know the sample is not in the previous page. It could possibly be in this page. For simplicity we
7485 disregard any pages that do not begin a fresh packet.
7486 */
7487 if ((oggbs->currentPageHeader.headerType & 0x01) == 0) { /* <-- Is it a fresh page? */
7488 if (oggbs->currentPageHeader.segmentTable[0] >= 2) {
7489 drflac_uint8 firstBytesInPage[2];
7490 firstBytesInPage[0] = oggbs->pageData[0];
7491 firstBytesInPage[1] = oggbs->pageData[1];
7492
7493 if ((firstBytesInPage[0] == 0xFF) && (firstBytesInPage[1] & 0xFC) == 0xF8) { /* <-- Does the page begin with a frame's sync code? */
7494 runningGranulePosition = oggbs->currentPageHeader.granulePosition;
7495 }
7496
7497 continue;
7498 }
7499 }
7500 }
7501
7502 /*
7503 We found the page that that is closest to the sample, so now we need to find it. The first thing to do is seek to the
7504 start of that page. In the loop above we checked that it was a fresh page which means this page is also the start of
7505 a new frame. This property means that after we've seeked to the page we can immediately start looping over frames until
7506 we find the one containing the target sample.
7507 */
7508 if (!drflac_oggbs__seek_physical(oggbs, runningFrameBytePos, DRFLAC_SEEK_SET)) {
7509 return DRFLAC_FALSE;
7510 }
7511 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7512 return DRFLAC_FALSE;
7513 }
7514
7515 /*
7516 At this point we'll be sitting on the first byte of the frame header of the first frame in the page. We just keep
7517 looping over these frames until we find the one containing the sample we're after.
7518 */
7519 runningPCMFrameCount = runningGranulePosition;
7520 for (;;) {
7521 /*
7522 There are two ways to find the sample and seek past irrelevant frames:
7523 1) Use the native FLAC decoder.
7524 2) Use Ogg's framing system.
7525
7526 Both of these options have their own pros and cons. Using the native FLAC decoder is slower because it needs to
7527 do a full decode of the frame. Using Ogg's framing system is faster, but more complicated and involves some code
7528 duplication for the decoding of frame headers.
7529
7530 Another thing to consider is that using the Ogg framing system will perform direct seeking of the physical Ogg
7531 bitstream. This is important to consider because it means we cannot read data from the drflac_bs object using the
7532 standard drflac__*() APIs because that will read in extra data for its own internal caching which in turn breaks
7533 the positioning of the read pointer of the physical Ogg bitstream. Therefore, anything that would normally be read
7534 using the native FLAC decoding APIs, such as drflac__read_next_flac_frame_header(), need to be re-implemented so as to
7535 avoid the use of the drflac_bs object.
7536
7537 Considering these issues, I have decided to use the slower native FLAC decoding method for the following reasons:
7538 1) Seeking is already partially accelerated using Ogg's paging system in the code block above.
7539 2) Seeking in an Ogg encapsulated FLAC stream is probably quite uncommon.
7540 3) Simplicity.
7541 */
7542 drflac_uint64 firstPCMFrameInFLACFrame = 0;
7543 drflac_uint64 lastPCMFrameInFLACFrame = 0;
7544 drflac_uint64 pcmFrameCountInThisFrame;
7545
7546 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
7547 return DRFLAC_FALSE;
7548 }
7549
7550 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
7551
7552 pcmFrameCountInThisFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
7553
7554 /* If we are seeking to the end of the file and we've just hit it, we're done. */
7555 if (pcmFrameIndex == pFlac->totalPCMFrameCount && (runningPCMFrameCount + pcmFrameCountInThisFrame) == pFlac->totalPCMFrameCount) {
7556 drflac_result result = drflac__decode_flac_frame(pFlac);
7557 if (result == DRFLAC_SUCCESS) {
7558 pFlac->currentPCMFrame = pcmFrameIndex;
7559 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
7560 return DRFLAC_TRUE;
7561 } else {
7562 return DRFLAC_FALSE;
7563 }
7564 }
7565
7566 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFrame)) {
7567 /*
7568 The sample should be in this FLAC frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
7569 it never existed and keep iterating.
7570 */
7571 drflac_result result = drflac__decode_flac_frame(pFlac);
7572 if (result == DRFLAC_SUCCESS) {
7573 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
7574 drflac_uint64 pcmFramesToDecode = (size_t)(pcmFrameIndex - runningPCMFrameCount); /* <-- Safe cast because the maximum number of samples in a frame is 65535. */
7575 if (pcmFramesToDecode == 0) {
7576 return DRFLAC_TRUE;
7577 }
7578
7579 pFlac->currentPCMFrame = runningPCMFrameCount;
7580
7581 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
7582 } else {
7583 if (result == DRFLAC_CRC_MISMATCH) {
7584 continue; /* CRC mismatch. Pretend this frame never existed. */
7585 } else {
7586 return DRFLAC_FALSE;
7587 }
7588 }
7589 } else {
7590 /*
7591 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
7592 frame never existed and leave the running sample count untouched.
7593 */
7594 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
7595 if (result == DRFLAC_SUCCESS) {
7596 runningPCMFrameCount += pcmFrameCountInThisFrame;
7597 } else {
7598 if (result == DRFLAC_CRC_MISMATCH) {
7599 continue; /* CRC mismatch. Pretend this frame never existed. */
7600 } else {
7601 return DRFLAC_FALSE;
7602 }
7603 }
7604 }
7605 }
7606}
7607
7608
7609
7610static drflac_bool32 drflac__init_private__ogg(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
7611{
7612 drflac_ogg_page_header header;
7613 drflac_uint32 crc32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
7614 drflac_uint32 bytesRead = 0;
7615
7616 /* Pre Condition: The bit stream should be sitting just past the 4-byte OggS capture pattern. */
7617 (void)relaxed;
7618
7619 pInit->container = drflac_container_ogg;
7620 pInit->oggFirstBytePos = 0;
7621
7622 /*
7623 We'll get here if the first 4 bytes of the stream were the OggS capture pattern, however it doesn't necessarily mean the
7624 stream includes FLAC encoded audio. To check for this we need to scan the beginning-of-stream page markers and check if
7625 any match the FLAC specification. Important to keep in mind that the stream may be multiplexed.
7626 */
7627 if (drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7628 return DRFLAC_FALSE;
7629 }
7630 pInit->runningFilePos += bytesRead;
7631
7632 for (;;) {
7633 int pageBodySize;
7634
7635 /* Break if we're past the beginning of stream page. */
7636 if ((header.headerType & 0x02) == 0) {
7637 return DRFLAC_FALSE;
7638 }
7639
7640 /* Check if it's a FLAC header. */
7641 pageBodySize = drflac_ogg__get_page_body_size(&header);
7642 if (pageBodySize == 51) { /* 51 = the lacing value of the FLAC header packet. */
7643 /* It could be a FLAC page... */
7644 drflac_uint32 bytesRemainingInPage = pageBodySize;
7645 drflac_uint8 packetType;
7646
7647 if (onRead(pUserData, &packetType, 1) != 1) {
7648 return DRFLAC_FALSE;
7649 }
7650
7651 bytesRemainingInPage -= 1;
7652 if (packetType == 0x7F) {
7653 /* Increasingly more likely to be a FLAC page... */
7654 drflac_uint8 sig[4];
7655 if (onRead(pUserData, sig, 4) != 4) {
7656 return DRFLAC_FALSE;
7657 }
7658
7659 bytesRemainingInPage -= 4;
7660 if (sig[0] == 'F' && sig[1] == 'L' && sig[2] == 'A' && sig[3] == 'C') {
7661 /* Almost certainly a FLAC page... */
7662 drflac_uint8 mappingVersion[2];
7663 if (onRead(pUserData, mappingVersion, 2) != 2) {
7664 return DRFLAC_FALSE;
7665 }
7666
7667 if (mappingVersion[0] != 1) {
7668 return DRFLAC_FALSE; /* Only supporting version 1.x of the Ogg mapping. */
7669 }
7670
7671 /*
7672 The next 2 bytes are the non-audio packets, not including this one. We don't care about this because we're going to
7673 be handling it in a generic way based on the serial number and packet types.
7674 */
7675 if (!onSeek(pUserData, 2, DRFLAC_SEEK_CUR)) {
7676 return DRFLAC_FALSE;
7677 }
7678
7679 /* Expecting the native FLAC signature "fLaC". */
7680 if (onRead(pUserData, sig, 4) != 4) {
7681 return DRFLAC_FALSE;
7682 }
7683
7684 if (sig[0] == 'f' && sig[1] == 'L' && sig[2] == 'a' && sig[3] == 'C') {
7685 /* The remaining data in the page should be the STREAMINFO block. */
7686 drflac_streaminfo streaminfo;
7687 drflac_uint8 isLastBlock;
7688 drflac_uint8 blockType;
7689 drflac_uint32 blockSize;
7690 if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
7691 return DRFLAC_FALSE;
7692 }
7693
7694 if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
7695 return DRFLAC_FALSE; /* Invalid block type. First block must be the STREAMINFO block. */
7696 }
7697
7698 if (drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
7699 /* Success! */
7700 pInit->hasStreamInfoBlock = DRFLAC_TRUE;
7701 pInit->sampleRate = streaminfo.sampleRate;
7702 pInit->channels = streaminfo.channels;
7703 pInit->bitsPerSample = streaminfo.bitsPerSample;
7704 pInit->totalPCMFrameCount = streaminfo.totalPCMFrameCount;
7705 pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames;
7706 pInit->hasMetadataBlocks = !isLastBlock;
7707
7708 if (onMeta) {
7709 drflac_metadata metadata;
7710 metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
7711 metadata.pRawData = NULL;
7712 metadata.rawDataSize = 0;
7713 metadata.data.streaminfo = streaminfo;
7714 onMeta(pUserDataMD, &metadata);
7715 }
7716
7717 pInit->runningFilePos += pageBodySize;
7718 pInit->oggFirstBytePos = pInit->runningFilePos - 79; /* Subtracting 79 will place us right on top of the "OggS" identifier of the FLAC bos page. */
7719 pInit->oggSerial = header.serialNumber;
7720 pInit->oggBosHeader = header;
7721 break;
7722 } else {
7723 /* Failed to read STREAMINFO block. Aww, so close... */
7724 return DRFLAC_FALSE;
7725 }
7726 } else {
7727 /* Invalid file. */
7728 return DRFLAC_FALSE;
7729 }
7730 } else {
7731 /* Not a FLAC header. Skip it. */
7732 if (!onSeek(pUserData, bytesRemainingInPage, DRFLAC_SEEK_CUR)) {
7733 return DRFLAC_FALSE;
7734 }
7735 }
7736 } else {
7737 /* Not a FLAC header. Seek past the entire page and move on to the next. */
7738 if (!onSeek(pUserData, bytesRemainingInPage, DRFLAC_SEEK_CUR)) {
7739 return DRFLAC_FALSE;
7740 }
7741 }
7742 } else {
7743 if (!onSeek(pUserData, pageBodySize, DRFLAC_SEEK_CUR)) {
7744 return DRFLAC_FALSE;
7745 }
7746 }
7747
7748 pInit->runningFilePos += pageBodySize;
7749
7750
7751 /* Read the header of the next page. */
7752 if (drflac_ogg__read_page_header(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7753 return DRFLAC_FALSE;
7754 }
7755 pInit->runningFilePos += bytesRead;
7756 }
7757
7758 /*
7759 If we get here it means we found a FLAC audio stream. We should be sitting on the first byte of the header of the next page. The next
7760 packets in the FLAC logical stream contain the metadata. The only thing left to do in the initialization phase for Ogg is to create the
7761 Ogg bistream object.
7762 */
7763 pInit->hasMetadataBlocks = DRFLAC_TRUE; /* <-- Always have at least VORBIS_COMMENT metadata block. */
7764 return DRFLAC_TRUE;
7765}
7766#endif
7767
7768static drflac_bool32 drflac__init_private(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD)
7769{
7770 drflac_bool32 relaxed;
7771 drflac_uint8 id[4];
7772
7773 if (pInit == NULL || onRead == NULL || onSeek == NULL) { /* <-- onTell is optional. */
7774 return DRFLAC_FALSE;
7775 }
7776
7777 DRFLAC_ZERO_MEMORY(pInit, sizeof(*pInit));
7778 pInit->onRead = onRead;
7779 pInit->onSeek = onSeek;
7780 pInit->onTell = onTell;
7781 pInit->onMeta = onMeta;
7782 pInit->container = container;
7783 pInit->pUserData = pUserData;
7784 pInit->pUserDataMD = pUserDataMD;
7785
7786 pInit->bs.onRead = onRead;
7787 pInit->bs.onSeek = onSeek;
7788 pInit->bs.onTell = onTell;
7789 pInit->bs.pUserData = pUserData;
7790 drflac__reset_cache(&pInit->bs);
7791
7792
7793 /* If the container is explicitly defined then we can try opening in relaxed mode. */
7794 relaxed = container != drflac_container_unknown;
7795
7796 /* Skip over any ID3 tags. */
7797 for (;;) {
7798 if (onRead(pUserData, id, 4) != 4) {
7799 return DRFLAC_FALSE; /* Ran out of data. */
7800 }
7801 pInit->runningFilePos += 4;
7802
7803 if (id[0] == 'I' && id[1] == 'D' && id[2] == '3') {
7804 drflac_uint8 header[6];
7805 drflac_uint8 flags;
7806 drflac_uint32 headerSize;
7807
7808 if (onRead(pUserData, header, 6) != 6) {
7809 return DRFLAC_FALSE; /* Ran out of data. */
7810 }
7811 pInit->runningFilePos += 6;
7812
7813 flags = header[1];
7814
7815 DRFLAC_COPY_MEMORY(&headerSize, header+2, 4);
7816 headerSize = drflac__unsynchsafe_32(drflac__be2host_32(headerSize));
7817 if (flags & 0x10) {
7818 headerSize += 10;
7819 }
7820
7821 if (!onSeek(pUserData, headerSize, DRFLAC_SEEK_CUR)) {
7822 return DRFLAC_FALSE; /* Failed to seek past the tag. */
7823 }
7824 pInit->runningFilePos += headerSize;
7825 } else {
7826 break;
7827 }
7828 }
7829
7830 if (id[0] == 'f' && id[1] == 'L' && id[2] == 'a' && id[3] == 'C') {
7831 return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7832 }
7833#ifndef DR_FLAC_NO_OGG
7834 if (id[0] == 'O' && id[1] == 'g' && id[2] == 'g' && id[3] == 'S') {
7835 return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7836 }
7837#endif
7838
7839 /* If we get here it means we likely don't have a header. Try opening in relaxed mode, if applicable. */
7840 if (relaxed) {
7841 if (container == drflac_container_native) {
7842 return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7843 }
7844#ifndef DR_FLAC_NO_OGG
7845 if (container == drflac_container_ogg) {
7846 return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7847 }
7848#endif
7849 }
7850
7851 /* Unsupported container. */
7852 return DRFLAC_FALSE;
7853}
7854
7855static void drflac__init_from_info(drflac* pFlac, const drflac_init_info* pInit)
7856{
7857 DRFLAC_ASSERT(pFlac != NULL);
7858 DRFLAC_ASSERT(pInit != NULL);
7859
7860 DRFLAC_ZERO_MEMORY(pFlac, sizeof(*pFlac));
7861 pFlac->bs = pInit->bs;
7862 pFlac->onMeta = pInit->onMeta;
7863 pFlac->pUserDataMD = pInit->pUserDataMD;
7864 pFlac->maxBlockSizeInPCMFrames = pInit->maxBlockSizeInPCMFrames;
7865 pFlac->sampleRate = pInit->sampleRate;
7866 pFlac->channels = (drflac_uint8)pInit->channels;
7867 pFlac->bitsPerSample = (drflac_uint8)pInit->bitsPerSample;
7868 pFlac->totalPCMFrameCount = pInit->totalPCMFrameCount;
7869 pFlac->container = pInit->container;
7870}
7871
7872
7873static drflac* drflac_open_with_metadata_private(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD, const drflac_allocation_callbacks* pAllocationCallbacks)
7874{
7875 drflac_init_info init;
7876 drflac_uint32 allocationSize;
7877 drflac_uint32 wholeSIMDVectorCountPerChannel;
7878 drflac_uint32 decodedSamplesAllocationSize;
7879#ifndef DR_FLAC_NO_OGG
7880 drflac_oggbs* pOggbs = NULL;
7881#endif
7882 drflac_uint64 firstFramePos;
7883 drflac_uint64 seektablePos;
7884 drflac_uint32 seekpointCount;
7885 drflac_allocation_callbacks allocationCallbacks;
7886 drflac* pFlac;
7887
7888 /* CPU support first. */
7889 drflac__init_cpu_caps();
7890
7891 if (!drflac__init_private(&init, onRead, onSeek, onTell, onMeta, container, pUserData, pUserDataMD)) {
7892 return NULL;
7893 }
7894
7895 if (pAllocationCallbacks != NULL) {
7896 allocationCallbacks = *pAllocationCallbacks;
7897 if (allocationCallbacks.onFree == NULL || (allocationCallbacks.onMalloc == NULL && allocationCallbacks.onRealloc == NULL)) {
7898 return NULL; /* Invalid allocation callbacks. */
7899 }
7900 } else {
7901 allocationCallbacks.pUserData = NULL;
7902 allocationCallbacks.onMalloc = drflac__malloc_default;
7903 allocationCallbacks.onRealloc = drflac__realloc_default;
7904 allocationCallbacks.onFree = drflac__free_default;
7905 }
7906
7907
7908 /*
7909 The size of the allocation for the drflac object needs to be large enough to fit the following:
7910 1) The main members of the drflac structure
7911 2) A block of memory large enough to store the decoded samples of the largest frame in the stream
7912 3) If the container is Ogg, a drflac_oggbs object
7913
7914 The complicated part of the allocation is making sure there's enough room the decoded samples, taking into consideration
7915 the different SIMD instruction sets.
7916 */
7917 allocationSize = sizeof(drflac);
7918
7919 /*
7920 The allocation size for decoded frames depends on the number of 32-bit integers that fit inside the largest SIMD vector
7921 we are supporting.
7922 */
7923 if ((init.maxBlockSizeInPCMFrames % (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) == 0) {
7924 wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32)));
7925 } else {
7926 wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) + 1;
7927 }
7928
7929 decodedSamplesAllocationSize = wholeSIMDVectorCountPerChannel * DRFLAC_MAX_SIMD_VECTOR_SIZE * init.channels;
7930
7931 allocationSize += decodedSamplesAllocationSize;
7932 allocationSize += DRFLAC_MAX_SIMD_VECTOR_SIZE; /* Allocate extra bytes to ensure we have enough for alignment. */
7933
7934#ifndef DR_FLAC_NO_OGG
7935 /* There's additional data required for Ogg streams. */
7936 if (init.container == drflac_container_ogg) {
7937 allocationSize += sizeof(drflac_oggbs);
7938
7939 pOggbs = (drflac_oggbs*)drflac__malloc_from_callbacks(sizeof(*pOggbs), &allocationCallbacks);
7940 if (pOggbs == NULL) {
7941 return NULL; /*DRFLAC_OUT_OF_MEMORY;*/
7942 }
7943
7944 DRFLAC_ZERO_MEMORY(pOggbs, sizeof(*pOggbs));
7945 pOggbs->onRead = onRead;
7946 pOggbs->onSeek = onSeek;
7947 pOggbs->onTell = onTell;
7948 pOggbs->pUserData = pUserData;
7949 pOggbs->currentBytePos = init.oggFirstBytePos;
7950 pOggbs->firstBytePos = init.oggFirstBytePos;
7951 pOggbs->serialNumber = init.oggSerial;
7952 pOggbs->bosPageHeader = init.oggBosHeader;
7953 pOggbs->bytesRemainingInPage = 0;
7954 }
7955#endif
7956
7957 /*
7958 This part is a bit awkward. We need to load the seektable so that it can be referenced in-memory, but I want the drflac object to
7959 consist of only a single heap allocation. To this, the size of the seek table needs to be known, which we determine when reading
7960 and decoding the metadata.
7961 */
7962 firstFramePos = 42; /* <-- We know we are at byte 42 at this point. */
7963 seektablePos = 0;
7964 seekpointCount = 0;
7965 if (init.hasMetadataBlocks) {
7966 drflac_read_proc onReadOverride = onRead;
7967 drflac_seek_proc onSeekOverride = onSeek;
7968 drflac_tell_proc onTellOverride = onTell;
7969 void* pUserDataOverride = pUserData;
7970
7971#ifndef DR_FLAC_NO_OGG
7972 if (init.container == drflac_container_ogg) {
7973 onReadOverride = drflac__on_read_ogg;
7974 onSeekOverride = drflac__on_seek_ogg;
7975 onTellOverride = drflac__on_tell_ogg;
7976 pUserDataOverride = (void*)pOggbs;
7977 }
7978#endif
7979
7980 if (!drflac__read_and_decode_metadata(onReadOverride, onSeekOverride, onTellOverride, onMeta, pUserDataOverride, pUserDataMD, &firstFramePos, &seektablePos, &seekpointCount, &allocationCallbacks)) {
7981 #ifndef DR_FLAC_NO_OGG
7982 drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
7983 #endif
7984 return NULL;
7985 }
7986
7987 allocationSize += seekpointCount * sizeof(drflac_seekpoint);
7988 }
7989
7990
7991 pFlac = (drflac*)drflac__malloc_from_callbacks(allocationSize, &allocationCallbacks);
7992 if (pFlac == NULL) {
7993 #ifndef DR_FLAC_NO_OGG
7994 drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
7995 #endif
7996 return NULL;
7997 }
7998
7999 drflac__init_from_info(pFlac, &init);
8000 pFlac->allocationCallbacks = allocationCallbacks;
8001 pFlac->pDecodedSamples = (drflac_int32*)drflac_align((size_t)pFlac->pExtraData, DRFLAC_MAX_SIMD_VECTOR_SIZE);
8002
8003#ifndef DR_FLAC_NO_OGG
8004 if (init.container == drflac_container_ogg) {
8005 drflac_oggbs* pInternalOggbs = (drflac_oggbs*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize + (seekpointCount * sizeof(drflac_seekpoint)));
8006 DRFLAC_COPY_MEMORY(pInternalOggbs, pOggbs, sizeof(*pOggbs));
8007
8008 /* At this point the pOggbs object has been handed over to pInternalOggbs and can be freed. */
8009 drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
8010 pOggbs = NULL;
8011
8012 /* The Ogg bistream needs to be layered on top of the original bitstream. */
8013 pFlac->bs.onRead = drflac__on_read_ogg;
8014 pFlac->bs.onSeek = drflac__on_seek_ogg;
8015 pFlac->bs.onTell = drflac__on_tell_ogg;
8016 pFlac->bs.pUserData = (void*)pInternalOggbs;
8017 pFlac->_oggbs = (void*)pInternalOggbs;
8018 }
8019#endif
8020
8021 pFlac->firstFLACFramePosInBytes = firstFramePos;
8022
8023 /* NOTE: Seektables are not currently compatible with Ogg encapsulation (Ogg has its own accelerated seeking system). I may change this later, so I'm leaving this here for now. */
8024#ifndef DR_FLAC_NO_OGG
8025 if (init.container == drflac_container_ogg)
8026 {
8027 pFlac->pSeekpoints = NULL;
8028 pFlac->seekpointCount = 0;
8029 }
8030 else
8031#endif
8032 {
8033 /* If we have a seektable we need to load it now, making sure we move back to where we were previously. */
8034 if (seektablePos != 0) {
8035 pFlac->seekpointCount = seekpointCount;
8036 pFlac->pSeekpoints = (drflac_seekpoint*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize);
8037
8038 DRFLAC_ASSERT(pFlac->bs.onSeek != NULL);
8039 DRFLAC_ASSERT(pFlac->bs.onRead != NULL);
8040
8041 /* Seek to the seektable, then just read directly into our seektable buffer. */
8042 if (pFlac->bs.onSeek(pFlac->bs.pUserData, (int)seektablePos, DRFLAC_SEEK_SET)) {
8043 drflac_uint32 iSeekpoint;
8044
8045 for (iSeekpoint = 0; iSeekpoint < seekpointCount; iSeekpoint += 1) {
8046 if (pFlac->bs.onRead(pFlac->bs.pUserData, pFlac->pSeekpoints + iSeekpoint, DRFLAC_SEEKPOINT_SIZE_IN_BYTES) == DRFLAC_SEEKPOINT_SIZE_IN_BYTES) {
8047 /* Endian swap. */
8048 pFlac->pSeekpoints[iSeekpoint].firstPCMFrame = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].firstPCMFrame);
8049 pFlac->pSeekpoints[iSeekpoint].flacFrameOffset = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].flacFrameOffset);
8050 pFlac->pSeekpoints[iSeekpoint].pcmFrameCount = drflac__be2host_16(pFlac->pSeekpoints[iSeekpoint].pcmFrameCount);
8051 } else {
8052 /* Failed to read the seektable. Pretend we don't have one. */
8053 pFlac->pSeekpoints = NULL;
8054 pFlac->seekpointCount = 0;
8055 break;
8056 }
8057 }
8058
8059 /* We need to seek back to where we were. If this fails it's a critical error. */
8060 if (!pFlac->bs.onSeek(pFlac->bs.pUserData, (int)pFlac->firstFLACFramePosInBytes, DRFLAC_SEEK_SET)) {
8061 drflac__free_from_callbacks(pFlac, &allocationCallbacks);
8062 return NULL;
8063 }
8064 } else {
8065 /* Failed to seek to the seektable. Ominous sign, but for now we can just pretend we don't have one. */
8066 pFlac->pSeekpoints = NULL;
8067 pFlac->seekpointCount = 0;
8068 }
8069 }
8070 }
8071
8072
8073 /*
8074 If we get here, but don't have a STREAMINFO block, it means we've opened the stream in relaxed mode and need to decode
8075 the first frame.
8076 */
8077 if (!init.hasStreamInfoBlock) {
8078 pFlac->currentFLACFrame.header = init.firstFrameHeader;
8079 for (;;) {
8080 drflac_result result = drflac__decode_flac_frame(pFlac);
8081 if (result == DRFLAC_SUCCESS) {
8082 break;
8083 } else {
8084 if (result == DRFLAC_CRC_MISMATCH) {
8085 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
8086 drflac__free_from_callbacks(pFlac, &allocationCallbacks);
8087 return NULL;
8088 }
8089 continue;
8090 } else {
8091 drflac__free_from_callbacks(pFlac, &allocationCallbacks);
8092 return NULL;
8093 }
8094 }
8095 }
8096 }
8097
8098 return pFlac;
8099}
8100
8101
8102
8103#ifndef DR_FLAC_NO_STDIO
8104#include <stdio.h>
8105#ifndef DR_FLAC_NO_WCHAR
8106#include <wchar.h> /* For wcslen(), wcsrtombs() */
8107#endif
8108
8109/* Errno */
8110/* drflac_result_from_errno() is only used for fopen() and wfopen() so putting it inside DR_WAV_NO_STDIO for now. If something else needs this later we can move it out. */
8111#include <errno.h>
8112static drflac_result drflac_result_from_errno(int e)
8113{
8114 switch (e)
8115 {
8116 case 0: return DRFLAC_SUCCESS;
8117 #ifdef EPERM
8118 case EPERM: return DRFLAC_INVALID_OPERATION;
8119 #endif
8120 #ifdef ENOENT
8121 case ENOENT: return DRFLAC_DOES_NOT_EXIST;
8122 #endif
8123 #ifdef ESRCH
8124 case ESRCH: return DRFLAC_DOES_NOT_EXIST;
8125 #endif
8126 #ifdef EINTR
8127 case EINTR: return DRFLAC_INTERRUPT;
8128 #endif
8129 #ifdef EIO
8130 case EIO: return DRFLAC_IO_ERROR;
8131 #endif
8132 #ifdef ENXIO
8133 case ENXIO: return DRFLAC_DOES_NOT_EXIST;
8134 #endif
8135 #ifdef E2BIG
8136 case E2BIG: return DRFLAC_INVALID_ARGS;
8137 #endif
8138 #ifdef ENOEXEC
8139 case ENOEXEC: return DRFLAC_INVALID_FILE;
8140 #endif
8141 #ifdef EBADF
8142 case EBADF: return DRFLAC_INVALID_FILE;
8143 #endif
8144 #ifdef ECHILD
8145 case ECHILD: return DRFLAC_ERROR;
8146 #endif
8147 #ifdef EAGAIN
8148 case EAGAIN: return DRFLAC_UNAVAILABLE;
8149 #endif
8150 #ifdef ENOMEM
8151 case ENOMEM: return DRFLAC_OUT_OF_MEMORY;
8152 #endif
8153 #ifdef EACCES
8154 case EACCES: return DRFLAC_ACCESS_DENIED;
8155 #endif
8156 #ifdef EFAULT
8157 case EFAULT: return DRFLAC_BAD_ADDRESS;
8158 #endif
8159 #ifdef ENOTBLK
8160 case ENOTBLK: return DRFLAC_ERROR;
8161 #endif
8162 #ifdef EBUSY
8163 case EBUSY: return DRFLAC_BUSY;
8164 #endif
8165 #ifdef EEXIST
8166 case EEXIST: return DRFLAC_ALREADY_EXISTS;
8167 #endif
8168 #ifdef EXDEV
8169 case EXDEV: return DRFLAC_ERROR;
8170 #endif
8171 #ifdef ENODEV
8172 case ENODEV: return DRFLAC_DOES_NOT_EXIST;
8173 #endif
8174 #ifdef ENOTDIR
8175 case ENOTDIR: return DRFLAC_NOT_DIRECTORY;
8176 #endif
8177 #ifdef EISDIR
8178 case EISDIR: return DRFLAC_IS_DIRECTORY;
8179 #endif
8180 #ifdef EINVAL
8181 case EINVAL: return DRFLAC_INVALID_ARGS;
8182 #endif
8183 #ifdef ENFILE
8184 case ENFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
8185 #endif
8186 #ifdef EMFILE
8187 case EMFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
8188 #endif
8189 #ifdef ENOTTY
8190 case ENOTTY: return DRFLAC_INVALID_OPERATION;
8191 #endif
8192 #ifdef ETXTBSY
8193 case ETXTBSY: return DRFLAC_BUSY;
8194 #endif
8195 #ifdef EFBIG
8196 case EFBIG: return DRFLAC_TOO_BIG;
8197 #endif
8198 #ifdef ENOSPC
8199 case ENOSPC: return DRFLAC_NO_SPACE;
8200 #endif
8201 #ifdef ESPIPE
8202 case ESPIPE: return DRFLAC_BAD_SEEK;
8203 #endif
8204 #ifdef EROFS
8205 case EROFS: return DRFLAC_ACCESS_DENIED;
8206 #endif
8207 #ifdef EMLINK
8208 case EMLINK: return DRFLAC_TOO_MANY_LINKS;
8209 #endif
8210 #ifdef EPIPE
8211 case EPIPE: return DRFLAC_BAD_PIPE;
8212 #endif
8213 #ifdef EDOM
8214 case EDOM: return DRFLAC_OUT_OF_RANGE;
8215 #endif
8216 #ifdef ERANGE
8217 case ERANGE: return DRFLAC_OUT_OF_RANGE;
8218 #endif
8219 #ifdef EDEADLK
8220 case EDEADLK: return DRFLAC_DEADLOCK;
8221 #endif
8222 #ifdef ENAMETOOLONG
8223 case ENAMETOOLONG: return DRFLAC_PATH_TOO_LONG;
8224 #endif
8225 #ifdef ENOLCK
8226 case ENOLCK: return DRFLAC_ERROR;
8227 #endif
8228 #ifdef ENOSYS
8229 case ENOSYS: return DRFLAC_NOT_IMPLEMENTED;
8230 #endif
8231 #if defined(ENOTEMPTY) && ENOTEMPTY != EEXIST /* In AIX, ENOTEMPTY and EEXIST use the same value. */
8232 case ENOTEMPTY: return DRFLAC_DIRECTORY_NOT_EMPTY;
8233 #endif
8234 #ifdef ELOOP
8235 case ELOOP: return DRFLAC_TOO_MANY_LINKS;
8236 #endif
8237 #ifdef ENOMSG
8238 case ENOMSG: return DRFLAC_NO_MESSAGE;
8239 #endif
8240 #ifdef EIDRM
8241 case EIDRM: return DRFLAC_ERROR;
8242 #endif
8243 #ifdef ECHRNG
8244 case ECHRNG: return DRFLAC_ERROR;
8245 #endif
8246 #ifdef EL2NSYNC
8247 case EL2NSYNC: return DRFLAC_ERROR;
8248 #endif
8249 #ifdef EL3HLT
8250 case EL3HLT: return DRFLAC_ERROR;
8251 #endif
8252 #ifdef EL3RST
8253 case EL3RST: return DRFLAC_ERROR;
8254 #endif
8255 #ifdef ELNRNG
8256 case ELNRNG: return DRFLAC_OUT_OF_RANGE;
8257 #endif
8258 #ifdef EUNATCH
8259 case EUNATCH: return DRFLAC_ERROR;
8260 #endif
8261 #ifdef ENOCSI
8262 case ENOCSI: return DRFLAC_ERROR;
8263 #endif
8264 #ifdef EL2HLT
8265 case EL2HLT: return DRFLAC_ERROR;
8266 #endif
8267 #ifdef EBADE
8268 case EBADE: return DRFLAC_ERROR;
8269 #endif
8270 #ifdef EBADR
8271 case EBADR: return DRFLAC_ERROR;
8272 #endif
8273 #ifdef EXFULL
8274 case EXFULL: return DRFLAC_ERROR;
8275 #endif
8276 #ifdef ENOANO
8277 case ENOANO: return DRFLAC_ERROR;
8278 #endif
8279 #ifdef EBADRQC
8280 case EBADRQC: return DRFLAC_ERROR;
8281 #endif
8282 #ifdef EBADSLT
8283 case EBADSLT: return DRFLAC_ERROR;
8284 #endif
8285 #ifdef EBFONT
8286 case EBFONT: return DRFLAC_INVALID_FILE;
8287 #endif
8288 #ifdef ENOSTR
8289 case ENOSTR: return DRFLAC_ERROR;
8290 #endif
8291 #ifdef ENODATA
8292 case ENODATA: return DRFLAC_NO_DATA_AVAILABLE;
8293 #endif
8294 #ifdef ETIME
8295 case ETIME: return DRFLAC_TIMEOUT;
8296 #endif
8297 #ifdef ENOSR
8298 case ENOSR: return DRFLAC_NO_DATA_AVAILABLE;
8299 #endif
8300 #ifdef ENONET
8301 case ENONET: return DRFLAC_NO_NETWORK;
8302 #endif
8303 #ifdef ENOPKG
8304 case ENOPKG: return DRFLAC_ERROR;
8305 #endif
8306 #ifdef EREMOTE
8307 case EREMOTE: return DRFLAC_ERROR;
8308 #endif
8309 #ifdef ENOLINK
8310 case ENOLINK: return DRFLAC_ERROR;
8311 #endif
8312 #ifdef EADV
8313 case EADV: return DRFLAC_ERROR;
8314 #endif
8315 #ifdef ESRMNT
8316 case ESRMNT: return DRFLAC_ERROR;
8317 #endif
8318 #ifdef ECOMM
8319 case ECOMM: return DRFLAC_ERROR;
8320 #endif
8321 #ifdef EPROTO
8322 case EPROTO: return DRFLAC_ERROR;
8323 #endif
8324 #ifdef EMULTIHOP
8325 case EMULTIHOP: return DRFLAC_ERROR;
8326 #endif
8327 #ifdef EDOTDOT
8328 case EDOTDOT: return DRFLAC_ERROR;
8329 #endif
8330 #ifdef EBADMSG
8331 case EBADMSG: return DRFLAC_BAD_MESSAGE;
8332 #endif
8333 #ifdef EOVERFLOW
8334 case EOVERFLOW: return DRFLAC_TOO_BIG;
8335 #endif
8336 #ifdef ENOTUNIQ
8337 case ENOTUNIQ: return DRFLAC_NOT_UNIQUE;
8338 #endif
8339 #ifdef EBADFD
8340 case EBADFD: return DRFLAC_ERROR;
8341 #endif
8342 #ifdef EREMCHG
8343 case EREMCHG: return DRFLAC_ERROR;
8344 #endif
8345 #ifdef ELIBACC
8346 case ELIBACC: return DRFLAC_ACCESS_DENIED;
8347 #endif
8348 #ifdef ELIBBAD
8349 case ELIBBAD: return DRFLAC_INVALID_FILE;
8350 #endif
8351 #ifdef ELIBSCN
8352 case ELIBSCN: return DRFLAC_INVALID_FILE;
8353 #endif
8354 #ifdef ELIBMAX
8355 case ELIBMAX: return DRFLAC_ERROR;
8356 #endif
8357 #ifdef ELIBEXEC
8358 case ELIBEXEC: return DRFLAC_ERROR;
8359 #endif
8360 #ifdef EILSEQ
8361 case EILSEQ: return DRFLAC_INVALID_DATA;
8362 #endif
8363 #ifdef ERESTART
8364 case ERESTART: return DRFLAC_ERROR;
8365 #endif
8366 #ifdef ESTRPIPE
8367 case ESTRPIPE: return DRFLAC_ERROR;
8368 #endif
8369 #ifdef EUSERS
8370 case EUSERS: return DRFLAC_ERROR;
8371 #endif
8372 #ifdef ENOTSOCK
8373 case ENOTSOCK: return DRFLAC_NOT_SOCKET;
8374 #endif
8375 #ifdef EDESTADDRREQ
8376 case EDESTADDRREQ: return DRFLAC_NO_ADDRESS;
8377 #endif
8378 #ifdef EMSGSIZE
8379 case EMSGSIZE: return DRFLAC_TOO_BIG;
8380 #endif
8381 #ifdef EPROTOTYPE
8382 case EPROTOTYPE: return DRFLAC_BAD_PROTOCOL;
8383 #endif
8384 #ifdef ENOPROTOOPT
8385 case ENOPROTOOPT: return DRFLAC_PROTOCOL_UNAVAILABLE;
8386 #endif
8387 #ifdef EPROTONOSUPPORT
8388 case EPROTONOSUPPORT: return DRFLAC_PROTOCOL_NOT_SUPPORTED;
8389 #endif
8390 #ifdef ESOCKTNOSUPPORT
8391 case ESOCKTNOSUPPORT: return DRFLAC_SOCKET_NOT_SUPPORTED;
8392 #endif
8393 #ifdef EOPNOTSUPP
8394 case EOPNOTSUPP: return DRFLAC_INVALID_OPERATION;
8395 #endif
8396 #ifdef EPFNOSUPPORT
8397 case EPFNOSUPPORT: return DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED;
8398 #endif
8399 #ifdef EAFNOSUPPORT
8400 case EAFNOSUPPORT: return DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED;
8401 #endif
8402 #ifdef EADDRINUSE
8403 case EADDRINUSE: return DRFLAC_ALREADY_IN_USE;
8404 #endif
8405 #ifdef EADDRNOTAVAIL
8406 case EADDRNOTAVAIL: return DRFLAC_ERROR;
8407 #endif
8408 #ifdef ENETDOWN
8409 case ENETDOWN: return DRFLAC_NO_NETWORK;
8410 #endif
8411 #ifdef ENETUNREACH
8412 case ENETUNREACH: return DRFLAC_NO_NETWORK;
8413 #endif
8414 #ifdef ENETRESET
8415 case ENETRESET: return DRFLAC_NO_NETWORK;
8416 #endif
8417 #ifdef ECONNABORTED
8418 case ECONNABORTED: return DRFLAC_NO_NETWORK;
8419 #endif
8420 #ifdef ECONNRESET
8421 case ECONNRESET: return DRFLAC_CONNECTION_RESET;
8422 #endif
8423 #ifdef ENOBUFS
8424 case ENOBUFS: return DRFLAC_NO_SPACE;
8425 #endif
8426 #ifdef EISCONN
8427 case EISCONN: return DRFLAC_ALREADY_CONNECTED;
8428 #endif
8429 #ifdef ENOTCONN
8430 case ENOTCONN: return DRFLAC_NOT_CONNECTED;
8431 #endif
8432 #ifdef ESHUTDOWN
8433 case ESHUTDOWN: return DRFLAC_ERROR;
8434 #endif
8435 #ifdef ETOOMANYREFS
8436 case ETOOMANYREFS: return DRFLAC_ERROR;
8437 #endif
8438 #ifdef ETIMEDOUT
8439 case ETIMEDOUT: return DRFLAC_TIMEOUT;
8440 #endif
8441 #ifdef ECONNREFUSED
8442 case ECONNREFUSED: return DRFLAC_CONNECTION_REFUSED;
8443 #endif
8444 #ifdef EHOSTDOWN
8445 case EHOSTDOWN: return DRFLAC_NO_HOST;
8446 #endif
8447 #ifdef EHOSTUNREACH
8448 case EHOSTUNREACH: return DRFLAC_NO_HOST;
8449 #endif
8450 #ifdef EALREADY
8451 case EALREADY: return DRFLAC_IN_PROGRESS;
8452 #endif
8453 #ifdef EINPROGRESS
8454 case EINPROGRESS: return DRFLAC_IN_PROGRESS;
8455 #endif
8456 #ifdef ESTALE
8457 case ESTALE: return DRFLAC_INVALID_FILE;
8458 #endif
8459 #ifdef EUCLEAN
8460 case EUCLEAN: return DRFLAC_ERROR;
8461 #endif
8462 #ifdef ENOTNAM
8463 case ENOTNAM: return DRFLAC_ERROR;
8464 #endif
8465 #ifdef ENAVAIL
8466 case ENAVAIL: return DRFLAC_ERROR;
8467 #endif
8468 #ifdef EISNAM
8469 case EISNAM: return DRFLAC_ERROR;
8470 #endif
8471 #ifdef EREMOTEIO
8472 case EREMOTEIO: return DRFLAC_IO_ERROR;
8473 #endif
8474 #ifdef EDQUOT
8475 case EDQUOT: return DRFLAC_NO_SPACE;
8476 #endif
8477 #ifdef ENOMEDIUM
8478 case ENOMEDIUM: return DRFLAC_DOES_NOT_EXIST;
8479 #endif
8480 #ifdef EMEDIUMTYPE
8481 case EMEDIUMTYPE: return DRFLAC_ERROR;
8482 #endif
8483 #ifdef ECANCELED
8484 case ECANCELED: return DRFLAC_CANCELLED;
8485 #endif
8486 #ifdef ENOKEY
8487 case ENOKEY: return DRFLAC_ERROR;
8488 #endif
8489 #ifdef EKEYEXPIRED
8490 case EKEYEXPIRED: return DRFLAC_ERROR;
8491 #endif
8492 #ifdef EKEYREVOKED
8493 case EKEYREVOKED: return DRFLAC_ERROR;
8494 #endif
8495 #ifdef EKEYREJECTED
8496 case EKEYREJECTED: return DRFLAC_ERROR;
8497 #endif
8498 #ifdef EOWNERDEAD
8499 case EOWNERDEAD: return DRFLAC_ERROR;
8500 #endif
8501 #ifdef ENOTRECOVERABLE
8502 case ENOTRECOVERABLE: return DRFLAC_ERROR;
8503 #endif
8504 #ifdef ERFKILL
8505 case ERFKILL: return DRFLAC_ERROR;
8506 #endif
8507 #ifdef EHWPOISON
8508 case EHWPOISON: return DRFLAC_ERROR;
8509 #endif
8510 default: return DRFLAC_ERROR;
8511 }
8512}
8513/* End Errno */
8514
8515/* fopen */
8516static drflac_result drflac_fopen(FILE** ppFile, const char* pFilePath, const char* pOpenMode)
8517{
8518#if defined(_MSC_VER) && _MSC_VER >= 1400
8519 errno_t err;
8520#endif
8521
8522 if (ppFile != NULL) {
8523 *ppFile = NULL; /* Safety. */
8524 }
8525
8526 if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
8527 return DRFLAC_INVALID_ARGS;
8528 }
8529
8530#if defined(_MSC_VER) && _MSC_VER >= 1400
8531 err = fopen_s(ppFile, pFilePath, pOpenMode);
8532 if (err != 0) {
8533 return drflac_result_from_errno(err);
8534 }
8535#else
8536#if defined(_WIN32) || defined(__APPLE__)
8537 *ppFile = fopen(pFilePath, pOpenMode);
8538#else
8539 #if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS == 64 && defined(_LARGEFILE64_SOURCE)
8540 *ppFile = fopen64(pFilePath, pOpenMode);
8541 #else
8542 *ppFile = fopen(pFilePath, pOpenMode);
8543 #endif
8544#endif
8545 if (*ppFile == NULL) {
8546 drflac_result result = drflac_result_from_errno(errno);
8547 if (result == DRFLAC_SUCCESS) {
8548 result = DRFLAC_ERROR; /* Just a safety check to make sure we never ever return success when pFile == NULL. */
8549 }
8550
8551 return result;
8552 }
8553#endif
8554
8555 return DRFLAC_SUCCESS;
8556}
8557
8558/*
8559_wfopen() isn't always available in all compilation environments.
8560
8561 * Windows only.
8562 * MSVC seems to support it universally as far back as VC6 from what I can tell (haven't checked further back).
8563 * MinGW-64 (both 32- and 64-bit) seems to support it.
8564 * MinGW wraps it in !defined(__STRICT_ANSI__).
8565 * OpenWatcom wraps it in !defined(_NO_EXT_KEYS).
8566
8567This can be reviewed as compatibility issues arise. The preference is to use _wfopen_s() and _wfopen() as opposed to the wcsrtombs()
8568fallback, so if you notice your compiler not detecting this properly I'm happy to look at adding support.
8569*/
8570#if defined(_WIN32)
8571 #if defined(_MSC_VER) || defined(__MINGW64__) || (!defined(__STRICT_ANSI__) && !defined(_NO_EXT_KEYS))
8572 #define DRFLAC_HAS_WFOPEN
8573 #endif
8574#endif
8575
8576#ifndef DR_FLAC_NO_WCHAR
8577static drflac_result drflac_wfopen(FILE** ppFile, const wchar_t* pFilePath, const wchar_t* pOpenMode, const drflac_allocation_callbacks* pAllocationCallbacks)
8578{
8579 if (ppFile != NULL) {
8580 *ppFile = NULL; /* Safety. */
8581 }
8582
8583 if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
8584 return DRFLAC_INVALID_ARGS;
8585 }
8586
8587#if defined(DRFLAC_HAS_WFOPEN)
8588 {
8589 /* Use _wfopen() on Windows. */
8590 #if defined(_MSC_VER) && _MSC_VER >= 1400
8591 errno_t err = _wfopen_s(ppFile, pFilePath, pOpenMode);
8592 if (err != 0) {
8593 return drflac_result_from_errno(err);
8594 }
8595 #else
8596 *ppFile = _wfopen(pFilePath, pOpenMode);
8597 if (*ppFile == NULL) {
8598 return drflac_result_from_errno(errno);
8599 }
8600 #endif
8601 (void)pAllocationCallbacks;
8602 }
8603#else
8604 /*
8605 Use fopen() on anything other than Windows. Requires a conversion. This is annoying because
8606 fopen() is locale specific. The only real way I can think of to do this is with wcsrtombs(). Note
8607 that wcstombs() is apparently not thread-safe because it uses a static global mbstate_t object for
8608 maintaining state. I've checked this with -std=c89 and it works, but if somebody get's a compiler
8609 error I'll look into improving compatibility.
8610 */
8611
8612 /*
8613 Some compilers don't support wchar_t or wcsrtombs() which we're using below. In this case we just
8614 need to abort with an error. If you encounter a compiler lacking such support, add it to this list
8615 and submit a bug report and it'll be added to the library upstream.
8616 */
8617 #if defined(__DJGPP__)
8618 {
8619 /* Nothing to do here. This will fall through to the error check below. */
8620 }
8621 #else
8622 {
8623 mbstate_t mbs;
8624 size_t lenMB;
8625 const wchar_t* pFilePathTemp = pFilePath;
8626 char* pFilePathMB = NULL;
8627 char pOpenModeMB[32] = {0};
8628
8629 /* Get the length first. */
8630 DRFLAC_ZERO_OBJECT(&mbs);
8631 lenMB = wcsrtombs(NULL, &pFilePathTemp, 0, &mbs);
8632 if (lenMB == (size_t)-1) {
8633 return drflac_result_from_errno(errno);
8634 }
8635
8636 pFilePathMB = (char*)drflac__malloc_from_callbacks(lenMB + 1, pAllocationCallbacks);
8637 if (pFilePathMB == NULL) {
8638 return DRFLAC_OUT_OF_MEMORY;
8639 }
8640
8641 pFilePathTemp = pFilePath;
8642 DRFLAC_ZERO_OBJECT(&mbs);
8643 wcsrtombs(pFilePathMB, &pFilePathTemp, lenMB + 1, &mbs);
8644
8645 /* The open mode should always consist of ASCII characters so we should be able to do a trivial conversion. */
8646 {
8647 size_t i = 0;
8648 for (;;) {
8649 if (pOpenMode[i] == 0) {
8650 pOpenModeMB[i] = '\0';
8651 break;
8652 }
8653
8654 pOpenModeMB[i] = (char)pOpenMode[i];
8655 i += 1;
8656 }
8657 }
8658
8659 *ppFile = fopen(pFilePathMB, pOpenModeMB);
8660
8661 drflac__free_from_callbacks(pFilePathMB, pAllocationCallbacks);
8662 }
8663 #endif
8664
8665 if (*ppFile == NULL) {
8666 return DRFLAC_ERROR;
8667 }
8668#endif
8669
8670 return DRFLAC_SUCCESS;
8671}
8672#endif
8673/* End fopen */
8674
8675static size_t drflac__on_read_stdio(void* pUserData, void* bufferOut, size_t bytesToRead)
8676{
8677 return fread(bufferOut, 1, bytesToRead, (FILE*)pUserData);
8678}
8679
8680static drflac_bool32 drflac__on_seek_stdio(void* pUserData, int offset, drflac_seek_origin origin)
8681{
8682 int whence = SEEK_SET;
8683 if (origin == DRFLAC_SEEK_CUR) {
8684 whence = SEEK_CUR;
8685 } else if (origin == DRFLAC_SEEK_END) {
8686 whence = SEEK_END;
8687 }
8688
8689 return fseek((FILE*)pUserData, offset, whence) == 0;
8690}
8691
8692static drflac_bool32 drflac__on_tell_stdio(void* pUserData, drflac_int64* pCursor)
8693{
8694 FILE* pFileStdio = (FILE*)pUserData;
8695 drflac_int64 result;
8696
8697 /* These were all validated at a higher level. */
8698 DRFLAC_ASSERT(pFileStdio != NULL);
8699 DRFLAC_ASSERT(pCursor != NULL);
8700
8701#if defined(_WIN32)
8702 #if defined(_MSC_VER) && _MSC_VER > 1200
8703 result = _ftelli64(pFileStdio);
8704 #else
8705 result = ftell(pFileStdio);
8706 #endif
8707#else
8708 result = ftell(pFileStdio);
8709#endif
8710
8711 *pCursor = result;
8712
8713 return DRFLAC_TRUE;
8714}
8715
8716
8717
8718DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
8719{
8720 drflac* pFlac;
8721 FILE* pFile;
8722
8723 if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
8724 return NULL;
8725 }
8726
8727 pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, drflac__on_tell_stdio, (void*)pFile, pAllocationCallbacks);
8728 if (pFlac == NULL) {
8729 fclose(pFile);
8730 return NULL;
8731 }
8732
8733 return pFlac;
8734}
8735
8736#ifndef DR_FLAC_NO_WCHAR
8737DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
8738{
8739 drflac* pFlac;
8740 FILE* pFile;
8741
8742 if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
8743 return NULL;
8744 }
8745
8746 pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, drflac__on_tell_stdio, (void*)pFile, pAllocationCallbacks);
8747 if (pFlac == NULL) {
8748 fclose(pFile);
8749 return NULL;
8750 }
8751
8752 return pFlac;
8753}
8754#endif
8755
8756DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8757{
8758 drflac* pFlac;
8759 FILE* pFile;
8760
8761 if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
8762 return NULL;
8763 }
8764
8765 pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, drflac__on_tell_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
8766 if (pFlac == NULL) {
8767 fclose(pFile);
8768 return pFlac;
8769 }
8770
8771 return pFlac;
8772}
8773
8774#ifndef DR_FLAC_NO_WCHAR
8775DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8776{
8777 drflac* pFlac;
8778 FILE* pFile;
8779
8780 if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
8781 return NULL;
8782 }
8783
8784 pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, drflac__on_tell_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
8785 if (pFlac == NULL) {
8786 fclose(pFile);
8787 return pFlac;
8788 }
8789
8790 return pFlac;
8791}
8792#endif
8793#endif /* DR_FLAC_NO_STDIO */
8794
8795static size_t drflac__on_read_memory(void* pUserData, void* bufferOut, size_t bytesToRead)
8796{
8797 drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
8798 size_t bytesRemaining;
8799
8800 DRFLAC_ASSERT(memoryStream != NULL);
8801 DRFLAC_ASSERT(memoryStream->dataSize >= memoryStream->currentReadPos);
8802
8803 bytesRemaining = memoryStream->dataSize - memoryStream->currentReadPos;
8804 if (bytesToRead > bytesRemaining) {
8805 bytesToRead = bytesRemaining;
8806 }
8807
8808 if (bytesToRead > 0) {
8809 DRFLAC_COPY_MEMORY(bufferOut, memoryStream->data + memoryStream->currentReadPos, bytesToRead);
8810 memoryStream->currentReadPos += bytesToRead;
8811 }
8812
8813 return bytesToRead;
8814}
8815
8816static drflac_bool32 drflac__on_seek_memory(void* pUserData, int offset, drflac_seek_origin origin)
8817{
8818 drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
8819 drflac_int64 newCursor;
8820
8821 DRFLAC_ASSERT(memoryStream != NULL);
8822
8823 newCursor = memoryStream->currentReadPos;
8824
8825 if (origin == DRFLAC_SEEK_SET) {
8826 newCursor = 0;
8827 } else if (origin == DRFLAC_SEEK_CUR) {
8828 newCursor = (drflac_int64)memoryStream->currentReadPos;
8829 } else if (origin == DRFLAC_SEEK_END) {
8830 newCursor = (drflac_int64)memoryStream->dataSize;
8831 } else {
8832 DRFLAC_ASSERT(!"Invalid seek origin");
8833 return DRFLAC_FALSE;
8834 }
8835
8836 newCursor += offset;
8837
8838 if (newCursor < 0) {
8839 return DRFLAC_FALSE; /* Trying to seek prior to the start of the buffer. */
8840 }
8841 if ((size_t)newCursor > memoryStream->dataSize) {
8842 return DRFLAC_FALSE; /* Trying to seek beyond the end of the buffer. */
8843 }
8844
8845 memoryStream->currentReadPos = (size_t)newCursor;
8846
8847 return DRFLAC_TRUE;
8848}
8849
8850static drflac_bool32 drflac__on_tell_memory(void* pUserData, drflac_int64* pCursor)
8851{
8852 drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
8853
8854 DRFLAC_ASSERT(memoryStream != NULL);
8855 DRFLAC_ASSERT(pCursor != NULL);
8856
8857 *pCursor = (drflac_int64)memoryStream->currentReadPos;
8858 return DRFLAC_TRUE;
8859}
8860
8861DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks)
8862{
8863 drflac__memory_stream memoryStream;
8864 drflac* pFlac;
8865
8866 memoryStream.data = (const drflac_uint8*)pData;
8867 memoryStream.dataSize = dataSize;
8868 memoryStream.currentReadPos = 0;
8869 pFlac = drflac_open(drflac__on_read_memory, drflac__on_seek_memory, drflac__on_tell_memory, &memoryStream, pAllocationCallbacks);
8870 if (pFlac == NULL) {
8871 return NULL;
8872 }
8873
8874 pFlac->memoryStream = memoryStream;
8875
8876 /* This is an awful hack... */
8877#ifndef DR_FLAC_NO_OGG
8878 if (pFlac->container == drflac_container_ogg)
8879 {
8880 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8881 oggbs->pUserData = &pFlac->memoryStream;
8882 }
8883 else
8884#endif
8885 {
8886 pFlac->bs.pUserData = &pFlac->memoryStream;
8887 }
8888
8889 return pFlac;
8890}
8891
8892DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8893{
8894 drflac__memory_stream memoryStream;
8895 drflac* pFlac;
8896
8897 memoryStream.data = (const drflac_uint8*)pData;
8898 memoryStream.dataSize = dataSize;
8899 memoryStream.currentReadPos = 0;
8900 pFlac = drflac_open_with_metadata_private(drflac__on_read_memory, drflac__on_seek_memory, drflac__on_tell_memory, onMeta, drflac_container_unknown, &memoryStream, pUserData, pAllocationCallbacks);
8901 if (pFlac == NULL) {
8902 return NULL;
8903 }
8904
8905 pFlac->memoryStream = memoryStream;
8906
8907 /* This is an awful hack... */
8908#ifndef DR_FLAC_NO_OGG
8909 if (pFlac->container == drflac_container_ogg)
8910 {
8911 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8912 oggbs->pUserData = &pFlac->memoryStream;
8913 }
8914 else
8915#endif
8916 {
8917 pFlac->bs.pUserData = &pFlac->memoryStream;
8918 }
8919
8920 return pFlac;
8921}
8922
8923
8924
8925DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8926{
8927 return drflac_open_with_metadata_private(onRead, onSeek, onTell, NULL, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
8928}
8929DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8930{
8931 return drflac_open_with_metadata_private(onRead, onSeek, onTell, NULL, container, pUserData, pUserData, pAllocationCallbacks);
8932}
8933
8934DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8935{
8936 return drflac_open_with_metadata_private(onRead, onSeek, onTell, onMeta, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
8937}
8938DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8939{
8940 return drflac_open_with_metadata_private(onRead, onSeek, onTell, onMeta, container, pUserData, pUserData, pAllocationCallbacks);
8941}
8942
8943DRFLAC_API void drflac_close(drflac* pFlac)
8944{
8945 if (pFlac == NULL) {
8946 return;
8947 }
8948
8949#ifndef DR_FLAC_NO_STDIO
8950 /*
8951 If we opened the file with drflac_open_file() we will want to close the file handle. We can know whether or not drflac_open_file()
8952 was used by looking at the callbacks.
8953 */
8954 if (pFlac->bs.onRead == drflac__on_read_stdio) {
8955 fclose((FILE*)pFlac->bs.pUserData);
8956 }
8957
8958#ifndef DR_FLAC_NO_OGG
8959 /* Need to clean up Ogg streams a bit differently due to the way the bit streaming is chained. */
8960 if (pFlac->container == drflac_container_ogg) {
8961 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8962 DRFLAC_ASSERT(pFlac->bs.onRead == drflac__on_read_ogg);
8963
8964 if (oggbs->onRead == drflac__on_read_stdio) {
8965 fclose((FILE*)oggbs->pUserData);
8966 }
8967 }
8968#endif
8969#endif
8970
8971 drflac__free_from_callbacks(pFlac, &pFlac->allocationCallbacks);
8972}
8973
8974
8975#if 0
8976static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8977{
8978 drflac_uint64 i;
8979 for (i = 0; i < frameCount; ++i) {
8980 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
8981 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
8982 drflac_uint32 right = left - side;
8983
8984 pOutputSamples[i*2+0] = (drflac_int32)left;
8985 pOutputSamples[i*2+1] = (drflac_int32)right;
8986 }
8987}
8988#endif
8989
8990static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8991{
8992 drflac_uint64 i;
8993 drflac_uint64 frameCount4 = frameCount >> 2;
8994 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8995 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8996 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
8997 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
8998
8999 for (i = 0; i < frameCount4; ++i) {
9000 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
9001 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
9002 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
9003 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
9004
9005 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
9006 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
9007 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
9008 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
9009
9010 drflac_uint32 right0 = left0 - side0;
9011 drflac_uint32 right1 = left1 - side1;
9012 drflac_uint32 right2 = left2 - side2;
9013 drflac_uint32 right3 = left3 - side3;
9014
9015 pOutputSamples[i*8+0] = (drflac_int32)left0;
9016 pOutputSamples[i*8+1] = (drflac_int32)right0;
9017 pOutputSamples[i*8+2] = (drflac_int32)left1;
9018 pOutputSamples[i*8+3] = (drflac_int32)right1;
9019 pOutputSamples[i*8+4] = (drflac_int32)left2;
9020 pOutputSamples[i*8+5] = (drflac_int32)right2;
9021 pOutputSamples[i*8+6] = (drflac_int32)left3;
9022 pOutputSamples[i*8+7] = (drflac_int32)right3;
9023 }
9024
9025 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9026 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9027 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9028 drflac_uint32 right = left - side;
9029
9030 pOutputSamples[i*2+0] = (drflac_int32)left;
9031 pOutputSamples[i*2+1] = (drflac_int32)right;
9032 }
9033}
9034
9035#if defined(DRFLAC_SUPPORT_SSE2)
9036static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9037{
9038 drflac_uint64 i;
9039 drflac_uint64 frameCount4 = frameCount >> 2;
9040 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9041 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9042 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9043 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9044
9045 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9046
9047 for (i = 0; i < frameCount4; ++i) {
9048 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9049 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9050 __m128i right = _mm_sub_epi32(left, side);
9051
9052 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9053 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9054 }
9055
9056 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9057 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9058 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9059 drflac_uint32 right = left - side;
9060
9061 pOutputSamples[i*2+0] = (drflac_int32)left;
9062 pOutputSamples[i*2+1] = (drflac_int32)right;
9063 }
9064}
9065#endif
9066
9067#if defined(DRFLAC_SUPPORT_NEON)
9068static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9069{
9070 drflac_uint64 i;
9071 drflac_uint64 frameCount4 = frameCount >> 2;
9072 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9073 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9074 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9075 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9076 int32x4_t shift0_4;
9077 int32x4_t shift1_4;
9078
9079 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9080
9081 shift0_4 = vdupq_n_s32(shift0);
9082 shift1_4 = vdupq_n_s32(shift1);
9083
9084 for (i = 0; i < frameCount4; ++i) {
9085 uint32x4_t left;
9086 uint32x4_t side;
9087 uint32x4_t right;
9088
9089 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9090 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9091 right = vsubq_u32(left, side);
9092
9093 drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
9094 }
9095
9096 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9097 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9098 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9099 drflac_uint32 right = left - side;
9100
9101 pOutputSamples[i*2+0] = (drflac_int32)left;
9102 pOutputSamples[i*2+1] = (drflac_int32)right;
9103 }
9104}
9105#endif
9106
9107static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9108{
9109#if defined(DRFLAC_SUPPORT_SSE2)
9110 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9111 drflac_read_pcm_frames_s32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9112 } else
9113#elif defined(DRFLAC_SUPPORT_NEON)
9114 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9115 drflac_read_pcm_frames_s32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9116 } else
9117#endif
9118 {
9119 /* Scalar fallback. */
9120#if 0
9121 drflac_read_pcm_frames_s32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9122#else
9123 drflac_read_pcm_frames_s32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9124#endif
9125 }
9126}
9127
9128
9129#if 0
9130static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9131{
9132 drflac_uint64 i;
9133 for (i = 0; i < frameCount; ++i) {
9134 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9135 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9136 drflac_uint32 left = right + side;
9137
9138 pOutputSamples[i*2+0] = (drflac_int32)left;
9139 pOutputSamples[i*2+1] = (drflac_int32)right;
9140 }
9141}
9142#endif
9143
9144static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9145{
9146 drflac_uint64 i;
9147 drflac_uint64 frameCount4 = frameCount >> 2;
9148 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9149 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9150 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9151 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9152
9153 for (i = 0; i < frameCount4; ++i) {
9154 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
9155 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
9156 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
9157 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
9158
9159 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
9160 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
9161 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
9162 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
9163
9164 drflac_uint32 left0 = right0 + side0;
9165 drflac_uint32 left1 = right1 + side1;
9166 drflac_uint32 left2 = right2 + side2;
9167 drflac_uint32 left3 = right3 + side3;
9168
9169 pOutputSamples[i*8+0] = (drflac_int32)left0;
9170 pOutputSamples[i*8+1] = (drflac_int32)right0;
9171 pOutputSamples[i*8+2] = (drflac_int32)left1;
9172 pOutputSamples[i*8+3] = (drflac_int32)right1;
9173 pOutputSamples[i*8+4] = (drflac_int32)left2;
9174 pOutputSamples[i*8+5] = (drflac_int32)right2;
9175 pOutputSamples[i*8+6] = (drflac_int32)left3;
9176 pOutputSamples[i*8+7] = (drflac_int32)right3;
9177 }
9178
9179 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9180 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9181 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9182 drflac_uint32 left = right + side;
9183
9184 pOutputSamples[i*2+0] = (drflac_int32)left;
9185 pOutputSamples[i*2+1] = (drflac_int32)right;
9186 }
9187}
9188
9189#if defined(DRFLAC_SUPPORT_SSE2)
9190static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9191{
9192 drflac_uint64 i;
9193 drflac_uint64 frameCount4 = frameCount >> 2;
9194 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9195 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9196 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9197 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9198
9199 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9200
9201 for (i = 0; i < frameCount4; ++i) {
9202 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9203 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9204 __m128i left = _mm_add_epi32(right, side);
9205
9206 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9207 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9208 }
9209
9210 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9211 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9212 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9213 drflac_uint32 left = right + side;
9214
9215 pOutputSamples[i*2+0] = (drflac_int32)left;
9216 pOutputSamples[i*2+1] = (drflac_int32)right;
9217 }
9218}
9219#endif
9220
9221#if defined(DRFLAC_SUPPORT_NEON)
9222static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9223{
9224 drflac_uint64 i;
9225 drflac_uint64 frameCount4 = frameCount >> 2;
9226 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9227 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9228 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9229 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9230 int32x4_t shift0_4;
9231 int32x4_t shift1_4;
9232
9233 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9234
9235 shift0_4 = vdupq_n_s32(shift0);
9236 shift1_4 = vdupq_n_s32(shift1);
9237
9238 for (i = 0; i < frameCount4; ++i) {
9239 uint32x4_t side;
9240 uint32x4_t right;
9241 uint32x4_t left;
9242
9243 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9244 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9245 left = vaddq_u32(right, side);
9246
9247 drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
9248 }
9249
9250 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9251 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9252 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9253 drflac_uint32 left = right + side;
9254
9255 pOutputSamples[i*2+0] = (drflac_int32)left;
9256 pOutputSamples[i*2+1] = (drflac_int32)right;
9257 }
9258}
9259#endif
9260
9261static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9262{
9263#if defined(DRFLAC_SUPPORT_SSE2)
9264 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9265 drflac_read_pcm_frames_s32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9266 } else
9267#elif defined(DRFLAC_SUPPORT_NEON)
9268 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9269 drflac_read_pcm_frames_s32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9270 } else
9271#endif
9272 {
9273 /* Scalar fallback. */
9274#if 0
9275 drflac_read_pcm_frames_s32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9276#else
9277 drflac_read_pcm_frames_s32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9278#endif
9279 }
9280}
9281
9282
9283#if 0
9284static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9285{
9286 for (drflac_uint64 i = 0; i < frameCount; ++i) {
9287 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9288 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9289
9290 mid = (mid << 1) | (side & 0x01);
9291
9292 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
9293 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
9294 }
9295}
9296#endif
9297
9298static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9299{
9300 drflac_uint64 i;
9301 drflac_uint64 frameCount4 = frameCount >> 2;
9302 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9303 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9304 drflac_int32 shift = unusedBitsPerSample;
9305
9306 if (shift > 0) {
9307 shift -= 1;
9308 for (i = 0; i < frameCount4; ++i) {
9309 drflac_uint32 temp0L;
9310 drflac_uint32 temp1L;
9311 drflac_uint32 temp2L;
9312 drflac_uint32 temp3L;
9313 drflac_uint32 temp0R;
9314 drflac_uint32 temp1R;
9315 drflac_uint32 temp2R;
9316 drflac_uint32 temp3R;
9317
9318 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9319 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9320 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9321 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9322
9323 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9324 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9325 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9326 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9327
9328 mid0 = (mid0 << 1) | (side0 & 0x01);
9329 mid1 = (mid1 << 1) | (side1 & 0x01);
9330 mid2 = (mid2 << 1) | (side2 & 0x01);
9331 mid3 = (mid3 << 1) | (side3 & 0x01);
9332
9333 temp0L = (mid0 + side0) << shift;
9334 temp1L = (mid1 + side1) << shift;
9335 temp2L = (mid2 + side2) << shift;
9336 temp3L = (mid3 + side3) << shift;
9337
9338 temp0R = (mid0 - side0) << shift;
9339 temp1R = (mid1 - side1) << shift;
9340 temp2R = (mid2 - side2) << shift;
9341 temp3R = (mid3 - side3) << shift;
9342
9343 pOutputSamples[i*8+0] = (drflac_int32)temp0L;
9344 pOutputSamples[i*8+1] = (drflac_int32)temp0R;
9345 pOutputSamples[i*8+2] = (drflac_int32)temp1L;
9346 pOutputSamples[i*8+3] = (drflac_int32)temp1R;
9347 pOutputSamples[i*8+4] = (drflac_int32)temp2L;
9348 pOutputSamples[i*8+5] = (drflac_int32)temp2R;
9349 pOutputSamples[i*8+6] = (drflac_int32)temp3L;
9350 pOutputSamples[i*8+7] = (drflac_int32)temp3R;
9351 }
9352 } else {
9353 for (i = 0; i < frameCount4; ++i) {
9354 drflac_uint32 temp0L;
9355 drflac_uint32 temp1L;
9356 drflac_uint32 temp2L;
9357 drflac_uint32 temp3L;
9358 drflac_uint32 temp0R;
9359 drflac_uint32 temp1R;
9360 drflac_uint32 temp2R;
9361 drflac_uint32 temp3R;
9362
9363 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9364 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9365 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9366 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9367
9368 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9369 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9370 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9371 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9372
9373 mid0 = (mid0 << 1) | (side0 & 0x01);
9374 mid1 = (mid1 << 1) | (side1 & 0x01);
9375 mid2 = (mid2 << 1) | (side2 & 0x01);
9376 mid3 = (mid3 << 1) | (side3 & 0x01);
9377
9378 temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
9379 temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
9380 temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
9381 temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
9382
9383 temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
9384 temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
9385 temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
9386 temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
9387
9388 pOutputSamples[i*8+0] = (drflac_int32)temp0L;
9389 pOutputSamples[i*8+1] = (drflac_int32)temp0R;
9390 pOutputSamples[i*8+2] = (drflac_int32)temp1L;
9391 pOutputSamples[i*8+3] = (drflac_int32)temp1R;
9392 pOutputSamples[i*8+4] = (drflac_int32)temp2L;
9393 pOutputSamples[i*8+5] = (drflac_int32)temp2R;
9394 pOutputSamples[i*8+6] = (drflac_int32)temp3L;
9395 pOutputSamples[i*8+7] = (drflac_int32)temp3R;
9396 }
9397 }
9398
9399 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9400 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9401 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9402
9403 mid = (mid << 1) | (side & 0x01);
9404
9405 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
9406 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
9407 }
9408}
9409
9410#if defined(DRFLAC_SUPPORT_SSE2)
9411static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9412{
9413 drflac_uint64 i;
9414 drflac_uint64 frameCount4 = frameCount >> 2;
9415 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9416 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9417 drflac_int32 shift = unusedBitsPerSample;
9418
9419 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9420
9421 if (shift == 0) {
9422 for (i = 0; i < frameCount4; ++i) {
9423 __m128i mid;
9424 __m128i side;
9425 __m128i left;
9426 __m128i right;
9427
9428 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9429 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9430
9431 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
9432
9433 left = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
9434 right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
9435
9436 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9437 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9438 }
9439
9440 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9441 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9442 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9443
9444 mid = (mid << 1) | (side & 0x01);
9445
9446 pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
9447 pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
9448 }
9449 } else {
9450 shift -= 1;
9451 for (i = 0; i < frameCount4; ++i) {
9452 __m128i mid;
9453 __m128i side;
9454 __m128i left;
9455 __m128i right;
9456
9457 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9458 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9459
9460 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
9461
9462 left = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
9463 right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
9464
9465 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9466 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9467 }
9468
9469 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9470 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9471 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9472
9473 mid = (mid << 1) | (side & 0x01);
9474
9475 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
9476 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
9477 }
9478 }
9479}
9480#endif
9481
9482#if defined(DRFLAC_SUPPORT_NEON)
9483static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9484{
9485 drflac_uint64 i;
9486 drflac_uint64 frameCount4 = frameCount >> 2;
9487 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9488 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9489 drflac_int32 shift = unusedBitsPerSample;
9490 int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
9491 int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
9492 uint32x4_t one4;
9493
9494 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9495
9496 wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9497 wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9498 one4 = vdupq_n_u32(1);
9499
9500 if (shift == 0) {
9501 for (i = 0; i < frameCount4; ++i) {
9502 uint32x4_t mid;
9503 uint32x4_t side;
9504 int32x4_t left;
9505 int32x4_t right;
9506
9507 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
9508 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
9509
9510 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
9511
9512 left = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
9513 right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
9514
9515 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9516 }
9517
9518 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9519 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9520 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9521
9522 mid = (mid << 1) | (side & 0x01);
9523
9524 pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
9525 pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
9526 }
9527 } else {
9528 int32x4_t shift4;
9529
9530 shift -= 1;
9531 shift4 = vdupq_n_s32(shift);
9532
9533 for (i = 0; i < frameCount4; ++i) {
9534 uint32x4_t mid;
9535 uint32x4_t side;
9536 int32x4_t left;
9537 int32x4_t right;
9538
9539 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
9540 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
9541
9542 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
9543
9544 left = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
9545 right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
9546
9547 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9548 }
9549
9550 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9551 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9552 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9553
9554 mid = (mid << 1) | (side & 0x01);
9555
9556 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
9557 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
9558 }
9559 }
9560}
9561#endif
9562
9563static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9564{
9565#if defined(DRFLAC_SUPPORT_SSE2)
9566 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9567 drflac_read_pcm_frames_s32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9568 } else
9569#elif defined(DRFLAC_SUPPORT_NEON)
9570 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9571 drflac_read_pcm_frames_s32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9572 } else
9573#endif
9574 {
9575 /* Scalar fallback. */
9576#if 0
9577 drflac_read_pcm_frames_s32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9578#else
9579 drflac_read_pcm_frames_s32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9580#endif
9581 }
9582}
9583
9584
9585#if 0
9586static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9587{
9588 for (drflac_uint64 i = 0; i < frameCount; ++i) {
9589 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample));
9590 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample));
9591 }
9592}
9593#endif
9594
9595static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9596{
9597 drflac_uint64 i;
9598 drflac_uint64 frameCount4 = frameCount >> 2;
9599 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9600 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9601 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9602 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9603
9604 for (i = 0; i < frameCount4; ++i) {
9605 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
9606 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
9607 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
9608 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
9609
9610 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
9611 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
9612 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
9613 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
9614
9615 pOutputSamples[i*8+0] = (drflac_int32)tempL0;
9616 pOutputSamples[i*8+1] = (drflac_int32)tempR0;
9617 pOutputSamples[i*8+2] = (drflac_int32)tempL1;
9618 pOutputSamples[i*8+3] = (drflac_int32)tempR1;
9619 pOutputSamples[i*8+4] = (drflac_int32)tempL2;
9620 pOutputSamples[i*8+5] = (drflac_int32)tempR2;
9621 pOutputSamples[i*8+6] = (drflac_int32)tempL3;
9622 pOutputSamples[i*8+7] = (drflac_int32)tempR3;
9623 }
9624
9625 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9626 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9627 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9628 }
9629}
9630
9631#if defined(DRFLAC_SUPPORT_SSE2)
9632static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9633{
9634 drflac_uint64 i;
9635 drflac_uint64 frameCount4 = frameCount >> 2;
9636 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9637 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9638 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9639 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9640
9641 for (i = 0; i < frameCount4; ++i) {
9642 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9643 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9644
9645 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9646 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9647 }
9648
9649 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9650 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9651 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9652 }
9653}
9654#endif
9655
9656#if defined(DRFLAC_SUPPORT_NEON)
9657static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9658{
9659 drflac_uint64 i;
9660 drflac_uint64 frameCount4 = frameCount >> 2;
9661 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9662 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9663 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9664 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9665
9666 int32x4_t shift4_0 = vdupq_n_s32(shift0);
9667 int32x4_t shift4_1 = vdupq_n_s32(shift1);
9668
9669 for (i = 0; i < frameCount4; ++i) {
9670 int32x4_t left;
9671 int32x4_t right;
9672
9673 left = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift4_0));
9674 right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift4_1));
9675
9676 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9677 }
9678
9679 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9680 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9681 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9682 }
9683}
9684#endif
9685
9686static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9687{
9688#if defined(DRFLAC_SUPPORT_SSE2)
9689 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9690 drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9691 } else
9692#elif defined(DRFLAC_SUPPORT_NEON)
9693 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9694 drflac_read_pcm_frames_s32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9695 } else
9696#endif
9697 {
9698 /* Scalar fallback. */
9699#if 0
9700 drflac_read_pcm_frames_s32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9701#else
9702 drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9703#endif
9704 }
9705}
9706
9707
9708DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut)
9709{
9710 drflac_uint64 framesRead;
9711 drflac_uint32 unusedBitsPerSample;
9712
9713 if (pFlac == NULL || framesToRead == 0) {
9714 return 0;
9715 }
9716
9717 if (pBufferOut == NULL) {
9718 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
9719 }
9720
9721 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
9722 unusedBitsPerSample = 32 - pFlac->bitsPerSample;
9723
9724 framesRead = 0;
9725 while (framesToRead > 0) {
9726 /* If we've run out of samples in this frame, go to the next. */
9727 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
9728 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
9729 break; /* Couldn't read the next frame, so just break from the loop and return. */
9730 }
9731 } else {
9732 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
9733 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
9734 drflac_uint64 frameCountThisIteration = framesToRead;
9735
9736 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
9737 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
9738 }
9739
9740 if (channelCount == 2) {
9741 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
9742 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
9743
9744 switch (pFlac->currentFLACFrame.header.channelAssignment)
9745 {
9746 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
9747 {
9748 drflac_read_pcm_frames_s32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9749 } break;
9750
9751 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
9752 {
9753 drflac_read_pcm_frames_s32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9754 } break;
9755
9756 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
9757 {
9758 drflac_read_pcm_frames_s32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9759 } break;
9760
9761 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
9762 default:
9763 {
9764 drflac_read_pcm_frames_s32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9765 } break;
9766 }
9767 } else {
9768 /* Generic interleaving. */
9769 drflac_uint64 i;
9770 for (i = 0; i < frameCountThisIteration; ++i) {
9771 unsigned int j;
9772 for (j = 0; j < channelCount; ++j) {
9773 pBufferOut[(i*channelCount)+j] = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
9774 }
9775 }
9776 }
9777
9778 framesRead += frameCountThisIteration;
9779 pBufferOut += frameCountThisIteration * channelCount;
9780 framesToRead -= frameCountThisIteration;
9781 pFlac->currentPCMFrame += frameCountThisIteration;
9782 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
9783 }
9784 }
9785
9786 return framesRead;
9787}
9788
9789
9790#if 0
9791static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9792{
9793 drflac_uint64 i;
9794 for (i = 0; i < frameCount; ++i) {
9795 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9796 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9797 drflac_uint32 right = left - side;
9798
9799 left >>= 16;
9800 right >>= 16;
9801
9802 pOutputSamples[i*2+0] = (drflac_int16)left;
9803 pOutputSamples[i*2+1] = (drflac_int16)right;
9804 }
9805}
9806#endif
9807
9808static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9809{
9810 drflac_uint64 i;
9811 drflac_uint64 frameCount4 = frameCount >> 2;
9812 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9813 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9814 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9815 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9816
9817 for (i = 0; i < frameCount4; ++i) {
9818 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
9819 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
9820 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
9821 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
9822
9823 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
9824 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
9825 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
9826 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
9827
9828 drflac_uint32 right0 = left0 - side0;
9829 drflac_uint32 right1 = left1 - side1;
9830 drflac_uint32 right2 = left2 - side2;
9831 drflac_uint32 right3 = left3 - side3;
9832
9833 left0 >>= 16;
9834 left1 >>= 16;
9835 left2 >>= 16;
9836 left3 >>= 16;
9837
9838 right0 >>= 16;
9839 right1 >>= 16;
9840 right2 >>= 16;
9841 right3 >>= 16;
9842
9843 pOutputSamples[i*8+0] = (drflac_int16)left0;
9844 pOutputSamples[i*8+1] = (drflac_int16)right0;
9845 pOutputSamples[i*8+2] = (drflac_int16)left1;
9846 pOutputSamples[i*8+3] = (drflac_int16)right1;
9847 pOutputSamples[i*8+4] = (drflac_int16)left2;
9848 pOutputSamples[i*8+5] = (drflac_int16)right2;
9849 pOutputSamples[i*8+6] = (drflac_int16)left3;
9850 pOutputSamples[i*8+7] = (drflac_int16)right3;
9851 }
9852
9853 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9854 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9855 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9856 drflac_uint32 right = left - side;
9857
9858 left >>= 16;
9859 right >>= 16;
9860
9861 pOutputSamples[i*2+0] = (drflac_int16)left;
9862 pOutputSamples[i*2+1] = (drflac_int16)right;
9863 }
9864}
9865
9866#if defined(DRFLAC_SUPPORT_SSE2)
9867static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9868{
9869 drflac_uint64 i;
9870 drflac_uint64 frameCount4 = frameCount >> 2;
9871 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9872 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9873 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9874 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9875
9876 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9877
9878 for (i = 0; i < frameCount4; ++i) {
9879 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9880 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9881 __m128i right = _mm_sub_epi32(left, side);
9882
9883 left = _mm_srai_epi32(left, 16);
9884 right = _mm_srai_epi32(right, 16);
9885
9886 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
9887 }
9888
9889 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9890 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9891 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9892 drflac_uint32 right = left - side;
9893
9894 left >>= 16;
9895 right >>= 16;
9896
9897 pOutputSamples[i*2+0] = (drflac_int16)left;
9898 pOutputSamples[i*2+1] = (drflac_int16)right;
9899 }
9900}
9901#endif
9902
9903#if defined(DRFLAC_SUPPORT_NEON)
9904static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9905{
9906 drflac_uint64 i;
9907 drflac_uint64 frameCount4 = frameCount >> 2;
9908 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9909 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9910 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9911 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9912 int32x4_t shift0_4;
9913 int32x4_t shift1_4;
9914
9915 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9916
9917 shift0_4 = vdupq_n_s32(shift0);
9918 shift1_4 = vdupq_n_s32(shift1);
9919
9920 for (i = 0; i < frameCount4; ++i) {
9921 uint32x4_t left;
9922 uint32x4_t side;
9923 uint32x4_t right;
9924
9925 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9926 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9927 right = vsubq_u32(left, side);
9928
9929 left = vshrq_n_u32(left, 16);
9930 right = vshrq_n_u32(right, 16);
9931
9932 drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
9933 }
9934
9935 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9936 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9937 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9938 drflac_uint32 right = left - side;
9939
9940 left >>= 16;
9941 right >>= 16;
9942
9943 pOutputSamples[i*2+0] = (drflac_int16)left;
9944 pOutputSamples[i*2+1] = (drflac_int16)right;
9945 }
9946}
9947#endif
9948
9949static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9950{
9951#if defined(DRFLAC_SUPPORT_SSE2)
9952 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9953 drflac_read_pcm_frames_s16__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9954 } else
9955#elif defined(DRFLAC_SUPPORT_NEON)
9956 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9957 drflac_read_pcm_frames_s16__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9958 } else
9959#endif
9960 {
9961 /* Scalar fallback. */
9962#if 0
9963 drflac_read_pcm_frames_s16__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9964#else
9965 drflac_read_pcm_frames_s16__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9966#endif
9967 }
9968}
9969
9970
9971#if 0
9972static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9973{
9974 drflac_uint64 i;
9975 for (i = 0; i < frameCount; ++i) {
9976 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9977 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9978 drflac_uint32 left = right + side;
9979
9980 left >>= 16;
9981 right >>= 16;
9982
9983 pOutputSamples[i*2+0] = (drflac_int16)left;
9984 pOutputSamples[i*2+1] = (drflac_int16)right;
9985 }
9986}
9987#endif
9988
9989static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9990{
9991 drflac_uint64 i;
9992 drflac_uint64 frameCount4 = frameCount >> 2;
9993 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9994 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9995 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9996 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9997
9998 for (i = 0; i < frameCount4; ++i) {
9999 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
10000 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
10001 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
10002 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
10003
10004 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
10005 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
10006 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
10007 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
10008
10009 drflac_uint32 left0 = right0 + side0;
10010 drflac_uint32 left1 = right1 + side1;
10011 drflac_uint32 left2 = right2 + side2;
10012 drflac_uint32 left3 = right3 + side3;
10013
10014 left0 >>= 16;
10015 left1 >>= 16;
10016 left2 >>= 16;
10017 left3 >>= 16;
10018
10019 right0 >>= 16;
10020 right1 >>= 16;
10021 right2 >>= 16;
10022 right3 >>= 16;
10023
10024 pOutputSamples[i*8+0] = (drflac_int16)left0;
10025 pOutputSamples[i*8+1] = (drflac_int16)right0;
10026 pOutputSamples[i*8+2] = (drflac_int16)left1;
10027 pOutputSamples[i*8+3] = (drflac_int16)right1;
10028 pOutputSamples[i*8+4] = (drflac_int16)left2;
10029 pOutputSamples[i*8+5] = (drflac_int16)right2;
10030 pOutputSamples[i*8+6] = (drflac_int16)left3;
10031 pOutputSamples[i*8+7] = (drflac_int16)right3;
10032 }
10033
10034 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10035 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10036 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10037 drflac_uint32 left = right + side;
10038
10039 left >>= 16;
10040 right >>= 16;
10041
10042 pOutputSamples[i*2+0] = (drflac_int16)left;
10043 pOutputSamples[i*2+1] = (drflac_int16)right;
10044 }
10045}
10046
10047#if defined(DRFLAC_SUPPORT_SSE2)
10048static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10049{
10050 drflac_uint64 i;
10051 drflac_uint64 frameCount4 = frameCount >> 2;
10052 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10053 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10054 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10055 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10056
10057 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10058
10059 for (i = 0; i < frameCount4; ++i) {
10060 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10061 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10062 __m128i left = _mm_add_epi32(right, side);
10063
10064 left = _mm_srai_epi32(left, 16);
10065 right = _mm_srai_epi32(right, 16);
10066
10067 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10068 }
10069
10070 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10071 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10072 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10073 drflac_uint32 left = right + side;
10074
10075 left >>= 16;
10076 right >>= 16;
10077
10078 pOutputSamples[i*2+0] = (drflac_int16)left;
10079 pOutputSamples[i*2+1] = (drflac_int16)right;
10080 }
10081}
10082#endif
10083
10084#if defined(DRFLAC_SUPPORT_NEON)
10085static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10086{
10087 drflac_uint64 i;
10088 drflac_uint64 frameCount4 = frameCount >> 2;
10089 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10090 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10091 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10092 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10093 int32x4_t shift0_4;
10094 int32x4_t shift1_4;
10095
10096 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10097
10098 shift0_4 = vdupq_n_s32(shift0);
10099 shift1_4 = vdupq_n_s32(shift1);
10100
10101 for (i = 0; i < frameCount4; ++i) {
10102 uint32x4_t side;
10103 uint32x4_t right;
10104 uint32x4_t left;
10105
10106 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10107 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10108 left = vaddq_u32(right, side);
10109
10110 left = vshrq_n_u32(left, 16);
10111 right = vshrq_n_u32(right, 16);
10112
10113 drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
10114 }
10115
10116 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10117 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10118 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10119 drflac_uint32 left = right + side;
10120
10121 left >>= 16;
10122 right >>= 16;
10123
10124 pOutputSamples[i*2+0] = (drflac_int16)left;
10125 pOutputSamples[i*2+1] = (drflac_int16)right;
10126 }
10127}
10128#endif
10129
10130static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10131{
10132#if defined(DRFLAC_SUPPORT_SSE2)
10133 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10134 drflac_read_pcm_frames_s16__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10135 } else
10136#elif defined(DRFLAC_SUPPORT_NEON)
10137 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10138 drflac_read_pcm_frames_s16__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10139 } else
10140#endif
10141 {
10142 /* Scalar fallback. */
10143#if 0
10144 drflac_read_pcm_frames_s16__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10145#else
10146 drflac_read_pcm_frames_s16__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10147#endif
10148 }
10149}
10150
10151
10152#if 0
10153static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10154{
10155 for (drflac_uint64 i = 0; i < frameCount; ++i) {
10156 drflac_uint32 mid = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10157 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10158
10159 mid = (mid << 1) | (side & 0x01);
10160
10161 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
10162 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
10163 }
10164}
10165#endif
10166
10167static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10168{
10169 drflac_uint64 i;
10170 drflac_uint64 frameCount4 = frameCount >> 2;
10171 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10172 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10173 drflac_uint32 shift = unusedBitsPerSample;
10174
10175 if (shift > 0) {
10176 shift -= 1;
10177 for (i = 0; i < frameCount4; ++i) {
10178 drflac_uint32 temp0L;
10179 drflac_uint32 temp1L;
10180 drflac_uint32 temp2L;
10181 drflac_uint32 temp3L;
10182 drflac_uint32 temp0R;
10183 drflac_uint32 temp1R;
10184 drflac_uint32 temp2R;
10185 drflac_uint32 temp3R;
10186
10187 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10188 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10189 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10190 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10191
10192 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10193 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10194 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10195 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10196
10197 mid0 = (mid0 << 1) | (side0 & 0x01);
10198 mid1 = (mid1 << 1) | (side1 & 0x01);
10199 mid2 = (mid2 << 1) | (side2 & 0x01);
10200 mid3 = (mid3 << 1) | (side3 & 0x01);
10201
10202 temp0L = (mid0 + side0) << shift;
10203 temp1L = (mid1 + side1) << shift;
10204 temp2L = (mid2 + side2) << shift;
10205 temp3L = (mid3 + side3) << shift;
10206
10207 temp0R = (mid0 - side0) << shift;
10208 temp1R = (mid1 - side1) << shift;
10209 temp2R = (mid2 - side2) << shift;
10210 temp3R = (mid3 - side3) << shift;
10211
10212 temp0L >>= 16;
10213 temp1L >>= 16;
10214 temp2L >>= 16;
10215 temp3L >>= 16;
10216
10217 temp0R >>= 16;
10218 temp1R >>= 16;
10219 temp2R >>= 16;
10220 temp3R >>= 16;
10221
10222 pOutputSamples[i*8+0] = (drflac_int16)temp0L;
10223 pOutputSamples[i*8+1] = (drflac_int16)temp0R;
10224 pOutputSamples[i*8+2] = (drflac_int16)temp1L;
10225 pOutputSamples[i*8+3] = (drflac_int16)temp1R;
10226 pOutputSamples[i*8+4] = (drflac_int16)temp2L;
10227 pOutputSamples[i*8+5] = (drflac_int16)temp2R;
10228 pOutputSamples[i*8+6] = (drflac_int16)temp3L;
10229 pOutputSamples[i*8+7] = (drflac_int16)temp3R;
10230 }
10231 } else {
10232 for (i = 0; i < frameCount4; ++i) {
10233 drflac_uint32 temp0L;
10234 drflac_uint32 temp1L;
10235 drflac_uint32 temp2L;
10236 drflac_uint32 temp3L;
10237 drflac_uint32 temp0R;
10238 drflac_uint32 temp1R;
10239 drflac_uint32 temp2R;
10240 drflac_uint32 temp3R;
10241
10242 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10243 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10244 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10245 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10246
10247 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10248 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10249 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10250 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10251
10252 mid0 = (mid0 << 1) | (side0 & 0x01);
10253 mid1 = (mid1 << 1) | (side1 & 0x01);
10254 mid2 = (mid2 << 1) | (side2 & 0x01);
10255 mid3 = (mid3 << 1) | (side3 & 0x01);
10256
10257 temp0L = ((drflac_int32)(mid0 + side0) >> 1);
10258 temp1L = ((drflac_int32)(mid1 + side1) >> 1);
10259 temp2L = ((drflac_int32)(mid2 + side2) >> 1);
10260 temp3L = ((drflac_int32)(mid3 + side3) >> 1);
10261
10262 temp0R = ((drflac_int32)(mid0 - side0) >> 1);
10263 temp1R = ((drflac_int32)(mid1 - side1) >> 1);
10264 temp2R = ((drflac_int32)(mid2 - side2) >> 1);
10265 temp3R = ((drflac_int32)(mid3 - side3) >> 1);
10266
10267 temp0L >>= 16;
10268 temp1L >>= 16;
10269 temp2L >>= 16;
10270 temp3L >>= 16;
10271
10272 temp0R >>= 16;
10273 temp1R >>= 16;
10274 temp2R >>= 16;
10275 temp3R >>= 16;
10276
10277 pOutputSamples[i*8+0] = (drflac_int16)temp0L;
10278 pOutputSamples[i*8+1] = (drflac_int16)temp0R;
10279 pOutputSamples[i*8+2] = (drflac_int16)temp1L;
10280 pOutputSamples[i*8+3] = (drflac_int16)temp1R;
10281 pOutputSamples[i*8+4] = (drflac_int16)temp2L;
10282 pOutputSamples[i*8+5] = (drflac_int16)temp2R;
10283 pOutputSamples[i*8+6] = (drflac_int16)temp3L;
10284 pOutputSamples[i*8+7] = (drflac_int16)temp3R;
10285 }
10286 }
10287
10288 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10289 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10290 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10291
10292 mid = (mid << 1) | (side & 0x01);
10293
10294 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
10295 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
10296 }
10297}
10298
10299#if defined(DRFLAC_SUPPORT_SSE2)
10300static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10301{
10302 drflac_uint64 i;
10303 drflac_uint64 frameCount4 = frameCount >> 2;
10304 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10305 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10306 drflac_uint32 shift = unusedBitsPerSample;
10307
10308 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10309
10310 if (shift == 0) {
10311 for (i = 0; i < frameCount4; ++i) {
10312 __m128i mid;
10313 __m128i side;
10314 __m128i left;
10315 __m128i right;
10316
10317 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10318 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10319
10320 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10321
10322 left = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
10323 right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
10324
10325 left = _mm_srai_epi32(left, 16);
10326 right = _mm_srai_epi32(right, 16);
10327
10328 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10329 }
10330
10331 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10332 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10333 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10334
10335 mid = (mid << 1) | (side & 0x01);
10336
10337 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
10338 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
10339 }
10340 } else {
10341 shift -= 1;
10342 for (i = 0; i < frameCount4; ++i) {
10343 __m128i mid;
10344 __m128i side;
10345 __m128i left;
10346 __m128i right;
10347
10348 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10349 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10350
10351 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10352
10353 left = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
10354 right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
10355
10356 left = _mm_srai_epi32(left, 16);
10357 right = _mm_srai_epi32(right, 16);
10358
10359 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10360 }
10361
10362 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10363 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10364 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10365
10366 mid = (mid << 1) | (side & 0x01);
10367
10368 pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
10369 pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
10370 }
10371 }
10372}
10373#endif
10374
10375#if defined(DRFLAC_SUPPORT_NEON)
10376static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10377{
10378 drflac_uint64 i;
10379 drflac_uint64 frameCount4 = frameCount >> 2;
10380 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10381 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10382 drflac_uint32 shift = unusedBitsPerSample;
10383 int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
10384 int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
10385
10386 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10387
10388 wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10389 wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10390
10391 if (shift == 0) {
10392 for (i = 0; i < frameCount4; ++i) {
10393 uint32x4_t mid;
10394 uint32x4_t side;
10395 int32x4_t left;
10396 int32x4_t right;
10397
10398 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
10399 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
10400
10401 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
10402
10403 left = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
10404 right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
10405
10406 left = vshrq_n_s32(left, 16);
10407 right = vshrq_n_s32(right, 16);
10408
10409 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10410 }
10411
10412 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10413 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10414 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10415
10416 mid = (mid << 1) | (side & 0x01);
10417
10418 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
10419 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
10420 }
10421 } else {
10422 int32x4_t shift4;
10423
10424 shift -= 1;
10425 shift4 = vdupq_n_s32(shift);
10426
10427 for (i = 0; i < frameCount4; ++i) {
10428 uint32x4_t mid;
10429 uint32x4_t side;
10430 int32x4_t left;
10431 int32x4_t right;
10432
10433 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
10434 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
10435
10436 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
10437
10438 left = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
10439 right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
10440
10441 left = vshrq_n_s32(left, 16);
10442 right = vshrq_n_s32(right, 16);
10443
10444 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10445 }
10446
10447 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10448 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10449 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10450
10451 mid = (mid << 1) | (side & 0x01);
10452
10453 pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
10454 pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
10455 }
10456 }
10457}
10458#endif
10459
10460static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10461{
10462#if defined(DRFLAC_SUPPORT_SSE2)
10463 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10464 drflac_read_pcm_frames_s16__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10465 } else
10466#elif defined(DRFLAC_SUPPORT_NEON)
10467 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10468 drflac_read_pcm_frames_s16__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10469 } else
10470#endif
10471 {
10472 /* Scalar fallback. */
10473#if 0
10474 drflac_read_pcm_frames_s16__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10475#else
10476 drflac_read_pcm_frames_s16__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10477#endif
10478 }
10479}
10480
10481
10482#if 0
10483static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10484{
10485 for (drflac_uint64 i = 0; i < frameCount; ++i) {
10486 pOutputSamples[i*2+0] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) >> 16);
10487 pOutputSamples[i*2+1] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) >> 16);
10488 }
10489}
10490#endif
10491
10492static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10493{
10494 drflac_uint64 i;
10495 drflac_uint64 frameCount4 = frameCount >> 2;
10496 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10497 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10498 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10499 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10500
10501 for (i = 0; i < frameCount4; ++i) {
10502 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
10503 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
10504 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
10505 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
10506
10507 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
10508 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
10509 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
10510 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
10511
10512 tempL0 >>= 16;
10513 tempL1 >>= 16;
10514 tempL2 >>= 16;
10515 tempL3 >>= 16;
10516
10517 tempR0 >>= 16;
10518 tempR1 >>= 16;
10519 tempR2 >>= 16;
10520 tempR3 >>= 16;
10521
10522 pOutputSamples[i*8+0] = (drflac_int16)tempL0;
10523 pOutputSamples[i*8+1] = (drflac_int16)tempR0;
10524 pOutputSamples[i*8+2] = (drflac_int16)tempL1;
10525 pOutputSamples[i*8+3] = (drflac_int16)tempR1;
10526 pOutputSamples[i*8+4] = (drflac_int16)tempL2;
10527 pOutputSamples[i*8+5] = (drflac_int16)tempR2;
10528 pOutputSamples[i*8+6] = (drflac_int16)tempL3;
10529 pOutputSamples[i*8+7] = (drflac_int16)tempR3;
10530 }
10531
10532 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10533 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10534 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10535 }
10536}
10537
10538#if defined(DRFLAC_SUPPORT_SSE2)
10539static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10540{
10541 drflac_uint64 i;
10542 drflac_uint64 frameCount4 = frameCount >> 2;
10543 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10544 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10545 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10546 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10547
10548 for (i = 0; i < frameCount4; ++i) {
10549 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10550 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10551
10552 left = _mm_srai_epi32(left, 16);
10553 right = _mm_srai_epi32(right, 16);
10554
10555 /* At this point we have results. We can now pack and interleave these into a single __m128i object and then store the in the output buffer. */
10556 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10557 }
10558
10559 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10560 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10561 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10562 }
10563}
10564#endif
10565
10566#if defined(DRFLAC_SUPPORT_NEON)
10567static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10568{
10569 drflac_uint64 i;
10570 drflac_uint64 frameCount4 = frameCount >> 2;
10571 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10572 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10573 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10574 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10575
10576 int32x4_t shift0_4 = vdupq_n_s32(shift0);
10577 int32x4_t shift1_4 = vdupq_n_s32(shift1);
10578
10579 for (i = 0; i < frameCount4; ++i) {
10580 int32x4_t left;
10581 int32x4_t right;
10582
10583 left = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
10584 right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
10585
10586 left = vshrq_n_s32(left, 16);
10587 right = vshrq_n_s32(right, 16);
10588
10589 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10590 }
10591
10592 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10593 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10594 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10595 }
10596}
10597#endif
10598
10599static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10600{
10601#if defined(DRFLAC_SUPPORT_SSE2)
10602 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10603 drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10604 } else
10605#elif defined(DRFLAC_SUPPORT_NEON)
10606 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10607 drflac_read_pcm_frames_s16__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10608 } else
10609#endif
10610 {
10611 /* Scalar fallback. */
10612#if 0
10613 drflac_read_pcm_frames_s16__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10614#else
10615 drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10616#endif
10617 }
10618}
10619
10620DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut)
10621{
10622 drflac_uint64 framesRead;
10623 drflac_uint32 unusedBitsPerSample;
10624
10625 if (pFlac == NULL || framesToRead == 0) {
10626 return 0;
10627 }
10628
10629 if (pBufferOut == NULL) {
10630 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
10631 }
10632
10633 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
10634 unusedBitsPerSample = 32 - pFlac->bitsPerSample;
10635
10636 framesRead = 0;
10637 while (framesToRead > 0) {
10638 /* If we've run out of samples in this frame, go to the next. */
10639 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
10640 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
10641 break; /* Couldn't read the next frame, so just break from the loop and return. */
10642 }
10643 } else {
10644 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
10645 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
10646 drflac_uint64 frameCountThisIteration = framesToRead;
10647
10648 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
10649 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
10650 }
10651
10652 if (channelCount == 2) {
10653 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
10654 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
10655
10656 switch (pFlac->currentFLACFrame.header.channelAssignment)
10657 {
10658 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
10659 {
10660 drflac_read_pcm_frames_s16__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10661 } break;
10662
10663 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
10664 {
10665 drflac_read_pcm_frames_s16__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10666 } break;
10667
10668 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
10669 {
10670 drflac_read_pcm_frames_s16__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10671 } break;
10672
10673 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
10674 default:
10675 {
10676 drflac_read_pcm_frames_s16__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10677 } break;
10678 }
10679 } else {
10680 /* Generic interleaving. */
10681 drflac_uint64 i;
10682 for (i = 0; i < frameCountThisIteration; ++i) {
10683 unsigned int j;
10684 for (j = 0; j < channelCount; ++j) {
10685 drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
10686 pBufferOut[(i*channelCount)+j] = (drflac_int16)(sampleS32 >> 16);
10687 }
10688 }
10689 }
10690
10691 framesRead += frameCountThisIteration;
10692 pBufferOut += frameCountThisIteration * channelCount;
10693 framesToRead -= frameCountThisIteration;
10694 pFlac->currentPCMFrame += frameCountThisIteration;
10695 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
10696 }
10697 }
10698
10699 return framesRead;
10700}
10701
10702
10703#if 0
10704static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10705{
10706 drflac_uint64 i;
10707 for (i = 0; i < frameCount; ++i) {
10708 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10709 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10710 drflac_uint32 right = left - side;
10711
10712 pOutputSamples[i*2+0] = (float)((drflac_int32)left / 2147483648.0);
10713 pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
10714 }
10715}
10716#endif
10717
10718static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10719{
10720 drflac_uint64 i;
10721 drflac_uint64 frameCount4 = frameCount >> 2;
10722 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10723 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10724 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10725 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10726
10727 float factor = 1 / 2147483648.0;
10728
10729 for (i = 0; i < frameCount4; ++i) {
10730 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
10731 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
10732 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
10733 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
10734
10735 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
10736 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
10737 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
10738 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
10739
10740 drflac_uint32 right0 = left0 - side0;
10741 drflac_uint32 right1 = left1 - side1;
10742 drflac_uint32 right2 = left2 - side2;
10743 drflac_uint32 right3 = left3 - side3;
10744
10745 pOutputSamples[i*8+0] = (drflac_int32)left0 * factor;
10746 pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
10747 pOutputSamples[i*8+2] = (drflac_int32)left1 * factor;
10748 pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
10749 pOutputSamples[i*8+4] = (drflac_int32)left2 * factor;
10750 pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
10751 pOutputSamples[i*8+6] = (drflac_int32)left3 * factor;
10752 pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
10753 }
10754
10755 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10756 drflac_uint32 left = pInputSamples0U32[i] << shift0;
10757 drflac_uint32 side = pInputSamples1U32[i] << shift1;
10758 drflac_uint32 right = left - side;
10759
10760 pOutputSamples[i*2+0] = (drflac_int32)left * factor;
10761 pOutputSamples[i*2+1] = (drflac_int32)right * factor;
10762 }
10763}
10764
10765#if defined(DRFLAC_SUPPORT_SSE2)
10766static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10767{
10768 drflac_uint64 i;
10769 drflac_uint64 frameCount4 = frameCount >> 2;
10770 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10771 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10772 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10773 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10774 __m128 factor;
10775
10776 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10777
10778 factor = _mm_set1_ps(1.0f / 8388608.0f);
10779
10780 for (i = 0; i < frameCount4; ++i) {
10781 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10782 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10783 __m128i right = _mm_sub_epi32(left, side);
10784 __m128 leftf = _mm_mul_ps(_mm_cvtepi32_ps(left), factor);
10785 __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
10786
10787 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10788 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10789 }
10790
10791 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10792 drflac_uint32 left = pInputSamples0U32[i] << shift0;
10793 drflac_uint32 side = pInputSamples1U32[i] << shift1;
10794 drflac_uint32 right = left - side;
10795
10796 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10797 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10798 }
10799}
10800#endif
10801
10802#if defined(DRFLAC_SUPPORT_NEON)
10803static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10804{
10805 drflac_uint64 i;
10806 drflac_uint64 frameCount4 = frameCount >> 2;
10807 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10808 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10809 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10810 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10811 float32x4_t factor4;
10812 int32x4_t shift0_4;
10813 int32x4_t shift1_4;
10814
10815 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10816
10817 factor4 = vdupq_n_f32(1.0f / 8388608.0f);
10818 shift0_4 = vdupq_n_s32(shift0);
10819 shift1_4 = vdupq_n_s32(shift1);
10820
10821 for (i = 0; i < frameCount4; ++i) {
10822 uint32x4_t left;
10823 uint32x4_t side;
10824 uint32x4_t right;
10825 float32x4_t leftf;
10826 float32x4_t rightf;
10827
10828 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10829 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10830 right = vsubq_u32(left, side);
10831 leftf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)), factor4);
10832 rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
10833
10834 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
10835 }
10836
10837 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10838 drflac_uint32 left = pInputSamples0U32[i] << shift0;
10839 drflac_uint32 side = pInputSamples1U32[i] << shift1;
10840 drflac_uint32 right = left - side;
10841
10842 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10843 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10844 }
10845}
10846#endif
10847
10848static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10849{
10850#if defined(DRFLAC_SUPPORT_SSE2)
10851 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10852 drflac_read_pcm_frames_f32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10853 } else
10854#elif defined(DRFLAC_SUPPORT_NEON)
10855 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10856 drflac_read_pcm_frames_f32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10857 } else
10858#endif
10859 {
10860 /* Scalar fallback. */
10861#if 0
10862 drflac_read_pcm_frames_f32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10863#else
10864 drflac_read_pcm_frames_f32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10865#endif
10866 }
10867}
10868
10869
10870#if 0
10871static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10872{
10873 drflac_uint64 i;
10874 for (i = 0; i < frameCount; ++i) {
10875 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10876 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10877 drflac_uint32 left = right + side;
10878
10879 pOutputSamples[i*2+0] = (float)((drflac_int32)left / 2147483648.0);
10880 pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
10881 }
10882}
10883#endif
10884
10885static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10886{
10887 drflac_uint64 i;
10888 drflac_uint64 frameCount4 = frameCount >> 2;
10889 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10890 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10891 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10892 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10893 float factor = 1 / 2147483648.0;
10894
10895 for (i = 0; i < frameCount4; ++i) {
10896 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
10897 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
10898 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
10899 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
10900
10901 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
10902 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
10903 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
10904 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
10905
10906 drflac_uint32 left0 = right0 + side0;
10907 drflac_uint32 left1 = right1 + side1;
10908 drflac_uint32 left2 = right2 + side2;
10909 drflac_uint32 left3 = right3 + side3;
10910
10911 pOutputSamples[i*8+0] = (drflac_int32)left0 * factor;
10912 pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
10913 pOutputSamples[i*8+2] = (drflac_int32)left1 * factor;
10914 pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
10915 pOutputSamples[i*8+4] = (drflac_int32)left2 * factor;
10916 pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
10917 pOutputSamples[i*8+6] = (drflac_int32)left3 * factor;
10918 pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
10919 }
10920
10921 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10922 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10923 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10924 drflac_uint32 left = right + side;
10925
10926 pOutputSamples[i*2+0] = (drflac_int32)left * factor;
10927 pOutputSamples[i*2+1] = (drflac_int32)right * factor;
10928 }
10929}
10930
10931#if defined(DRFLAC_SUPPORT_SSE2)
10932static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10933{
10934 drflac_uint64 i;
10935 drflac_uint64 frameCount4 = frameCount >> 2;
10936 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10937 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10938 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10939 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10940 __m128 factor;
10941
10942 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10943
10944 factor = _mm_set1_ps(1.0f / 8388608.0f);
10945
10946 for (i = 0; i < frameCount4; ++i) {
10947 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10948 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10949 __m128i left = _mm_add_epi32(right, side);
10950 __m128 leftf = _mm_mul_ps(_mm_cvtepi32_ps(left), factor);
10951 __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
10952
10953 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10954 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10955 }
10956
10957 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10958 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10959 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10960 drflac_uint32 left = right + side;
10961
10962 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10963 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10964 }
10965}
10966#endif
10967
10968#if defined(DRFLAC_SUPPORT_NEON)
10969static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10970{
10971 drflac_uint64 i;
10972 drflac_uint64 frameCount4 = frameCount >> 2;
10973 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10974 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10975 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10976 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10977 float32x4_t factor4;
10978 int32x4_t shift0_4;
10979 int32x4_t shift1_4;
10980
10981 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10982
10983 factor4 = vdupq_n_f32(1.0f / 8388608.0f);
10984 shift0_4 = vdupq_n_s32(shift0);
10985 shift1_4 = vdupq_n_s32(shift1);
10986
10987 for (i = 0; i < frameCount4; ++i) {
10988 uint32x4_t side;
10989 uint32x4_t right;
10990 uint32x4_t left;
10991 float32x4_t leftf;
10992 float32x4_t rightf;
10993
10994 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10995 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10996 left = vaddq_u32(right, side);
10997 leftf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)), factor4);
10998 rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
10999
11000 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11001 }
11002
11003 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11004 drflac_uint32 side = pInputSamples0U32[i] << shift0;
11005 drflac_uint32 right = pInputSamples1U32[i] << shift1;
11006 drflac_uint32 left = right + side;
11007
11008 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
11009 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
11010 }
11011}
11012#endif
11013
11014static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11015{
11016#if defined(DRFLAC_SUPPORT_SSE2)
11017 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11018 drflac_read_pcm_frames_f32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11019 } else
11020#elif defined(DRFLAC_SUPPORT_NEON)
11021 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11022 drflac_read_pcm_frames_f32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11023 } else
11024#endif
11025 {
11026 /* Scalar fallback. */
11027#if 0
11028 drflac_read_pcm_frames_f32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11029#else
11030 drflac_read_pcm_frames_f32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11031#endif
11032 }
11033}
11034
11035
11036#if 0
11037static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11038{
11039 for (drflac_uint64 i = 0; i < frameCount; ++i) {
11040 drflac_uint32 mid = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11041 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11042
11043 mid = (mid << 1) | (side & 0x01);
11044
11045 pOutputSamples[i*2+0] = (float)((((drflac_int32)(mid + side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
11046 pOutputSamples[i*2+1] = (float)((((drflac_int32)(mid - side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
11047 }
11048}
11049#endif
11050
11051static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11052{
11053 drflac_uint64 i;
11054 drflac_uint64 frameCount4 = frameCount >> 2;
11055 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11056 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11057 drflac_uint32 shift = unusedBitsPerSample;
11058 float factor = 1 / 2147483648.0;
11059
11060 if (shift > 0) {
11061 shift -= 1;
11062 for (i = 0; i < frameCount4; ++i) {
11063 drflac_uint32 temp0L;
11064 drflac_uint32 temp1L;
11065 drflac_uint32 temp2L;
11066 drflac_uint32 temp3L;
11067 drflac_uint32 temp0R;
11068 drflac_uint32 temp1R;
11069 drflac_uint32 temp2R;
11070 drflac_uint32 temp3R;
11071
11072 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11073 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11074 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11075 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11076
11077 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11078 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11079 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11080 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11081
11082 mid0 = (mid0 << 1) | (side0 & 0x01);
11083 mid1 = (mid1 << 1) | (side1 & 0x01);
11084 mid2 = (mid2 << 1) | (side2 & 0x01);
11085 mid3 = (mid3 << 1) | (side3 & 0x01);
11086
11087 temp0L = (mid0 + side0) << shift;
11088 temp1L = (mid1 + side1) << shift;
11089 temp2L = (mid2 + side2) << shift;
11090 temp3L = (mid3 + side3) << shift;
11091
11092 temp0R = (mid0 - side0) << shift;
11093 temp1R = (mid1 - side1) << shift;
11094 temp2R = (mid2 - side2) << shift;
11095 temp3R = (mid3 - side3) << shift;
11096
11097 pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
11098 pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
11099 pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
11100 pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
11101 pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
11102 pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
11103 pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
11104 pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
11105 }
11106 } else {
11107 for (i = 0; i < frameCount4; ++i) {
11108 drflac_uint32 temp0L;
11109 drflac_uint32 temp1L;
11110 drflac_uint32 temp2L;
11111 drflac_uint32 temp3L;
11112 drflac_uint32 temp0R;
11113 drflac_uint32 temp1R;
11114 drflac_uint32 temp2R;
11115 drflac_uint32 temp3R;
11116
11117 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11118 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11119 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11120 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11121
11122 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11123 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11124 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11125 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11126
11127 mid0 = (mid0 << 1) | (side0 & 0x01);
11128 mid1 = (mid1 << 1) | (side1 & 0x01);
11129 mid2 = (mid2 << 1) | (side2 & 0x01);
11130 mid3 = (mid3 << 1) | (side3 & 0x01);
11131
11132 temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
11133 temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
11134 temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
11135 temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
11136
11137 temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
11138 temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
11139 temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
11140 temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
11141
11142 pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
11143 pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
11144 pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
11145 pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
11146 pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
11147 pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
11148 pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
11149 pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
11150 }
11151 }
11152
11153 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11154 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11155 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11156
11157 mid = (mid << 1) | (side & 0x01);
11158
11159 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) * factor;
11160 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) * factor;
11161 }
11162}
11163
11164#if defined(DRFLAC_SUPPORT_SSE2)
11165static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11166{
11167 drflac_uint64 i;
11168 drflac_uint64 frameCount4 = frameCount >> 2;
11169 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11170 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11171 drflac_uint32 shift = unusedBitsPerSample - 8;
11172 float factor;
11173 __m128 factor128;
11174
11175 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
11176
11177 factor = 1.0f / 8388608.0f;
11178 factor128 = _mm_set1_ps(factor);
11179
11180 if (shift == 0) {
11181 for (i = 0; i < frameCount4; ++i) {
11182 __m128i mid;
11183 __m128i side;
11184 __m128i tempL;
11185 __m128i tempR;
11186 __m128 leftf;
11187 __m128 rightf;
11188
11189 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
11190 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
11191
11192 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
11193
11194 tempL = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
11195 tempR = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
11196
11197 leftf = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
11198 rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
11199
11200 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11201 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11202 }
11203
11204 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11205 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11206 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11207
11208 mid = (mid << 1) | (side & 0x01);
11209
11210 pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
11211 pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
11212 }
11213 } else {
11214 shift -= 1;
11215 for (i = 0; i < frameCount4; ++i) {
11216 __m128i mid;
11217 __m128i side;
11218 __m128i tempL;
11219 __m128i tempR;
11220 __m128 leftf;
11221 __m128 rightf;
11222
11223 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
11224 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
11225
11226 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
11227
11228 tempL = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
11229 tempR = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
11230
11231 leftf = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
11232 rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
11233
11234 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11235 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11236 }
11237
11238 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11239 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11240 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11241
11242 mid = (mid << 1) | (side & 0x01);
11243
11244 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
11245 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
11246 }
11247 }
11248}
11249#endif
11250
11251#if defined(DRFLAC_SUPPORT_NEON)
11252static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11253{
11254 drflac_uint64 i;
11255 drflac_uint64 frameCount4 = frameCount >> 2;
11256 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11257 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11258 drflac_uint32 shift = unusedBitsPerSample - 8;
11259 float factor;
11260 float32x4_t factor4;
11261 int32x4_t shift4;
11262 int32x4_t wbps0_4; /* Wasted Bits Per Sample */
11263 int32x4_t wbps1_4; /* Wasted Bits Per Sample */
11264
11265 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
11266
11267 factor = 1.0f / 8388608.0f;
11268 factor4 = vdupq_n_f32(factor);
11269 wbps0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
11270 wbps1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
11271
11272 if (shift == 0) {
11273 for (i = 0; i < frameCount4; ++i) {
11274 int32x4_t lefti;
11275 int32x4_t righti;
11276 float32x4_t leftf;
11277 float32x4_t rightf;
11278
11279 uint32x4_t mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
11280 uint32x4_t side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
11281
11282 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
11283
11284 lefti = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
11285 righti = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
11286
11287 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11288 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11289
11290 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11291 }
11292
11293 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11294 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11295 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11296
11297 mid = (mid << 1) | (side & 0x01);
11298
11299 pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
11300 pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
11301 }
11302 } else {
11303 shift -= 1;
11304 shift4 = vdupq_n_s32(shift);
11305 for (i = 0; i < frameCount4; ++i) {
11306 uint32x4_t mid;
11307 uint32x4_t side;
11308 int32x4_t lefti;
11309 int32x4_t righti;
11310 float32x4_t leftf;
11311 float32x4_t rightf;
11312
11313 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
11314 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
11315
11316 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
11317
11318 lefti = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
11319 righti = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
11320
11321 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11322 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11323
11324 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11325 }
11326
11327 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11328 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11329 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11330
11331 mid = (mid << 1) | (side & 0x01);
11332
11333 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
11334 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
11335 }
11336 }
11337}
11338#endif
11339
11340static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11341{
11342#if defined(DRFLAC_SUPPORT_SSE2)
11343 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11344 drflac_read_pcm_frames_f32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11345 } else
11346#elif defined(DRFLAC_SUPPORT_NEON)
11347 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11348 drflac_read_pcm_frames_f32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11349 } else
11350#endif
11351 {
11352 /* Scalar fallback. */
11353#if 0
11354 drflac_read_pcm_frames_f32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11355#else
11356 drflac_read_pcm_frames_f32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11357#endif
11358 }
11359}
11360
11361#if 0
11362static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11363{
11364 for (drflac_uint64 i = 0; i < frameCount; ++i) {
11365 pOutputSamples[i*2+0] = (float)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) / 2147483648.0);
11366 pOutputSamples[i*2+1] = (float)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) / 2147483648.0);
11367 }
11368}
11369#endif
11370
11371static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11372{
11373 drflac_uint64 i;
11374 drflac_uint64 frameCount4 = frameCount >> 2;
11375 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11376 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11377 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11378 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11379 float factor = 1 / 2147483648.0;
11380
11381 for (i = 0; i < frameCount4; ++i) {
11382 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
11383 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
11384 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
11385 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
11386
11387 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
11388 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
11389 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
11390 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
11391
11392 pOutputSamples[i*8+0] = (drflac_int32)tempL0 * factor;
11393 pOutputSamples[i*8+1] = (drflac_int32)tempR0 * factor;
11394 pOutputSamples[i*8+2] = (drflac_int32)tempL1 * factor;
11395 pOutputSamples[i*8+3] = (drflac_int32)tempR1 * factor;
11396 pOutputSamples[i*8+4] = (drflac_int32)tempL2 * factor;
11397 pOutputSamples[i*8+5] = (drflac_int32)tempR2 * factor;
11398 pOutputSamples[i*8+6] = (drflac_int32)tempL3 * factor;
11399 pOutputSamples[i*8+7] = (drflac_int32)tempR3 * factor;
11400 }
11401
11402 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11403 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11404 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11405 }
11406}
11407
11408#if defined(DRFLAC_SUPPORT_SSE2)
11409static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11410{
11411 drflac_uint64 i;
11412 drflac_uint64 frameCount4 = frameCount >> 2;
11413 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11414 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11415 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
11416 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
11417
11418 float factor = 1.0f / 8388608.0f;
11419 __m128 factor128 = _mm_set1_ps(factor);
11420
11421 for (i = 0; i < frameCount4; ++i) {
11422 __m128i lefti;
11423 __m128i righti;
11424 __m128 leftf;
11425 __m128 rightf;
11426
11427 lefti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
11428 righti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
11429
11430 leftf = _mm_mul_ps(_mm_cvtepi32_ps(lefti), factor128);
11431 rightf = _mm_mul_ps(_mm_cvtepi32_ps(righti), factor128);
11432
11433 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11434 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11435 }
11436
11437 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11438 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11439 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11440 }
11441}
11442#endif
11443
11444#if defined(DRFLAC_SUPPORT_NEON)
11445static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11446{
11447 drflac_uint64 i;
11448 drflac_uint64 frameCount4 = frameCount >> 2;
11449 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11450 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11451 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
11452 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
11453
11454 float factor = 1.0f / 8388608.0f;
11455 float32x4_t factor4 = vdupq_n_f32(factor);
11456 int32x4_t shift0_4 = vdupq_n_s32(shift0);
11457 int32x4_t shift1_4 = vdupq_n_s32(shift1);
11458
11459 for (i = 0; i < frameCount4; ++i) {
11460 int32x4_t lefti;
11461 int32x4_t righti;
11462 float32x4_t leftf;
11463 float32x4_t rightf;
11464
11465 lefti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
11466 righti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
11467
11468 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11469 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11470
11471 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11472 }
11473
11474 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11475 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11476 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11477 }
11478}
11479#endif
11480
11481static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11482{
11483#if defined(DRFLAC_SUPPORT_SSE2)
11484 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11485 drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11486 } else
11487#elif defined(DRFLAC_SUPPORT_NEON)
11488 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11489 drflac_read_pcm_frames_f32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11490 } else
11491#endif
11492 {
11493 /* Scalar fallback. */
11494#if 0
11495 drflac_read_pcm_frames_f32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11496#else
11497 drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11498#endif
11499 }
11500}
11501
11502DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut)
11503{
11504 drflac_uint64 framesRead;
11505 drflac_uint32 unusedBitsPerSample;
11506
11507 if (pFlac == NULL || framesToRead == 0) {
11508 return 0;
11509 }
11510
11511 if (pBufferOut == NULL) {
11512 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
11513 }
11514
11515 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
11516 unusedBitsPerSample = 32 - pFlac->bitsPerSample;
11517
11518 framesRead = 0;
11519 while (framesToRead > 0) {
11520 /* If we've run out of samples in this frame, go to the next. */
11521 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
11522 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
11523 break; /* Couldn't read the next frame, so just break from the loop and return. */
11524 }
11525 } else {
11526 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
11527 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
11528 drflac_uint64 frameCountThisIteration = framesToRead;
11529
11530 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
11531 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
11532 }
11533
11534 if (channelCount == 2) {
11535 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
11536 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
11537
11538 switch (pFlac->currentFLACFrame.header.channelAssignment)
11539 {
11540 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
11541 {
11542 drflac_read_pcm_frames_f32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11543 } break;
11544
11545 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
11546 {
11547 drflac_read_pcm_frames_f32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11548 } break;
11549
11550 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
11551 {
11552 drflac_read_pcm_frames_f32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11553 } break;
11554
11555 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
11556 default:
11557 {
11558 drflac_read_pcm_frames_f32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11559 } break;
11560 }
11561 } else {
11562 /* Generic interleaving. */
11563 drflac_uint64 i;
11564 for (i = 0; i < frameCountThisIteration; ++i) {
11565 unsigned int j;
11566 for (j = 0; j < channelCount; ++j) {
11567 drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
11568 pBufferOut[(i*channelCount)+j] = (float)(sampleS32 / 2147483648.0);
11569 }
11570 }
11571 }
11572
11573 framesRead += frameCountThisIteration;
11574 pBufferOut += frameCountThisIteration * channelCount;
11575 framesToRead -= frameCountThisIteration;
11576 pFlac->currentPCMFrame += frameCountThisIteration;
11577 pFlac->currentFLACFrame.pcmFramesRemaining -= (unsigned int)frameCountThisIteration;
11578 }
11579 }
11580
11581 return framesRead;
11582}
11583
11584
11585DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
11586{
11587 if (pFlac == NULL) {
11588 return DRFLAC_FALSE;
11589 }
11590
11591 /* Don't do anything if we're already on the seek point. */
11592 if (pFlac->currentPCMFrame == pcmFrameIndex) {
11593 return DRFLAC_TRUE;
11594 }
11595
11596 /*
11597 If we don't know where the first frame begins then we can't seek. This will happen when the STREAMINFO block was not present
11598 when the decoder was opened.
11599 */
11600 if (pFlac->firstFLACFramePosInBytes == 0) {
11601 return DRFLAC_FALSE;
11602 }
11603
11604 if (pcmFrameIndex == 0) {
11605 pFlac->currentPCMFrame = 0;
11606 return drflac__seek_to_first_frame(pFlac);
11607 } else {
11608 drflac_bool32 wasSuccessful = DRFLAC_FALSE;
11609 drflac_uint64 originalPCMFrame = pFlac->currentPCMFrame;
11610
11611 /* Clamp the sample to the end. */
11612 if (pcmFrameIndex > pFlac->totalPCMFrameCount) {
11613 pcmFrameIndex = pFlac->totalPCMFrameCount;
11614 }
11615
11616 /* If the target sample and the current sample are in the same frame we just move the position forward. */
11617 if (pcmFrameIndex > pFlac->currentPCMFrame) {
11618 /* Forward. */
11619 drflac_uint32 offset = (drflac_uint32)(pcmFrameIndex - pFlac->currentPCMFrame);
11620 if (pFlac->currentFLACFrame.pcmFramesRemaining > offset) {
11621 pFlac->currentFLACFrame.pcmFramesRemaining -= offset;
11622 pFlac->currentPCMFrame = pcmFrameIndex;
11623 return DRFLAC_TRUE;
11624 }
11625 } else {
11626 /* Backward. */
11627 drflac_uint32 offsetAbs = (drflac_uint32)(pFlac->currentPCMFrame - pcmFrameIndex);
11628 drflac_uint32 currentFLACFramePCMFrameCount = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
11629 drflac_uint32 currentFLACFramePCMFramesConsumed = currentFLACFramePCMFrameCount - pFlac->currentFLACFrame.pcmFramesRemaining;
11630 if (currentFLACFramePCMFramesConsumed > offsetAbs) {
11631 pFlac->currentFLACFrame.pcmFramesRemaining += offsetAbs;
11632 pFlac->currentPCMFrame = pcmFrameIndex;
11633 return DRFLAC_TRUE;
11634 }
11635 }
11636
11637 /*
11638 Different techniques depending on encapsulation. Using the native FLAC seektable with Ogg encapsulation is a bit awkward so
11639 we'll instead use Ogg's natural seeking facility.
11640 */
11641#ifndef DR_FLAC_NO_OGG
11642 if (pFlac->container == drflac_container_ogg)
11643 {
11644 wasSuccessful = drflac_ogg__seek_to_pcm_frame(pFlac, pcmFrameIndex);
11645 }
11646 else
11647#endif
11648 {
11649 /* First try seeking via the seek table. If this fails, fall back to a brute force seek which is much slower. */
11650 if (/*!wasSuccessful && */!pFlac->_noSeekTableSeek) {
11651 wasSuccessful = drflac__seek_to_pcm_frame__seek_table(pFlac, pcmFrameIndex);
11652 }
11653
11654#if !defined(DR_FLAC_NO_CRC)
11655 /* Fall back to binary search if seek table seeking fails. This requires the length of the stream to be known. */
11656 if (!wasSuccessful && !pFlac->_noBinarySearchSeek && pFlac->totalPCMFrameCount > 0) {
11657 wasSuccessful = drflac__seek_to_pcm_frame__binary_search(pFlac, pcmFrameIndex);
11658 }
11659#endif
11660
11661 /* Fall back to brute force if all else fails. */
11662 if (!wasSuccessful && !pFlac->_noBruteForceSeek) {
11663 wasSuccessful = drflac__seek_to_pcm_frame__brute_force(pFlac, pcmFrameIndex);
11664 }
11665 }
11666
11667 if (wasSuccessful) {
11668 pFlac->currentPCMFrame = pcmFrameIndex;
11669 } else {
11670 /* Seek failed. Try putting the decoder back to it's original state. */
11671 if (drflac_seek_to_pcm_frame(pFlac, originalPCMFrame) == DRFLAC_FALSE) {
11672 /* Failed to seek back to the original PCM frame. Fall back to 0. */
11673 drflac_seek_to_pcm_frame(pFlac, 0);
11674 }
11675 }
11676
11677 return wasSuccessful;
11678 }
11679}
11680
11681
11682
11683/* High Level APIs */
11684
11685/* SIZE_MAX */
11686#if defined(SIZE_MAX)
11687 #define DRFLAC_SIZE_MAX SIZE_MAX
11688#else
11689 #if defined(DRFLAC_64BIT)
11690 #define DRFLAC_SIZE_MAX ((drflac_uint64)0xFFFFFFFFFFFFFFFF)
11691 #else
11692 #define DRFLAC_SIZE_MAX 0xFFFFFFFF
11693 #endif
11694#endif
11695/* End SIZE_MAX */
11696
11697
11698/* Using a macro as the definition of the drflac__full_decode_and_close_*() API family. Sue me. */
11699#define DRFLAC_DEFINE_FULL_READ_AND_CLOSE(extension, type) \
11700static type* drflac__full_read_and_close_ ## extension (drflac* pFlac, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut)\
11701{ \
11702 type* pSampleData = NULL; \
11703 drflac_uint64 totalPCMFrameCount; \
11704 \
11705 DRFLAC_ASSERT(pFlac != NULL); \
11706 \
11707 totalPCMFrameCount = pFlac->totalPCMFrameCount; \
11708 \
11709 if (totalPCMFrameCount == 0) { \
11710 type buffer[4096]; \
11711 drflac_uint64 pcmFramesRead; \
11712 size_t sampleDataBufferSize = sizeof(buffer); \
11713 \
11714 pSampleData = (type*)drflac__malloc_from_callbacks(sampleDataBufferSize, &pFlac->allocationCallbacks); \
11715 if (pSampleData == NULL) { \
11716 goto on_error; \
11717 } \
11718 \
11719 while ((pcmFramesRead = (drflac_uint64)drflac_read_pcm_frames_##extension(pFlac, sizeof(buffer)/sizeof(buffer[0])/pFlac->channels, buffer)) > 0) { \
11720 if (((totalPCMFrameCount + pcmFramesRead) * pFlac->channels * sizeof(type)) > sampleDataBufferSize) { \
11721 type* pNewSampleData; \
11722 size_t newSampleDataBufferSize; \
11723 \
11724 newSampleDataBufferSize = sampleDataBufferSize * 2; \
11725 pNewSampleData = (type*)drflac__realloc_from_callbacks(pSampleData, newSampleDataBufferSize, sampleDataBufferSize, &pFlac->allocationCallbacks); \
11726 if (pNewSampleData == NULL) { \
11727 drflac__free_from_callbacks(pSampleData, &pFlac->allocationCallbacks); \
11728 goto on_error; \
11729 } \
11730 \
11731 sampleDataBufferSize = newSampleDataBufferSize; \
11732 pSampleData = pNewSampleData; \
11733 } \
11734 \
11735 DRFLAC_COPY_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), buffer, (size_t)(pcmFramesRead*pFlac->channels*sizeof(type))); \
11736 totalPCMFrameCount += pcmFramesRead; \
11737 } \
11738 \
11739 /* At this point everything should be decoded, but we just want to fill the unused part buffer with silence - need to \
11740 protect those ears from random noise! */ \
11741 DRFLAC_ZERO_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), (size_t)(sampleDataBufferSize - totalPCMFrameCount*pFlac->channels*sizeof(type))); \
11742 } else { \
11743 drflac_uint64 dataSize = totalPCMFrameCount*pFlac->channels*sizeof(type); \
11744 if (dataSize > (drflac_uint64)DRFLAC_SIZE_MAX) { \
11745 goto on_error; /* The decoded data is too big. */ \
11746 } \
11747 \
11748 pSampleData = (type*)drflac__malloc_from_callbacks((size_t)dataSize, &pFlac->allocationCallbacks); /* <-- Safe cast as per the check above. */ \
11749 if (pSampleData == NULL) { \
11750 goto on_error; \
11751 } \
11752 \
11753 totalPCMFrameCount = drflac_read_pcm_frames_##extension(pFlac, pFlac->totalPCMFrameCount, pSampleData); \
11754 } \
11755 \
11756 if (sampleRateOut) *sampleRateOut = pFlac->sampleRate; \
11757 if (channelsOut) *channelsOut = pFlac->channels; \
11758 if (totalPCMFrameCountOut) *totalPCMFrameCountOut = totalPCMFrameCount; \
11759 \
11760 drflac_close(pFlac); \
11761 return pSampleData; \
11762 \
11763on_error: \
11764 drflac_close(pFlac); \
11765 return NULL; \
11766}
11767
11768DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s32, drflac_int32)
11769DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s16, drflac_int16)
11770DRFLAC_DEFINE_FULL_READ_AND_CLOSE(f32, float)
11771
11772DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11773{
11774 drflac* pFlac;
11775
11776 if (channelsOut) {
11777 *channelsOut = 0;
11778 }
11779 if (sampleRateOut) {
11780 *sampleRateOut = 0;
11781 }
11782 if (totalPCMFrameCountOut) {
11783 *totalPCMFrameCountOut = 0;
11784 }
11785
11786 pFlac = drflac_open(onRead, onSeek, onTell, pUserData, pAllocationCallbacks);
11787 if (pFlac == NULL) {
11788 return NULL;
11789 }
11790
11791 return drflac__full_read_and_close_s32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11792}
11793
11794DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11795{
11796 drflac* pFlac;
11797
11798 if (channelsOut) {
11799 *channelsOut = 0;
11800 }
11801 if (sampleRateOut) {
11802 *sampleRateOut = 0;
11803 }
11804 if (totalPCMFrameCountOut) {
11805 *totalPCMFrameCountOut = 0;
11806 }
11807
11808 pFlac = drflac_open(onRead, onSeek, onTell, pUserData, pAllocationCallbacks);
11809 if (pFlac == NULL) {
11810 return NULL;
11811 }
11812
11813 return drflac__full_read_and_close_s16(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11814}
11815
11816DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_tell_proc onTell, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11817{
11818 drflac* pFlac;
11819
11820 if (channelsOut) {
11821 *channelsOut = 0;
11822 }
11823 if (sampleRateOut) {
11824 *sampleRateOut = 0;
11825 }
11826 if (totalPCMFrameCountOut) {
11827 *totalPCMFrameCountOut = 0;
11828 }
11829
11830 pFlac = drflac_open(onRead, onSeek, onTell, pUserData, pAllocationCallbacks);
11831 if (pFlac == NULL) {
11832 return NULL;
11833 }
11834
11835 return drflac__full_read_and_close_f32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11836}
11837
11838#ifndef DR_FLAC_NO_STDIO
11839DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11840{
11841 drflac* pFlac;
11842
11843 if (sampleRate) {
11844 *sampleRate = 0;
11845 }
11846 if (channels) {
11847 *channels = 0;
11848 }
11849 if (totalPCMFrameCount) {
11850 *totalPCMFrameCount = 0;
11851 }
11852
11853 pFlac = drflac_open_file(filename, pAllocationCallbacks);
11854 if (pFlac == NULL) {
11855 return NULL;
11856 }
11857
11858 return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
11859}
11860
11861DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11862{
11863 drflac* pFlac;
11864
11865 if (sampleRate) {
11866 *sampleRate = 0;
11867 }
11868 if (channels) {
11869 *channels = 0;
11870 }
11871 if (totalPCMFrameCount) {
11872 *totalPCMFrameCount = 0;
11873 }
11874
11875 pFlac = drflac_open_file(filename, pAllocationCallbacks);
11876 if (pFlac == NULL) {
11877 return NULL;
11878 }
11879
11880 return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
11881}
11882
11883DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11884{
11885 drflac* pFlac;
11886
11887 if (sampleRate) {
11888 *sampleRate = 0;
11889 }
11890 if (channels) {
11891 *channels = 0;
11892 }
11893 if (totalPCMFrameCount) {
11894 *totalPCMFrameCount = 0;
11895 }
11896
11897 pFlac = drflac_open_file(filename, pAllocationCallbacks);
11898 if (pFlac == NULL) {
11899 return NULL;
11900 }
11901
11902 return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
11903}
11904#endif
11905
11906DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11907{
11908 drflac* pFlac;
11909
11910 if (sampleRate) {
11911 *sampleRate = 0;
11912 }
11913 if (channels) {
11914 *channels = 0;
11915 }
11916 if (totalPCMFrameCount) {
11917 *totalPCMFrameCount = 0;
11918 }
11919
11920 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11921 if (pFlac == NULL) {
11922 return NULL;
11923 }
11924
11925 return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
11926}
11927
11928DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11929{
11930 drflac* pFlac;
11931
11932 if (sampleRate) {
11933 *sampleRate = 0;
11934 }
11935 if (channels) {
11936 *channels = 0;
11937 }
11938 if (totalPCMFrameCount) {
11939 *totalPCMFrameCount = 0;
11940 }
11941
11942 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11943 if (pFlac == NULL) {
11944 return NULL;
11945 }
11946
11947 return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
11948}
11949
11950DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11951{
11952 drflac* pFlac;
11953
11954 if (sampleRate) {
11955 *sampleRate = 0;
11956 }
11957 if (channels) {
11958 *channels = 0;
11959 }
11960 if (totalPCMFrameCount) {
11961 *totalPCMFrameCount = 0;
11962 }
11963
11964 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11965 if (pFlac == NULL) {
11966 return NULL;
11967 }
11968
11969 return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
11970}
11971
11972
11973DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
11974{
11975 if (pAllocationCallbacks != NULL) {
11976 drflac__free_from_callbacks(p, pAllocationCallbacks);
11977 } else {
11978 drflac__free_default(p, NULL);
11979 }
11980}
11981
11982
11983
11984
11985DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments)
11986{
11987 if (pIter == NULL) {
11988 return;
11989 }
11990
11991 pIter->countRemaining = commentCount;
11992 pIter->pRunningData = (const char*)pComments;
11993}
11994
11995DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut)
11996{
11997 drflac_int32 length;
11998 const char* pComment;
11999
12000 /* Safety. */
12001 if (pCommentLengthOut) {
12002 *pCommentLengthOut = 0;
12003 }
12004
12005 if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
12006 return NULL;
12007 }
12008
12009 length = drflac__le2host_32_ptr_unaligned(pIter->pRunningData);
12010 pIter->pRunningData += 4;
12011
12012 pComment = pIter->pRunningData;
12013 pIter->pRunningData += length;
12014 pIter->countRemaining -= 1;
12015
12016 if (pCommentLengthOut) {
12017 *pCommentLengthOut = length;
12018 }
12019
12020 return pComment;
12021}
12022
12023
12024
12025
12026DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData)
12027{
12028 if (pIter == NULL) {
12029 return;
12030 }
12031
12032 pIter->countRemaining = trackCount;
12033 pIter->pRunningData = (const char*)pTrackData;
12034}
12035
12036DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack)
12037{
12038 drflac_cuesheet_track cuesheetTrack;
12039 const char* pRunningData;
12040 drflac_uint64 offsetHi;
12041 drflac_uint64 offsetLo;
12042
12043 if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
12044 return DRFLAC_FALSE;
12045 }
12046
12047 pRunningData = pIter->pRunningData;
12048
12049 offsetHi = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
12050 offsetLo = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
12051 cuesheetTrack.offset = offsetLo | (offsetHi << 32);
12052 cuesheetTrack.trackNumber = pRunningData[0]; pRunningData += 1;
12053 DRFLAC_COPY_MEMORY(cuesheetTrack.ISRC, pRunningData, sizeof(cuesheetTrack.ISRC)); pRunningData += 12;
12054 cuesheetTrack.isAudio = (pRunningData[0] & 0x80) != 0;
12055 cuesheetTrack.preEmphasis = (pRunningData[0] & 0x40) != 0; pRunningData += 14;
12056 cuesheetTrack.indexCount = pRunningData[0]; pRunningData += 1;
12057 cuesheetTrack.pIndexPoints = (const drflac_cuesheet_track_index*)pRunningData; pRunningData += cuesheetTrack.indexCount * sizeof(drflac_cuesheet_track_index);
12058
12059 pIter->pRunningData = pRunningData;
12060 pIter->countRemaining -= 1;
12061
12062 if (pCuesheetTrack) {
12063 *pCuesheetTrack = cuesheetTrack;
12064 }
12065
12066 return DRFLAC_TRUE;
12067}
12068
12069#if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
12070 #pragma GCC diagnostic pop
12071#endif
12072#endif /* dr_flac_c */
12073#endif /* DR_FLAC_IMPLEMENTATION */
12074
12075
12076/*
12077REVISION HISTORY
12078================
12079v0.13.0 - TBD
12080 - API CHANGE: Seek origin enums have been renamed to match the naming convention used by other dr_libs libraries:
12081 - drflac_seek_origin_start -> DRFLAC_SEEK_SET
12082 - drflac_seek_origin_current -> DRFLAC_SEEK_CUR
12083 - DRFLAC_SEEK_END (new)
12084 - API CHANGE: A new seek origin has been added to allow seeking from the end of the file. If you implement your own `onSeek` callback, you should now detect and handle `DRFLAC_SEEK_END`. If seeking to the end is not supported, return `DRFLAC_FALSE`. If you only use `*_open_file()` or `*_open_memory()`, you need not change anything.
12085 - API CHANGE: An `onTell` callback has been added to the following functions:
12086 - drflac_open()
12087 - drflac_open_relaxed()
12088 - drflac_open_with_metadata()
12089 - drflac_open_with_metadata_relaxed()
12090 - drflac_open_and_read_pcm_frames_s32()
12091 - drflac_open_and_read_pcm_frames_s16()
12092 - drflac_open_and_read_pcm_frames_f32()
12093 - Fix compilation for AIX OS.
12094
12095v0.12.43 - 2024-12-17
12096 - Fix a possible buffer overflow during decoding.
12097 - Improve detection of ARM64EC
12098
12099v0.12.42 - 2023-11-02
12100 - Fix build for ARMv6-M.
12101 - Fix a compilation warning with GCC.
12102
12103v0.12.41 - 2023-06-17
12104 - Fix an incorrect date in revision history. No functional change.
12105
12106v0.12.40 - 2023-05-22
12107 - Minor code restructure. No functional change.
12108
12109v0.12.39 - 2022-09-17
12110 - Fix compilation with DJGPP.
12111 - Fix compilation error with Visual Studio 2019 and the ARM build.
12112 - Fix an error with SSE 4.1 detection.
12113 - Add support for disabling wchar_t with DR_WAV_NO_WCHAR.
12114 - Improve compatibility with compilers which lack support for explicit struct packing.
12115 - Improve compatibility with low-end and embedded hardware by reducing the amount of stack
12116 allocation when loading an Ogg encapsulated file.
12117
12118v0.12.38 - 2022-04-10
12119 - Fix compilation error on older versions of GCC.
12120
12121v0.12.37 - 2022-02-12
12122 - Improve ARM detection.
12123
12124v0.12.36 - 2022-02-07
12125 - Fix a compilation error with the ARM build.
12126
12127v0.12.35 - 2022-02-06
12128 - Fix a bug due to underestimating the amount of precision required for the prediction stage.
12129 - Fix some bugs found from fuzz testing.
12130
12131v0.12.34 - 2022-01-07
12132 - Fix some misalignment bugs when reading metadata.
12133
12134v0.12.33 - 2021-12-22
12135 - Fix a bug with seeking when the seek table does not start at PCM frame 0.
12136
12137v0.12.32 - 2021-12-11
12138 - Fix a warning with Clang.
12139
12140v0.12.31 - 2021-08-16
12141 - Silence some warnings.
12142
12143v0.12.30 - 2021-07-31
12144 - Fix platform detection for ARM64.
12145
12146v0.12.29 - 2021-04-02
12147 - Fix a bug where the running PCM frame index is set to an invalid value when over-seeking.
12148 - Fix a decoding error due to an incorrect validation check.
12149
12150v0.12.28 - 2021-02-21
12151 - Fix a warning due to referencing _MSC_VER when it is undefined.
12152
12153v0.12.27 - 2021-01-31
12154 - Fix a static analysis warning.
12155
12156v0.12.26 - 2021-01-17
12157 - Fix a compilation warning due to _BSD_SOURCE being deprecated.
12158
12159v0.12.25 - 2020-12-26
12160 - Update documentation.
12161
12162v0.12.24 - 2020-11-29
12163 - Fix ARM64/NEON detection when compiling with MSVC.
12164
12165v0.12.23 - 2020-11-21
12166 - Fix compilation with OpenWatcom.
12167
12168v0.12.22 - 2020-11-01
12169 - Fix an error with the previous release.
12170
12171v0.12.21 - 2020-11-01
12172 - Fix a possible deadlock when seeking.
12173 - Improve compiler support for older versions of GCC.
12174
12175v0.12.20 - 2020-09-08
12176 - Fix a compilation error on older compilers.
12177
12178v0.12.19 - 2020-08-30
12179 - Fix a bug due to an undefined 32-bit shift.
12180
12181v0.12.18 - 2020-08-14
12182 - Fix a crash when compiling with clang-cl.
12183
12184v0.12.17 - 2020-08-02
12185 - Simplify sized types.
12186
12187v0.12.16 - 2020-07-25
12188 - Fix a compilation warning.
12189
12190v0.12.15 - 2020-07-06
12191 - Check for negative LPC shifts and return an error.
12192
12193v0.12.14 - 2020-06-23
12194 - Add include guard for the implementation section.
12195
12196v0.12.13 - 2020-05-16
12197 - Add compile-time and run-time version querying.
12198 - DRFLAC_VERSION_MINOR
12199 - DRFLAC_VERSION_MAJOR
12200 - DRFLAC_VERSION_REVISION
12201 - DRFLAC_VERSION_STRING
12202 - drflac_version()
12203 - drflac_version_string()
12204
12205v0.12.12 - 2020-04-30
12206 - Fix compilation errors with VC6.
12207
12208v0.12.11 - 2020-04-19
12209 - Fix some pedantic warnings.
12210 - Fix some undefined behaviour warnings.
12211
12212v0.12.10 - 2020-04-10
12213 - Fix some bugs when trying to seek with an invalid seek table.
12214
12215v0.12.9 - 2020-04-05
12216 - Fix warnings.
12217
12218v0.12.8 - 2020-04-04
12219 - Add drflac_open_file_w() and drflac_open_file_with_metadata_w().
12220 - Fix some static analysis warnings.
12221 - Minor documentation updates.
12222
12223v0.12.7 - 2020-03-14
12224 - Fix compilation errors with VC6.
12225
12226v0.12.6 - 2020-03-07
12227 - Fix compilation error with Visual Studio .NET 2003.
12228
12229v0.12.5 - 2020-01-30
12230 - Silence some static analysis warnings.
12231
12232v0.12.4 - 2020-01-29
12233 - Silence some static analysis warnings.
12234
12235v0.12.3 - 2019-12-02
12236 - Fix some warnings when compiling with GCC and the -Og flag.
12237 - Fix a crash in out-of-memory situations.
12238 - Fix potential integer overflow bug.
12239 - Fix some static analysis warnings.
12240 - Fix a possible crash when using custom memory allocators without a custom realloc() implementation.
12241 - Fix a bug with binary search seeking where the bits per sample is not a multiple of 8.
12242
12243v0.12.2 - 2019-10-07
12244 - Internal code clean up.
12245
12246v0.12.1 - 2019-09-29
12247 - Fix some Clang Static Analyzer warnings.
12248 - Fix an unused variable warning.
12249
12250v0.12.0 - 2019-09-23
12251 - API CHANGE: Add support for user defined memory allocation routines. This system allows the program to specify their own memory allocation
12252 routines with a user data pointer for client-specific contextual data. This adds an extra parameter to the end of the following APIs:
12253 - drflac_open()
12254 - drflac_open_relaxed()
12255 - drflac_open_with_metadata()
12256 - drflac_open_with_metadata_relaxed()
12257 - drflac_open_file()
12258 - drflac_open_file_with_metadata()
12259 - drflac_open_memory()
12260 - drflac_open_memory_with_metadata()
12261 - drflac_open_and_read_pcm_frames_s32()
12262 - drflac_open_and_read_pcm_frames_s16()
12263 - drflac_open_and_read_pcm_frames_f32()
12264 - drflac_open_file_and_read_pcm_frames_s32()
12265 - drflac_open_file_and_read_pcm_frames_s16()
12266 - drflac_open_file_and_read_pcm_frames_f32()
12267 - drflac_open_memory_and_read_pcm_frames_s32()
12268 - drflac_open_memory_and_read_pcm_frames_s16()
12269 - drflac_open_memory_and_read_pcm_frames_f32()
12270 Set this extra parameter to NULL to use defaults which is the same as the previous behaviour. Setting this NULL will use
12271 DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
12272 - Remove deprecated APIs:
12273 - drflac_read_s32()
12274 - drflac_read_s16()
12275 - drflac_read_f32()
12276 - drflac_seek_to_sample()
12277 - drflac_open_and_decode_s32()
12278 - drflac_open_and_decode_s16()
12279 - drflac_open_and_decode_f32()
12280 - drflac_open_and_decode_file_s32()
12281 - drflac_open_and_decode_file_s16()
12282 - drflac_open_and_decode_file_f32()
12283 - drflac_open_and_decode_memory_s32()
12284 - drflac_open_and_decode_memory_s16()
12285 - drflac_open_and_decode_memory_f32()
12286 - Remove drflac.totalSampleCount which is now replaced with drflac.totalPCMFrameCount. You can emulate drflac.totalSampleCount
12287 by doing pFlac->totalPCMFrameCount*pFlac->channels.
12288 - Rename drflac.currentFrame to drflac.currentFLACFrame to remove ambiguity with PCM frames.
12289 - Fix errors when seeking to the end of a stream.
12290 - Optimizations to seeking.
12291 - SSE improvements and optimizations.
12292 - ARM NEON optimizations.
12293 - Optimizations to drflac_read_pcm_frames_s16().
12294 - Optimizations to drflac_read_pcm_frames_s32().
12295
12296v0.11.10 - 2019-06-26
12297 - Fix a compiler error.
12298
12299v0.11.9 - 2019-06-16
12300 - Silence some ThreadSanitizer warnings.
12301
12302v0.11.8 - 2019-05-21
12303 - Fix warnings.
12304
12305v0.11.7 - 2019-05-06
12306 - C89 fixes.
12307
12308v0.11.6 - 2019-05-05
12309 - Add support for C89.
12310 - Fix a compiler warning when CRC is disabled.
12311 - Change license to choice of public domain or MIT-0.
12312
12313v0.11.5 - 2019-04-19
12314 - Fix a compiler error with GCC.
12315
12316v0.11.4 - 2019-04-17
12317 - Fix some warnings with GCC when compiling with -std=c99.
12318
12319v0.11.3 - 2019-04-07
12320 - Silence warnings with GCC.
12321
12322v0.11.2 - 2019-03-10
12323 - Fix a warning.
12324
12325v0.11.1 - 2019-02-17
12326 - Fix a potential bug with seeking.
12327
12328v0.11.0 - 2018-12-16
12329 - API CHANGE: Deprecated drflac_read_s32(), drflac_read_s16() and drflac_read_f32() and replaced them with
12330 drflac_read_pcm_frames_s32(), drflac_read_pcm_frames_s16() and drflac_read_pcm_frames_f32(). The new APIs take
12331 and return PCM frame counts instead of sample counts. To upgrade you will need to change the input count by
12332 dividing it by the channel count, and then do the same with the return value.
12333 - API_CHANGE: Deprecated drflac_seek_to_sample() and replaced with drflac_seek_to_pcm_frame(). Same rules as
12334 the changes to drflac_read_*() apply.
12335 - API CHANGE: Deprecated drflac_open_and_decode_*() and replaced with drflac_open_*_and_read_*(). Same rules as
12336 the changes to drflac_read_*() apply.
12337 - Optimizations.
12338
12339v0.10.0 - 2018-09-11
12340 - Remove the DR_FLAC_NO_WIN32_IO option and the Win32 file IO functionality. If you need to use Win32 file IO you
12341 need to do it yourself via the callback API.
12342 - Fix the clang build.
12343 - Fix undefined behavior.
12344 - Fix errors with CUESHEET metdata blocks.
12345 - Add an API for iterating over each cuesheet track in the CUESHEET metadata block. This works the same way as the
12346 Vorbis comment API.
12347 - Other miscellaneous bug fixes, mostly relating to invalid FLAC streams.
12348 - Minor optimizations.
12349
12350v0.9.11 - 2018-08-29
12351 - Fix a bug with sample reconstruction.
12352
12353v0.9.10 - 2018-08-07
12354 - Improve 64-bit detection.
12355
12356v0.9.9 - 2018-08-05
12357 - Fix C++ build on older versions of GCC.
12358
12359v0.9.8 - 2018-07-24
12360 - Fix compilation errors.
12361
12362v0.9.7 - 2018-07-05
12363 - Fix a warning.
12364
12365v0.9.6 - 2018-06-29
12366 - Fix some typos.
12367
12368v0.9.5 - 2018-06-23
12369 - Fix some warnings.
12370
12371v0.9.4 - 2018-06-14
12372 - Optimizations to seeking.
12373 - Clean up.
12374
12375v0.9.3 - 2018-05-22
12376 - Bug fix.
12377
12378v0.9.2 - 2018-05-12
12379 - Fix a compilation error due to a missing break statement.
12380
12381v0.9.1 - 2018-04-29
12382 - Fix compilation error with Clang.
12383
12384v0.9 - 2018-04-24
12385 - Fix Clang build.
12386 - Start using major.minor.revision versioning.
12387
12388v0.8g - 2018-04-19
12389 - Fix build on non-x86/x64 architectures.
12390
12391v0.8f - 2018-02-02
12392 - Stop pretending to support changing rate/channels mid stream.
12393
12394v0.8e - 2018-02-01
12395 - Fix a crash when the block size of a frame is larger than the maximum block size defined by the FLAC stream.
12396 - Fix a crash the the Rice partition order is invalid.
12397
12398v0.8d - 2017-09-22
12399 - Add support for decoding streams with ID3 tags. ID3 tags are just skipped.
12400
12401v0.8c - 2017-09-07
12402 - Fix warning on non-x86/x64 architectures.
12403
12404v0.8b - 2017-08-19
12405 - Fix build on non-x86/x64 architectures.
12406
12407v0.8a - 2017-08-13
12408 - A small optimization for the Clang build.
12409
12410v0.8 - 2017-08-12
12411 - API CHANGE: Rename dr_* types to drflac_*.
12412 - Optimizations. This brings dr_flac back to about the same class of efficiency as the reference implementation.
12413 - Add support for custom implementations of malloc(), realloc(), etc.
12414 - Add CRC checking to Ogg encapsulated streams.
12415 - Fix VC++ 6 build. This is only for the C++ compiler. The C compiler is not currently supported.
12416 - Bug fixes.
12417
12418v0.7 - 2017-07-23
12419 - Add support for opening a stream without a header block. To do this, use drflac_open_relaxed() / drflac_open_with_metadata_relaxed().
12420
12421v0.6 - 2017-07-22
12422 - Add support for recovering from invalid frames. With this change, dr_flac will simply skip over invalid frames as if they
12423 never existed. Frames are checked against their sync code, the CRC-8 of the frame header and the CRC-16 of the whole frame.
12424
12425v0.5 - 2017-07-16
12426 - Fix typos.
12427 - Change drflac_bool* types to unsigned.
12428 - Add CRC checking. This makes dr_flac slower, but can be disabled with #define DR_FLAC_NO_CRC.
12429
12430v0.4f - 2017-03-10
12431 - Fix a couple of bugs with the bitstreaming code.
12432
12433v0.4e - 2017-02-17
12434 - Fix some warnings.
12435
12436v0.4d - 2016-12-26
12437 - Add support for 32-bit floating-point PCM decoding.
12438 - Use drflac_int* and drflac_uint* sized types to improve compiler support.
12439 - Minor improvements to documentation.
12440
12441v0.4c - 2016-12-26
12442 - Add support for signed 16-bit integer PCM decoding.
12443
12444v0.4b - 2016-10-23
12445 - A minor change to drflac_bool8 and drflac_bool32 types.
12446
12447v0.4a - 2016-10-11
12448 - Rename drBool32 to drflac_bool32 for styling consistency.
12449
12450v0.4 - 2016-09-29
12451 - API/ABI CHANGE: Use fixed size 32-bit booleans instead of the built-in bool type.
12452 - API CHANGE: Rename drflac_open_and_decode*() to drflac_open_and_decode*_s32().
12453 - API CHANGE: Swap the order of "channels" and "sampleRate" parameters in drflac_open_and_decode*(). Rationale for this is to
12454 keep it consistent with drflac_audio.
12455
12456v0.3f - 2016-09-21
12457 - Fix a warning with GCC.
12458
12459v0.3e - 2016-09-18
12460 - Fixed a bug where GCC 4.3+ was not getting properly identified.
12461 - Fixed a few typos.
12462 - Changed date formats to ISO 8601 (YYYY-MM-DD).
12463
12464v0.3d - 2016-06-11
12465 - Minor clean up.
12466
12467v0.3c - 2016-05-28
12468 - Fixed compilation error.
12469
12470v0.3b - 2016-05-16
12471 - Fixed Linux/GCC build.
12472 - Updated documentation.
12473
12474v0.3a - 2016-05-15
12475 - Minor fixes to documentation.
12476
12477v0.3 - 2016-05-11
12478 - Optimizations. Now at about parity with the reference implementation on 32-bit builds.
12479 - Lots of clean up.
12480
12481v0.2b - 2016-05-10
12482 - Bug fixes.
12483
12484v0.2a - 2016-05-10
12485 - Made drflac_open_and_decode() more robust.
12486 - Removed an unused debugging variable
12487
12488v0.2 - 2016-05-09
12489 - Added support for Ogg encapsulation.
12490 - API CHANGE. Have the onSeek callback take a third argument which specifies whether or not the seek
12491 should be relative to the start or the current position. Also changes the seeking rules such that
12492 seeking offsets will never be negative.
12493 - Have drflac_open_and_decode() fail gracefully if the stream has an unknown total sample count.
12494
12495v0.1b - 2016-05-07
12496 - Properly close the file handle in drflac_open_file() and family when the decoder fails to initialize.
12497 - Removed a stale comment.
12498
12499v0.1a - 2016-05-05
12500 - Minor formatting changes.
12501 - Fixed a warning on the GCC build.
12502
12503v0.1 - 2016-05-03
12504 - Initial versioned release.
12505*/
12506
12507/*
12508This software is available as a choice of the following licenses. Choose
12509whichever you prefer.
12510
12511===============================================================================
12512ALTERNATIVE 1 - Public Domain (www.unlicense.org)
12513===============================================================================
12514This is free and unencumbered software released into the public domain.
12515
12516Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
12517software, either in source code form or as a compiled binary, for any purpose,
12518commercial or non-commercial, and by any means.
12519
12520In jurisdictions that recognize copyright laws, the author or authors of this
12521software dedicate any and all copyright interest in the software to the public
12522domain. We make this dedication for the benefit of the public at large and to
12523the detriment of our heirs and successors. We intend this dedication to be an
12524overt act of relinquishment in perpetuity of all present and future rights to
12525this software under copyright law.
12526
12527THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12528IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
12529FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
12530AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
12531ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
12532WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
12533
12534For more information, please refer to <http://unlicense.org/>
12535
12536===============================================================================
12537ALTERNATIVE 2 - MIT No Attribution
12538===============================================================================
12539Copyright 2023 David Reid
12540
12541Permission is hereby granted, free of charge, to any person obtaining a copy of
12542this software and associated documentation files (the "Software"), to deal in
12543the Software without restriction, including without limitation the rights to
12544use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
12545of the Software, and to permit persons to whom the Software is furnished to do
12546so.
12547
12548THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12549IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
12550FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
12551AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
12552LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
12553OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
12554SOFTWARE.
12555*/
12556
Copyright 2026  E766CB298A6D1E64 | Git-Thing heavily inspired by cgit