Module core::arch::x86_64 1.27.0[−][src]
Platform-specific intrinsics for the x86_64 platform.
See the module documentation for more details.
Structs
| CpuidResult |
[ x86-64 ] Result of the |
| __m128 |
[ x86-64 ] 128-bit wide set of four |
| __m256 |
[ x86-64 ] 256-bit wide set of eight |
| __m128d |
[ x86-64 ] 128-bit wide set of two |
| __m128i |
[ x86-64 ] 128-bit wide integer vector type, x86-specific |
| __m256d |
[ x86-64 ] 256-bit wide set of four |
| __m256i |
[ x86-64 ] 256-bit wide integer vector type, x86-specific |
| __m64 |
[ Experimental ] [x86-64 ] 64-bit wide integer vector type, x86-specific |
Constants
| _CMP_EQ_OQ |
[ x86-64 ] Equal (ordered, non-signaling) |
| _CMP_EQ_OS |
[ x86-64 ] Equal (ordered, signaling) |
| _CMP_EQ_UQ |
[ x86-64 ] Equal (unordered, non-signaling) |
| _CMP_EQ_US |
[ x86-64 ] Equal (unordered, signaling) |
| _CMP_FALSE_OQ |
[ x86-64 ] False (ordered, non-signaling) |
| _CMP_FALSE_OS |
[ x86-64 ] False (ordered, signaling) |
| _CMP_GE_OQ |
[ x86-64 ] Greater-than-or-equal (ordered, non-signaling) |
| _CMP_GE_OS |
[ x86-64 ] Greater-than-or-equal (ordered, signaling) |
| _CMP_GT_OQ |
[ x86-64 ] Greater-than (ordered, non-signaling) |
| _CMP_GT_OS |
[ x86-64 ] Greater-than (ordered, signaling) |
| _CMP_LE_OQ |
[ x86-64 ] Less-than-or-equal (ordered, non-signaling) |
| _CMP_LE_OS |
[ x86-64 ] Less-than-or-equal (ordered, signaling) |
| _CMP_LT_OQ |
[ x86-64 ] Less-than (ordered, non-signaling) |
| _CMP_LT_OS |
[ x86-64 ] Less-than (ordered, signaling) |
| _CMP_NEQ_OQ |
[ x86-64 ] Not-equal (ordered, non-signaling) |
| _CMP_NEQ_OS |
[ x86-64 ] Not-equal (ordered, signaling) |
| _CMP_NEQ_UQ |
[ x86-64 ] Not-equal (unordered, non-signaling) |
| _CMP_NEQ_US |
[ x86-64 ] Not-equal (unordered, signaling) |
| _CMP_NGE_UQ |
[ x86-64 ] Not-greater-than-or-equal (unordered, non-signaling) |
| _CMP_NGE_US |
[ x86-64 ] Not-greater-than-or-equal (unordered, signaling) |
| _CMP_NGT_UQ |
[ x86-64 ] Not-greater-than (unordered, non-signaling) |
| _CMP_NGT_US |
[ x86-64 ] Not-greater-than (unordered, signaling) |
| _CMP_NLE_UQ |
[ x86-64 ] Not-less-than-or-equal (unordered, non-signaling) |
| _CMP_NLE_US |
[ x86-64 ] Not-less-than-or-equal (unordered, signaling) |
| _CMP_NLT_UQ |
[ x86-64 ] Not-less-than (unordered, non-signaling) |
| _CMP_NLT_US |
[ x86-64 ] Not-less-than (unordered, signaling) |
| _CMP_ORD_Q |
[ x86-64 ] Ordered (non-signaling) |
| _CMP_ORD_S |
[ x86-64 ] Ordered (signaling) |
| _CMP_TRUE_UQ |
[ x86-64 ] True (unordered, non-signaling) |
| _CMP_TRUE_US |
[ x86-64 ] True (unordered, signaling) |
| _CMP_UNORD_Q |
[ x86-64 ] Unordered (non-signaling) |
| _CMP_UNORD_S |
[ x86-64 ] Unordered (signaling) |
| _MM_EXCEPT_DENORM |
[ x86-64 ] See |
| _MM_EXCEPT_DIV_ZERO |
[ x86-64 ] See |
| _MM_EXCEPT_INEXACT |
[ x86-64 ] See |
| _MM_EXCEPT_INVALID |
[ x86-64 ] See |
| _MM_EXCEPT_MASK |
[ x86-64 ]
|
| _MM_EXCEPT_OVERFLOW |
[ x86-64 ] See |
| _MM_EXCEPT_UNDERFLOW |
[ x86-64 ] See |
| _MM_FLUSH_ZERO_MASK |
[ x86-64 ]
|
| _MM_FLUSH_ZERO_OFF |
[ x86-64 ] See |
| _MM_FLUSH_ZERO_ON |
[ x86-64 ] See |
| _MM_FROUND_CEIL |
[ x86-64 ] round up and do not suppress exceptions |
| _MM_FROUND_CUR_DIRECTION |
[ x86-64 ] use MXCSR.RC; see |
| _MM_FROUND_FLOOR |
[ x86-64 ] round down and do not suppress exceptions |
| _MM_FROUND_NEARBYINT |
[ x86-64 ] use MXCSR.RC and suppress exceptions; see |
| _MM_FROUND_NINT |
[ x86-64 ] round to nearest and do not suppress exceptions |
| _MM_FROUND_NO_EXC |
[ x86-64 ] suppress exceptions |
| _MM_FROUND_RAISE_EXC |
[ x86-64 ] do not suppress exceptions |
| _MM_FROUND_RINT |
[ x86-64 ] use MXCSR.RC and do not suppress exceptions; see
|
| _MM_FROUND_TO_NEAREST_INT |
[ x86-64 ] round to nearest |
| _MM_FROUND_TO_NEG_INF |
[ x86-64 ] round down |
| _MM_FROUND_TO_POS_INF |
[ x86-64 ] round up |
| _MM_FROUND_TO_ZERO |
[ x86-64 ] truncate |
| _MM_FROUND_TRUNC |
[ x86-64 ] truncate and do not suppress exceptions |
| _MM_HINT_NTA |
[ x86-64 ] See |
| _MM_HINT_T0 |
[ x86-64 ] See |
| _MM_HINT_T1 |
[ x86-64 ] See |
| _MM_HINT_T2 |
[ x86-64 ] See |
| _MM_MASK_DENORM |
[ x86-64 ] See |
| _MM_MASK_DIV_ZERO |
[ x86-64 ] See |
| _MM_MASK_INEXACT |
[ x86-64 ] See |
| _MM_MASK_INVALID |
[ x86-64 ] See |
| _MM_MASK_MASK |
[ x86-64 ]
|
| _MM_MASK_OVERFLOW |
[ x86-64 ] See |
| _MM_MASK_UNDERFLOW |
[ x86-64 ] See |
| _MM_ROUND_DOWN |
[ x86-64 ] See |
| _MM_ROUND_MASK |
[ x86-64 ]
|
| _MM_ROUND_NEAREST |
[ x86-64 ] See |
| _MM_ROUND_TOWARD_ZERO |
[ x86-64 ] See |
| _MM_ROUND_UP |
[ x86-64 ] See |
| _SIDD_BIT_MASK |
[ x86-64 ] Mask only: return the bit mask |
| _SIDD_CMP_EQUAL_ANY |
[ x86-64 ] For each character in |
| _SIDD_CMP_EQUAL_EACH |
[ x86-64 ] The strings defined by |
| _SIDD_CMP_EQUAL_ORDERED |
[ x86-64 ] Search for the defined substring in the target |
| _SIDD_CMP_RANGES |
[ x86-64 ] For each character in |
| _SIDD_LEAST_SIGNIFICANT |
[ x86-64 ] Index only: return the least significant bit (Default) |
| _SIDD_MASKED_NEGATIVE_POLARITY |
[ x86-64 ] Negate results only before the end of the string |
| _SIDD_MASKED_POSITIVE_POLARITY |
[ x86-64 ] Do not negate results before the end of the string |
| _SIDD_MOST_SIGNIFICANT |
[ x86-64 ] Index only: return the most significant bit |
| _SIDD_NEGATIVE_POLARITY |
[ x86-64 ] Negate results |
| _SIDD_POSITIVE_POLARITY |
[ x86-64 ] Do not negate results (Default) |
| _SIDD_SBYTE_OPS |
[ x86-64 ] String contains signed 8-bit characters |
| _SIDD_SWORD_OPS |
[ x86-64 ] String contains unsigned 16-bit characters |
| _SIDD_UBYTE_OPS |
[ x86-64 ] String contains unsigned 8-bit characters (Default) |
| _SIDD_UNIT_MASK |
[ x86-64 ] Mask only: return the byte mask |
| _SIDD_UWORD_OPS |
[ x86-64 ] String contains unsigned 16-bit characters |
| _XCR_XFEATURE_ENABLED_MASK |
[ x86-64 ]
|
Functions
| _MM_GET_EXCEPTION_MASK⚠ |
[ x86-64 and ] sseSee |
| _MM_GET_EXCEPTION_STATE⚠ |
[ x86-64 and ] sseSee |
| _MM_GET_FLUSH_ZERO_MODE⚠ |
[ x86-64 and ] sseSee |
| _MM_GET_ROUNDING_MODE⚠ |
[ x86-64 and ] sseSee |
| _MM_SET_EXCEPTION_MASK⚠ |
[ x86-64 and ] sseSee |
| _MM_SET_EXCEPTION_STATE⚠ |
[ x86-64 and ] sseSee |
| _MM_SET_FLUSH_ZERO_MODE⚠ |
[ x86-64 and ] sseSee |
| _MM_SET_ROUNDING_MODE⚠ |
[ x86-64 and ] sseSee |
| _MM_TRANSPOSE4_PS⚠ |
[ x86-64 and ] sseTranspose the 4x4 matrix formed by 4 rows of __m128 in place. |
| __cpuid⚠ |
[ x86-64 ] See |
| __cpuid_count⚠ |
[ x86-64 ] Returns the result of the |
| __get_cpuid_max⚠ |
[ x86-64 ] Returns the highest-supported |
| __rdtscp⚠ |
[ x86-64 ] Reads the current value of the processor’s time-stamp counter and
the |
| _andn_u32⚠ |
[ x86-64 and ] bmi1Bitwise logical |
| _andn_u64⚠ |
[ x86-64 and ] bmi1Bitwise logical |
| _bextr2_u32⚠ |
[ x86-64 and ] bmi1Extracts bits of |
| _bextr2_u64⚠ |
[ x86-64 and ] bmi1Extracts bits of |
| _bextr_u32⚠ |
[ x86-64 and ] bmi1Extracts bits in range [ |
| _bextr_u64⚠ |
[ x86-64 and ] bmi1Extracts bits in range [ |
| _blcfill_u32⚠ |
[ x86-64 and ] tbmClears all bits below the least significant zero bit of |
| _blcfill_u64⚠ |
[ x86-64 and ] tbmClears all bits below the least significant zero bit of |
| _blci_u32⚠ |
[ x86-64 and ] tbmSets all bits of |
| _blci_u64⚠ |
[ x86-64 and ] tbmSets all bits of |
| _blcic_u32⚠ |
[ x86-64 and ] tbmSets the least significant zero bit of |
| _blcic_u64⚠ |
[ x86-64 and ] tbmSets the least significant zero bit of |
| _blcmsk_u32⚠ |
[ x86-64 and ] tbmSets the least significant zero bit of |
| _blcmsk_u64⚠ |
[ x86-64 and ] tbmSets the least significant zero bit of |
| _blcs_u32⚠ |
[ x86-64 and ] tbmSets the least significant zero bit of |
| _blcs_u64⚠ |
[ x86-64 and ] tbmSets the least significant zero bit of |
| _blsfill_u32⚠ |
[ x86-64 and ] tbmSets all bits of |
| _blsfill_u64⚠ |
[ x86-64 and ] tbmSets all bits of |
| _blsi_u32⚠ |
[ x86-64 and ] bmi1Extract lowest set isolated bit. |
| _blsi_u64⚠ |
[ x86-64 and ] bmi1Extract lowest set isolated bit. |
| _blsic_u32⚠ |
[ x86-64 and ] tbmClears least significant bit and sets all other bits. |
| _blsic_u64⚠ |
[ x86-64 and ] tbmClears least significant bit and sets all other bits. |
| _blsmsk_u32⚠ |
[ x86-64 and ] bmi1Get mask up to lowest set bit. |
| _blsmsk_u64⚠ |
[ x86-64 and ] bmi1Get mask up to lowest set bit. |
| _blsr_u32⚠ |
[ x86-64 and ] bmi1Resets the lowest set bit of |
| _blsr_u64⚠ |
[ x86-64 and ] bmi1Resets the lowest set bit of |
| _bswap⚠ |
[ x86-64 ] Return an integer with the reversed byte order of x |
| _bswap64⚠ |
[ x86-64 ] Return an integer with the reversed byte order of x |
| _bzhi_u32⚠ |
[ x86-64 and ] bmi2Zero higher bits of |
| _bzhi_u64⚠ |
[ x86-64 and ] bmi2Zero higher bits of |
| _fxrstor⚠ |
[ x86-64 and ] fxsrRestores the |
| _fxrstor64⚠ |
[ x86-64 and ] fxsrRestores the |
| _fxsave⚠ |
[ x86-64 and ] fxsrSaves the |
| _fxsave64⚠ |
[ x86-64 and ] fxsrSaves the |
| _lzcnt_u32⚠ |
[ x86-64 and ] lzcntCounts the leading most significant zero bits. |
| _lzcnt_u64⚠ |
[ x86-64 and ] lzcntCounts the leading most significant zero bits. |
| _mm256_abs_epi8⚠ |
[ x86-64 and ] avx2Computes the absolute values of packed 8-bit integers in |
| _mm256_abs_epi16⚠ |
[ x86-64 and ] avx2Computes the absolute values of packed 16-bit integers in |
| _mm256_abs_epi32⚠ |
[ x86-64 and ] avx2Computes the absolute values of packed 32-bit integers in |
| _mm256_add_epi8⚠ |
[ x86-64 and ] avx2Add packed 8-bit integers in |
| _mm256_add_epi16⚠ |
[ x86-64 and ] avx2Add packed 16-bit integers in |
| _mm256_add_epi32⚠ |
[ x86-64 and ] avx2Add packed 32-bit integers in |
| _mm256_add_epi64⚠ |
[ x86-64 and ] avx2Add packed 64-bit integers in |
| _mm256_add_pd⚠ |
[ x86-64 and ] avxAdd packed double-precision (64-bit) floating-point elements
in |
| _mm256_add_ps⚠ |
[ x86-64 and ] avxAdd packed single-precision (32-bit) floating-point elements in |
| _mm256_adds_epi8⚠ |
[ x86-64 and ] avx2Add packed 8-bit integers in |
| _mm256_adds_epi16⚠ |
[ x86-64 and ] avx2Add packed 16-bit integers in |
| _mm256_adds_epu8⚠ |
[ x86-64 and ] avx2Add packed unsigned 8-bit integers in |
| _mm256_adds_epu16⚠ |
[ x86-64 and ] avx2Add packed unsigned 16-bit integers in |
| _mm256_addsub_pd⚠ |
[ x86-64 and ] avxAlternatively add and subtract packed double-precision (64-bit)
floating-point elements in |
| _mm256_addsub_ps⚠ |
[ x86-64 and ] avxAlternatively add and subtract packed single-precision (32-bit)
floating-point elements in |
| _mm256_alignr_epi8⚠ |
[ x86-64 and ] avx2Concatenate pairs of 16-byte blocks in |
| _mm256_and_pd⚠ |
[ x86-64 and ] avxCompute the bitwise AND of a packed double-precision (64-bit)
floating-point elements
in |
| _mm256_and_ps⚠ |
[ x86-64 and ] avxCompute the bitwise AND of packed single-precision (32-bit) floating-point
elements in |
| _mm256_and_si256⚠ |
[ x86-64 and ] avx2Compute the bitwise AND of 256 bits (representing integer data)
in |
| _mm256_andnot_pd⚠ |
[ x86-64 and ] avxCompute the bitwise NOT of packed double-precision (64-bit) floating-point
elements in |
| _mm256_andnot_ps⚠ |
[ x86-64 and ] avxCompute the bitwise NOT of packed single-precision (32-bit) floating-point
elements in |
| _mm256_andnot_si256⚠ |
[ x86-64 and ] avx2Compute the bitwise NOT of 256 bits (representing integer data)
in |
| _mm256_avg_epu8⚠ |
[ x86-64 and ] avx2Average packed unsigned 8-bit integers in |
| _mm256_avg_epu16⚠ |
[ x86-64 and ] avx2Average packed unsigned 16-bit integers in |
| _mm256_blend_epi16⚠ |
[ x86-64 and ] avx2Blend packed 16-bit integers from |
| _mm256_blend_epi32⚠ |
[ x86-64 and ] avx2Blend packed 32-bit integers from |
| _mm256_blend_pd⚠ |
[ x86-64 and ] avxBlend packed double-precision (64-bit) floating-point elements from
|
| _mm256_blend_ps⚠ |
[ x86-64 and ] avxBlend packed single-precision (32-bit) floating-point elements from
|
| _mm256_blendv_epi8⚠ |
[ x86-64 and ] avx2Blend packed 8-bit integers from |
| _mm256_blendv_pd⚠ |
[ x86-64 and ] avxBlend packed double-precision (64-bit) floating-point elements from
|
| _mm256_blendv_ps⚠ |
[ x86-64 and ] avxBlend packed single-precision (32-bit) floating-point elements from
|
| _mm256_broadcast_pd⚠ |
[ x86-64 and ] avxBroadcast 128 bits from memory (composed of 2 packed double-precision (64-bit) floating-point elements) to all elements of the returned vector. |
| _mm256_broadcast_ps⚠ |
[ x86-64 and ] avxBroadcast 128 bits from memory (composed of 4 packed single-precision (32-bit) floating-point elements) to all elements of the returned vector. |
| _mm256_broadcast_sd⚠ |
[ x86-64 and ] avxBroadcast a double-precision (64-bit) floating-point element from memory to all elements of the returned vector. |
| _mm256_broadcast_ss⚠ |
[ x86-64 and ] avxBroadcast a single-precision (32-bit) floating-point element from memory to all elements of the returned vector. |
| _mm256_broadcastb_epi8⚠ |
[ x86-64 and ] avx2Broadcast the low packed 8-bit integer from |
| _mm256_broadcastd_epi32⚠ |
[ x86-64 and ] avx2Broadcast the low packed 32-bit integer from |
| _mm256_broadcastq_epi64⚠ |
[ x86-64 and ] avx2Broadcast the low packed 64-bit integer from |
| _mm256_broadcastsd_pd⚠ |
[ x86-64 and ] avx2Broadcast the low double-precision (64-bit) floating-point element
from |
| _mm256_broadcastsi128_si256⚠ |
[ x86-64 and ] avx2Broadcast 128 bits of integer data from a to all 128-bit lanes in the 256-bit returned value. |
| _mm256_broadcastss_ps⚠ |
[ x86-64 and ] avx2Broadcast the low single-precision (32-bit) floating-point element
from |
| _mm256_broadcastw_epi16⚠ |
[ x86-64 and ] avx2Broadcast the low packed 16-bit integer from a to all elements of the 256-bit returned value |
| _mm256_bslli_epi128⚠ |
[ x86-64 and ] avx2Shift 128-bit lanes in |
| _mm256_bsrli_epi128⚠ |
[ x86-64 and ] avx2Shift 128-bit lanes in |
| _mm256_castpd128_pd256⚠ |
[ x86-64 and ] avxCasts vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined. |
| _mm256_castpd256_pd128⚠ |
[ x86-64 and ] avxCasts vector of type __m256d to type __m128d. |
| _mm256_castpd_ps⚠ |
[ x86-64 and ] avxCast vector of type __m256d to type __m256. |
| _mm256_castpd_si256⚠ |
[ x86-64 and ] avxCasts vector of type __m256d to type __m256i. |
| _mm256_castps128_ps256⚠ |
[ x86-64 and ] avxCasts vector of type __m128 to type __m256; the upper 128 bits of the result are undefined. |
| _mm256_castps256_ps128⚠ |
[ x86-64 and ] avxCasts vector of type __m256 to type __m128. |
| _mm256_castps_pd⚠ |
[ x86-64 and ] avxCast vector of type __m256 to type __m256d. |
| _mm256_castps_si256⚠ |
[ x86-64 and ] avxCasts vector of type __m256 to type __m256i. |
| _mm256_castsi128_si256⚠ |
[ x86-64 and ] avxCasts vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined. |
| _mm256_castsi256_pd⚠ |
[ x86-64 and ] avxCasts vector of type __m256i to type __m256d. |
| _mm256_castsi256_ps⚠ |
[ x86-64 and ] avxCasts vector of type __m256i to type __m256. |
| _mm256_castsi256_si128⚠ |
[ x86-64 and ] avxCasts vector of type __m256i to type __m128i. |
| _mm256_ceil_pd⚠ |
[ x86-64 and ] avxRound packed double-precision (64-bit) floating point elements in |
| _mm256_ceil_ps⚠ |
[ x86-64 and ] avxRound packed single-precision (32-bit) floating point elements in |
| _mm256_cmp_pd⚠ |
[ x86-64 and ] avxCompare packed double-precision (64-bit) floating-point
elements in |
| _mm256_cmp_ps⚠ |
[ x86-64 and ] avxCompare packed single-precision (32-bit) floating-point
elements in |
| _mm256_cmpeq_epi8⚠ |
[ x86-64 and ] avx2Compare packed 8-bit integers in |
| _mm256_cmpeq_epi16⚠ |
[ x86-64 and ] avx2Compare packed 16-bit integers in |
| _mm256_cmpeq_epi32⚠ |
[ x86-64 and ] avx2Compare packed 32-bit integers in |
| _mm256_cmpeq_epi64⚠ |
[ x86-64 and ] avx2Compare packed 64-bit integers in |
| _mm256_cmpgt_epi8⚠ |
[ x86-64 and ] avx2Compare packed 8-bit integers in |
| _mm256_cmpgt_epi16⚠ |
[ x86-64 and ] avx2Compare packed 16-bit integers in |
| _mm256_cmpgt_epi32⚠ |
[ x86-64 and ] avx2Compare packed 32-bit integers in |
| _mm256_cmpgt_epi64⚠ |
[ x86-64 and ] avx2Compare packed 64-bit integers in |
| _mm256_cvtepi16_epi32⚠ |
[ x86-64 and ] avx2Sign-extend 16-bit integers to 32-bit integers. |
| _mm256_cvtepi16_epi64⚠ |
[ x86-64 and ] avx2Sign-extend 16-bit integers to 64-bit integers. |
| _mm256_cvtepi32_epi64⚠ |
[ x86-64 and ] avx2Sign-extend 32-bit integers to 64-bit integers. |
| _mm256_cvtepi32_pd⚠ |
[ x86-64 and ] avxConvert packed 32-bit integers in |
| _mm256_cvtepi32_ps⚠ |
[ x86-64 and ] avxConvert packed 32-bit integers in |
| _mm256_cvtepi8_epi16⚠ |
[ x86-64 and ] avx2Sign-extend 8-bit integers to 16-bit integers. |
| _mm256_cvtepi8_epi32⚠ |
[ x86-64 and ] avx2Sign-extend 8-bit integers to 32-bit integers. |
| _mm256_cvtepi8_epi64⚠ |
[ x86-64 and ] avx2Sign-extend 8-bit integers to 64-bit integers. |
| _mm256_cvtepu16_epi32⚠ |
[ x86-64 and ] avx2Zero extend packed unsigned 16-bit integers in |
| _mm256_cvtepu16_epi64⚠ |
[ x86-64 and ] avx2Zero-extend the lower four unsigned 16-bit integers in |
| _mm256_cvtepu32_epi64⚠ |
[ x86-64 and ] avx2Zero-extend unsigned 32-bit integers in |
| _mm256_cvtepu8_epi16⚠ |
[ x86-64 and ] avx2Zero-extend unsigned 8-bit integers in |
| _mm256_cvtepu8_epi32⚠ |
[ x86-64 and ] avx2Zero-extend the lower eight unsigned 8-bit integers in |
| _mm256_cvtepu8_epi64⚠ |
[ x86-64 and ] avx2Zero-extend the lower four unsigned 8-bit integers in |
| _mm256_cvtpd_epi32⚠ |
[ x86-64 and ] avxConvert packed double-precision (64-bit) floating-point elements in |
| _mm256_cvtpd_ps⚠ |
[ x86-64 and ] avxConvert packed double-precision (64-bit) floating-point elements in |
| _mm256_cvtps_epi32⚠ |
[ x86-64 and ] avxConvert packed single-precision (32-bit) floating-point elements in |
| _mm256_cvtps_pd⚠ |
[ x86-64 and ] avxConvert packed single-precision (32-bit) floating-point elements in |
| _mm256_cvtsd_f64⚠ |
[ x86-64 and ] avx2Returns the first element of the input vector of |
| _mm256_cvtsi256_si32⚠ |
[ x86-64 and ] avx2Returns the first element of the input vector of |
| _mm256_cvtss_f32⚠ |
[ x86-64 and ] avxReturns the first element of the input vector of |
| _mm256_cvttpd_epi32⚠ |
[ x86-64 and ] avxConvert packed double-precision (64-bit) floating-point elements in |
| _mm256_cvttps_epi32⚠ |
[ x86-64 and ] avxConvert packed single-precision (32-bit) floating-point elements in |
| _mm256_div_pd⚠ |
[ x86-64 and ] avxCompute the division of each of the 4 packed 64-bit floating-point elements
in |
| _mm256_div_ps⚠ |
[ x86-64 and ] avxCompute the division of each of the 8 packed 32-bit floating-point elements
in |
| _mm256_dp_ps⚠ |
[ x86-64 and ] avxConditionally multiply the packed single-precision (32-bit) floating-point
elements in |
| _mm256_extract_epi8⚠ |
[ x86-64 and ] avx2Extract an 8-bit integer from |
| _mm256_extract_epi16⚠ |
[ x86-64 and ] avx2Extract a 16-bit integer from |
| _mm256_extract_epi32⚠ |
[ x86-64 and ] avx2Extract a 32-bit integer from |
| _mm256_extract_epi64⚠ |
[ x86-64 and ] avx2Extract a 64-bit integer from |
| _mm256_extractf128_pd⚠ |
[ x86-64 and ] avxExtract 128 bits (composed of 2 packed double-precision (64-bit)
floating-point elements) from |
| _mm256_extractf128_ps⚠ |
[ x86-64 and ] avxExtract 128 bits (composed of 4 packed single-precision (32-bit)
floating-point elements) from |
| _mm256_extractf128_si256⚠ |
[ x86-64 and ] avxExtract 128 bits (composed of integer data) from |
| _mm256_extracti128_si256⚠ |
[ x86-64 and ] avx2Extract 128 bits (of integer data) from |
| _mm256_floor_pd⚠ |
[ x86-64 and ] avxRound packed double-precision (64-bit) floating point elements in |
| _mm256_floor_ps⚠ |
[ x86-64 and ] avxRound packed single-precision (32-bit) floating point elements in |
| _mm256_fmadd_pd⚠ |
[ x86-64 and ] fmaMultiply packed double-precision (64-bit) floating-point elements in |
| _mm256_fmadd_ps⚠ |
[ x86-64 and ] fmaMultiply packed single-precision (32-bit) floating-point elements in |
| _mm256_fmaddsub_pd⚠ |
[ x86-64 and ] fmaMultiply packed double-precision (64-bit) floating-point elements in |
| _mm256_fmaddsub_ps⚠ |
[ x86-64 and ] fmaMultiply packed single-precision (32-bit) floating-point elements in |
| _mm256_fmsub_pd⚠ |
[ x86-64 and ] fmaMultiply packed double-precision (64-bit) floating-point elements in |
| _mm256_fmsub_ps⚠ |
[ x86-64 and ] fmaMultiply packed single-precision (32-bit) floating-point elements in |
| _mm256_fmsubadd_pd⚠ |
[ x86-64 and ] fmaMultiply packed double-precision (64-bit) floating-point elements in |
| _mm256_fmsubadd_ps⚠ |
[ x86-64 and ] fmaMultiply packed single-precision (32-bit) floating-point elements in |
| _mm256_fnmadd_pd⚠ |
[ x86-64 and ] fmaMultiply packed double-precision (64-bit) floating-point elements in |
| _mm256_fnmadd_ps⚠ |
[ x86-64 and ] fmaMultiply packed single-precision (32-bit) floating-point elements in |
| _mm256_fnmsub_pd⚠ |
[ x86-64 and ] fmaMultiply packed double-precision (64-bit) floating-point elements in |
| _mm256_fnmsub_ps⚠ |
[ x86-64 and ] fmaMultiply packed single-precision (32-bit) floating-point elements in |
| _mm256_hadd_epi16⚠ |
[ x86-64 and ] avx2Horizontally add adjacent pairs of 16-bit integers in |
| _mm256_hadd_epi32⚠ |
[ x86-64 and ] avx2Horizontally add adjacent pairs of 32-bit integers in |
| _mm256_hadd_pd⚠ |
[ x86-64 and ] avxHorizontal addition of adjacent pairs in the two packed vectors
of 4 64-bit floating points |
| _mm256_hadd_ps⚠ |
[ x86-64 and ] avxHorizontal addition of adjacent pairs in the two packed vectors
of 8 32-bit floating points |
| _mm256_hadds_epi16⚠ |
[ x86-64 and ] avx2Horizontally add adjacent pairs of 16-bit integers in |
| _mm256_hsub_epi16⚠ |
[ x86-64 and ] avx2Horizontally subtract adjacent pairs of 16-bit integers in |
| _mm256_hsub_epi32⚠ |
[ x86-64 and ] avx2Horizontally subtract adjacent pairs of 32-bit integers in |
| _mm256_hsub_pd⚠ |
[ x86-64 and ] avxHorizontal subtraction of adjacent pairs in the two packed vectors
of 4 64-bit floating points |
| _mm256_hsub_ps⚠ |
[ x86-64 and ] avxHorizontal subtraction of adjacent pairs in the two packed vectors
of 8 32-bit floating points |
| _mm256_hsubs_epi16⚠ |
[ x86-64 and ] avx2Horizontally subtract adjacent pairs of 16-bit integers in |
| _mm256_i32gather_epi32⚠ |
[ x86-64 and ] avx2Return values from |
| _mm256_i32gather_epi64⚠ |
[ x86-64 and ] avx2Return values from |
| _mm256_i32gather_pd⚠ |
[ x86-64 and ] avx2Return values from |
| _mm256_i32gather_ps⚠ |
[ x86-64 and ] avx2Return values from |
| _mm256_i64gather_epi32⚠ |
[ x86-64 and ] avx2Return values from |
| _mm256_i64gather_epi64⚠ |
[ x86-64 and ] avx2Return values from |
| _mm256_i64gather_pd⚠ |
[ x86-64 and ] avx2Return values from |
| _mm256_i64gather_ps⚠ |
[ x86-64 and ] avx2Return values from |
| _mm256_insert_epi8⚠ |
[ x86-64 and ] avxCopy |
| _mm256_insert_epi16⚠ |
[ x86-64 and ] avxCopy |
| _mm256_insert_epi32⚠ |
[ x86-64 and ] avxCopy |
| _mm256_insert_epi64⚠ |
[ x86-64 and ] avxCopy |
| _mm256_insertf128_pd⚠ |
[ x86-64 and ] avxCopy |
| _mm256_insertf128_ps⚠ |
[ x86-64 and ] avxCopy |
| _mm256_insertf128_si256⚠ |
[ x86-64 and ] avxCopy |
| _mm256_inserti128_si256⚠ |
[ x86-64 and ] avx2Copy |
| _mm256_lddqu_si256⚠ |
[ x86-64 and ] avxLoad 256-bits of integer data from unaligned memory into result.
This intrinsic may perform better than |
| _mm256_load_pd⚠ |
[ x86-64 and ] avxLoad 256-bits (composed of 4 packed double-precision (64-bit)
floating-point elements) from memory into result.
|
| _mm256_load_ps⚠ |
[ x86-64 and ] avxLoad 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) from memory into result.
|
| _mm256_load_si256⚠ |
[ x86-64 and ] avxLoad 256-bits of integer data from memory into result.
|
| _mm256_loadu2_m128⚠ |
[ x86-64 and ] avx,sseLoad two 128-bit values (composed of 4 packed single-precision (32-bit)
floating-point elements) from memory, and combine them into a 256-bit
value.
|
| _mm256_loadu2_m128d⚠ |
[ x86-64 and ] avx,sse2Load two 128-bit values (composed of 2 packed double-precision (64-bit)
floating-point elements) from memory, and combine them into a 256-bit
value.
|
| _mm256_loadu2_m128i⚠ |
[ x86-64 and ] avx,sse2Load two 128-bit values (composed of integer data) from memory, and combine
them into a 256-bit value.
|
| _mm256_loadu_pd⚠ |
[ x86-64 and ] avxLoad 256-bits (composed of 4 packed double-precision (64-bit)
floating-point elements) from memory into result.
|
| _mm256_loadu_ps⚠ |
[ x86-64 and ] avxLoad 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) from memory into result.
|
| _mm256_loadu_si256⚠ |
[ x86-64 and ] avxLoad 256-bits of integer data from memory into result.
|
| _mm256_madd_epi16⚠ |
[ x86-64 and ] avx2Multiply packed signed 16-bit integers in |
| _mm256_maddubs_epi16⚠ |
[ x86-64 and ] avx2Vertically multiply each unsigned 8-bit integer from |
| _mm256_mask_i32gather_epi32⚠ |
[ x86-64 and ] avx2Return values from |
| _mm256_mask_i32gather_epi64⚠ |
[ x86-64 and ] avx2Return values from |
| _mm256_mask_i32gather_pd⚠ |
[ x86-64 and ] avx2Return values from |
| _mm256_mask_i32gather_ps⚠ |
[ x86-64 and ] avx2Return values from |
| _mm256_mask_i64gather_epi32⚠ |
[ x86-64 and ] avx2Return values from |
| _mm256_mask_i64gather_epi64⚠ |
[ x86-64 and ] avx2Return values from |
| _mm256_mask_i64gather_pd⚠ |
[ x86-64 and ] avx2Return values from |
| _mm256_mask_i64gather_ps⚠ |
[ x86-64 and ] avx2Return values from |
| _mm256_maskload_epi32⚠ |
[ x86-64 and ] avx2Load packed 32-bit integers from memory pointed by |
| _mm256_maskload_epi64⚠ |
[ x86-64 and ] avx2Load packed 64-bit integers from memory pointed by |
| _mm256_maskload_pd⚠ |
[ x86-64 and ] avxLoad packed double-precision (64-bit) floating-point elements from memory
into result using |
| _mm256_maskload_ps⚠ |
[ x86-64 and ] avxLoad packed single-precision (32-bit) floating-point elements from memory
into result using |
| _mm256_maskstore_epi32⚠ |
[ x86-64 and ] avx2Store packed 32-bit integers from |
| _mm256_maskstore_epi64⚠ |
[ x86-64 and ] avx2Store packed 64-bit integers from |
| _mm256_maskstore_pd⚠ |
[ x86-64 and ] avxStore packed double-precision (64-bit) floating-point elements from |
| _mm256_maskstore_ps⚠ |
[ x86-64 and ] avxStore packed single-precision (32-bit) floating-point elements from |
| _mm256_max_epi8⚠ |
[ x86-64 and ] avx2Compare packed 8-bit integers in |
| _mm256_max_epi16⚠ |
[ x86-64 and ] avx2Compare packed 16-bit integers in |
| _mm256_max_epi32⚠ |
[ x86-64 and ] avx2Compare packed 32-bit integers in |
| _mm256_max_epu8⚠ |
[ x86-64 and ] avx2Compare packed unsigned 8-bit integers in |
| _mm256_max_epu16⚠ |
[ x86-64 and ] avx2Compare packed unsigned 16-bit integers in |
| _mm256_max_epu32⚠ |
[ x86-64 and ] avx2Compare packed unsigned 32-bit integers in |
| _mm256_max_pd⚠ |
[ x86-64 and ] avxCompare packed double-precision (64-bit) floating-point elements
in |
| _mm256_max_ps⚠ |
[ x86-64 and ] avxCompare packed single-precision (32-bit) floating-point elements in |
| _mm256_min_epi8⚠ |
[ x86-64 and ] avx2Compare packed 8-bit integers in |
| _mm256_min_epi16⚠ |
[ x86-64 and ] avx2Compare packed 16-bit integers in |
| _mm256_min_epi32⚠ |
[ x86-64 and ] avx2Compare packed 32-bit integers in |
| _mm256_min_epu8⚠ |
[ x86-64 and ] avx2Compare packed unsigned 8-bit integers in |
| _mm256_min_epu16⚠ |
[ x86-64 and ] avx2Compare packed unsigned 16-bit integers in |
| _mm256_min_epu32⚠ |
[ x86-64 and ] avx2Compare packed unsigned 32-bit integers in |
| _mm256_min_pd⚠ |
[ x86-64 and ] avxCompare packed double-precision (64-bit) floating-point elements
in |
| _mm256_min_ps⚠ |
[ x86-64 and ] avxCompare packed single-precision (32-bit) floating-point elements in |
| _mm256_movedup_pd⚠ |
[ x86-64 and ] avxDuplicate even-indexed double-precision (64-bit) floating-point elements from "a", and return the results. |
| _mm256_movehdup_ps⚠ |
[ x86-64 and ] avxDuplicate odd-indexed single-precision (32-bit) floating-point elements
from |
| _mm256_moveldup_ps⚠ |
[ x86-64 and ] avxDuplicate even-indexed single-precision (32-bit) floating-point elements
from |
| _mm256_movemask_epi8⚠ |
[ x86-64 and ] avx2Create mask from the most significant bit of each 8-bit element in |
| _mm256_movemask_pd⚠ |
[ x86-64 and ] avxSet each bit of the returned mask based on the most significant bit of the
corresponding packed double-precision (64-bit) floating-point element in
|
| _mm256_movemask_ps⚠ |
[ x86-64 and ] avxSet each bit of the returned mask based on the most significant bit of the
corresponding packed single-precision (32-bit) floating-point element in
|
| _mm256_mpsadbw_epu8⚠ |
[ x86-64 and ] avx2Compute the sum of absolute differences (SADs) of quadruplets of unsigned
8-bit integers in |
| _mm256_mul_epi32⚠ |
[ x86-64 and ] avx2Multiply the low 32-bit integers from each packed 64-bit element in
|
| _mm256_mul_epu32⚠ |
[ x86-64 and ] avx2Multiply the low unsigned 32-bit integers from each packed 64-bit
element in |
| _mm256_mul_pd⚠ |
[ x86-64 and ] avxAdd packed double-precision (64-bit) floating-point elements
in |
| _mm256_mul_ps⚠ |
[ x86-64 and ] avxAdd packed single-precision (32-bit) floating-point elements in |
| _mm256_mulhi_epi16⚠ |
[ x86-64 and ] avx2Multiply the packed 16-bit integers in |
| _mm256_mulhi_epu16⚠ |
[ x86-64 and ] avx2Multiply the packed unsigned 16-bit integers in |
| _mm256_mulhrs_epi16⚠ |
[ x86-64 and ] avx2Multiply packed 16-bit integers in |
| _mm256_mullo_epi16⚠ |
[ x86-64 and ] avx2Multiply the packed 16-bit integers in |
| _mm256_mullo_epi32⚠ |
[ x86-64 and ] avx2Multiply the packed 32-bit integers in |
| _mm256_or_pd⚠ |
[ x86-64 and ] avxCompute the bitwise OR packed double-precision (64-bit) floating-point
elements in |
| _mm256_or_ps⚠ |
[ x86-64 and ] avxCompute the bitwise OR packed single-precision (32-bit) floating-point
elements in |
| _mm256_or_si256⚠ |
[ x86-64 and ] avx2Compute the bitwise OR of 256 bits (representing integer data) in |
| _mm256_packs_epi16⚠ |
[ x86-64 and ] avx2Convert packed 16-bit integers from |
| _mm256_packs_epi32⚠ |
[ x86-64 and ] avx2Convert packed 32-bit integers from |
| _mm256_packus_epi16⚠ |
[ x86-64 and ] avx2Convert packed 16-bit integers from |
| _mm256_packus_epi32⚠ |
[ x86-64 and ] avx2Convert packed 32-bit integers from |
| _mm256_permute2f128_pd⚠ |
[ x86-64 and ] avxShuffle 256-bits (composed of 4 packed double-precision (64-bit)
floating-point elements) selected by |
| _mm256_permute2f128_ps⚠ |
[ x86-64 and ] avxShuffle 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) selected by |
| _mm256_permute2f128_si256⚠ |
[ x86-64 and ] avxShuffle 258-bits (composed of integer data) selected by |
| _mm256_permute2x128_si256⚠ |
[ x86-64 and ] avx2Shuffle 128-bits of integer data selected by |
| _mm256_permute4x64_epi64⚠ |
[ x86-64 and ] avx2Permutes 64-bit integers from |
| _mm256_permute4x64_pd⚠ |
[ x86-64 and ] avx2Shuffle 64-bit floating-point elements in |
| _mm256_permute_pd⚠ |
[ x86-64 and ] avxShuffle double-precision (64-bit) floating-point elements in |
| _mm256_permute_ps⚠ |
[ x86-64 and ] avxShuffle single-precision (32-bit) floating-point elements in |
| _mm256_permutevar8x32_epi32⚠ |
[ x86-64 and ] avx2Permutes packed 32-bit integers from |
| _mm256_permutevar8x32_ps⚠ |
[ x86-64 and ] avx2Shuffle eight 32-bit foating-point elements in |
| _mm256_permutevar_pd⚠ |
[ x86-64 and ] avxShuffle double-precision (64-bit) floating-point elements in |
| _mm256_permutevar_ps⚠ |
[ x86-64 and ] avxShuffle single-precision (32-bit) floating-point elements in |
| _mm256_rcp_ps⚠ |
[ x86-64 and ] avxCompute the approximate reciprocal of packed single-precision (32-bit)
floating-point elements in |
| _mm256_round_pd⚠ |
[ x86-64 and ] avxRound packed double-precision (64-bit) floating point elements in |
| _mm256_round_ps⚠ |
[ x86-64 and ] avxRound packed single-precision (32-bit) floating point elements in |
| _mm256_rsqrt_ps⚠ |
[ x86-64 and ] avxCompute the approximate reciprocal square root of packed single-precision
(32-bit) floating-point elements in |
| _mm256_sad_epu8⚠ |
[ x86-64 and ] avx2Compute the absolute differences of packed unsigned 8-bit integers in |
| _mm256_set1_epi8⚠ |
[ x86-64 and ] avxBroadcast 8-bit integer |
| _mm256_set1_epi16⚠ |
[ x86-64 and ] avxBroadcast 16-bit integer |
| _mm256_set1_epi32⚠ |
[ x86-64 and ] avxBroadcast 32-bit integer |
| _mm256_set1_epi64x⚠ |
[ x86-64 and ] avxBroadcast 64-bit integer |
| _mm256_set1_pd⚠ |
[ x86-64 and ] avxBroadcast double-precision (64-bit) floating-point value |
| _mm256_set1_ps⚠ |
[ x86-64 and ] avxBroadcast single-precision (32-bit) floating-point value |
| _mm256_set_epi8⚠ |
[ x86-64 and ] avxSet packed 8-bit integers in returned vector with the supplied values in reverse order. |
| _mm256_set_epi16⚠ |
[ x86-64 and ] avxSet packed 16-bit integers in returned vector with the supplied values. |
| _mm256_set_epi32⚠ |
[ x86-64 and ] avxSet packed 32-bit integers in returned vector with the supplied values. |
| _mm256_set_epi64x⚠ |
[ x86-64 and ] avxSet packed 64-bit integers in returned vector with the supplied values. |
| _mm256_set_m128⚠ |
[ x86-64 and ] avxSet packed __m256 returned vector with the supplied values. |
| _mm256_set_m128d⚠ |
[ x86-64 and ] avxSet packed __m256d returned vector with the supplied values. |
| _mm256_set_m128i⚠ |
[ x86-64 and ] avxSet packed __m256i returned vector with the supplied values. |
| _mm256_set_pd⚠ |
[ x86-64 and ] avxSet packed double-precision (64-bit) floating-point elements in returned vector with the supplied values. |
| _mm256_set_ps⚠ |
[ x86-64 and ] avxSet packed single-precision (32-bit) floating-point elements in returned vector with the supplied values. |
| _mm256_setr_epi8⚠ |
[ x86-64 and ] avxSet packed 8-bit integers in returned vector with the supplied values in reverse order. |
| _mm256_setr_epi16⚠ |
[ x86-64 and ] avxSet packed 16-bit integers in returned vector with the supplied values in reverse order. |
| _mm256_setr_epi32⚠ |
[ x86-64 and ] avxSet packed 32-bit integers in returned vector with the supplied values in reverse order. |
| _mm256_setr_epi64x⚠ |
[ x86-64 and ] avxSet packed 64-bit integers in returned vector with the supplied values in reverse order. |
| _mm256_setr_m128⚠ |
[ x86-64 and ] avxSet packed __m256 returned vector with the supplied values. |
| _mm256_setr_m128d⚠ |
[ x86-64 and ] avxSet packed __m256d returned vector with the supplied values. |
| _mm256_setr_m128i⚠ |
[ x86-64 and ] avxSet packed __m256i returned vector with the supplied values. |
| _mm256_setr_pd⚠ |
[ x86-64 and ] avxSet packed double-precision (64-bit) floating-point elements in returned vector with the supplied values in reverse order. |
| _mm256_setr_ps⚠ |
[ x86-64 and ] avxSet packed single-precision (32-bit) floating-point elements in returned vector with the supplied values in reverse order. |
| _mm256_setzero_pd⚠ |
[ x86-64 and ] avxReturn vector of type __m256d with all elements set to zero. |
| _mm256_setzero_ps⚠ |
[ x86-64 and ] avxReturn vector of type __m256 with all elements set to zero. |
| _mm256_setzero_si256⚠ |
[ x86-64 and ] avxReturn vector of type __m256i with all elements set to zero. |
| _mm256_shuffle_epi8⚠ |
[ x86-64 and ] avx2Shuffle bytes from |
| _mm256_shuffle_epi32⚠ |
[ x86-64 and ] avx2Shuffle 32-bit integers in 128-bit lanes of |
| _mm256_shuffle_pd⚠ |
[ x86-64 and ] avxShuffle double-precision (64-bit) floating-point elements within 128-bit
lanes using the control in |
| _mm256_shuffle_ps⚠ |
[ x86-64 and ] avxShuffle single-precision (32-bit) floating-point elements in |
| _mm256_shufflehi_epi16⚠ |
[ x86-64 and ] avx2Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of |
| _mm256_shufflelo_epi16⚠ |
[ x86-64 and ] avx2Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of |
| _mm256_sign_epi8⚠ |
[ x86-64 and ] avx2Negate packed 8-bit integers in |
| _mm256_sign_epi16⚠ |
[ x86-64 and ] avx2Negate packed 16-bit integers in |
| _mm256_sign_epi32⚠ |
[ x86-64 and ] avx2Negate packed 32-bit integers in |
| _mm256_sll_epi16⚠ |
[ x86-64 and ] avx2Shift packed 16-bit integers in |
| _mm256_sll_epi32⚠ |
[ x86-64 and ] avx2Shift packed 32-bit integers in |
| _mm256_sll_epi64⚠ |
[ x86-64 and ] avx2Shift packed 64-bit integers in |
| _mm256_slli_epi16⚠ |
[ x86-64 and ] avx2Shift packed 16-bit integers in |
| _mm256_slli_epi32⚠ |
[ x86-64 and ] avx2Shift packed 32-bit integers in |
| _mm256_slli_epi64⚠ |
[ x86-64 and ] avx2Shift packed 64-bit integers in |
| _mm256_slli_si256⚠ |
[ x86-64 and ] avx2Shift 128-bit lanes in |
| _mm256_sllv_epi32⚠ |
[ x86-64 and ] avx2Shift packed 32-bit integers in |
| _mm256_sllv_epi64⚠ |
[ x86-64 and ] avx2Shift packed 64-bit integers in |
| _mm256_sqrt_pd⚠ |
[ x86-64 and ] avxReturn the square root of packed double-precision (64-bit) floating point
elements in |
| _mm256_sqrt_ps⚠ |
[ x86-64 and ] avxReturn the square root of packed single-precision (32-bit) floating point
elements in |
| _mm256_sra_epi16⚠ |
[ x86-64 and ] avx2Shift packed 16-bit integers in |
| _mm256_sra_epi32⚠ |
[ x86-64 and ] avx2Shift packed 32-bit integers in |
| _mm256_srai_epi16⚠ |
[ x86-64 and ] avx2Shift packed 16-bit integers in |
| _mm256_srai_epi32⚠ |
[ x86-64 and ] avx2Shift packed 32-bit integers in |
| _mm256_srav_epi32⚠ |
[ x86-64 and ] avx2Shift packed 32-bit integers in |
| _mm256_srl_epi16⚠ |
[ x86-64 and ] avx2Shift packed 16-bit integers in |
| _mm256_srl_epi32⚠ |
[ x86-64 and ] avx2Shift packed 32-bit integers in |
| _mm256_srl_epi64⚠ |
[ x86-64 and ] avx2Shift packed 64-bit integers in |
| _mm256_srli_epi16⚠ |
[ x86-64 and ] avx2Shift packed 16-bit integers in |
| _mm256_srli_epi32⚠ |
[ x86-64 and ] avx2Shift packed 32-bit integers in |
| _mm256_srli_epi64⚠ |
[ x86-64 and ] avx2Shift packed 64-bit integers in |
| _mm256_srli_si256⚠ |
[ x86-64 and ] avx2Shift 128-bit lanes in |
| _mm256_srlv_epi32⚠ |
[ x86-64 and ] avx2Shift packed 32-bit integers in |
| _mm256_srlv_epi64⚠ |
[ x86-64 and ] avx2Shift packed 64-bit integers in |
| _mm256_store_pd⚠ |
[ x86-64 and ] avxStore 256-bits (composed of 4 packed double-precision (64-bit)
floating-point elements) from |
| _mm256_store_ps⚠ |
[ x86-64 and ] avxStore 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) from |
| _mm256_store_si256⚠ |
[ x86-64 and ] avxStore 256-bits of integer data from |
| _mm256_storeu2_m128⚠ |
[ x86-64 and ] avx,sseStore the high and low 128-bit halves (each composed of 4 packed
single-precision (32-bit) floating-point elements) from |
| _mm256_storeu2_m128d⚠ |
[ x86-64 and ] avx,sse2Store the high and low 128-bit halves (each composed of 2 packed
double-precision (64-bit) floating-point elements) from |
| _mm256_storeu2_m128i⚠ |
[ x86-64 and ] avx,sse2Store the high and low 128-bit halves (each composed of integer data) from
|
| _mm256_storeu_pd⚠ |
[ x86-64 and ] avxStore 256-bits (composed of 4 packed double-precision (64-bit)
floating-point elements) from |
| _mm256_storeu_ps⚠ |
[ x86-64 and ] avxStore 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) from |
| _mm256_storeu_si256⚠ |
[ x86-64 and ] avxStore 256-bits of integer data from |
| _mm256_stream_pd⚠ |
[ x86-64 and ] avxMoves double-precision values from a 256-bit vector of |
| _mm256_stream_ps⚠ |
[ x86-64 and ] avxMoves single-precision floating point values from a 256-bit vector
of |
| _mm256_stream_si256⚠ |
[ x86-64 and ] avxMoves integer data from a 256-bit integer vector to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon) |
| _mm256_sub_epi8⚠ |
[ x86-64 and ] avx2Subtract packed 8-bit integers in |
| _mm256_sub_epi16⚠ |
[ x86-64 and ] avx2Subtract packed 16-bit integers in |
| _mm256_sub_epi32⚠ |
[ x86-64 and ] avx2Subtract packed 32-bit integers in |
| _mm256_sub_epi64⚠ |
[ x86-64 and ] avx2Subtract packed 64-bit integers in |
| _mm256_sub_pd⚠ |
[ x86-64 and ] avxSubtract packed double-precision (64-bit) floating-point elements in |
| _mm256_sub_ps⚠ |
[ x86-64 and ] avxSubtract packed single-precision (32-bit) floating-point elements in |
| _mm256_subs_epi8⚠ |
[ x86-64 and ] avx2Subtract packed 8-bit integers in |
| _mm256_subs_epi16⚠ |
[ x86-64 and ] avx2Subtract packed 16-bit integers in |
| _mm256_subs_epu8⚠ |
[ x86-64 and ] avx2Subtract packed unsigned 8-bit integers in |
| _mm256_subs_epu16⚠ |
[ x86-64 and ] avx2Subtract packed unsigned 16-bit integers in |
| _mm256_testc_pd⚠ |
[ x86-64 and ] avxCompute the bitwise AND of 256 bits (representing double-precision (64-bit)
floating-point elements) in |
| _mm256_testc_ps⚠ |
[ x86-64 and ] avxCompute the bitwise AND of 256 bits (representing single-precision (32-bit)
floating-point elements) in |
| _mm256_testc_si256⚠ |
[ x86-64 and ] avxCompute the bitwise AND of 256 bits (representing integer data) in |
| _mm256_testnzc_pd⚠ |
[ x86-64 and ] avxCompute the bitwise AND of 256 bits (representing double-precision (64-bit)
floating-point elements) in |
| _mm256_testnzc_ps⚠ |
[ x86-64 and ] avxCompute the bitwise AND of 256 bits (representing single-precision (32-bit)
floating-point elements) in |
| _mm256_testnzc_si256⚠ |
[ x86-64 and ] avxCompute the bitwise AND of 256 bits (representing integer data) in |
| _mm256_testz_pd⚠ |
[ x86-64 and ] avxCompute the bitwise AND of 256 bits (representing double-precision (64-bit)
floating-point elements) in |
| _mm256_testz_ps⚠ |
[ x86-64 and ] avxCompute the bitwise AND of 256 bits (representing single-precision (32-bit)
floating-point elements) in |
| _mm256_testz_si256⚠ |
[ x86-64 and ] avxCompute the bitwise AND of 256 bits (representing integer data) in |
| _mm256_undefined_pd⚠ |
[ x86-64 and ] avxReturn vector of type |
| _mm256_undefined_ps⚠ |
[ x86-64 and ] avxReturn vector of type |
| _mm256_undefined_si256⚠ |
[ x86-64 and ] avxReturn vector of type __m256i with undefined elements. |
| _mm256_unpackhi_epi8⚠ |
[ x86-64 and ] avx2Unpack and interleave 8-bit integers from the high half of each
128-bit lane in |
| _mm256_unpackhi_epi16⚠ |
[ x86-64 and ] avx2Unpack and interleave 16-bit integers from the high half of each
128-bit lane of |
| _mm256_unpackhi_epi32⚠ |
[ x86-64 and ] avx2Unpack and interleave 32-bit integers from the high half of each
128-bit lane of |
| _mm256_unpackhi_epi64⚠ |
[ x86-64 and ] avx2Unpack and interleave 64-bit integers from the high half of each
128-bit lane of |
| _mm256_unpackhi_pd⚠ |
[ x86-64 and ] avxUnpack and interleave double-precision (64-bit) floating-point elements
from the high half of each 128-bit lane in |
| _mm256_unpackhi_ps⚠ |
[ x86-64 and ] avxUnpack and interleave single-precision (32-bit) floating-point elements
from the high half of each 128-bit lane in |
| _mm256_unpacklo_epi8⚠ |
[ x86-64 and ] avx2Unpack and interleave 8-bit integers from the low half of each
128-bit lane of |
| _mm256_unpacklo_epi16⚠ |
[ x86-64 and ] avx2Unpack and interleave 16-bit integers from the low half of each
128-bit lane of |
| _mm256_unpacklo_epi32⚠ |
[ x86-64 and ] avx2Unpack and interleave 32-bit integers from the low half of each
128-bit lane of |
| _mm256_unpacklo_epi64⚠ |
[ x86-64 and ] avx2Unpack and interleave 64-bit integers from the low half of each
128-bit lane of |
| _mm256_unpacklo_pd⚠ |
[ x86-64 and ] avxUnpack and interleave double-precision (64-bit) floating-point elements
from the low half of each 128-bit lane in |
| _mm256_unpacklo_ps⚠ |
[ x86-64 and ] avxUnpack and interleave single-precision (32-bit) floating-point elements
from the low half of each 128-bit lane in |
| _mm256_xor_pd⚠ |
[ x86-64 and ] avxCompute the bitwise XOR of packed double-precision (64-bit) floating-point
elements in |
| _mm256_xor_ps⚠ |
[ x86-64 and ] avxCompute the bitwise XOR of packed single-precision (32-bit) floating-point
elements in |
| _mm256_xor_si256⚠ |
[ x86-64 and ] avx2Compute the bitwise XOR of 256 bits (representing integer data)
in |
| _mm256_zeroall⚠ |
[ x86-64 and ] avxZero the contents of all XMM or YMM registers. |
| _mm256_zeroupper⚠ |
[ x86-64 and ] avxZero the upper 128 bits of all YMM registers; the lower 128-bits of the registers are unmodified. |
| _mm256_zextpd128_pd256⚠ |
[ x86-64 and ] avx,sse2Constructs a 256-bit floating-point vector of |
| _mm256_zextps128_ps256⚠ |
[ x86-64 and ] avx,sseConstructs a 256-bit floating-point vector of |
| _mm256_zextsi128_si256⚠ |
[ x86-64 and ] avx,sse2Constructs a 256-bit integer vector from a 128-bit integer vector. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero. |
| _mm_abs_epi8⚠ |
[ x86-64 and ] ssse3Compute the absolute value of packed 8-bit signed integers in |
| _mm_abs_epi16⚠ |
[ x86-64 and ] ssse3Compute the absolute value of each of the packed 16-bit signed integers in
|
| _mm_abs_epi32⚠ |
[ x86-64 and ] ssse3Compute the absolute value of each of the packed 32-bit signed integers in
|
| _mm_add_epi8⚠ |
[ x86-64 and ] sse2Add packed 8-bit integers in |
| _mm_add_epi16⚠ |
[ x86-64 and ] sse2Add packed 16-bit integers in |
| _mm_add_epi32⚠ |
[ x86-64 and ] sse2Add packed 32-bit integers in |
| _mm_add_epi64⚠ |
[ x86-64 and ] sse2Add packed 64-bit integers in |
| _mm_add_pd⚠ |
[ x86-64 and ] sse2Add packed double-precision (64-bit) floating-point elements in |
| _mm_add_ps⚠ |
[ x86-64 and ] sseAdds __m128 vectors. |
| _mm_add_sd⚠ |
[ x86-64 and ] sse2Return a new vector with the low element of |
| _mm_add_ss⚠ |
[ x86-64 and ] sseAdds the first component of |
| _mm_adds_epi8⚠ |
[ x86-64 and ] sse2Add packed 8-bit integers in |
| _mm_adds_epi16⚠ |
[ x86-64 and ] sse2Add packed 16-bit integers in |
| _mm_adds_epu8⚠ |
[ x86-64 and ] sse2Add packed unsigned 8-bit integers in |
| _mm_adds_epu16⚠ |
[ x86-64 and ] sse2Add packed unsigned 16-bit integers in |
| _mm_addsub_pd⚠ |
[ x86-64 and ] sse3Alternatively add and subtract packed double-precision (64-bit)
floating-point elements in |
| _mm_addsub_ps⚠ |
[ x86-64 and ] sse3Alternatively add and subtract packed single-precision (32-bit)
floating-point elements in |
| _mm_aesdec_si128⚠ |
[ x86-64 and ] aesPerform one round of an AES decryption flow on data (state) in |
| _mm_aesdeclast_si128⚠ |
[ x86-64 and ] aesPerform the last round of an AES decryption flow on data (state) in |
| _mm_aesenc_si128⚠ |
[ x86-64 and ] aesPerform one round of an AES encryption flow on data (state) in |
| _mm_aesenclast_si128⚠ |
[ x86-64 and ] aesPerform the last round of an AES encryption flow on data (state) in |
| _mm_aesimc_si128⚠ |
[ x86-64 and ] aesPerform the |
| _mm_aeskeygenassist_si128⚠ |
[ x86-64 and ] aesAssist in expanding the AES cipher key. |
| _mm_alignr_epi8⚠ |
[ x86-64 and ] ssse3Concatenate 16-byte blocks in |
| _mm_and_pd⚠ |
[ x86-64 and ] sse2Compute the bitwise AND of packed double-precision (64-bit) floating-point
elements in |
| _mm_and_ps⚠ |
[ x86-64 and ] sseBitwise AND of packed single-precision (32-bit) floating-point elements. |
| _mm_and_si128⚠ |
[ x86-64 and ] sse2Compute the bitwise AND of 128 bits (representing integer data) in |
| _mm_andnot_pd⚠ |
[ x86-64 and ] sse2Compute the bitwise NOT of |
| _mm_andnot_ps⚠ |
[ x86-64 and ] sseBitwise AND-NOT of packed single-precision (32-bit) floating-point elements. |
| _mm_andnot_si128⚠ |
[ x86-64 and ] sse2Compute the bitwise NOT of 128 bits (representing integer data) in |
| _mm_avg_epu8⚠ |
[ x86-64 and ] sse2Average packed unsigned 8-bit integers in |
| _mm_avg_epu16⚠ |
[ x86-64 and ] sse2Average packed unsigned 16-bit integers in |
| _mm_blend_epi16⚠ |
[ x86-64 and ] sse4.1Blend packed 16-bit integers from |
| _mm_blend_epi32⚠ |
[ x86-64 and ] avx2Blend packed 32-bit integers from |
| _mm_blend_pd⚠ |
[ x86-64 and ] sse4.1Blend packed double-precision (64-bit) floating-point elements from |
| _mm_blend_ps⚠ |
[ x86-64 and ] sse4.1Blend packed single-precision (32-bit) floating-point elements from |
| _mm_blendv_epi8⚠ |
[ x86-64 and ] sse4.1Blend packed 8-bit integers from |
| _mm_blendv_pd⚠ |
[ x86-64 and ] sse4.1Blend packed double-precision (64-bit) floating-point elements from |
| _mm_blendv_ps⚠ |
[ x86-64 and ] sse4.1Blend packed single-precision (32-bit) floating-point elements from |
| _mm_broadcast_ss⚠ |
[ x86-64 and ] avxBroadcast a single-precision (32-bit) floating-point element from memory to all elements of the returned vector. |
| _mm_broadcastb_epi8⚠ |
[ x86-64 and ] avx2Broadcast the low packed 8-bit integer from |
| _mm_broadcastd_epi32⚠ |
[ x86-64 and ] avx2Broadcast the low packed 32-bit integer from |
| _mm_broadcastq_epi64⚠ |
[ x86-64 and ] avx2Broadcast the low packed 64-bit integer from |
| _mm_broadcastsd_pd⚠ |
[ x86-64 and ] avx2Broadcast the low double-precision (64-bit) floating-point element
from |
| _mm_broadcastss_ps⚠ |
[ x86-64 and ] avx2Broadcast the low single-precision (32-bit) floating-point element
from |
| _mm_broadcastw_epi16⚠ |
[ x86-64 and ] avx2Broadcast the low packed 16-bit integer from a to all elements of the 128-bit returned value |
| _mm_bslli_si128⚠ |
[ x86-64 and ] sse2Shift |
| _mm_bsrli_si128⚠ |
[ x86-64 and ] sse2Shift |
| _mm_castpd_ps⚠ |
[ x86-64 and ] sse2Casts a 128-bit floating-point vector of |
| _mm_castpd_si128⚠ |
[ x86-64 and ] sse2Casts a 128-bit floating-point vector of |
| _mm_castps_pd⚠ |
[ x86-64 and ] sse2Casts a 128-bit floating-point vector of |
| _mm_castps_si128⚠ |
[ x86-64 and ] sse2Casts a 128-bit floating-point vector of |
| _mm_castsi128_pd⚠ |
[ x86-64 and ] sse2Casts a 128-bit integer vector into a 128-bit floating-point vector
of |
| _mm_castsi128_ps⚠ |
[ x86-64 and ] sse2Casts a 128-bit integer vector into a 128-bit floating-point vector
of |
| _mm_ceil_pd⚠ |
[ x86-64 and ] sse4.1Round the packed double-precision (64-bit) floating-point elements in |
| _mm_ceil_ps⚠ |
[ x86-64 and ] sse4.1Round the packed single-precision (32-bit) floating-point elements in |
| _mm_ceil_sd⚠ |
[ x86-64 and ] sse4.1Round the lower double-precision (64-bit) floating-point element in |
| _mm_ceil_ss⚠ |
[ x86-64 and ] sse4.1Round the lower single-precision (32-bit) floating-point element in |
| _mm_clflush⚠ |
[ x86-64 and ] sse2Invalidate and flush the cache line that contains |
| _mm_clmulepi64_si128⚠ |
[ x86-64 and ] pclmulqdqPerform a carry-less multiplication of two 64-bit polynomials over the finite field GF(2^k). |
| _mm_cmp_pd⚠ |
[ x86-64 and ] avx,sse2Compare packed double-precision (64-bit) floating-point
elements in |
| _mm_cmp_ps⚠ |
[ x86-64 and ] avx,sseCompare packed single-precision (32-bit) floating-point
elements in |
| _mm_cmp_sd⚠ |
[ x86-64 and ] avx,sse2Compare the lower double-precision (64-bit) floating-point element in
|
| _mm_cmp_ss⚠ |
[ x86-64 and ] avx,sseCompare the lower single-precision (32-bit) floating-point element in
|
| _mm_cmpeq_epi8⚠ |
[ x86-64 and ] sse2Compare packed 8-bit integers in |
| _mm_cmpeq_epi16⚠ |
[ x86-64 and ] sse2Compare packed 16-bit integers in |
| _mm_cmpeq_epi32⚠ |
[ x86-64 and ] sse2Compare packed 32-bit integers in |
| _mm_cmpeq_epi64⚠ |
[ x86-64 and ] sse4.1Compare packed 64-bit integers in |
| _mm_cmpeq_pd⚠ |
[ x86-64 and ] sse2Compare corresponding elements in |
| _mm_cmpeq_ps⚠ |
[ x86-64 and ] sseCompare each of the four floats in |
| _mm_cmpeq_sd⚠ |
[ x86-64 and ] sse2Return a new vector with the low element of |
| _mm_cmpeq_ss⚠ |
[ x86-64 and ] sseCompare the lowest |
| _mm_cmpestra⚠ |
[ x86-64 and ] sse4.2Compare packed strings in |
| _mm_cmpestrc⚠ |
[ x86-64 and ] sse4.2Compare packed strings in |
| _mm_cmpestri⚠ |
[ x86-64 and ] sse4.2Compare packed strings |
| _mm_cmpestrm⚠ |
[ x86-64 and ] sse4.2Compare packed strings in |
| _mm_cmpestro⚠ |
[ x86-64 and ] sse4.2Compare packed strings in |
| _mm_cmpestrs⚠ |
[ x86-64 and ] sse4.2Compare packed strings in |
| _mm_cmpestrz⚠ |
[ x86-64 and ] sse4.2Compare packed strings in |
| _mm_cmpge_pd⚠ |
[ x86-64 and ] sse2Compare corresponding elements in |
| _mm_cmpge_ps⚠ |
[ x86-64 and ] sseCompare each of the four floats in |
| _mm_cmpge_sd⚠ |
[ x86-64 and ] sse2Return a new vector with the low element of |
| _mm_cmpge_ss⚠ |
[ x86-64 and ] sseCompare the lowest |
| _mm_cmpgt_epi8⚠ |
[ x86-64 and ] sse2Compare packed 8-bit integers in |
| _mm_cmpgt_epi16⚠ |
[ x86-64 and ] sse2Compare packed 16-bit integers in |
| _mm_cmpgt_epi32⚠ |
[ x86-64 and ] sse2Compare packed 32-bit integers in |
| _mm_cmpgt_epi64⚠ |
[ x86-64 and ] sse4.2Compare packed 64-bit integers in |
| _mm_cmpgt_pd⚠ |
[ x86-64 and ] sse2Compare corresponding elements in |
| _mm_cmpgt_ps⚠ |
[ x86-64 and ] sseCompare each of the four floats in |
| _mm_cmpgt_sd⚠ |
[ x86-64 and ] sse2Return a new vector with the low element of |
| _mm_cmpgt_ss⚠ |
[ x86-64 and ] sseCompare the lowest |
| _mm_cmpistra⚠ |
[ x86-64 and ] sse4.2Compare packed strings with implicit lengths in |
| _mm_cmpistrc⚠ |
[ x86-64 and ] sse4.2Compare packed strings with implicit lengths in |
| _mm_cmpistri⚠ |
[ x86-64 and ] sse4.2Compare packed strings with implicit lengths in |
| _mm_cmpistrm⚠ |
[ x86-64 and ] sse4.2Compare packed strings with implicit lengths in |
| _mm_cmpistro⚠ |
[ x86-64 and ] sse4.2Compare packed strings with implicit lengths in |
| _mm_cmpistrs⚠ |
[ x86-64 and ] sse4.2Compare packed strings with implicit lengths in |
| _mm_cmpistrz⚠ |
[ x86-64 and ] sse4.2Compare packed strings with implicit lengths in |
| _mm_cmple_pd⚠ |
[ x86-64 and ] sse2Compare corresponding elements in |
| _mm_cmple_ps⚠ |
[ x86-64 and ] sseCompare each of the four floats in |
| _mm_cmple_sd⚠ |
[ x86-64 and ] sse2Return a new vector with the low element of |
| _mm_cmple_ss⚠ |
[ x86-64 and ] sseCompare the lowest |
| _mm_cmplt_epi8⚠ |
[ x86-64 and ] sse2Compare packed 8-bit integers in |
| _mm_cmplt_epi16⚠ |
[ x86-64 and ] sse2Compare packed 16-bit integers in |
| _mm_cmplt_epi32⚠ |
[ x86-64 and ] sse2Compare packed 32-bit integers in |
| _mm_cmplt_pd⚠ |
[ x86-64 and ] sse2Compare corresponding elements in |
| _mm_cmplt_ps⚠ |
[ x86-64 and ] sseCompare each of the four floats in |
| _mm_cmplt_sd⚠ |
[ x86-64 and ] sse2Return a new vector with the low element of |
| _mm_cmplt_ss⚠ |
[ x86-64 and ] sseCompare the lowest |
| _mm_cmpneq_pd⚠ |
[ x86-64 and ] sse2Compare corresponding elements in |
| _mm_cmpneq_ps⚠ |
[ x86-64 and ] sseCompare each of the four floats in |
| _mm_cmpneq_sd⚠ |
[ x86-64 and ] sse2Return a new vector with the low element of |
| _mm_cmpneq_ss⚠ |
[ x86-64 and ] sseCompare the lowest |
| _mm_cmpnge_pd⚠ |
[ x86-64 and ] sse2Compare corresponding elements in |
| _mm_cmpnge_ps⚠ |
[ x86-64 and ] sseCompare each of the four floats in |
| _mm_cmpnge_sd⚠ |
[ x86-64 and ] sse2Return a new vector with the low element of |
| _mm_cmpnge_ss⚠ |
[ x86-64 and ] sseCompare the lowest |
| _mm_cmpngt_pd⚠ |
[ x86-64 and ] sse2Compare corresponding elements in |
| _mm_cmpngt_ps⚠ |
[ x86-64 and ] sseCompare each of the four floats in |
| _mm_cmpngt_sd⚠ |
[ x86-64 and ] sse2Return a new vector with the low element of |
| _mm_cmpngt_ss⚠ |
[ x86-64 and ] sseCompare the lowest |
| _mm_cmpnle_pd⚠ |
[ x86-64 and ] sse2Compare corresponding elements in |
| _mm_cmpnle_ps⚠ |
[ x86-64 and ] sseCompare each of the four floats in |
| _mm_cmpnle_sd⚠ |
[ x86-64 and ] sse2Return a new vector with the low element of |
| _mm_cmpnle_ss⚠ |
[ x86-64 and ] sseCompare the lowest |
| _mm_cmpnlt_pd⚠ |
[ x86-64 and ] sse2Compare corresponding elements in |
| _mm_cmpnlt_ps⚠ |
[ x86-64 and ] sseCompare each of the four floats in |
| _mm_cmpnlt_sd⚠ |
[ x86-64 and ] sse2Return a new vector with the low element of |
| _mm_cmpnlt_ss⚠ |
[ x86-64 and ] sseCompare the lowest |
| _mm_cmpord_pd⚠ |
[ x86-64 and ] sse2Compare corresponding elements in |
| _mm_cmpord_ps⚠ |
[ x86-64 and ] sseCompare each of the four floats in |
| _mm_cmpord_sd⚠ |
[ x86-64 and ] sse2Return a new vector with the low element of |
| _mm_cmpord_ss⚠ |
[ x86-64 and ] sseCheck if the lowest |
| _mm_cmpunord_pd⚠ |
[ x86-64 and ] sse2Compare corresponding elements in |
| _mm_cmpunord_ps⚠ |
[ x86-64 and ] sseCompare each of the four floats in |
| _mm_cmpunord_sd⚠ |
[ x86-64 and ] sse2Return a new vector with the low element of |
| _mm_cmpunord_ss⚠ |
[ x86-64 and ] sseCheck if the lowest |
| _mm_comieq_sd⚠ |
[ x86-64 and ] sse2Compare the lower element of |
| _mm_comieq_ss⚠ |
[ x86-64 and ] sseCompare two 32-bit floats from the low-order bits of |
| _mm_comige_sd⚠ |
[ x86-64 and ] sse2Compare the lower element of |
| _mm_comige_ss⚠ |
[ x86-64 and ] sseCompare two 32-bit floats from the low-order bits of |
| _mm_comigt_sd⚠ |
[ x86-64 and ] sse2Compare the lower element of |
| _mm_comigt_ss⚠ |
[ x86-64 and ] sseCompare two 32-bit floats from the low-order bits of |
| _mm_comile_sd⚠ |
[ x86-64 and ] sse2Compare the lower element of |
| _mm_comile_ss⚠ |
[ x86-64 and ] sseCompare two 32-bit floats from the low-order bits of |
| _mm_comilt_sd⚠ |
[ x86-64 and ] sse2Compare the lower element of |
| _mm_comilt_ss⚠ |
[ x86-64 and ] sseCompare two 32-bit floats from the low-order bits of |
| _mm_comineq_sd⚠ |
[ x86-64 and ] sse2Compare the lower element of |
| _mm_comineq_ss⚠ |
[ x86-64 and ] sseCompare two 32-bit floats from the low-order bits of |
| _mm_crc32_u8⚠ |
[ x86-64 and ] sse4.2Starting with the initial value in |
| _mm_crc32_u16⚠ |
[ x86-64 and ] sse4.2Starting with the initial value in |
| _mm_crc32_u32⚠ |
[ x86-64 and ] sse4.2Starting with the initial value in |
| _mm_crc32_u64⚠ |
[ x86-64 and ] sse4.2Starting with the initial value in |
| _mm_cvt_si2ss⚠ |
[ x86-64 and ] sseAlias for |
| _mm_cvt_ss2si⚠ |
[ x86-64 and ] sseAlias for |
| _mm_cvtepi16_epi32⚠ |
[ x86-64 and ] sse4.1Sign extend packed 16-bit integers in |
| _mm_cvtepi16_epi64⚠ |
[ x86-64 and ] sse4.1Sign extend packed 16-bit integers in |
| _mm_cvtepi32_epi64⚠ |
[ x86-64 and ] sse4.1Sign extend packed 32-bit integers in |
| _mm_cvtepi32_pd⚠ |
[ x86-64 and ] sse2Convert the lower two packed 32-bit integers in |
| _mm_cvtepi32_ps⚠ |
[ x86-64 and ] sse2Convert packed 32-bit integers in |
| _mm_cvtepi8_epi16⚠ |
[ x86-64 and ] sse4.1Sign extend packed 8-bit integers in |
| _mm_cvtepi8_epi32⚠ |
[ x86-64 and ] sse4.1Sign extend packed 8-bit integers in |
| _mm_cvtepi8_epi64⚠ |
[ x86-64 and ] sse4.1Sign extend packed 8-bit integers in the low 8 bytes of |
| _mm_cvtepu16_epi32⚠ |
[ x86-64 and ] sse4.1Zero extend packed unsigned 16-bit integers in |
| _mm_cvtepu16_epi64⚠ |
[ x86-64 and ] sse4.1Zero extend packed unsigned 16-bit integers in |
| _mm_cvtepu32_epi64⚠ |
[ x86-64 and ] sse4.1Zero extend packed unsigned 32-bit integers in |
| _mm_cvtepu8_epi16⚠ |
[ x86-64 and ] sse4.1Zero extend packed unsigned 8-bit integers in |
| _mm_cvtepu8_epi32⚠ |
[ x86-64 and ] sse4.1Zero extend packed unsigned 8-bit integers in |
| _mm_cvtepu8_epi64⚠ |
[ x86-64 and ] sse4.1Zero extend packed unsigned 8-bit integers in |
| _mm_cvtpd_epi32⚠ |
[ x86-64 and ] sse2Convert packed double-precision (64-bit) floating-point elements in |
| _mm_cvtpd_ps⚠ |
[ x86-64 and ] sse2Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements |
| _mm_cvtps_epi32⚠ |
[ x86-64 and ] sse2Convert packed single-precision (32-bit) floating-point elements in |
| _mm_cvtps_pd⚠ |
[ x86-64 and ] sse2Convert packed single-precision (32-bit) floating-point elements in |
| _mm_cvtsd_f64⚠ |
[ x86-64 and ] sse2Return the lower double-precision (64-bit) floating-point element of "a". |
| _mm_cvtsd_si32⚠ |
[ x86-64 and ] sse2Convert the lower double-precision (64-bit) floating-point element in a to a 32-bit integer. |
| _mm_cvtsd_si64⚠ |
[ x86-64 and ] sse2Convert the lower double-precision (64-bit) floating-point element in a to a 64-bit integer. |
| _mm_cvtsd_si64x⚠ |
[ x86-64 and ] sse2Alias for |
| _mm_cvtsd_ss⚠ |
[ x86-64 and ] sse2Convert the lower double-precision (64-bit) floating-point element in |
| _mm_cvtsi128_si32⚠ |
[ x86-64 and ] sse2Return the lowest element of |
| _mm_cvtsi128_si64⚠ |
[ x86-64 and ] sse2Return the lowest element of |
| _mm_cvtsi128_si64x⚠ |
[ x86-64 and ] sse2Return the lowest element of |
| _mm_cvtsi32_sd⚠ |
[ x86-64 and ] sse2Return |
| _mm_cvtsi32_si128⚠ |
[ x86-64 and ] sse2Return a vector whose lowest element is |
| _mm_cvtsi32_ss⚠ |
[ x86-64 and ] sseConvert a 32 bit integer to a 32 bit float. The result vector is the input
vector |
| _mm_cvtsi64_sd⚠ |
[ x86-64 and ] sse2Return |
| _mm_cvtsi64_si128⚠ |
[ x86-64 and ] sse2Return a vector whose lowest element is |
| _mm_cvtsi64_ss⚠ |
[ x86-64 and ] sseConvert a 64 bit integer to a 32 bit float. The result vector is the input
vector |
| _mm_cvtsi64x_sd⚠ |
[ x86-64 and ] sse2Return |
| _mm_cvtsi64x_si128⚠ |
[ x86-64 and ] sse2Return a vector whose lowest element is |
| _mm_cvtss_f32⚠ |
[ x86-64 and ] sseExtract the lowest 32 bit float from the input vector. |
| _mm_cvtss_sd⚠ |
[ x86-64 and ] sse2Convert the lower single-precision (32-bit) floating-point element in |
| _mm_cvtss_si32⚠ |
[ x86-64 and ] sseConvert the lowest 32 bit float in the input vector to a 32 bit integer. |
| _mm_cvtss_si64⚠ |
[ x86-64 and ] sseConvert the lowest 32 bit float in the input vector to a 64 bit integer. |
| _mm_cvtt_ss2si⚠ |
[ x86-64 and ] sseAlias for |
| _mm_cvttpd_epi32⚠ |
[ x86-64 and ] sse2Convert packed double-precision (64-bit) floating-point elements in |
| _mm_cvttps_epi32⚠ |
[ x86-64 and ] sse2Convert packed single-precision (32-bit) floating-point elements in |
| _mm_cvttsd_si32⚠ |
[ x86-64 and ] sse2Convert the lower double-precision (64-bit) floating-point element in |
| _mm_cvttsd_si64⚠ |
[ x86-64 and ] sse2Convert the lower double-precision (64-bit) floating-point element in |
| _mm_cvttsd_si64x⚠ |
[ x86-64 and ] sse2Alias for |
| _mm_cvttss_si32⚠ |
[ x86-64 and ] sseConvert the lowest 32 bit float in the input vector to a 32 bit integer with truncation. |
| _mm_cvttss_si64⚠ |
[ x86-64 and ] sseConvert the lowest 32 bit float in the input vector to a 64 bit integer with truncation. |
| _mm_div_pd⚠ |
[ x86-64 and ] sse2Divide packed double-precision (64-bit) floating-point elements in |
| _mm_div_ps⚠ |
[ x86-64 and ] sseDivides __m128 vectors. |
| _mm_div_sd⚠ |
[ x86-64 and ] sse2Return a new vector with the low element of |
| _mm_div_ss⚠ |
[ x86-64 and ] sseDivides the first component of |
| _mm_dp_pd⚠ |
[ x86-64 and ] sse4.1Returns the dot product of two __m128d vectors. |
| _mm_dp_ps⚠ |
[ x86-64 and ] sse4.1Returns the dot product of two __m128 vectors. |
| _mm_extract_epi8⚠ |
[ x86-64 and ] sse4.1Extract an 8-bit integer from |
| _mm_extract_epi16⚠ |
[ x86-64 and ] sse2Return the |
| _mm_extract_epi32⚠ |
[ x86-64 and ] sse4.1Extract an 32-bit integer from |
| _mm_extract_epi64⚠ |
[ x86-64 and ] sse4.1Extract an 64-bit integer from |
| _mm_extract_ps⚠ |
[ x86-64 and ] sse4.1Extract a single-precision (32-bit) floating-point element from |
| _mm_extract_si64⚠ |
[ x86-64 and ] sse4aExtracts the bit range specified by |
| _mm_floor_pd⚠ |
[ x86-64 and ] sse4.1Round the packed double-precision (64-bit) floating-point elements in |
| _mm_floor_ps⚠ |
[ x86-64 and ] sse4.1Round the packed single-precision (32-bit) floating-point elements in |
| _mm_floor_sd⚠ |
[ x86-64 and ] sse4.1Round the lower double-precision (64-bit) floating-point element in |
| _mm_floor_ss⚠ |
[ x86-64 and ] sse4.1Round the lower single-precision (32-bit) floating-point element in |
| _mm_fmadd_pd⚠ |
[ x86-64 and ] fmaMultiply packed double-precision (64-bit) floating-point elements in |
| _mm_fmadd_ps⚠ |
[ x86-64 and ] fmaMultiply packed single-precision (32-bit) floating-point elements in |
| _mm_fmadd_sd⚠ |
[ x86-64 and ] fmaMultiply the lower double-precision (64-bit) floating-point elements in
|
| _mm_fmadd_ss⚠ |
[ x86-64 and ] fmaMultiply the lower single-precision (32-bit) floating-point elements in
|
| _mm_fmaddsub_pd⚠ |
[ x86-64 and ] fmaMultiply packed double-precision (64-bit) floating-point elements in |
| _mm_fmaddsub_ps⚠ |
[ x86-64 and ] fmaMultiply packed single-precision (32-bit) floating-point elements in |
| _mm_fmsub_pd⚠ |
[ x86-64 and ] fmaMultiply packed double-precision (64-bit) floating-point elements in |
| _mm_fmsub_ps⚠ |
[ x86-64 and ] fmaMultiply packed single-precision (32-bit) floating-point elements in |
| _mm_fmsub_sd⚠ |
[ x86-64 and ] fmaMultiply the lower double-precision (64-bit) floating-point elements in
|
| _mm_fmsub_ss⚠ |
[ x86-64 and ] fmaMultiply the lower single-precision (32-bit) floating-point elements in
|
| _mm_fmsubadd_pd⚠ |
[ x86-64 and ] fmaMultiply packed double-precision (64-bit) floating-point elements in |
| _mm_fmsubadd_ps⚠ |
[ x86-64 and ] fmaMultiply packed single-precision (32-bit) floating-point elements in |
| _mm_fnmadd_pd⚠ |
[ x86-64 and ] fmaMultiply packed double-precision (64-bit) floating-point elements in |
| _mm_fnmadd_ps⚠ |
[ x86-64 and ] fmaMultiply packed single-precision (32-bit) floating-point elements in |
| _mm_fnmadd_sd⚠ |
[ x86-64 and ] fmaMultiply the lower double-precision (64-bit) floating-point elements in
|
| _mm_fnmadd_ss⚠ |
[ x86-64 and ] fmaMultiply the lower single-precision (32-bit) floating-point elements in
|
| _mm_fnmsub_pd⚠ |
[ x86-64 and ] fmaMultiply packed double-precision (64-bit) floating-point elements in |
| _mm_fnmsub_ps⚠ |
[ x86-64 and ] fmaMultiply packed single-precision (32-bit) floating-point elements in |
| _mm_fnmsub_sd⚠ |
[ x86-64 and ] fmaMultiply the lower double-precision (64-bit) floating-point elements in
|
| _mm_fnmsub_ss⚠ |
[ x86-64 and ] fmaMultiply the lower single-precision (32-bit) floating-point elements in
|
| _mm_getcsr⚠ |
[ x86-64 and ] sseGet the unsigned 32-bit value of the MXCSR control and status register. |
| _mm_hadd_epi16⚠ |
[ x86-64 and ] ssse3Horizontally add the adjacent pairs of values contained in 2 packed
128-bit vectors of |
| _mm_hadd_epi32⚠ |
[ x86-64 and ] ssse3Horizontally add the adjacent pairs of values contained in 2 packed
128-bit vectors of |
| _mm_hadd_pd⚠ |
[ x86-64 and ] sse3Horizontally add adjacent pairs of double-precision (64-bit)
floating-point elements in |
| _mm_hadd_ps⚠ |
[ x86-64 and ] sse3Horizontally add adjacent pairs of single-precision (32-bit)
floating-point elements in |
| _mm_hadds_epi16⚠ |
[ x86-64 and ] ssse3Horizontally add the adjacent pairs of values contained in 2 packed
128-bit vectors of |
| _mm_hsub_epi16⚠ |
[ x86-64 and ] ssse3Horizontally subtract the adjacent pairs of values contained in 2
packed 128-bit vectors of |
| _mm_hsub_epi32⚠ |
[ x86-64 and ] ssse3Horizontally subtract the adjacent pairs of values contained in 2
packed 128-bit vectors of |
| _mm_hsub_pd⚠ |
[ x86-64 and ] sse3Horizontally subtract adjacent pairs of double-precision (64-bit)
floating-point elements in |
| _mm_hsub_ps⚠ |
[ x86-64 and ] sse3Horizontally add adjacent pairs of single-precision (32-bit)
floating-point elements in |
| _mm_hsubs_epi16⚠ |
[ x86-64 and ] ssse3Horizontally subtract the adjacent pairs of values contained in 2
packed 128-bit vectors of |
| _mm_i32gather_epi32⚠ |
[ x86-64 and ] avx2Return values from |
| _mm_i32gather_epi64⚠ |
[ x86-64 and ] avx2Return values from |
| _mm_i32gather_pd⚠ |
[ x86-64 and ] avx2Return values from |
| _mm_i32gather_ps⚠ |
[ x86-64 and ] avx2Return values from |
| _mm_i64gather_epi32⚠ |
[ x86-64 and ] avx2Return values from |
| _mm_i64gather_epi64⚠ |
[ x86-64 and ] avx2Return values from |
| _mm_i64gather_pd⚠ |
[ x86-64 and ] avx2Return values from |
| _mm_i64gather_ps⚠ |
[ x86-64 and ] avx2Return values from |
| _mm_insert_epi8⚠ |
[ x86-64 and ] sse4.1Return a copy of |
| _mm_insert_epi16⚠ |
[ x86-64 and ] sse2Return a new vector where the |
| _mm_insert_epi32⚠ |
[ x86-64 and ] sse4.1Return a copy of |
| _mm_insert_epi64⚠ |
[ x86-64 and ] sse4.1Return a copy of |
| _mm_insert_ps⚠ |
[ x86-64 and ] sse4.1Select a single value in |
| _mm_insert_si64⚠ |
[ x86-64 and ] sse4aInserts the |
| _mm_lddqu_si128⚠ |
[ x86-64 and ] sse3Load 128-bits of integer data from unaligned memory.
This intrinsic may perform better than |
| _mm_lfence⚠ |
[ x86-64 and ] sse2Perform a serializing operation on all load-from-memory instructions that were issued prior to this instruction. |
| _mm_load1_pd⚠ |
[ x86-64 and ] sse2Load a double-precision (64-bit) floating-point element from memory into both elements of returned vector. |
| _mm_load1_ps⚠ |
[ x86-64 and ] sseConstruct a |
| _mm_load_pd⚠ |
[ x86-64 and ] sse2Load 128-bits (composed of 2 packed double-precision (64-bit)
floating-point elements) from memory into the returned vector.
|
| _mm_load_pd1⚠ |
[ x86-64 and ] sse2Load a double-precision (64-bit) floating-point element from memory into both elements of returned vector. |
| _mm_load_ps⚠ |
[ x86-64 and ] sseLoad four |
| _mm_load_ps1⚠ |
[ x86-64 and ] sseAlias for |
| _mm_load_sd⚠ |
[ x86-64 and ] sse2Loads a 64-bit double-precision value to the low element of a 128-bit integer vector and clears the upper element. |
| _mm_load_si128⚠ |
[ x86-64 and ] sse2Load 128-bits of integer data from memory into a new vector. |
| _mm_load_ss⚠ |
[ x86-64 and ] sseConstruct a |
| _mm_loaddup_pd⚠ |
[ x86-64 and ] sse3Load a double-precision (64-bit) floating-point element from memory into both elements of return vector. |
| _mm_loadh_pd⚠ |
[ x86-64 and ] sse2Loads a double-precision value into the high-order bits of a 128-bit
vector of |
| _mm_loadl_epi64⚠ |
[ x86-64 and ] sse2Load 64-bit integer from memory into first element of returned vector. |
| _mm_loadl_pd⚠ |
[ x86-64 and ] sse2Loads a double-precision value into the low-order bits of a 128-bit
vector of |
| _mm_loadr_pd⚠ |
[ x86-64 and ] sse2Load 2 double-precision (64-bit) floating-point elements from memory into
the returned vector in reverse order. |
| _mm_loadr_ps⚠ |
[ x86-64 and ] sseLoad four |
| _mm_loadu_pd⚠ |
[ x86-64 and ] sse2Load 128-bits (composed of 2 packed double-precision (64-bit)
floating-point elements) from memory into the returned vector.
|
| _mm_loadu_ps⚠ |
[ x86-64 and ] sseLoad four |
| _mm_loadu_si128⚠ |
[ x86-64 and ] sse2Load 128-bits of integer data from memory into a new vector. |
| _mm_madd_epi16⚠ |
[ x86-64 and ] sse2Multiply and then horizontally add signed 16 bit integers in |
| _mm_maddubs_epi16⚠ |
[ x86-64 and ] ssse3Multiply corresponding pairs of packed 8-bit unsigned integer values contained in the first source operand and packed 8-bit signed integer values contained in the second source operand, add pairs of contiguous products with signed saturation, and writes the 16-bit sums to the corresponding bits in the destination. |
| _mm_mask_i32gather_epi32⚠ |
[ x86-64 and ] avx2Return values from |
| _mm_mask_i32gather_epi64⚠ |
[ x86-64 and ] avx2Return values from |
| _mm_mask_i32gather_pd⚠ |
[ x86-64 and ] avx2Return values from |
| _mm_mask_i32gather_ps⚠ |
[ x86-64 and ] avx2Return values from |
| _mm_mask_i64gather_epi32⚠ |
[ x86-64 and ] avx2Return values from |
| _mm_mask_i64gather_epi64⚠ |
[ x86-64 and ] avx2Return values from |
| _mm_mask_i64gather_pd⚠ |
[ x86-64 and ] avx2Return values from |
| _mm_mask_i64gather_ps⚠ |
[ x86-64 and ] avx2Return values from |
| _mm_maskload_epi32⚠ |
[ x86-64 and ] avx2Load packed 32-bit integers from memory pointed by |
| _mm_maskload_epi64⚠ |
[ x86-64 and ] avx2Load packed 64-bit integers from memory pointed by |
| _mm_maskload_pd⚠ |
[ x86-64 and ] avxLoad packed double-precision (64-bit) floating-point elements from memory
into result using |
| _mm_maskload_ps⚠ |
[ x86-64 and ] avxLoad packed single-precision (32-bit) floating-point elements from memory
into result using |
| _mm_maskmoveu_si128⚠ |
[ x86-64 and ] sse2Conditionally store 8-bit integer elements from |
| _mm_maskstore_epi32⚠ |
[ x86-64 and ] avx2Store packed 32-bit integers from |
| _mm_maskstore_epi64⚠ |
[ x86-64 and ] avx2Store packed 64-bit integers from |
| _mm_maskstore_pd⚠ |
[ x86-64 and ] avxStore packed double-precision (64-bit) floating-point elements from |
| _mm_maskstore_ps⚠ |
[ x86-64 and ] avxStore packed single-precision (32-bit) floating-point elements from |
| _mm_max_epi8⚠ |
[ x86-64 and ] sse4.1Compare packed 8-bit integers in |
| _mm_max_epi16⚠ |
[ x86-64 and ] sse2Compare packed 16-bit integers in |
| _mm_max_epi32⚠ |
[ x86-64 and ] sse4.1Compare packed 32-bit integers in |
| _mm_max_epu8⚠ |
[ x86-64 and ] sse2Compare packed unsigned 8-bit integers in |
| _mm_max_epu16⚠ |
[ x86-64 and ] sse4.1Compare packed unsigned 16-bit integers in |
| _mm_max_epu32⚠ |
[ x86-64 and ] sse4.1Compare packed unsigned 32-bit integers in |
| _mm_max_pd⚠ |
[ x86-64 and ] sse2Return a new vector with the maximum values from corresponding elements in
|
| _mm_max_ps⚠ |
[ x86-64 and ] sseCompare packed single-precision (32-bit) floating-point elements in |
| _mm_max_sd⚠ |
[ x86-64 and ] sse2Return a new vector with the low element of |
| _mm_max_ss⚠ |
[ x86-64 and ] sseCompare the first single-precision (32-bit) floating-point element of |
| _mm_mfence⚠ |
[ x86-64 and ] sse2Perform a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior to this instruction. |
| _mm_min_epi8⚠ |
[ x86-64 and ] sse4.1Compare packed 8-bit integers in |
| _mm_min_epi16⚠ |
[ x86-64 and ] sse2Compare packed 16-bit integers in |
| _mm_min_epi32⚠ |
[ x86-64 and ] sse4.1Compare packed 32-bit integers in |
| _mm_min_epu8⚠ |
[ x86-64 and ] sse2Compare packed unsigned 8-bit integers in |
| _mm_min_epu16⚠ |
[ x86-64 and ] sse4.1Compare packed unsigned 16-bit integers in |
| _mm_min_epu32⚠ |
[ x86-64 and ] sse4.1Compare packed unsigned 32-bit integers in |
| _mm_min_pd⚠ |
[ x86-64 and ] sse2Return a new vector with the minimum values from corresponding elements in
|
| _mm_min_ps⚠ |
[ x86-64 and ] sseCompare packed single-precision (32-bit) floating-point elements in |
| _mm_min_sd⚠ |
[ x86-64 and ] sse2Return a new vector with the low element of |
| _mm_min_ss⚠ |
[ x86-64 and ] sseCompare the first single-precision (32-bit) floating-point element of |
| _mm_minpos_epu16⚠ |
[ x86-64 and ] sse4.1Finds the minimum unsigned 16-bit element in the 128-bit __m128i vector, returning a vector containing its value in its first position, and its index in its second position; all other elements are set to zero. |
| _mm_move_epi64⚠ |
[ x86-64 and ] sse2Return a vector where the low element is extracted from |
| _mm_move_sd⚠ |
[ x86-64 and ] sse2Constructs a 128-bit floating-point vector of |
| _mm_move_ss⚠ |
[ x86-64 and ] sseReturn a |
| _mm_movedup_pd⚠ |
[ x86-64 and ] sse3Duplicate the low double-precision (64-bit) floating-point element
from |
| _mm_movehdup_ps⚠ |
[ x86-64 and ] sse3Duplicate odd-indexed single-precision (32-bit) floating-point elements
from |
| _mm_movehl_ps⚠ |
[ x86-64 and ] sseCombine higher half of |
| _mm_moveldup_ps⚠ |
[ x86-64 and ] sse3Duplicate even-indexed single-precision (32-bit) floating-point elements
from |
| _mm_movelh_ps⚠ |
[ x86-64 and ] sseCombine lower half of |
| _mm_movemask_epi8⚠ |
[ x86-64 and ] sse2Return a mask of the most significant bit of each element in |
| _mm_movemask_pd⚠ |
[ x86-64 and ] sse2Return a mask of the most significant bit of each element in |
| _mm_movemask_ps⚠ |
[ x86-64 and ] sseReturn a mask of the most significant bit of each element in |
| _mm_mpsadbw_epu8⚠ |
[ x86-64 and ] sse4.1Subtracts 8-bit unsigned integer values and computes the absolute values of the differences to the corresponding bits in the destination. Then sums of the absolute differences are returned according to the bit fields in the immediate operand. |
| _mm_mul_epi32⚠ |
[ x86-64 and ] sse4.1Multiply the low 32-bit integers from each packed 64-bit
element in |
| _mm_mul_epu32⚠ |
[ x86-64 and ] sse2Multiply the low unsigned 32-bit integers from each packed 64-bit element
in |
| _mm_mul_pd⚠ |
[ x86-64 and ] sse2Multiply packed double-precision (64-bit) floating-point elements in |
| _mm_mul_ps⚠ |
[ x86-64 and ] sseMultiplies __m128 vectors. |
| _mm_mul_sd⚠ |
[ x86-64 and ] sse2Return a new vector with the low element of |
| _mm_mul_ss⚠ |
[ x86-64 and ] sseMultiplies the first component of |
| _mm_mulhi_epi16⚠ |
[ x86-64 and ] sse2Multiply the packed 16-bit integers in |
| _mm_mulhi_epu16⚠ |
[ x86-64 and ] sse2Multiply the packed unsigned 16-bit integers in |
| _mm_mulhrs_epi16⚠ |
[ x86-64 and ] ssse3Multiply packed 16-bit signed integer values, truncate the 32-bit
product to the 18 most significant bits by right-shifting, round the
truncated value by adding 1, and write bits |
| _mm_mullo_epi16⚠ |
[ x86-64 and ] sse2Multiply the packed 16-bit integers in |
| _mm_mullo_epi32⚠ |
[ x86-64 and ] sse4.1Multiply the packed 32-bit integers in |
| _mm_or_pd⚠ |
[ x86-64 and ] sse2Compute the bitwise OR of |
| _mm_or_ps⚠ |
[ x86-64 and ] sseBitwise OR of packed single-precision (32-bit) floating-point elements. |
| _mm_or_si128⚠ |
[ x86-64 and ] sse2Compute the bitwise OR of 128 bits (representing integer data) in |
| _mm_packs_epi16⚠ |
[ x86-64 and ] sse2Convert packed 16-bit integers from |
| _mm_packs_epi32⚠ |
[ x86-64 and ] sse2Convert packed 32-bit integers from |
| _mm_packus_epi16⚠ |
[ x86-64 and ] sse2Convert packed 16-bit integers from |
| _mm_packus_epi32⚠ |
[ x86-64 and ] sse4.1Convert packed 32-bit integers from |
| _mm_pause⚠ |
[ x86-64 and ] sse2Provide a hint to the processor that the code sequence is a spin-wait loop. |
| _mm_permute_pd⚠ |
[ x86-64 and ] avx,sse2Shuffle double-precision (64-bit) floating-point elements in |
| _mm_permute_ps⚠ |
[ x86-64 and ] avx,sseShuffle single-precision (32-bit) floating-point elements in |
| _mm_permutevar_pd⚠ |
[ x86-64 and ] avxShuffle double-precision (64-bit) floating-point elements in |
| _mm_permutevar_ps⚠ |
[ x86-64 and ] avxShuffle single-precision (32-bit) floating-point elements in |
| _mm_prefetch⚠ |
[ x86-64 and ] sseFetch the cache line that contains address |
| _mm_rcp_ps⚠ |
[ x86-64 and ] sseReturn the approximate reciprocal of packed single-precision (32-bit)
floating-point elements in |
| _mm_rcp_ss⚠ |
[ x86-64 and ] sseReturn the approximate reciprocal of the first single-precision
(32-bit) floating-point element in |
| _mm_round_pd⚠ |
[ x86-64 and ] sse4.1Round the packed double-precision (64-bit) floating-point elements in |
| _mm_round_ps⚠ |
[ x86-64 and ] sse4.1Round the packed single-precision (32-bit) floating-point elements in |
| _mm_round_sd⚠ |
[ x86-64 and ] sse4.1Round the lower double-precision (64-bit) floating-point element in |
| _mm_round_ss⚠ |
[ x86-64 and ] sse4.1Round the lower single-precision (32-bit) floating-point element in |
| _mm_rsqrt_ps⚠ |
[ x86-64 and ] sseReturn the approximate reciprocal square root of packed single-precision
(32-bit) floating-point elements in |
| _mm_rsqrt_ss⚠ |
[ x86-64 and ] sseReturn the approximate reciprocal square root of the fist single-precision
(32-bit) floating-point elements in |
| _mm_sad_epu8⚠ |
[ x86-64 and ] sse2Sum the absolute differences of packed unsigned 8-bit integers. |
| _mm_set1_epi8⚠ |
[ x86-64 and ] sse2Broadcast 8-bit integer |
| _mm_set1_epi16⚠ |
[ x86-64 and ] sse2Broadcast 16-bit integer |
| _mm_set1_epi32⚠ |
[ x86-64 and ] sse2Broadcast 32-bit integer |
| _mm_set1_epi64x⚠ |
[ x86-64 and ] sse2Broadcast 64-bit integer |
| _mm_set1_pd⚠ |
[ x86-64 and ] sse2Broadcast double-precision (64-bit) floating-point value a to all elements of the return value. |
| _mm_set1_ps⚠ |
[ x86-64 and ] sseConstruct a |
| _mm_set_epi8⚠ |
[ x86-64 and ] sse2Set packed 8-bit integers with the supplied values. |
| _mm_set_epi16⚠ |
[ x86-64 and ] sse2Set packed 16-bit integers with the supplied values. |
| _mm_set_epi32⚠ |
[ x86-64 and ] sse2Set packed 32-bit integers with the supplied values. |
| _mm_set_epi64x⚠ |
[ x86-64 and ] sse2Set packed 64-bit integers with the supplied values, from highest to lowest. |
| _mm_set_pd⚠ |
[ x86-64 and ] sse2Set packed double-precision (64-bit) floating-point elements in the return value with the supplied values. |
| _mm_set_pd1⚠ |
[ x86-64 and ] sse2Broadcast double-precision (64-bit) floating-point value a to all elements of the return value. |
| _mm_set_ps⚠ |
[ x86-64 and ] sseConstruct a |
| _mm_set_ps1⚠ |
[ x86-64 and ] sseAlias for |
| _mm_set_sd⚠ |
[ x86-64 and ] sse2Copy double-precision (64-bit) floating-point element |
| _mm_set_ss⚠ |
[ x86-64 and ] sseConstruct a |
| _mm_setcsr⚠ |
[ x86-64 and ] sseSet the MXCSR register with the 32-bit unsigned integer value. |
| _mm_setr_epi8⚠ |
[ x86-64 and ] sse2Set packed 8-bit integers with the supplied values in reverse order. |
| _mm_setr_epi16⚠ |
[ x86-64 and ] sse2Set packed 16-bit integers with the supplied values in reverse order. |
| _mm_setr_epi32⚠ |
[ x86-64 and ] sse2Set packed 32-bit integers with the supplied values in reverse order. |
| _mm_setr_pd⚠ |
[ x86-64 and ] sse2Set packed double-precision (64-bit) floating-point elements in the return value with the supplied values in reverse order. |
| _mm_setr_ps⚠ |
[ x86-64 and ] sseConstruct a |
| _mm_setzero_pd⚠ |
[ x86-64 and ] sse2Returns packed double-precision (64-bit) floating-point elements with all zeros. |
| _mm_setzero_ps⚠ |
[ x86-64 and ] sseConstruct a |
| _mm_setzero_si128⚠ |
[ x86-64 and ] sse2Returns a vector with all elements set to zero. |
| _mm_sfence⚠ |
[ x86-64 and ] ssePerform a serializing operation on all store-to-memory instructions that were issued prior to this instruction. |
| _mm_sha1msg1_epu32⚠ |
[ x86-64 and ] shaPerform an intermediate calculation for the next four SHA1 message values
(unsigned 32-bit integers) using previous message values from |
| _mm_sha1msg2_epu32⚠ |
[ x86-64 and ] shaPerform the final calculation for the next four SHA1 message values
(unsigned 32-bit integers) using the intermediate result in |
| _mm_sha1nexte_epu32⚠ |
[ x86-64 and ] shaCalculate SHA1 state variable E after four rounds of operation from the
current SHA1 state variable |
| _mm_sha1rnds4_epu32⚠ |
[ x86-64 and ] shaPerform four rounds of SHA1 operation using an initial SHA1 state (A,B,C,D)
from |
| _mm_sha256msg1_epu32⚠ |
[ x86-64 and ] shaPerform an intermediate calculation for the next four SHA256 message values
(unsigned 32-bit integers) using previous message values from |
| _mm_sha256msg2_epu32⚠ |
[ x86-64 and ] shaPerform the final calculation for the next four SHA256 message values
(unsigned 32-bit integers) using previous message values from |
| _mm_sha256rnds2_epu32⚠ |
[ x86-64 and ] shaPerform 2 rounds of SHA256 operation using an initial SHA256 state
(C,D,G,H) from |
| _mm_shuffle_epi8⚠ |
[ x86-64 and ] ssse3Shuffle bytes from |
| _mm_shuffle_epi32⚠ |
[ x86-64 and ] sse2Shuffle 32-bit integers in |
| _mm_shuffle_pd⚠ |
[ x86-64 and ] sse2Constructs a 128-bit floating-point vector of |
| _mm_shuffle_ps⚠ |
[ x86-64 and ] sseShuffle packed single-precision (32-bit) floating-point elements in |
| _mm_shufflehi_epi16⚠ |
[ x86-64 and ] sse2Shuffle 16-bit integers in the high 64 bits of |
| _mm_shufflelo_epi16⚠ |
[ x86-64 and ] sse2Shuffle 16-bit integers in the low 64 bits of |
| _mm_sign_epi8⚠ |
[ x86-64 and ] ssse3Negate packed 8-bit integers in |
| _mm_sign_epi16⚠ |
[ x86-64 and ] ssse3Negate packed 16-bit integers in |
| _mm_sign_epi32⚠ |
[ x86-64 and ] ssse3Negate packed 32-bit integers in |
| _mm_sll_epi16⚠ |
[ x86-64 and ] sse2Shift packed 16-bit integers in |
| _mm_sll_epi32⚠ |
[ x86-64 and ] sse2Shift packed 32-bit integers in |
| _mm_sll_epi64⚠ |
[ x86-64 and ] sse2Shift packed 64-bit integers in |
| _mm_slli_epi16⚠ |
[ x86-64 and ] sse2Shift packed 16-bit integers in |
| _mm_slli_epi32⚠ |
[ x86-64 and ] sse2Shift packed 32-bit integers in |
| _mm_slli_epi64⚠ |
[ x86-64 and ] sse2Shift packed 64-bit integers in |
| _mm_slli_si128⚠ |
[ x86-64 and ] sse2Shift |
| _mm_sllv_epi32⚠ |
[ x86-64 and ] avx2Shift packed 32-bit integers in |
| _mm_sllv_epi64⚠ |
[ x86-64 and ] avx2Shift packed 64-bit integers in |
| _mm_sqrt_pd⚠ |
[ x86-64 and ] sse2Return a new vector with the square root of each of the values in |
| _mm_sqrt_ps⚠ |
[ x86-64 and ] sseReturn the square root of packed single-precision (32-bit) floating-point
elements in |
| _mm_sqrt_sd⚠ |
[ x86-64 and ] sse2Return a new vector with the low element of |
| _mm_sqrt_ss⚠ |
[ x86-64 and ] sseReturn the square root of the first single-precision (32-bit)
floating-point element in |
| _mm_sra_epi16⚠ |
[ x86-64 and ] sse2Shift packed 16-bit integers in |
| _mm_sra_epi32⚠ |
[ x86-64 and ] sse2Shift packed 32-bit integers in |
| _mm_srai_epi16⚠ |
[ x86-64 and ] sse2Shift packed 16-bit integers in |
| _mm_srai_epi32⚠ |
[ x86-64 and ] sse2Shift packed 32-bit integers in |
| _mm_srav_epi32⚠ |
[ x86-64 and ] avx2Shift packed 32-bit integers in |
| _mm_srl_epi16⚠ |
[ x86-64 and ] sse2Shift packed 16-bit integers in |
| _mm_srl_epi32⚠ |
[ x86-64 and ] sse2Shift packed 32-bit integers in |
| _mm_srl_epi64⚠ |
[ x86-64 and ] sse2Shift packed 64-bit integers in |
| _mm_srli_epi16⚠ |
[ x86-64 and ] sse2Shift packed 16-bit integers in |
| _mm_srli_epi32⚠ |
[ x86-64 and ] sse2Shift packed 32-bit integers in |
| _mm_srli_epi64⚠ |
[ x86-64 and ] sse2Shift packed 64-bit integers in |
| _mm_srli_si128⚠ |
[ x86-64 and ] sse2Shift |
| _mm_srlv_epi32⚠ |
[ x86-64 and ] avx2Shift packed 32-bit integers in |
| _mm_srlv_epi64⚠ |
[ x86-64 and ] avx2Shift packed 64-bit integers in |
| _mm_store1_pd⚠ |
[ x86-64 and ] sse2Store the lower double-precision (64-bit) floating-point element from |
| _mm_store1_ps⚠ |
[ x86-64 and ] sseStore the lowest 32 bit float of |
| _mm_store_pd⚠ |
[ x86-64 and ] sse2Store 128-bits (composed of 2 packed double-precision (64-bit)
floating-point elements) from |
| _mm_store_pd1⚠ |
[ x86-64 and ] sse2Store the lower double-precision (64-bit) floating-point element from |
| _mm_store_ps⚠ |
[ x86-64 and ] sseStore four 32-bit floats into aligned memory. |
| _mm_store_ps1⚠ |
[ x86-64 and ] sseAlias for |
| _mm_store_sd⚠ |
[ x86-64 and ] sse2Stores the lower 64 bits of a 128-bit vector of |
| _mm_store_si128⚠ |
[ x86-64 and ] sse2Store 128-bits of integer data from |
| _mm_store_ss⚠ |
[ x86-64 and ] sseStore the lowest 32 bit float of |
| _mm_storeh_pd⚠ |
[ x86-64 and ] sse2Stores the upper 64 bits of a 128-bit vector of |
| _mm_storel_epi64⚠ |
[ x86-64 and ] sse2Store the lower 64-bit integer |
| _mm_storel_pd⚠ |
[ x86-64 and ] sse2Stores the lower 64 bits of a 128-bit vector of |
| _mm_storer_pd⚠ |
[ x86-64 and ] sse2Store 2 double-precision (64-bit) floating-point elements from |
| _mm_storer_ps⚠ |
[ x86-64 and ] sseStore four 32-bit floats into aligned memory in reverse order. |
| _mm_storeu_pd⚠ |
[ x86-64 and ] sse2Store 128-bits (composed of 2 packed double-precision (64-bit)
floating-point elements) from |
| _mm_storeu_ps⚠ |
[ x86-64 and ] sseStore four 32-bit floats into memory. There are no restrictions on memory
alignment. For aligned memory |
| _mm_storeu_si128⚠ |
[ x86-64 and ] sse2Store 128-bits of integer data from |
| _mm_stream_pd⚠ |
[ x86-64 and ] sse2Stores a 128-bit floating point vector of |
| _mm_stream_ps⚠ |
[ x86-64 and ] sseStores |
| _mm_stream_sd⚠ |
[ x86-64 and ] sse4aNon-temporal store of |
| _mm_stream_si32⚠ |
[ x86-64 and ] sse2Stores a 32-bit integer value in the specified memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon). |
| _mm_stream_si64⚠ |
[ x86-64 and ] sse2Stores a 64-bit integer value in the specified memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon). |
| _mm_stream_si128⚠ |
[ x86-64 and ] sse2Stores a 128-bit integer vector to a 128-bit aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon). |
| _mm_stream_ss⚠ |
[ x86-64 and ] sse4aNon-temporal store of |
| _mm_sub_epi8⚠ |
[ x86-64 and ] sse2Subtract packed 8-bit integers in |
| _mm_sub_epi16⚠ |
[ x86-64 and ] sse2Subtract packed 16-bit integers in |
| _mm_sub_epi32⚠ |
[ x86-64 and ] sse2Subtract packed 32-bit integers in |
| _mm_sub_epi64⚠ |
[ x86-64 and ] sse2Subtract packed 64-bit integers in |
| _mm_sub_pd⚠ |
[ x86-64 and ] sse2Subtract packed double-precision (64-bit) floating-point elements in |
| _mm_sub_ps⚠ |
[ x86-64 and ] sseSubtracts __m128 vectors. |
| _mm_sub_sd⚠ |
[ x86-64 and ] sse2Return a new vector with the low element of |
| _mm_sub_ss⚠ |
[ x86-64 and ] sseSubtracts the first component of |
| _mm_subs_epi8⚠ |
[ x86-64 and ] sse2Subtract packed 8-bit integers in |
| _mm_subs_epi16⚠ |
[ x86-64 and ] sse2Subtract packed 16-bit integers in |
| _mm_subs_epu8⚠ |
[ x86-64 and ] sse2Subtract packed unsigned 8-bit integers in |
| _mm_subs_epu16⚠ |
[ x86-64 and ] sse2Subtract packed unsigned 16-bit integers in |
| _mm_test_all_ones⚠ |
[ x86-64 and ] sse4.1Tests whether the specified bits in |
| _mm_test_all_zeros⚠ |
[ x86-64 and ] sse4.1Tests whether the specified bits in a 128-bit integer vector are all zeros. |
| _mm_test_mix_ones_zeros⚠ |
[ x86-64 and ] sse4.1Tests whether the specified bits in a 128-bit integer vector are neither all zeros nor all ones. |
| _mm_testc_pd⚠ |
[ x86-64 and ] avxCompute the bitwise AND of 128 bits (representing double-precision (64-bit)
floating-point elements) in |
| _mm_testc_ps⚠ |
[ x86-64 and ] avxCompute the bitwise AND of 128 bits (representing single-precision (32-bit)
floating-point elements) in |
| _mm_testc_si128⚠ |
[ x86-64 and ] sse4.1Tests whether the specified bits in a 128-bit integer vector are all ones. |
| _mm_testnzc_pd⚠ |
[ x86-64 and ] avxCompute the bitwise AND of 128 bits (representing double-precision (64-bit)
floating-point elements) in |
| _mm_testnzc_ps⚠ |
[ x86-64 and ] avxCompute the bitwise AND of 128 bits (representing single-precision (32-bit)
floating-point elements) in |
| _mm_testnzc_si128⚠ |
[ x86-64 and ] sse4.1Tests whether the specified bits in a 128-bit integer vector are neither all zeros nor all ones. |
| _mm_testz_pd⚠ |
[ x86-64 and ] avxCompute the bitwise AND of 128 bits (representing double-precision (64-bit)
floating-point elements) in |
| _mm_testz_ps⚠ |
[ x86-64 and ] avxCompute the bitwise AND of 128 bits (representing single-precision (32-bit)
floating-point elements) in |
| _mm_testz_si128⚠ |
[ x86-64 and ] sse4.1Tests whether the specified bits in a 128-bit integer vector are all zeros. |
| _mm_tzcnt_32⚠ |
[ x86-64 and ] bmi1Counts the number of trailing least significant zero bits. |
| _mm_tzcnt_64⚠ |
[ x86-64 and ] bmi1Counts the number of trailing least significant zero bits. |
| _mm_ucomieq_sd⚠ |
[ x86-64 and ] sse2Compare the lower element of |
| _mm_ucomieq_ss⚠ |
[ x86-64 and ] sseCompare two 32-bit floats from the low-order bits of |
| _mm_ucomige_sd⚠ |
[ x86-64 and ] sse2Compare the lower element of |
| _mm_ucomige_ss⚠ |
[ x86-64 and ] sseCompare two 32-bit floats from the low-order bits of |
| _mm_ucomigt_sd⚠ |
[ x86-64 and ] sse2Compare the lower element of |
| _mm_ucomigt_ss⚠ |
[ x86-64 and ] sseCompare two 32-bit floats from the low-order bits of |
| _mm_ucomile_sd⚠ |
[ x86-64 and ] sse2Compare the lower element of |
| _mm_ucomile_ss⚠ |
[ x86-64 and ] sseCompare two 32-bit floats from the low-order bits of |
| _mm_ucomilt_sd⚠ |
[ x86-64 and ] sse2Compare the lower element of |
| _mm_ucomilt_ss⚠ |
[ x86-64 and ] sseCompare two 32-bit floats from the low-order bits of |
| _mm_ucomineq_sd⚠ |
[ x86-64 and ] sse2Compare the lower element of |
| _mm_ucomineq_ss⚠ |
[ x86-64 and ] sseCompare two 32-bit floats from the low-order bits of |
| _mm_undefined_pd⚠ |
[ x86-64 and ] sse2Return vector of type __m128d with undefined elements. |
| _mm_undefined_ps⚠ |
[ x86-64 and ] sseReturn vector of type __m128 with undefined elements. |
| _mm_undefined_si128⚠ |
[ x86-64 and ] sse2Return vector of type __m128i with undefined elements. |
| _mm_unpackhi_epi8⚠ |
[ x86-64 and ] sse2Unpack and interleave 8-bit integers from the high half of |
| _mm_unpackhi_epi16⚠ |
[ x86-64 and ] sse2Unpack and interleave 16-bit integers from the high half of |
| _mm_unpackhi_epi32⚠ |
[ x86-64 and ] sse2Unpack and interleave 32-bit integers from the high half of |
| _mm_unpackhi_epi64⚠ |
[ x86-64 and ] sse2Unpack and interleave 64-bit integers from the high half of |
| _mm_unpackhi_pd⚠ |
[ x86-64 and ] sse2The resulting |
| _mm_unpackhi_ps⚠ |
[ x86-64 and ] sseUnpack and interleave single-precision (32-bit) floating-point elements
from the higher half of |
| _mm_unpacklo_epi8⚠ |
[ x86-64 and ] sse2Unpack and interleave 8-bit integers from the low half of |
| _mm_unpacklo_epi16⚠ |
[ x86-64 and ] sse2Unpack and interleave 16-bit integers from the low half of |
| _mm_unpacklo_epi32⚠ |
[ x86-64 and ] sse2Unpack and interleave 32-bit integers from the low half of |
| _mm_unpacklo_epi64⚠ |
[ x86-64 and ] sse2Unpack and interleave 64-bit integers from the low half of |
| _mm_unpacklo_pd⚠ |
[ x86-64 and ] sse2The resulting |
| _mm_unpacklo_ps⚠ |
[ x86-64 and ] sseUnpack and interleave single-precision (32-bit) floating-point elements
from the lower half of |
| _mm_xor_pd⚠ |
[ x86-64 and ] sse2Compute the bitwise OR of |
| _mm_xor_ps⚠ |
[ x86-64 and ] sseBitwise exclusive OR of packed single-precision (32-bit) floating-point elements. |
| _mm_xor_si128⚠ |
[ x86-64 and ] sse2Compute the bitwise XOR of 128 bits (representing integer data) in |
| _mulx_u32⚠ |
[ x86-64 and ] bmi2Unsigned multiply without affecting flags. |
| _mulx_u64⚠ |
[ x86-64 and ] bmi2Unsigned multiply without affecting flags. |
| _pdep_u32⚠ |
[ x86-64 and ] bmi2Scatter contiguous low order bits of |
| _pdep_u64⚠ |
[ x86-64 and ] bmi2Scatter contiguous low order bits of |
| _pext_u32⚠ |
[ x86-64 and ] bmi2Gathers the bits of |
| _pext_u64⚠ |
[ x86-64 and ] bmi2Gathers the bits of |
| _popcnt32⚠ |
[ x86-64 and ] popcntCounts the bits that are set. |
| _popcnt64⚠ |
[ x86-64 and ] popcntCounts the bits that are set. |
| _rdrand16_step⚠ |
[ x86-64 and ] rdrandRead a hardware generated 16-bit random value and store the result in val. Return 1 if a random value was generated, and 0 otherwise. |
| _rdrand32_step⚠ |
[ x86-64 and ] rdrandRead a hardware generated 32-bit random value and store the result in val. Return 1 if a random value was generated, and 0 otherwise. |
| _rdrand64_step⚠ |
[ x86-64 and ] rdrandRead a hardware generated 64-bit random value and store the result in val. Return 1 if a random value was generated, and 0 otherwise. |
| _rdseed16_step⚠ |
[ x86-64 and ] rdseedRead a 16-bit NIST SP800-90B and SP800-90C compliant random value and store in val. Return 1 if a random value was generated, and 0 otherwise. |
| _rdseed32_step⚠ |
[ x86-64 and ] rdseedRead a 32-bit NIST SP800-90B and SP800-90C compliant random value and store in val. Return 1 if a random value was generated, and 0 otherwise. |
| _rdseed64_step⚠ |
[ x86-64 and ] rdseedRead a 64-bit NIST SP800-90B and SP800-90C compliant random value and store in val. Return 1 if a random value was generated, and 0 otherwise. |
| _rdtsc⚠ |
[ x86-64 ] Reads the current value of the processor’s time-stamp counter. |
| _t1mskc_u32⚠ |
[ x86-64 and ] tbmClears all bits below the least significant zero of |
| _t1mskc_u64⚠ |
[ x86-64 and ] tbmClears all bits below the least significant zero of |
| _tzcnt_u32⚠ |
[ x86-64 and ] bmi1Counts the number of trailing least significant zero bits. |
| _tzcnt_u64⚠ |
[ x86-64 and ] bmi1Counts the number of trailing least significant zero bits. |
| _tzmsk_u32⚠ |
[ x86-64 and ] tbmSets all bits below the least significant one of |
| _tzmsk_u64⚠ |
[ x86-64 and ] tbmSets all bits below the least significant one of |
| _xgetbv⚠ |
[ x86-64 and ] xsaveReads the contents of the extended control register |
| _xrstor⚠ |
[ x86-64 and ] xsavePerform a full or partial restore of the enabled processor states using
the state information stored in memory at |
| _xrstor64⚠ |
[ x86-64 and ] xsavePerform a full or partial restore of the enabled processor states using
the state information stored in memory at |
| _xrstors⚠ |
[ x86-64 and ] xsave,xsavesPerform a full or partial restore of the enabled processor states using the
state information stored in memory at |
| _xrstors64⚠ |
[ x86-64 and ] xsave,xsavesPerform a full or partial restore of the enabled processor states using the
state information stored in memory at |
| _xsave⚠ |
[ x86-64 and ] xsavePerform a full or partial save of the enabled processor states to memory at
|
| _xsave64⚠ |
[ x86-64 and ] xsavePerform a full or partial save of the enabled processor states to memory at
|
| _xsavec⚠ |
[ x86-64 and ] xsave,xsavecPerform a full or partial save of the enabled processor states to memory
at |
| _xsavec64⚠ |
[ x86-64 and ] xsave,xsavecPerform a full or partial save of the enabled processor states to memory
at |
| _xsaveopt⚠ |
[ x86-64 and ] xsave,xsaveoptPerform a full or partial save of the enabled processor states to memory at
|
| _xsaveopt64⚠ |
[ x86-64 and ] xsave,xsaveoptPerform a full or partial save of the enabled processor states to memory at
|
| _xsaves⚠ |
[ x86-64 and ] xsave,xsavesPerform a full or partial save of the enabled processor states to memory at
|
| _xsaves64⚠ |
[ x86-64 and ] xsave,xsavesPerform a full or partial save of the enabled processor states to memory at
|
| _xsetbv⚠ |
[ x86-64 and ] xsaveCopy 64-bits from |
| _MM_SHUFFLE |
[ Experimental ] [x86-64 ] A utility function for creating masks to use with Intel shuffle and permute intrinsics. |
| _m_maskmovq⚠ |
[ Experimental ] [x86-64 and ] sse,mmxConditionally copies the values from each 8-bit element in the first 64-bit integer vector operand to the specified memory location, as specified by the most significant bit in the corresponding element in the second 64-bit integer vector operand. |
| _m_paddb⚠ |
[ Experimental ] [x86-64 and ] mmxAdd packed 8-bit integers in |
| _m_paddd⚠ |
[ Experimental ] [x86-64 and ] mmxAdd packed 32-bit integers in |
| _m_paddsb⚠ |
[ Experimental ] [x86-64 and ] mmxAdd packed 8-bit integers in |
| _m_paddsw⚠ |
[ Experimental ] [x86-64 and ] mmxAdd packed 16-bit integers in |
| _m_paddusb⚠ |
[ Experimental ] [x86-64 and ] mmxAdd packed unsigned 8-bit integers in |
| _m_paddusw⚠ |
[ Experimental ] [x86-64 and ] mmxAdd packed unsigned 16-bit integers in |
| _m_paddw⚠ |
[ Experimental ] [x86-64 and ] mmxAdd packed 16-bit integers in |
| _m_pavgb⚠ |
[ Experimental ] [x86-64 and ] sse,mmxComputes the rounded averages of the packed unsigned 8-bit integer values and writes the averages to the corresponding bits in the destination. |
| _m_pavgw⚠ |
[ Experimental ] [x86-64 and ] sse,mmxComputes the rounded averages of the packed unsigned 16-bit integer values and writes the averages to the corresponding bits in the destination. |
| _m_pextrw⚠ |
[ Experimental ] [x86-64 and ] sse,mmxExtracts 16-bit element from a 64-bit vector of |
| _m_pinsrw⚠ |
[ Experimental ] [x86-64 and ] sse,mmxCopies data from the 64-bit vector of |
| _m_pmaxsw⚠ |
[ Experimental ] [x86-64 and ] sse,mmxCompares the packed 16-bit signed integers of |
| _m_pmaxub⚠ |
[ Experimental ] [x86-64 and ] sse,mmxCompares the packed 8-bit signed integers of |
| _m_pminsw⚠ |
[ Experimental ] [x86-64 and ] sse,mmxCompares the packed 16-bit signed integers of |
| _m_pminub⚠ |
[ Experimental ] [x86-64 and ] sse,mmxCompares the packed 8-bit signed integers of |
| _m_pmovmskb⚠ |
[ Experimental ] [x86-64 and ] sse,mmxTakes the most significant bit from each 8-bit element in a 64-bit integer vector to create a 16-bit mask value. Zero-extends the value to 32-bit integer and writes it to the destination. |
| _m_pmulhuw⚠ |
[ Experimental ] [x86-64 and ] sse,mmxMultiplies packed 16-bit unsigned integer values and writes the high-order 16 bits of each 32-bit product to the corresponding bits in the destination. |
| _m_psadbw⚠ |
[ Experimental ] [x86-64 and ] sse,mmxSubtracts the corresponding 8-bit unsigned integer values of the two
64-bit vector operands and computes the absolute value for each of the
difference. Then sum of the 8 absolute differences is written to the
bits |
| _m_pshufw⚠ |
[ Experimental ] [x86-64 and ] sse,mmxShuffles the 4 16-bit integers from a 64-bit integer vector to the destination, as specified by the immediate value operand. |
| _m_psubb⚠ |
[ Experimental ] [x86-64 and ] mmxSubtract packed 8-bit integers in |
| _m_psubd⚠ |
[ Experimental ] [x86-64 and ] mmxSubtract packed 32-bit integers in |
| _m_psubsb⚠ |
[ Experimental ] [x86-64 and ] mmxSubtract packed 8-bit integers in |
| _m_psubsw⚠ |
[ Experimental ] [x86-64 and ] mmxSubtract packed 16-bit integers in |
| _m_psubusb⚠ |
[ Experimental ] [x86-64 and ] mmxSubtract packed unsigned 8-bit integers in |
| _m_psubusw⚠ |
[ Experimental ] [x86-64 and ] mmxSubtract packed unsigned 16-bit integers in |
| _m_psubw⚠ |
[ Experimental ] [x86-64 and ] mmxSubtract packed 16-bit integers in |
| _mm_abs_pi8⚠ |
[ Experimental ] [x86-64 and ] ssse3,mmxCompute the absolute value of packed 8-bit integers in |
| _mm_abs_pi16⚠ |
[ Experimental ] [x86-64 and ] ssse3,mmxCompute the absolute value of packed 8-bit integers in |
| _mm_abs_pi32⚠ |
[ Experimental ] [x86-64 and ] ssse3,mmxCompute the absolute value of packed 32-bit integers in |
| _mm_add_pi8⚠ |
[ Experimental ] [x86-64 and ] mmxAdd packed 8-bit integers in |
| _mm_add_pi16⚠ |
[ Experimental ] [x86-64 and ] mmxAdd packed 16-bit integers in |
| _mm_add_pi32⚠ |
[ Experimental ] [x86-64 and ] mmxAdd packed 32-bit integers in |
| _mm_add_si64⚠ |
[ Experimental ] [x86-64 and ] sse2,mmxAdds two signed or unsigned 64-bit integer values, returning the lower 64 bits of the sum. |
| _mm_adds_pi8⚠ |
[ Experimental ] [x86-64 and ] mmxAdd packed 8-bit integers in |
| _mm_adds_pi16⚠ |
[ Experimental ] [x86-64 and ] mmxAdd packed 16-bit integers in |
| _mm_adds_pu8⚠ |
[ Experimental ] [x86-64 and ] mmxAdd packed unsigned 8-bit integers in |
| _mm_adds_pu16⚠ |
[ Experimental ] [x86-64 and ] mmxAdd packed unsigned 16-bit integers in |
| _mm_alignr_pi8⚠ |
[ Experimental ] [x86-64 and ] ssse3,mmxConcatenates the two 64-bit integer vector operands, and right-shifts the result by the number of bytes specified in the immediate operand. |
| _mm_avg_pu8⚠ |
[ Experimental ] [x86-64 and ] sse,mmxComputes the rounded averages of the packed unsigned 8-bit integer values and writes the averages to the corresponding bits in the destination. |
| _mm_avg_pu16⚠ |
[ Experimental ] [x86-64 and ] sse,mmxComputes the rounded averages of the packed unsigned 16-bit integer values and writes the averages to the corresponding bits in the destination. |
| _mm_cmpgt_pi8⚠ |
[ Experimental ] [x86-64 and ] mmxCompares whether each element of |
| _mm_cmpgt_pi16⚠ |
[ Experimental ] [x86-64 and ] mmxCompares whether each element of |
| _mm_cmpgt_pi32⚠ |
[ Experimental ] [x86-64 and ] mmxCompares whether each element of |
| _mm_cvt_pi2ps⚠ |
[ Experimental ] [x86-64 and ] sse,mmxConverts two elements of a 64-bit vector of |
| _mm_cvt_ps2pi⚠ |
[ Experimental ] [x86-64 and ] sse,mmxConvert the two lower packed single-precision (32-bit) floating-point
elements in |
| _mm_cvtpd_pi32⚠ |
[ Experimental ] [x86-64 and ] sse2,mmxConverts the two double-precision floating-point elements of a
128-bit vector of |
| _mm_cvtpi16_ps⚠ |
[ Experimental ] [x86-64 and ] sse,mmxConverts a 64-bit vector of |
| _mm_cvtpi32_pd⚠ |
[ Experimental ] [x86-64 and ] sse2,mmxConverts the two signed 32-bit integer elements of a 64-bit vector of
|
| _mm_cvtpi32_ps⚠ |
[ Experimental ] [x86-64 and ] sse,mmxConverts two elements of a 64-bit vector of |
| _mm_cvtpi32x2_ps⚠ |
[ Experimental ] [x86-64 and ] sse,mmxConverts the two 32-bit signed integer values from each 64-bit vector
operand of |
| _mm_cvtpi8_ps⚠ |
[ Experimental ] [x86-64 and ] sse,mmxConverts the lower 4 8-bit values of |
| _mm_cvtps_pi8⚠ |
[ Experimental ] [x86-64 and ] sse,mmxConvert packed single-precision (32-bit) floating-point elements in |
| _mm_cvtps_pi16⚠ |
[ Experimental ] [x86-64 and ] sse,mmxConvert packed single-precision (32-bit) floating-point elements in |
| _mm_cvtps_pi32⚠ |
[ Experimental ] [x86-64 and ] sse,mmxConvert the two lower packed single-precision (32-bit) floating-point
elements in |
| _mm_cvtpu16_ps⚠ |
[ Experimental ] [x86-64 and ] sse,mmxConverts a 64-bit vector of |
| _mm_cvtpu8_ps⚠ |
[ Experimental ] [x86-64 and ] sse,mmxConverts the lower 4 8-bit values of |
| _mm_cvtt_ps2pi⚠ |
[ Experimental ] [x86-64 and ] sse,mmxConvert the two lower packed single-precision (32-bit) floating-point
elements in |
| _mm_cvttpd_pi32⚠ |
[ Experimental ] [x86-64 and ] sse2,mmxConverts the two double-precision floating-point elements of a
128-bit vector of |
| _mm_cvttps_pi32⚠ |
[ Experimental ] [x86-64 and ] sse,mmxConvert the two lower packed single-precision (32-bit) floating-point
elements in |
| _mm_extract_pi16⚠ |
[ Experimental ] [x86-64 and ] sse,mmxExtracts 16-bit element from a 64-bit vector of |
| _mm_hadd_pi16⚠ |
[ Experimental ] [x86-64 and ] ssse3,mmxHorizontally add the adjacent pairs of values contained in 2 packed
64-bit vectors of |
| _mm_hadd_pi32⚠ |
[ Experimental ] [x86-64 and ] ssse3,mmxHorizontally add the adjacent pairs of values contained in 2 packed
64-bit vectors of |
| _mm_hadds_pi16⚠ |
[ Experimental ] [x86-64 and ] ssse3,mmxHorizontally add the adjacent pairs of values contained in 2 packed
64-bit vectors of |
| _mm_hsub_pi16⚠ |
[ Experimental ] [x86-64 and ] ssse3,mmxHorizontally subtracts the adjacent pairs of values contained in 2
packed 64-bit vectors of |
| _mm_hsub_pi32⚠ |
[ Experimental ] [x86-64 and ] ssse3,mmxHorizontally subtracts the adjacent pairs of values contained in 2
packed 64-bit vectors of |
| _mm_hsubs_pi16⚠ |
[ Experimental ] [x86-64 and ] ssse3,mmxHorizontally subtracts the adjacent pairs of values contained in 2
packed 64-bit vectors of |
| _mm_insert_pi16⚠ |
[ Experimental ] [x86-64 and ] sse,mmxCopies data from the 64-bit vector of |
| _mm_loadh_pi⚠ |
[ Experimental ] [x86-64 and ] sseSet the upper two single-precision floating-point values with 64 bits of
data loaded from the address |
| _mm_loadl_pi⚠ |
[ Experimental ] [x86-64 and ] sseLoad two floats from |
| _mm_maddubs_pi16⚠ |
[ Experimental ] [x86-64 and ] ssse3,mmxMultiplies corresponding pairs of packed 8-bit unsigned integer values contained in the first source operand and packed 8-bit signed integer values contained in the second source operand, adds pairs of contiguous products with signed saturation, and writes the 16-bit sums to the corresponding bits in the destination. |
| _mm_maskmove_si64⚠ |
[ Experimental ] [x86-64 and ] sse,mmxConditionally copies the values from each 8-bit element in the first 64-bit integer vector operand to the specified memory location, as specified by the most significant bit in the corresponding element in the second 64-bit integer vector operand. |
| _mm_max_pi16⚠ |
[ Experimental ] [x86-64 and ] sse,mmxCompares the packed 16-bit signed integers of |
| _mm_max_pu8⚠ |
[ Experimental ] [x86-64 and ] sse,mmxCompares the packed 8-bit signed integers of |
| _mm_min_pi16⚠ |
[ Experimental ] [x86-64 and ] sse,mmxCompares the packed 16-bit signed integers of |
| _mm_min_pu8⚠ |
[ Experimental ] [x86-64 and ] sse,mmxCompares the packed 8-bit signed integers of |
| _mm_movemask_pi8⚠ |
[ Experimental ] [x86-64 and ] sse,mmxTakes the most significant bit from each 8-bit element in a 64-bit integer vector to create a 16-bit mask value. Zero-extends the value to 32-bit integer and writes it to the destination. |
| _mm_movepi64_pi64⚠ |
[ Experimental ] [x86-64 and ] sse2,mmxReturns the lower 64 bits of a 128-bit integer vector as a 64-bit integer. |
| _mm_movpi64_epi64⚠ |
[ Experimental ] [x86-64 and ] sse2,mmxMoves the 64-bit operand to a 128-bit integer vector, zeroing the upper bits. |
| _mm_mul_su32⚠ |
[ Experimental ] [x86-64 and ] sse2,mmxMultiplies 32-bit unsigned integer values contained in the lower bits of the two 64-bit integer vectors and returns the 64-bit unsigned product. |
| _mm_mulhi_pu16⚠ |
[ Experimental ] [x86-64 and ] sse,mmxMultiplies packed 16-bit unsigned integer values and writes the high-order 16 bits of each 32-bit product to the corresponding bits in the destination. |
| _mm_mulhrs_pi16⚠ |
[ Experimental ] [x86-64 and ] ssse3,mmxMultiplies packed 16-bit signed integer values, truncates the 32-bit
products to the 18 most significant bits by right-shifting, rounds the
truncated value by adding 1, and writes bits |
| _mm_mullo_pi16⚠ |
[ Experimental ] [x86-64 and ] sse,mmxMultiplies packed 16-bit integer values and writes the low-order 16 bits of each 32-bit product to the corresponding bits in the destination. |
| _mm_packs_pi16⚠ |
[ Experimental ] [x86-64 and ] mmxConvert packed 16-bit integers from |
| _mm_packs_pi32⚠ |
[ Experimental ] [x86-64 and ] mmxConvert packed 32-bit integers from |
| _mm_sad_pu8⚠ |
[ Experimental ] [x86-64 and ] sse,mmxSubtracts the corresponding 8-bit unsigned integer values of the two
64-bit vector operands and computes the absolute value for each of the
difference. Then sum of the 8 absolute differences is written to the
bits |
| _mm_set1_epi64⚠ |
[ Experimental ] [x86-64 and ] sse2,mmxInitializes both values in a 128-bit vector of |
| _mm_set1_pi8⚠ |
[ Experimental ] [x86-64 and ] mmxBroadcast 8-bit integer a to all all elements of dst. |
| _mm_set1_pi16⚠ |
[ Experimental ] [x86-64 and ] mmxBroadcast 16-bit integer a to all all elements of dst. |
| _mm_set1_pi32⚠ |
[ Experimental ] [x86-64 and ] mmxBroadcast 32-bit integer a to all all elements of dst. |
| _mm_set_epi64⚠ |
[ Experimental ] [x86-64 and ] sse2,mmxInitializes both 64-bit values in a 128-bit vector of |
| _mm_set_pi8⚠ |
[ Experimental ] [x86-64 and ] mmxSet packed 8-bit integers in dst with the supplied values. |
| _mm_set_pi16⚠ |
[ Experimental ] [x86-64 and ] mmxSet packed 16-bit integers in dst with the supplied values. |
| _mm_set_pi32⚠ |
[ Experimental ] [x86-64 and ] mmxSet packed 32-bit integers in dst with the supplied values. |
| _mm_setr_epi64⚠ |
[ Experimental ] [x86-64 and ] sse2,mmxConstructs a 128-bit integer vector, initialized in reverse order with the specified 64-bit integral values. |
| _mm_setr_pi8⚠ |
[ Experimental ] [x86-64 and ] mmxSet packed 8-bit integers in dst with the supplied values in reverse order. |
| _mm_setr_pi16⚠ |
[ Experimental ] [x86-64 and ] mmxSet packed 16-bit integers in dst with the supplied values in reverse order. |
| _mm_setr_pi32⚠ |
[ Experimental ] [x86-64 and ] mmxSet packed 32-bit integers in dst with the supplied values in reverse order. |
| _mm_setzero_si64⚠ |
[ Experimental ] [x86-64 and ] mmxConstructs a 64-bit integer vector initialized to zero. |
| _mm_shuffle_pi8⚠ |
[ Experimental ] [x86-64 and ] ssse3,mmxShuffle packed 8-bit integers in |
| _mm_shuffle_pi16⚠ |
[ Experimental ] [x86-64 and ] sse,mmxShuffles the 4 16-bit integers from a 64-bit integer vector to the destination, as specified by the immediate value operand. |
| _mm_sign_pi8⚠ |
[ Experimental ] [x86-64 and ] ssse3,mmxNegate packed 8-bit integers in |
| _mm_sign_pi16⚠ |
[ Experimental ] [x86-64 and ] ssse3,mmxNegate packed 16-bit integers in |
| _mm_sign_pi32⚠ |
[ Experimental ] [x86-64 and ] ssse3,mmxNegate packed 32-bit integers in |
| _mm_storeh_pi⚠ |
[ Experimental ] [x86-64 and ] sseStore the upper half of |
| _mm_storel_pi⚠ |
[ Experimental ] [x86-64 and ] sseStore the lower half of |
| _mm_stream_pi⚠ |
[ Experimental ] [x86-64 and ] sse,mmxStore 64-bits of integer data from a into memory using a non-temporal memory hint. |
| _mm_sub_pi8⚠ |
[ Experimental ] [x86-64 and ] mmxSubtract packed 8-bit integers in |
| _mm_sub_pi16⚠ |
[ Experimental ] [x86-64 and ] mmxSubtract packed 16-bit integers in |
| _mm_sub_pi32⚠ |
[ Experimental ] [x86-64 and ] mmxSubtract packed 32-bit integers in |
| _mm_sub_si64⚠ |
[ Experimental ] [x86-64 and ] sse2,mmxSubtracts signed or unsigned 64-bit integer values and writes the difference to the corresponding bits in the destination. |
| _mm_subs_pi8⚠ |
[ Experimental ] [x86-64 and ] mmxSubtract packed 8-bit integers in |
| _mm_subs_pi16⚠ |
[ Experimental ] [x86-64 and ] mmxSubtract packed 16-bit integers in |
| _mm_subs_pu8⚠ |
[ Experimental ] [x86-64 and ] mmxSubtract packed unsigned 8-bit integers in |
| _mm_subs_pu16⚠ |
[ Experimental ] [x86-64 and ] mmxSubtract packed unsigned 16-bit integers in |
| _mm_unpackhi_pi8⚠ |
[ Experimental ] [x86-64 and ] mmxUnpacks the upper four elements from two |
| _mm_unpackhi_pi16⚠ |
[ Experimental ] [x86-64 and ] mmxUnpacks the upper two elements from two |
| _mm_unpackhi_pi32⚠ |
[ Experimental ] [x86-64 and ] mmxUnpacks the upper element from two |
| _mm_unpacklo_pi8⚠ |
[ Experimental ] [x86-64 and ] mmxUnpacks the lower four elements from two |
| _mm_unpacklo_pi16⚠ |
[ Experimental ] [x86-64 and ] mmxUnpacks the lower two elements from two |
| _mm_unpacklo_pi32⚠ |
[ Experimental ] [x86-64 and ] mmxUnpacks the lower element from two |
| has_cpuid |
[ Experimental ] [x86-64 ] Does the host support the |