<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Fast SSE Select Operation</title>
	<atom:link href="http://mark.santaniello.com/archives/315/feed" rel="self" type="application/rss+xml" />
	<link>http://mark.santaniello.com/archives/315</link>
	<description>the body of a very slow loop</description>
	<pubDate>Wed, 07 Jan 2009 01:49:05 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: Revisiting Fast SSE Select at mark++</title>
		<link>http://mark.santaniello.com/archives/315#comment-42014</link>
		<dc:creator>Revisiting Fast SSE Select at mark++</dc:creator>
		<pubDate>Tue, 08 Apr 2008 22:51:08 +0000</pubDate>
		<guid isPermaLink="false">http://mark.santaniello.net/archives/315#comment-42014</guid>
		<description>[...] started thinking about this SSE select operation again, probably round about the time I learned of AMD&#8217;s SSEPlus and its [...]</description>
		<content:encoded><![CDATA[<p>[...] started thinking about this SSE select operation again, probably round about the time I learned of AMD&#8217;s SSEPlus and its [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark</title>
		<link>http://mark.santaniello.com/archives/315#comment-28034</link>
		<dc:creator>Mark</dc:creator>
		<pubDate>Thu, 05 Apr 2007 19:49:25 +0000</pubDate>
		<guid isPermaLink="false">http://mark.santaniello.net/archives/315#comment-28034</guid>
		<description>Wendy helped me to understand why my code is *not* susceptible to the problem you describe, Kent.  The answer does not have much of anything to do with "const", despite what I said previously.

The reason my version is safe is quite simply that my _mm_sel_ps_xor function does not ever modify any of its parameters.

Sure, the const qualification helps prevent a mistake, but as long as I am careful not to modify my reference parameters, I'm golden.

Insert head-slapping "duh" here.  

For illustrative purposes, here's a version that is vulnerable to the problem you describe:

&lt;pre&gt;&lt;code&gt;
_mm_sel_ps_xor_BAD_BAD_BAD( __m128&#038; a, __m128&#038; b, const __m128&#038; mask)
{
    b = _mm_xor_ps( b, a );
    return _mm_xor_ps( a, _mm_and_ps( mask, b ) );
}
&lt;/code&gt;&lt;/pre&gt;</description>
		<content:encoded><![CDATA[<p>Wendy helped me to understand why my code is *not* susceptible to the problem you describe, Kent.  The answer does not have much of anything to do with &#8220;const&#8221;, despite what I said previously.</p>
<p>The reason my version is safe is quite simply that my _mm_sel_ps_xor function does not ever modify any of its parameters.</p>
<p>Sure, the const qualification helps prevent a mistake, but as long as I am careful not to modify my reference parameters, I&#8217;m golden.</p>
<p>Insert head-slapping &#8220;duh&#8221; here.  </p>
<p>For illustrative purposes, here&#8217;s a version that is vulnerable to the problem you describe:</p>
<pre><code>
_mm_sel_ps_xor_BAD_BAD_BAD( __m128&#038; a, __m128&#038; b, const __m128&#038; mask)
{
    b = _mm_xor_ps( b, a );
    return _mm_xor_ps( a, _mm_and_ps( mask, b ) );
}
</code></pre>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark</title>
		<link>http://mark.santaniello.com/archives/315#comment-27755</link>
		<dc:creator>Mark</dc:creator>
		<pubDate>Thu, 05 Apr 2007 05:32:34 +0000</pubDate>
		<guid isPermaLink="false">http://mark.santaniello.net/archives/315#comment-27755</guid>
		<description>I think my version is OK with const references, right?  The compiler must not modify either of the source memory locations.</description>
		<content:encoded><![CDATA[<p>I think my version is OK with const references, right?  The compiler must not modify either of the source memory locations.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kent</title>
		<link>http://mark.santaniello.com/archives/315#comment-27654</link>
		<dc:creator>Kent</dc:creator>
		<pubDate>Thu, 05 Apr 2007 00:51:55 +0000</pubDate>
		<guid isPermaLink="false">http://mark.santaniello.net/archives/315#comment-27654</guid>
		<description>Hey bud,

I love this optimization too, but it's not perfect.  It has a tiny flaw if you pass two references to the same memory location.  If that happens the xor's will wipe out all bits, no matter the mask.  Ideally, if you want to merge the same memory location, you want to get the source back unmodified.  You can only use the xor approach if you can guarantee that &#38;a != &#38;b.

If you can't, you need to use the andnot approach.</description>
		<content:encoded><![CDATA[<p>Hey bud,</p>
<p>I love this optimization too, but it&#8217;s not perfect.  It has a tiny flaw if you pass two references to the same memory location.  If that happens the xor&#8217;s will wipe out all bits, no matter the mask.  Ideally, if you want to merge the same memory location, you want to get the source back unmodified.  You can only use the xor approach if you can guarantee that &amp;a != &amp;b.</p>
<p>If you can&#8217;t, you need to use the andnot approach.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
