Re: Strange: Rosetta faster than M1

Quincey Morris

OK, this is embarrassing. I’m so used to looking at Swift code these days, that my brain just automatically translated your code into Swift, and I never realized it was C. So the Swift forums aren’t where I should have suggested — although I see over there that you might be gettting some kind of answer anyway. It’s an interesting question.

Does this mean that Swift is some kind of cult that has taken over my brain? Am I auto translating the entire world into Swift? Can I get rehab for this?

Admittedly, the people who might be interested in this overlap pretty well with Swift engineers, but I’m sorry I didn’t make a better suggestion.

On Sep 21, 2022, at 17:49, Quincey Morris <quinceymorris@...> wrote:

I really think you should start by asking this question over in the Swift forums, in case there is some compiler-specific answer that can immediately explain the result. It could be a compiler code generation deficiency, but there are many other possibilities. For example, a colleague of mine speculated that there could be reasons why the M1 Mac ran the Rosetta translation on a *performance* core, but ran the Apple Silicon version on an *efficiency* core.

You can also investigate the performance yourself, by running (say) the Time Profiler template in Instruments. It’s unclear which instrument might provide informative results, so you might need to try a couple of different templates, focussing on different things.

This might ultimately be an Apple support question, but I’d imagine there are numerous people on the Swift forums who’d enjoy puzzling out the answer. :)

On Sep 20, 2022, at 22:59, Gerriet M. Denkmann <gerriet@...> wrote:

On 20 Sep 2022, at 19:42, Alex Zavatone via <zav@...> wrote:

It might seem like a primitive approach, but logging with time stamps should be able to highlight where the suckyness is.  Run a log that displays the time delta from the last logging statement so that you are only looking at the deltas. Then run each version and see where the slowness is.  That should tell you, right?

I did this:
typedef uint32_t limb;
typedef uint64_t bigLimb;

const uint len = 50000;
const int shiftLimb = sizeof(limb) * 8;

limb *someArray = malloc( len * sizeof(limb) );  
bigLimb someBig = 0;

for (bigLimb factor = 1; factor < len; factor++ )
for (uint idx = 0 ; idx < len ; idx++)
someBig += factor * someArray[idx] ;
someArray[idx] = (limb)(someBig);
someBig >>= shiftLimb;

and run it in Release mode (-Os = Fastest, Smallest) 
(In Debug mode (-O0) Rosetta time = M1 time).

with "someBig >>= shiftLimb”:
Rosetta M1 Rosetta time / M1 time
1.8 3.35 0.54
without the shift:
1.32 0.924 1.43

So it seems that Rosetta optimizes shifts way better than Apple Silicon.

Which kind of looks like a bug.


Join to automatically receive all group messages.