New .nops directive, to aid Linux alternatives patching?

classic Classic list List threaded Threaded
39 messages Options
12
Reply | Threaded
Open this post in threaded view
|

New .nops directive, to aid Linux alternatives patching?

Andrew Cooper
Hello,

I realise this is a little bit niche, but how feasible would it be to
introduce a new .nops directive which takes a size parameter, and
outputs long nops covering the number of specified bytes?

For kernel development, when creating alternative patch points for
boot-time instruction/functionality selection, we commonly end up with
the different alternatives having different lengths.  For patching
safety, the compile-time alternative needs to be extended with nops so
the largest alternative can fit in, if we chose to select it.

At the moment, alignment directives have optimisations to pad with long
nops up to the alignment boundary.  However, the alignment properties
are problematic, especially when trying to patch an individual
instruction or two in a hotpath.

At the moment, automatic size calculations can be performed in the
following way:

/*
 * Define an alternative between two instructions. If @feature is
 * present, early code in apply_alternatives() replaces @oldinstr with
 * @newinstr. ".skip" directive takes care of proper instruction padding
 * in case @newinstr is longer than @oldinstr.
 */
.macro ALTERNATIVE oldinstr, newinstr, feature
140:
        \oldinstr
141:
        .skip -(((144f-143f)-(141b-140b)) > 0) *
((144f-143f)-(141b-140b)),0x90
142:

        .pushsection .altinstructions,"a"
        altinstruction_entry
140b,143f,\feature,142b-140b,144f-143f,142b-141b
        .popsection

        .pushsection .altinstr_replacement,"ax"
143:
        \newinstr
144:
        .popsection
.endm

With the .skip directive adding sufficient bytes of single-byte nop
instructions.  While this is functionally correct, it renders the
disassembly unintelligible (especially for longer alternatives, as we've
seen with the Spectre/SP2 mitigations), and comes with runtime
performance hit (singlebyte nops are deliberately not optimised in newer
pipelines to avoid breaking naive timing loops).

The runtime perf hit can be addressed late in boot by re-patching at
runtime with long nops.  However, patching comes with a nonzero chance
of tripping over an NMI/MCE and having the interrupt handler counter a
half-patched instruction, and it would be better to avoid needless
repatching if we possibly can.


Anyway, what I'm trying to say is that having a .nops directive which
could produce an exact number of optimised nops would be very helpful. 
Is it the kind of feature which would be considered useful upstream?

~Andrew
Reply | Threaded
Open this post in threaded view
|

Re: New .nops directive, to aid Linux alternatives patching?

H.J. Lu-30
On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
<[hidden email]> wrote:
> Hello,
>
> I realise this is a little bit niche, but how feasible would it be to
> introduce a new .nops directive which takes a size parameter, and
> outputs long nops covering the number of specified bytes?
>

Sounds to me you want a pseudo NOP instruction:

pseudo-NOP N

which generates a long NOP with N byte.  Is that correct.  If yes,
what is the range of N?

--
H.J.
Reply | Threaded
Open this post in threaded view
|

Re: New .nops directive, to aid Linux alternatives patching?

Andrew Cooper
On 08/02/2018 20:10, H.J. Lu wrote:

> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
> <[hidden email]> wrote:
>> Hello,
>>
>> I realise this is a little bit niche, but how feasible would it be to
>> introduce a new .nops directive which takes a size parameter, and
>> outputs long nops covering the number of specified bytes?
>>
> Sounds to me you want a pseudo NOP instruction:
>
> pseudo-NOP N
>
> which generates a long NOP with N byte.  Is that correct.  If yes,
> what is the range of N?

Currently 255 based on other implementation limits, and I expect that
ought to be long enough for anyone.  There is one existing user for
N=43, and I expect that to grow a bit.

The real answer properly depends at what point it is more efficient to
jmp rather than wasting decode bandwidth decoding nops, and I don't know
the answer, but expect that it isn't larger than 255.

~Andrew
Reply | Threaded
Open this post in threaded view
|

Re: New .nops directive, to aid Linux alternatives patching?

H.J. Lu-30
On Thu, Feb 8, 2018 at 12:18 PM, Andrew Cooper
<[hidden email]> wrote:

> On 08/02/2018 20:10, H.J. Lu wrote:
>> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
>> <[hidden email]> wrote:
>>> Hello,
>>>
>>> I realise this is a little bit niche, but how feasible would it be to
>>> introduce a new .nops directive which takes a size parameter, and
>>> outputs long nops covering the number of specified bytes?
>>>
>> Sounds to me you want a pseudo NOP instruction:
>>
>> pseudo-NOP N
>>
>> which generates a long NOP with N byte.  Is that correct.  If yes,
>> what is the range of N?
>
> Currently 255 based on other implementation limits, and I expect that
> ought to be long enough for anyone.  There is one existing user for
> N=43, and I expect that to grow a bit.
>
> The real answer properly depends at what point it is more efficient to
> jmp rather than wasting decode bandwidth decoding nops, and I don't know
> the answer, but expect that it isn't larger than 255.
>

How about

{nop} N

If N is less than 15 bytes, it generates a long nop.   Otherwise, we use a jump
instruction over nops.  Does it work for you?


--
H.J.
Reply | Threaded
Open this post in threaded view
|

Re: New .nops directive, to aid Linux alternatives patching?

H.J. Lu-30
On Thu, Feb 8, 2018 at 12:27 PM, H.J. Lu <[hidden email]> wrote:

> On Thu, Feb 8, 2018 at 12:18 PM, Andrew Cooper
> <[hidden email]> wrote:
>> On 08/02/2018 20:10, H.J. Lu wrote:
>>> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
>>> <[hidden email]> wrote:
>>>> Hello,
>>>>
>>>> I realise this is a little bit niche, but how feasible would it be to
>>>> introduce a new .nops directive which takes a size parameter, and
>>>> outputs long nops covering the number of specified bytes?
>>>>
>>> Sounds to me you want a pseudo NOP instruction:
>>>
>>> pseudo-NOP N
>>>
>>> which generates a long NOP with N byte.  Is that correct.  If yes,
>>> what is the range of N?
>>
>> Currently 255 based on other implementation limits, and I expect that
>> ought to be long enough for anyone.  There is one existing user for
>> N=43, and I expect that to grow a bit.
>>
>> The real answer properly depends at what point it is more efficient to
>> jmp rather than wasting decode bandwidth decoding nops, and I don't know
>> the answer, but expect that it isn't larger than 255.
>>
>
> How about
>
> {nop} N
>
> If N is less than 15 bytes, it generates a long nop.   Otherwise, we use a jump
> instruction over nops.  Does it work for you?

N will be limited to 255.



--
H.J.
Reply | Threaded
Open this post in threaded view
|

Re: New .nops directive, to aid Linux alternatives patching?

Andrew Cooper
On 08/02/2018 20:28, H.J. Lu wrote:

> On Thu, Feb 8, 2018 at 12:27 PM, H.J. Lu <[hidden email]> wrote:
>> On Thu, Feb 8, 2018 at 12:18 PM, Andrew Cooper
>> <[hidden email]> wrote:
>>> On 08/02/2018 20:10, H.J. Lu wrote:
>>>> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
>>>> <[hidden email]> wrote:
>>>>> Hello,
>>>>>
>>>>> I realise this is a little bit niche, but how feasible would it be to
>>>>> introduce a new .nops directive which takes a size parameter, and
>>>>> outputs long nops covering the number of specified bytes?
>>>>>
>>>> Sounds to me you want a pseudo NOP instruction:
>>>>
>>>> pseudo-NOP N
>>>>
>>>> which generates a long NOP with N byte.  Is that correct.  If yes,
>>>> what is the range of N?
>>> Currently 255 based on other implementation limits, and I expect that
>>> ought to be long enough for anyone.  There is one existing user for
>>> N=43, and I expect that to grow a bit.
>>>
>>> The real answer properly depends at what point it is more efficient to
>>> jmp rather than wasting decode bandwidth decoding nops, and I don't know
>>> the answer, but expect that it isn't larger than 255.
>>>
>> How about
>>
>> {nop} N
>>
>> If N is less than 15 bytes, it generates a long nop.   Otherwise, we use a jump
>> instruction over nops.  Does it work for you?
> N will be limited to 255.

Do you mean up to 255 bytes of adjacent long nops, or still a jump if
over 15 bytes?  For alternatives in the range of 15-30, a jmp is almost
certainly slower than executing through the nops.  The ORM isn't clear
where the split lies, and I expect it is very uarch specific.

~Andrew
Reply | Threaded
Open this post in threaded view
|

Re: New .nops directive, to aid Linux alternatives patching?

H.J. Lu-30
On Thu, Feb 8, 2018 at 12:33 PM, Andrew Cooper
<[hidden email]> wrote:

> On 08/02/2018 20:28, H.J. Lu wrote:
>> On Thu, Feb 8, 2018 at 12:27 PM, H.J. Lu <[hidden email]> wrote:
>>> On Thu, Feb 8, 2018 at 12:18 PM, Andrew Cooper
>>> <[hidden email]> wrote:
>>>> On 08/02/2018 20:10, H.J. Lu wrote:
>>>>> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
>>>>> <[hidden email]> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I realise this is a little bit niche, but how feasible would it be to
>>>>>> introduce a new .nops directive which takes a size parameter, and
>>>>>> outputs long nops covering the number of specified bytes?
>>>>>>
>>>>> Sounds to me you want a pseudo NOP instruction:
>>>>>
>>>>> pseudo-NOP N
>>>>>
>>>>> which generates a long NOP with N byte.  Is that correct.  If yes,
>>>>> what is the range of N?
>>>> Currently 255 based on other implementation limits, and I expect that
>>>> ought to be long enough for anyone.  There is one existing user for
>>>> N=43, and I expect that to grow a bit.
>>>>
>>>> The real answer properly depends at what point it is more efficient to
>>>> jmp rather than wasting decode bandwidth decoding nops, and I don't know
>>>> the answer, but expect that it isn't larger than 255.
>>>>
>>> How about
>>>
>>> {nop} N
>>>
>>> If N is less than 15 bytes, it generates a long nop.   Otherwise, we use a jump
>>> instruction over nops.  Does it work for you?
>> N will be limited to 255.
>
> Do you mean up to 255 bytes of adjacent long nops, or still a jump if
> over 15 bytes?  For alternatives in the range of 15-30, a jmp is almost
> certainly slower than executing through the nops.  The ORM isn't clear
> where the split lies, and I expect it is very uarch specific.

How about this

{nop} N, L
{nop} N

N is < =255. If L is missing, L is 15.

If N < L then
  Long NOPs up to N bytes
else
  jmp + long nops up to N bytes.
fi


--
H.J.
Reply | Threaded
Open this post in threaded view
|

Re: New .nops directive, to aid Linux alternatives patching?

Andrew Cooper
On 08/02/2018 20:36, H.J. Lu wrote:

> On Thu, Feb 8, 2018 at 12:33 PM, Andrew Cooper
> <[hidden email]> wrote:
>> On 08/02/2018 20:28, H.J. Lu wrote:
>>> On Thu, Feb 8, 2018 at 12:27 PM, H.J. Lu <[hidden email]> wrote:
>>>> On Thu, Feb 8, 2018 at 12:18 PM, Andrew Cooper
>>>> <[hidden email]> wrote:
>>>>> On 08/02/2018 20:10, H.J. Lu wrote:
>>>>>> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
>>>>>> <[hidden email]> wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I realise this is a little bit niche, but how feasible would it be to
>>>>>>> introduce a new .nops directive which takes a size parameter, and
>>>>>>> outputs long nops covering the number of specified bytes?
>>>>>>>
>>>>>> Sounds to me you want a pseudo NOP instruction:
>>>>>>
>>>>>> pseudo-NOP N
>>>>>>
>>>>>> which generates a long NOP with N byte.  Is that correct.  If yes,
>>>>>> what is the range of N?
>>>>> Currently 255 based on other implementation limits, and I expect that
>>>>> ought to be long enough for anyone.  There is one existing user for
>>>>> N=43, and I expect that to grow a bit.
>>>>>
>>>>> The real answer properly depends at what point it is more efficient to
>>>>> jmp rather than wasting decode bandwidth decoding nops, and I don't know
>>>>> the answer, but expect that it isn't larger than 255.
>>>>>
>>>> How about
>>>>
>>>> {nop} N
>>>>
>>>> If N is less than 15 bytes, it generates a long nop.   Otherwise, we use a jump
>>>> instruction over nops.  Does it work for you?
>>> N will be limited to 255.
>> Do you mean up to 255 bytes of adjacent long nops, or still a jump if
>> over 15 bytes?  For alternatives in the range of 15-30, a jmp is almost
>> certainly slower than executing through the nops.  The ORM isn't clear
>> where the split lies, and I expect it is very uarch specific.
> How about this
>
> {nop} N, L
> {nop} N
>
> N is < =255. If L is missing, L is 15.
>
> If N < L then
>   Long NOPs up to N bytes
> else
>   jmp + long nops up to N bytes.
> fi

I'm afraid that I don't think that will be very helpful in that form. 
Are there technical reasons why you don't want to emit more than a
single 15byte long nop?

First of all, 9-byte long nops are the longest you can use without
suffering decode stalls from on most processors due to excess segment
prefixes, which is why both Linux and Xen top out there when dynamically
adding new nops.

Secondly, I don't understand why you want the jmp.  I think it would be
entirely reasonable to make it the programmers problem to work out when
a jmp is more efficient.  If the patchsites really do get stupidly long,
we could make a boot-time u-arch calculation to decider whether the jmp
or the nops are better, but shorter patchsites are better so I don't
expect such a feature to get any production use where using a jmp would
be beneficial.

Ideally, such an implementation would just emit as many long nops as
would fill up the space requested.  One trick however to consider is
that if you've got N+10 bytes remaining and emitting N-sized long nops
(where N is most likely 9), then emitting an N+8 long nop and a 2-byte
long nop is more efficient to execute than an N+9 nop and a singlebyte
nop, as the singlebyte nop can't be optimised during execution.

~Andrew
Reply | Threaded
Open this post in threaded view
|

Re: New .nops directive, to aid Linux alternatives patching?

H.J. Lu-30
On Thu, Feb 8, 2018 at 3:47 PM, Andrew Cooper <[hidden email]> wrote:

> On 08/02/2018 20:36, H.J. Lu wrote:
>> On Thu, Feb 8, 2018 at 12:33 PM, Andrew Cooper
>> <[hidden email]> wrote:
>>> On 08/02/2018 20:28, H.J. Lu wrote:
>>>> On Thu, Feb 8, 2018 at 12:27 PM, H.J. Lu <[hidden email]> wrote:
>>>>> On Thu, Feb 8, 2018 at 12:18 PM, Andrew Cooper
>>>>> <[hidden email]> wrote:
>>>>>> On 08/02/2018 20:10, H.J. Lu wrote:
>>>>>>> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
>>>>>>> <[hidden email]> wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I realise this is a little bit niche, but how feasible would it be to
>>>>>>>> introduce a new .nops directive which takes a size parameter, and
>>>>>>>> outputs long nops covering the number of specified bytes?
>>>>>>>>
>>>>>>> Sounds to me you want a pseudo NOP instruction:
>>>>>>>
>>>>>>> pseudo-NOP N
>>>>>>>
>>>>>>> which generates a long NOP with N byte.  Is that correct.  If yes,
>>>>>>> what is the range of N?
>>>>>> Currently 255 based on other implementation limits, and I expect that
>>>>>> ought to be long enough for anyone.  There is one existing user for
>>>>>> N=43, and I expect that to grow a bit.
>>>>>>
>>>>>> The real answer properly depends at what point it is more efficient to
>>>>>> jmp rather than wasting decode bandwidth decoding nops, and I don't know
>>>>>> the answer, but expect that it isn't larger than 255.
>>>>>>
>>>>> How about
>>>>>
>>>>> {nop} N
>>>>>
>>>>> If N is less than 15 bytes, it generates a long nop.   Otherwise, we use a jump
>>>>> instruction over nops.  Does it work for you?
>>>> N will be limited to 255.
>>> Do you mean up to 255 bytes of adjacent long nops, or still a jump if
>>> over 15 bytes?  For alternatives in the range of 15-30, a jmp is almost
>>> certainly slower than executing through the nops.  The ORM isn't clear
>>> where the split lies, and I expect it is very uarch specific.
>> How about this
>>
>> {nop} N, L
>> {nop} N
>>
>> N is < =255. If L is missing, L is 15.
>>
>> If N < L then
>>   Long NOPs up to N bytes
>> else
>>   jmp + long nops up to N bytes.
>> fi
>
> I'm afraid that I don't think that will be very helpful in that form.
> Are there technical reasons why you don't want to emit more than a
> single 15byte long nop?
>

Doesn't

{nop} 28, 40

generate 2 x 14-byte nops?


--
H.J.
Reply | Threaded
Open this post in threaded view
|

Re: New .nops directive, to aid Linux alternatives patching?

Alan Modra-3
In reply to this post by Andrew Cooper
On Thu, Feb 08, 2018 at 11:47:53PM +0000, Andrew Cooper wrote:

> On 08/02/2018 20:36, H.J. Lu wrote:
> > How about this
> >
> > {nop} N, L
> > {nop} N
> >
> > N is < =255. If L is missing, L is 15.
> >
> > If N < L then
> >   Long NOPs up to N bytes
> > else
> >   jmp + long nops up to N bytes.
> > fi
>
> I'm afraid that I don't think that will be very helpful in that form. 

Wrong punctuation.  You missed a full stop after "think".

Just how does this not give you what you were asking for at the
beginning of this thread?

--
Alan Modra
Australia Development Lab, IBM
Reply | Threaded
Open this post in threaded view
|

Re: New .nops directive, to aid Linux alternatives patching?

Andrew Cooper
In reply to this post by H.J. Lu-30
On 09/02/2018 00:24, H.J. Lu wrote:

> On Thu, Feb 8, 2018 at 3:47 PM, Andrew Cooper <[hidden email]> wrote:
>> On 08/02/2018 20:36, H.J. Lu wrote:
>>> On Thu, Feb 8, 2018 at 12:33 PM, Andrew Cooper
>>> <[hidden email]> wrote:
>>>> On 08/02/2018 20:28, H.J. Lu wrote:
>>>>> On Thu, Feb 8, 2018 at 12:27 PM, H.J. Lu <[hidden email]> wrote:
>>>>>> On Thu, Feb 8, 2018 at 12:18 PM, Andrew Cooper
>>>>>> <[hidden email]> wrote:
>>>>>>> On 08/02/2018 20:10, H.J. Lu wrote:
>>>>>>>> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
>>>>>>>> <[hidden email]> wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I realise this is a little bit niche, but how feasible would it be to
>>>>>>>>> introduce a new .nops directive which takes a size parameter, and
>>>>>>>>> outputs long nops covering the number of specified bytes?
>>>>>>>>>
>>>>>>>> Sounds to me you want a pseudo NOP instruction:
>>>>>>>>
>>>>>>>> pseudo-NOP N
>>>>>>>>
>>>>>>>> which generates a long NOP with N byte.  Is that correct.  If yes,
>>>>>>>> what is the range of N?
>>>>>>> Currently 255 based on other implementation limits, and I expect that
>>>>>>> ought to be long enough for anyone.  There is one existing user for
>>>>>>> N=43, and I expect that to grow a bit.
>>>>>>>
>>>>>>> The real answer properly depends at what point it is more efficient to
>>>>>>> jmp rather than wasting decode bandwidth decoding nops, and I don't know
>>>>>>> the answer, but expect that it isn't larger than 255.
>>>>>>>
>>>>>> How about
>>>>>>
>>>>>> {nop} N
>>>>>>
>>>>>> If N is less than 15 bytes, it generates a long nop.   Otherwise, we use a jump
>>>>>> instruction over nops.  Does it work for you?
>>>>> N will be limited to 255.
>>>> Do you mean up to 255 bytes of adjacent long nops, or still a jump if
>>>> over 15 bytes?  For alternatives in the range of 15-30, a jmp is almost
>>>> certainly slower than executing through the nops.  The ORM isn't clear
>>>> where the split lies, and I expect it is very uarch specific.
>>> How about this
>>>
>>> {nop} N, L
>>> {nop} N
>>>
>>> N is < =255. If L is missing, L is 15.
>>>
>>> If N < L then
>>>   Long NOPs up to N bytes
>>> else
>>>   jmp + long nops up to N bytes.
>>> fi
>> I'm afraid that I don't think that will be very helpful in that form.
>> Are there technical reasons why you don't want to emit more than a
>> single 15byte long nop?
>>
> Doesn't
>
> {nop} 28, 40
>
> generate 2 x 14-byte nops?

By the above logic, yes.  I still don't see the value in the L
parameter, because I don't expect an average programmer to know how to
choose it sensibly.  Then again, a compiler generating code for a
specified uarch probably could have some idea of what value to feed in.

If the semantics were a little more like:

{nop} N => N bytes of nops with no jumps
{nop} N, L => as above

Then this might be more useful.

I expect N will typically be an expression rather than an absolute
number, because the usecase I've proposed is for filling in a specific,
calculated number of bytes.  (In particular, what commonly happens is
that memory references in alternatives are the thing which cause the
exact length to fluctuate.)  When there is a sensible uarch value for L,
that can be fed in, but shouldn't be mandatory.  In particular, if it
unknown, 15 is almost certainly the wrong default for it.

~Andrew
Reply | Threaded
Open this post in threaded view
|

Re: New .nops directive, to aid Linux alternatives patching?

H.J. Lu-30
On Thu, Feb 8, 2018 at 4:45 PM, Andrew Cooper <[hidden email]> wrote:

> On 09/02/2018 00:24, H.J. Lu wrote:
>> On Thu, Feb 8, 2018 at 3:47 PM, Andrew Cooper <[hidden email]> wrote:
>>> On 08/02/2018 20:36, H.J. Lu wrote:
>>>> On Thu, Feb 8, 2018 at 12:33 PM, Andrew Cooper
>>>> <[hidden email]> wrote:
>>>>> On 08/02/2018 20:28, H.J. Lu wrote:
>>>>>> On Thu, Feb 8, 2018 at 12:27 PM, H.J. Lu <[hidden email]> wrote:
>>>>>>> On Thu, Feb 8, 2018 at 12:18 PM, Andrew Cooper
>>>>>>> <[hidden email]> wrote:
>>>>>>>> On 08/02/2018 20:10, H.J. Lu wrote:
>>>>>>>>> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I realise this is a little bit niche, but how feasible would it be to
>>>>>>>>>> introduce a new .nops directive which takes a size parameter, and
>>>>>>>>>> outputs long nops covering the number of specified bytes?
>>>>>>>>>>
>>>>>>>>> Sounds to me you want a pseudo NOP instruction:
>>>>>>>>>
>>>>>>>>> pseudo-NOP N
>>>>>>>>>
>>>>>>>>> which generates a long NOP with N byte.  Is that correct.  If yes,
>>>>>>>>> what is the range of N?
>>>>>>>> Currently 255 based on other implementation limits, and I expect that
>>>>>>>> ought to be long enough for anyone.  There is one existing user for
>>>>>>>> N=43, and I expect that to grow a bit.
>>>>>>>>
>>>>>>>> The real answer properly depends at what point it is more efficient to
>>>>>>>> jmp rather than wasting decode bandwidth decoding nops, and I don't know
>>>>>>>> the answer, but expect that it isn't larger than 255.
>>>>>>>>
>>>>>>> How about
>>>>>>>
>>>>>>> {nop} N
>>>>>>>
>>>>>>> If N is less than 15 bytes, it generates a long nop.   Otherwise, we use a jump
>>>>>>> instruction over nops.  Does it work for you?
>>>>>> N will be limited to 255.
>>>>> Do you mean up to 255 bytes of adjacent long nops, or still a jump if
>>>>> over 15 bytes?  For alternatives in the range of 15-30, a jmp is almost
>>>>> certainly slower than executing through the nops.  The ORM isn't clear
>>>>> where the split lies, and I expect it is very uarch specific.
>>>> How about this
>>>>
>>>> {nop} N, L
>>>> {nop} N
>>>>
>>>> N is < =255. If L is missing, L is 15.
>>>>
>>>> If N < L then
>>>>   Long NOPs up to N bytes
>>>> else
>>>>   jmp + long nops up to N bytes.
>>>> fi
>>> I'm afraid that I don't think that will be very helpful in that form.
>>> Are there technical reasons why you don't want to emit more than a
>>> single 15byte long nop?
>>>
>> Doesn't
>>
>> {nop} 28, 40
>>
>> generate 2 x 14-byte nops?
>
> By the above logic, yes.  I still don't see the value in the L
> parameter, because I don't expect an average programmer to know how to
> choose it sensibly.  Then again, a compiler generating code for a
> specified uarch probably could have some idea of what value to feed in.
>
> If the semantics were a little more like:
>
> {nop} N => N bytes of nops with no jumps
> {nop} N, L => as above
>
> Then this might be more useful.
>
> I expect N will typically be an expression rather than an absolute
> number, because the usecase I've proposed is for filling in a specific,
> calculated number of bytes.  (In particular, what commonly happens is
> that memory references in alternatives are the thing which cause the
> exact length to fluctuate.)  When there is a sensible uarch value for L,
> that can be fed in, but shouldn't be mandatory.  In particular, if it
> unknown, 15 is almost certainly the wrong default for it.

So, you want

.nop SIZE

and

.jump SIZE

which are similar to '.skip SIZE , FILL'.  But they fill SIZE with nops or
jmp + nops.

--
H.J.
Reply | Threaded
Open this post in threaded view
|

Re: New .nops directive, to aid Linux alternatives patching?

H.J. Lu-30
On Thu, Feb 8, 2018 at 5:14 PM, H.J. Lu <[hidden email]> wrote:

> On Thu, Feb 8, 2018 at 4:45 PM, Andrew Cooper <[hidden email]> wrote:
>> On 09/02/2018 00:24, H.J. Lu wrote:
>>> On Thu, Feb 8, 2018 at 3:47 PM, Andrew Cooper <[hidden email]> wrote:
>>>> On 08/02/2018 20:36, H.J. Lu wrote:
>>>>> On Thu, Feb 8, 2018 at 12:33 PM, Andrew Cooper
>>>>> <[hidden email]> wrote:
>>>>>> On 08/02/2018 20:28, H.J. Lu wrote:
>>>>>>> On Thu, Feb 8, 2018 at 12:27 PM, H.J. Lu <[hidden email]> wrote:
>>>>>>>> On Thu, Feb 8, 2018 at 12:18 PM, Andrew Cooper
>>>>>>>> <[hidden email]> wrote:
>>>>>>>>> On 08/02/2018 20:10, H.J. Lu wrote:
>>>>>>>>>> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
>>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I realise this is a little bit niche, but how feasible would it be to
>>>>>>>>>>> introduce a new .nops directive which takes a size parameter, and
>>>>>>>>>>> outputs long nops covering the number of specified bytes?
>>>>>>>>>>>
>>>>>>>>>> Sounds to me you want a pseudo NOP instruction:
>>>>>>>>>>
>>>>>>>>>> pseudo-NOP N
>>>>>>>>>>
>>>>>>>>>> which generates a long NOP with N byte.  Is that correct.  If yes,
>>>>>>>>>> what is the range of N?
>>>>>>>>> Currently 255 based on other implementation limits, and I expect that
>>>>>>>>> ought to be long enough for anyone.  There is one existing user for
>>>>>>>>> N=43, and I expect that to grow a bit.
>>>>>>>>>
>>>>>>>>> The real answer properly depends at what point it is more efficient to
>>>>>>>>> jmp rather than wasting decode bandwidth decoding nops, and I don't know
>>>>>>>>> the answer, but expect that it isn't larger than 255.
>>>>>>>>>
>>>>>>>> How about
>>>>>>>>
>>>>>>>> {nop} N
>>>>>>>>
>>>>>>>> If N is less than 15 bytes, it generates a long nop.   Otherwise, we use a jump
>>>>>>>> instruction over nops.  Does it work for you?
>>>>>>> N will be limited to 255.
>>>>>> Do you mean up to 255 bytes of adjacent long nops, or still a jump if
>>>>>> over 15 bytes?  For alternatives in the range of 15-30, a jmp is almost
>>>>>> certainly slower than executing through the nops.  The ORM isn't clear
>>>>>> where the split lies, and I expect it is very uarch specific.
>>>>> How about this
>>>>>
>>>>> {nop} N, L
>>>>> {nop} N
>>>>>
>>>>> N is < =255. If L is missing, L is 15.
>>>>>
>>>>> If N < L then
>>>>>   Long NOPs up to N bytes
>>>>> else
>>>>>   jmp + long nops up to N bytes.
>>>>> fi
>>>> I'm afraid that I don't think that will be very helpful in that form.
>>>> Are there technical reasons why you don't want to emit more than a
>>>> single 15byte long nop?
>>>>
>>> Doesn't
>>>
>>> {nop} 28, 40
>>>
>>> generate 2 x 14-byte nops?
>>
>> By the above logic, yes.  I still don't see the value in the L
>> parameter, because I don't expect an average programmer to know how to
>> choose it sensibly.  Then again, a compiler generating code for a
>> specified uarch probably could have some idea of what value to feed in.
>>
>> If the semantics were a little more like:
>>
>> {nop} N => N bytes of nops with no jumps
>> {nop} N, L => as above
>>
>> Then this might be more useful.
>>
>> I expect N will typically be an expression rather than an absolute
>> number, because the usecase I've proposed is for filling in a specific,
>> calculated number of bytes.  (In particular, what commonly happens is
>> that memory references in alternatives are the thing which cause the
>> exact length to fluctuate.)  When there is a sensible uarch value for L,
>> that can be fed in, but shouldn't be mandatory.  In particular, if it
>> unknown, 15 is almost certainly the wrong default for it.
>
> So, you want
>
> .nop SIZE
>
> and
>
> .jump SIZE
>
> which are similar to '.skip SIZE , FILL'.  But they fill SIZE with nops or
> jmp + nops.
>

Or

.nop SIZE, JUMP_SIZE

If SIZE < JUMP_SIZE then
  SIZE of nops.
else
  SIZE of jmp + nops.
fi

--
H.J.
Reply | Threaded
Open this post in threaded view
|

Re: New .nops directive, to aid Linux alternatives patching?

Andrew Cooper
On 09/02/18 02:22, H.J. Lu wrote:

> On Thu, Feb 8, 2018 at 5:14 PM, H.J. Lu <[hidden email]> wrote:
>> On Thu, Feb 8, 2018 at 4:45 PM, Andrew Cooper <[hidden email]> wrote:
>>> On 09/02/2018 00:24, H.J. Lu wrote:
>>>> On Thu, Feb 8, 2018 at 3:47 PM, Andrew Cooper <[hidden email]> wrote:
>>>>> On 08/02/2018 20:36, H.J. Lu wrote:
>>>>>> On Thu, Feb 8, 2018 at 12:33 PM, Andrew Cooper
>>>>>> <[hidden email]> wrote:
>>>>>>> On 08/02/2018 20:28, H.J. Lu wrote:
>>>>>>>> On Thu, Feb 8, 2018 at 12:27 PM, H.J. Lu <[hidden email]> wrote:
>>>>>>>>> On Thu, Feb 8, 2018 at 12:18 PM, Andrew Cooper
>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>> On 08/02/2018 20:10, H.J. Lu wrote:
>>>>>>>>>>> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
>>>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> I realise this is a little bit niche, but how feasible would it be to
>>>>>>>>>>>> introduce a new .nops directive which takes a size parameter, and
>>>>>>>>>>>> outputs long nops covering the number of specified bytes?
>>>>>>>>>>>>
>>>>>>>>>>> Sounds to me you want a pseudo NOP instruction:
>>>>>>>>>>>
>>>>>>>>>>> pseudo-NOP N
>>>>>>>>>>>
>>>>>>>>>>> which generates a long NOP with N byte.  Is that correct.  If yes,
>>>>>>>>>>> what is the range of N?
>>>>>>>>>> Currently 255 based on other implementation limits, and I expect that
>>>>>>>>>> ought to be long enough for anyone.  There is one existing user for
>>>>>>>>>> N=43, and I expect that to grow a bit.
>>>>>>>>>>
>>>>>>>>>> The real answer properly depends at what point it is more efficient to
>>>>>>>>>> jmp rather than wasting decode bandwidth decoding nops, and I don't know
>>>>>>>>>> the answer, but expect that it isn't larger than 255.
>>>>>>>>>>
>>>>>>>>> How about
>>>>>>>>>
>>>>>>>>> {nop} N
>>>>>>>>>
>>>>>>>>> If N is less than 15 bytes, it generates a long nop.   Otherwise, we use a jump
>>>>>>>>> instruction over nops.  Does it work for you?
>>>>>>>> N will be limited to 255.
>>>>>>> Do you mean up to 255 bytes of adjacent long nops, or still a jump if
>>>>>>> over 15 bytes?  For alternatives in the range of 15-30, a jmp is almost
>>>>>>> certainly slower than executing through the nops.  The ORM isn't clear
>>>>>>> where the split lies, and I expect it is very uarch specific.
>>>>>> How about this
>>>>>>
>>>>>> {nop} N, L
>>>>>> {nop} N
>>>>>>
>>>>>> N is < =255. If L is missing, L is 15.
>>>>>>
>>>>>> If N < L then
>>>>>>   Long NOPs up to N bytes
>>>>>> else
>>>>>>   jmp + long nops up to N bytes.
>>>>>> fi
>>>>> I'm afraid that I don't think that will be very helpful in that form.
>>>>> Are there technical reasons why you don't want to emit more than a
>>>>> single 15byte long nop?
>>>>>
>>>> Doesn't
>>>>
>>>> {nop} 28, 40
>>>>
>>>> generate 2 x 14-byte nops?
>>> By the above logic, yes.  I still don't see the value in the L
>>> parameter, because I don't expect an average programmer to know how to
>>> choose it sensibly.  Then again, a compiler generating code for a
>>> specified uarch probably could have some idea of what value to feed in.
>>>
>>> If the semantics were a little more like:
>>>
>>> {nop} N => N bytes of nops with no jumps
>>> {nop} N, L => as above
>>>
>>> Then this might be more useful.
>>>
>>> I expect N will typically be an expression rather than an absolute
>>> number, because the usecase I've proposed is for filling in a specific,
>>> calculated number of bytes.  (In particular, what commonly happens is
>>> that memory references in alternatives are the thing which cause the
>>> exact length to fluctuate.)  When there is a sensible uarch value for L,
>>> that can be fed in, but shouldn't be mandatory.  In particular, if it
>>> unknown, 15 is almost certainly the wrong default for it.
>> So, you want
>>
>> .nop SIZE
>>
>> and
>>
>> .jump SIZE
>>
>> which are similar to '.skip SIZE , FILL'.  But they fill SIZE with nops or
>> jmp + nops.
>>
> Or
>
> .nop SIZE, JUMP_SIZE
>
> If SIZE < JUMP_SIZE then
>   SIZE of nops.
> else
>   SIZE of jmp + nops.
> fi

I'm still not sure why you want the jump functionality in the first
place, but yes - this latest option would work.

FWIW, jumping over code with alternatives is typically done like:

ALTERNATIVE "jmp .L\@_skip", "", FEATURE_X
...
.L\@_skip:

At which point it is only the two or 5 byte jmp which is being
dynamically modified.  The converse case is where we begin with 2 or 5
bytes of nops, and dynamically insert the jmp.

If we're in the line for other related feature requests, how about being
able to optionally specify the maximum length of individual nops?  e.g.

.nop SIZE [, MAX_NOP = 9 [, JUMP_SIZE = -1]]

If SIZE < JUMP_SIZE then
  SIZE of nops (of MAX_NOP len or less).
else
  SIZE of jmp + nops.
fi

uarch considerations also affect the maximum length of long nops which
can be executed without suffering decode stalls.  A sensible default (on
64-bit capable processors) is 9, rather than the 15 which would be the
more obvious answer.  However, in the case of inserting the jmp, we
don't end up executing the nops, at which point decode stalls are not of
any concern.

~Andrew
Reply | Threaded
Open this post in threaded view
|

Re: New .nops directive, to aid Linux alternatives patching?

H.J. Lu-30
On Fri, Feb 9, 2018 at 3:35 AM, Andrew Cooper <[hidden email]> wrote:

> On 09/02/18 02:22, H.J. Lu wrote:
>> On Thu, Feb 8, 2018 at 5:14 PM, H.J. Lu <[hidden email]> wrote:
>>> On Thu, Feb 8, 2018 at 4:45 PM, Andrew Cooper <[hidden email]> wrote:
>>>> On 09/02/2018 00:24, H.J. Lu wrote:
>>>>> On Thu, Feb 8, 2018 at 3:47 PM, Andrew Cooper <[hidden email]> wrote:
>>>>>> On 08/02/2018 20:36, H.J. Lu wrote:
>>>>>>> On Thu, Feb 8, 2018 at 12:33 PM, Andrew Cooper
>>>>>>> <[hidden email]> wrote:
>>>>>>>> On 08/02/2018 20:28, H.J. Lu wrote:
>>>>>>>>> On Thu, Feb 8, 2018 at 12:27 PM, H.J. Lu <[hidden email]> wrote:
>>>>>>>>>> On Thu, Feb 8, 2018 at 12:18 PM, Andrew Cooper
>>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>> On 08/02/2018 20:10, H.J. Lu wrote:
>>>>>>>>>>>> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
>>>>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I realise this is a little bit niche, but how feasible would it be to
>>>>>>>>>>>>> introduce a new .nops directive which takes a size parameter, and
>>>>>>>>>>>>> outputs long nops covering the number of specified bytes?
>>>>>>>>>>>>>
>>>>>>>>>>>> Sounds to me you want a pseudo NOP instruction:
>>>>>>>>>>>>
>>>>>>>>>>>> pseudo-NOP N
>>>>>>>>>>>>
>>>>>>>>>>>> which generates a long NOP with N byte.  Is that correct.  If yes,
>>>>>>>>>>>> what is the range of N?
>>>>>>>>>>> Currently 255 based on other implementation limits, and I expect that
>>>>>>>>>>> ought to be long enough for anyone.  There is one existing user for
>>>>>>>>>>> N=43, and I expect that to grow a bit.
>>>>>>>>>>>
>>>>>>>>>>> The real answer properly depends at what point it is more efficient to
>>>>>>>>>>> jmp rather than wasting decode bandwidth decoding nops, and I don't know
>>>>>>>>>>> the answer, but expect that it isn't larger than 255.
>>>>>>>>>>>
>>>>>>>>>> How about
>>>>>>>>>>
>>>>>>>>>> {nop} N
>>>>>>>>>>
>>>>>>>>>> If N is less than 15 bytes, it generates a long nop.   Otherwise, we use a jump
>>>>>>>>>> instruction over nops.  Does it work for you?
>>>>>>>>> N will be limited to 255.
>>>>>>>> Do you mean up to 255 bytes of adjacent long nops, or still a jump if
>>>>>>>> over 15 bytes?  For alternatives in the range of 15-30, a jmp is almost
>>>>>>>> certainly slower than executing through the nops.  The ORM isn't clear
>>>>>>>> where the split lies, and I expect it is very uarch specific.
>>>>>>> How about this
>>>>>>>
>>>>>>> {nop} N, L
>>>>>>> {nop} N
>>>>>>>
>>>>>>> N is < =255. If L is missing, L is 15.
>>>>>>>
>>>>>>> If N < L then
>>>>>>>   Long NOPs up to N bytes
>>>>>>> else
>>>>>>>   jmp + long nops up to N bytes.
>>>>>>> fi
>>>>>> I'm afraid that I don't think that will be very helpful in that form.
>>>>>> Are there technical reasons why you don't want to emit more than a
>>>>>> single 15byte long nop?
>>>>>>
>>>>> Doesn't
>>>>>
>>>>> {nop} 28, 40
>>>>>
>>>>> generate 2 x 14-byte nops?
>>>> By the above logic, yes.  I still don't see the value in the L
>>>> parameter, because I don't expect an average programmer to know how to
>>>> choose it sensibly.  Then again, a compiler generating code for a
>>>> specified uarch probably could have some idea of what value to feed in.
>>>>
>>>> If the semantics were a little more like:
>>>>
>>>> {nop} N => N bytes of nops with no jumps
>>>> {nop} N, L => as above
>>>>
>>>> Then this might be more useful.
>>>>
>>>> I expect N will typically be an expression rather than an absolute
>>>> number, because the usecase I've proposed is for filling in a specific,
>>>> calculated number of bytes.  (In particular, what commonly happens is
>>>> that memory references in alternatives are the thing which cause the
>>>> exact length to fluctuate.)  When there is a sensible uarch value for L,
>>>> that can be fed in, but shouldn't be mandatory.  In particular, if it
>>>> unknown, 15 is almost certainly the wrong default for it.
>>> So, you want
>>>
>>> .nop SIZE
>>>
>>> and
>>>
>>> .jump SIZE
>>>
>>> which are similar to '.skip SIZE , FILL'.  But they fill SIZE with nops or
>>> jmp + nops.
>>>
>> Or
>>
>> .nop SIZE, JUMP_SIZE
>>
>> If SIZE < JUMP_SIZE then
>>   SIZE of nops.
>> else
>>   SIZE of jmp + nops.
>> fi
>
> I'm still not sure why you want the jump functionality in the first
> place, but yes - this latest option would work.
>
> FWIW, jumping over code with alternatives is typically done like:
>
> ALTERNATIVE "jmp .L\@_skip", "", FEATURE_X
> ...
> .L\@_skip:
>
> At which point it is only the two or 5 byte jmp which is being
> dynamically modified.  The converse case is where we begin with 2 or 5
> bytes of nops, and dynamically insert the jmp.
>
> If we're in the line for other related feature requests, how about being
> able to optionally specify the maximum length of individual nops?  e.g.
>
> .nop SIZE [, MAX_NOP = 9 [, JUMP_SIZE = -1]]

OK, let go with

 .nop SIZE [, MAX_NOP = 9]

It is easier to implement with 2 arguments.   MAX_NOP must be a constant.

> If SIZE < JUMP_SIZE then
>   SIZE of nops (of MAX_NOP len or less).
> else
>   SIZE of jmp + nops.
> fi
>
> uarch considerations also affect the maximum length of long nops which
> can be executed without suffering decode stalls.  A sensible default (on
> 64-bit capable processors) is 9, rather than the 15 which would be the
> more obvious answer.  However, in the case of inserting the jmp, we
> don't end up executing the nops, at which point decode stalls are not of
> any concern.
>
> ~Andrew



--
H.J.
Reply | Threaded
Open this post in threaded view
|

Re: New .nops directive, to aid Linux alternatives patching?

Andrew Cooper
On 09/02/18 11:55, H.J. Lu wrote:

> On Fri, Feb 9, 2018 at 3:35 AM, Andrew Cooper <[hidden email]> wrote:
>> On 09/02/18 02:22, H.J. Lu wrote:
>>> On Thu, Feb 8, 2018 at 5:14 PM, H.J. Lu <[hidden email]> wrote:
>>>> On Thu, Feb 8, 2018 at 4:45 PM, Andrew Cooper <[hidden email]> wrote:
>>>>> On 09/02/2018 00:24, H.J. Lu wrote:
>>>>>> On Thu, Feb 8, 2018 at 3:47 PM, Andrew Cooper <[hidden email]> wrote:
>>>>>>> On 08/02/2018 20:36, H.J. Lu wrote:
>>>>>>>> On Thu, Feb 8, 2018 at 12:33 PM, Andrew Cooper
>>>>>>>> <[hidden email]> wrote:
>>>>>>>>> On 08/02/2018 20:28, H.J. Lu wrote:
>>>>>>>>>> On Thu, Feb 8, 2018 at 12:27 PM, H.J. Lu <[hidden email]> wrote:
>>>>>>>>>>> On Thu, Feb 8, 2018 at 12:18 PM, Andrew Cooper
>>>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>>> On 08/02/2018 20:10, H.J. Lu wrote:
>>>>>>>>>>>>> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
>>>>>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I realise this is a little bit niche, but how feasible would it be to
>>>>>>>>>>>>>> introduce a new .nops directive which takes a size parameter, and
>>>>>>>>>>>>>> outputs long nops covering the number of specified bytes?
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Sounds to me you want a pseudo NOP instruction:
>>>>>>>>>>>>>
>>>>>>>>>>>>> pseudo-NOP N
>>>>>>>>>>>>>
>>>>>>>>>>>>> which generates a long NOP with N byte.  Is that correct.  If yes,
>>>>>>>>>>>>> what is the range of N?
>>>>>>>>>>>> Currently 255 based on other implementation limits, and I expect that
>>>>>>>>>>>> ought to be long enough for anyone.  There is one existing user for
>>>>>>>>>>>> N=43, and I expect that to grow a bit.
>>>>>>>>>>>>
>>>>>>>>>>>> The real answer properly depends at what point it is more efficient to
>>>>>>>>>>>> jmp rather than wasting decode bandwidth decoding nops, and I don't know
>>>>>>>>>>>> the answer, but expect that it isn't larger than 255.
>>>>>>>>>>>>
>>>>>>>>>>> How about
>>>>>>>>>>>
>>>>>>>>>>> {nop} N
>>>>>>>>>>>
>>>>>>>>>>> If N is less than 15 bytes, it generates a long nop.   Otherwise, we use a jump
>>>>>>>>>>> instruction over nops.  Does it work for you?
>>>>>>>>>> N will be limited to 255.
>>>>>>>>> Do you mean up to 255 bytes of adjacent long nops, or still a jump if
>>>>>>>>> over 15 bytes?  For alternatives in the range of 15-30, a jmp is almost
>>>>>>>>> certainly slower than executing through the nops.  The ORM isn't clear
>>>>>>>>> where the split lies, and I expect it is very uarch specific.
>>>>>>>> How about this
>>>>>>>>
>>>>>>>> {nop} N, L
>>>>>>>> {nop} N
>>>>>>>>
>>>>>>>> N is < =255. If L is missing, L is 15.
>>>>>>>>
>>>>>>>> If N < L then
>>>>>>>>   Long NOPs up to N bytes
>>>>>>>> else
>>>>>>>>   jmp + long nops up to N bytes.
>>>>>>>> fi
>>>>>>> I'm afraid that I don't think that will be very helpful in that form.
>>>>>>> Are there technical reasons why you don't want to emit more than a
>>>>>>> single 15byte long nop?
>>>>>>>
>>>>>> Doesn't
>>>>>>
>>>>>> {nop} 28, 40
>>>>>>
>>>>>> generate 2 x 14-byte nops?
>>>>> By the above logic, yes.  I still don't see the value in the L
>>>>> parameter, because I don't expect an average programmer to know how to
>>>>> choose it sensibly.  Then again, a compiler generating code for a
>>>>> specified uarch probably could have some idea of what value to feed in.
>>>>>
>>>>> If the semantics were a little more like:
>>>>>
>>>>> {nop} N => N bytes of nops with no jumps
>>>>> {nop} N, L => as above
>>>>>
>>>>> Then this might be more useful.
>>>>>
>>>>> I expect N will typically be an expression rather than an absolute
>>>>> number, because the usecase I've proposed is for filling in a specific,
>>>>> calculated number of bytes.  (In particular, what commonly happens is
>>>>> that memory references in alternatives are the thing which cause the
>>>>> exact length to fluctuate.)  When there is a sensible uarch value for L,
>>>>> that can be fed in, but shouldn't be mandatory.  In particular, if it
>>>>> unknown, 15 is almost certainly the wrong default for it.
>>>> So, you want
>>>>
>>>> .nop SIZE
>>>>
>>>> and
>>>>
>>>> .jump SIZE
>>>>
>>>> which are similar to '.skip SIZE , FILL'.  But they fill SIZE with nops or
>>>> jmp + nops.
>>>>
>>> Or
>>>
>>> .nop SIZE, JUMP_SIZE
>>>
>>> If SIZE < JUMP_SIZE then
>>>   SIZE of nops.
>>> else
>>>   SIZE of jmp + nops.
>>> fi
>> I'm still not sure why you want the jump functionality in the first
>> place, but yes - this latest option would work.
>>
>> FWIW, jumping over code with alternatives is typically done like:
>>
>> ALTERNATIVE "jmp .L\@_skip", "", FEATURE_X
>> ...
>> .L\@_skip:
>>
>> At which point it is only the two or 5 byte jmp which is being
>> dynamically modified.  The converse case is where we begin with 2 or 5
>> bytes of nops, and dynamically insert the jmp.
>>
>> If we're in the line for other related feature requests, how about being
>> able to optionally specify the maximum length of individual nops?  e.g.
>>
>> .nop SIZE [, MAX_NOP = 9 [, JUMP_SIZE = -1]]
> OK, let go with
>
>  .nop SIZE [, MAX_NOP = 9]
>
> It is easier to implement with 2 arguments.   MAX_NOP must be a constant.

Sounds good to me.

~Andrew
Reply | Threaded
Open this post in threaded view
|

Re: New .nops directive, to aid Linux alternatives patching?

H.J. Lu-30
On Fri, Feb 9, 2018 at 5:29 AM, Andrew Cooper <[hidden email]> wrote:

> On 09/02/18 11:55, H.J. Lu wrote:
>> On Fri, Feb 9, 2018 at 3:35 AM, Andrew Cooper <[hidden email]> wrote:
>>> On 09/02/18 02:22, H.J. Lu wrote:
>>>> On Thu, Feb 8, 2018 at 5:14 PM, H.J. Lu <[hidden email]> wrote:
>>>>> On Thu, Feb 8, 2018 at 4:45 PM, Andrew Cooper <[hidden email]> wrote:
>>>>>> On 09/02/2018 00:24, H.J. Lu wrote:
>>>>>>> On Thu, Feb 8, 2018 at 3:47 PM, Andrew Cooper <[hidden email]> wrote:
>>>>>>>> On 08/02/2018 20:36, H.J. Lu wrote:
>>>>>>>>> On Thu, Feb 8, 2018 at 12:33 PM, Andrew Cooper
>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>> On 08/02/2018 20:28, H.J. Lu wrote:
>>>>>>>>>>> On Thu, Feb 8, 2018 at 12:27 PM, H.J. Lu <[hidden email]> wrote:
>>>>>>>>>>>> On Thu, Feb 8, 2018 at 12:18 PM, Andrew Cooper
>>>>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>>>> On 08/02/2018 20:10, H.J. Lu wrote:
>>>>>>>>>>>>>> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
>>>>>>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I realise this is a little bit niche, but how feasible would it be to
>>>>>>>>>>>>>>> introduce a new .nops directive which takes a size parameter, and
>>>>>>>>>>>>>>> outputs long nops covering the number of specified bytes?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sounds to me you want a pseudo NOP instruction:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> pseudo-NOP N
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> which generates a long NOP with N byte.  Is that correct.  If yes,
>>>>>>>>>>>>>> what is the range of N?
>>>>>>>>>>>>> Currently 255 based on other implementation limits, and I expect that
>>>>>>>>>>>>> ought to be long enough for anyone.  There is one existing user for
>>>>>>>>>>>>> N=43, and I expect that to grow a bit.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The real answer properly depends at what point it is more efficient to
>>>>>>>>>>>>> jmp rather than wasting decode bandwidth decoding nops, and I don't know
>>>>>>>>>>>>> the answer, but expect that it isn't larger than 255.
>>>>>>>>>>>>>
>>>>>>>>>>>> How about
>>>>>>>>>>>>
>>>>>>>>>>>> {nop} N
>>>>>>>>>>>>
>>>>>>>>>>>> If N is less than 15 bytes, it generates a long nop.   Otherwise, we use a jump
>>>>>>>>>>>> instruction over nops.  Does it work for you?
>>>>>>>>>>> N will be limited to 255.
>>>>>>>>>> Do you mean up to 255 bytes of adjacent long nops, or still a jump if
>>>>>>>>>> over 15 bytes?  For alternatives in the range of 15-30, a jmp is almost
>>>>>>>>>> certainly slower than executing through the nops.  The ORM isn't clear
>>>>>>>>>> where the split lies, and I expect it is very uarch specific.
>>>>>>>>> How about this
>>>>>>>>>
>>>>>>>>> {nop} N, L
>>>>>>>>> {nop} N
>>>>>>>>>
>>>>>>>>> N is < =255. If L is missing, L is 15.
>>>>>>>>>
>>>>>>>>> If N < L then
>>>>>>>>>   Long NOPs up to N bytes
>>>>>>>>> else
>>>>>>>>>   jmp + long nops up to N bytes.
>>>>>>>>> fi
>>>>>>>> I'm afraid that I don't think that will be very helpful in that form.
>>>>>>>> Are there technical reasons why you don't want to emit more than a
>>>>>>>> single 15byte long nop?
>>>>>>>>
>>>>>>> Doesn't
>>>>>>>
>>>>>>> {nop} 28, 40
>>>>>>>
>>>>>>> generate 2 x 14-byte nops?
>>>>>> By the above logic, yes.  I still don't see the value in the L
>>>>>> parameter, because I don't expect an average programmer to know how to
>>>>>> choose it sensibly.  Then again, a compiler generating code for a
>>>>>> specified uarch probably could have some idea of what value to feed in.
>>>>>>
>>>>>> If the semantics were a little more like:
>>>>>>
>>>>>> {nop} N => N bytes of nops with no jumps
>>>>>> {nop} N, L => as above
>>>>>>
>>>>>> Then this might be more useful.
>>>>>>
>>>>>> I expect N will typically be an expression rather than an absolute
>>>>>> number, because the usecase I've proposed is for filling in a specific,
>>>>>> calculated number of bytes.  (In particular, what commonly happens is
>>>>>> that memory references in alternatives are the thing which cause the
>>>>>> exact length to fluctuate.)  When there is a sensible uarch value for L,
>>>>>> that can be fed in, but shouldn't be mandatory.  In particular, if it
>>>>>> unknown, 15 is almost certainly the wrong default for it.
>>>>> So, you want
>>>>>
>>>>> .nop SIZE
>>>>>
>>>>> and
>>>>>
>>>>> .jump SIZE
>>>>>
>>>>> which are similar to '.skip SIZE , FILL'.  But they fill SIZE with nops or
>>>>> jmp + nops.
>>>>>
>>>> Or
>>>>
>>>> .nop SIZE, JUMP_SIZE
>>>>
>>>> If SIZE < JUMP_SIZE then
>>>>   SIZE of nops.
>>>> else
>>>>   SIZE of jmp + nops.
>>>> fi
>>> I'm still not sure why you want the jump functionality in the first
>>> place, but yes - this latest option would work.
>>>
>>> FWIW, jumping over code with alternatives is typically done like:
>>>
>>> ALTERNATIVE "jmp .L\@_skip", "", FEATURE_X
>>> ...
>>> .L\@_skip:
>>>
>>> At which point it is only the two or 5 byte jmp which is being
>>> dynamically modified.  The converse case is where we begin with 2 or 5
>>> bytes of nops, and dynamically insert the jmp.
>>>
>>> If we're in the line for other related feature requests, how about being
>>> able to optionally specify the maximum length of individual nops?  e.g.
>>>
>>> .nop SIZE [, MAX_NOP = 9 [, JUMP_SIZE = -1]]
>> OK, let go with
>>
>>  .nop SIZE [, MAX_NOP = 9]
>>
>> It is easier to implement with 2 arguments.   MAX_NOP must be a constant.
>
> Sounds good to me.

Please try users/hjl/nop branch:

https://github.com/hjl-tools/binutils-gdb/tree/users/hjl/nop

It implemented:

.nop SIZE [, MAX_NOP = 10]

The maximum SIZE is 255.

--
H.J.
Reply | Threaded
Open this post in threaded view
|

Re: New .nops directive, to aid Linux alternatives patching?

Andrew Cooper
On 10/02/18 15:44, H.J. Lu wrote:

> On Fri, Feb 9, 2018 at 5:29 AM, Andrew Cooper <[hidden email]> wrote:
>> On 09/02/18 11:55, H.J. Lu wrote:
>>> On Fri, Feb 9, 2018 at 3:35 AM, Andrew Cooper <[hidden email]> wrote:
>>>> On 09/02/18 02:22, H.J. Lu wrote:
>>>>> On Thu, Feb 8, 2018 at 5:14 PM, H.J. Lu <[hidden email]> wrote:
>>>>>> On Thu, Feb 8, 2018 at 4:45 PM, Andrew Cooper <[hidden email]> wrote:
>>>>>>> On 09/02/2018 00:24, H.J. Lu wrote:
>>>>>>>> On Thu, Feb 8, 2018 at 3:47 PM, Andrew Cooper <[hidden email]> wrote:
>>>>>>>>> On 08/02/2018 20:36, H.J. Lu wrote:
>>>>>>>>>> On Thu, Feb 8, 2018 at 12:33 PM, Andrew Cooper
>>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>> On 08/02/2018 20:28, H.J. Lu wrote:
>>>>>>>>>>>> On Thu, Feb 8, 2018 at 12:27 PM, H.J. Lu <[hidden email]> wrote:
>>>>>>>>>>>>> On Thu, Feb 8, 2018 at 12:18 PM, Andrew Cooper
>>>>>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>>>>> On 08/02/2018 20:10, H.J. Lu wrote:
>>>>>>>>>>>>>>> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
>>>>>>>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I realise this is a little bit niche, but how feasible would it be to
>>>>>>>>>>>>>>>> introduce a new .nops directive which takes a size parameter, and
>>>>>>>>>>>>>>>> outputs long nops covering the number of specified bytes?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sounds to me you want a pseudo NOP instruction:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> pseudo-NOP N
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> which generates a long NOP with N byte.  Is that correct.  If yes,
>>>>>>>>>>>>>>> what is the range of N?
>>>>>>>>>>>>>> Currently 255 based on other implementation limits, and I expect that
>>>>>>>>>>>>>> ought to be long enough for anyone.  There is one existing user for
>>>>>>>>>>>>>> N=43, and I expect that to grow a bit.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The real answer properly depends at what point it is more efficient to
>>>>>>>>>>>>>> jmp rather than wasting decode bandwidth decoding nops, and I don't know
>>>>>>>>>>>>>> the answer, but expect that it isn't larger than 255.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> How about
>>>>>>>>>>>>>
>>>>>>>>>>>>> {nop} N
>>>>>>>>>>>>>
>>>>>>>>>>>>> If N is less than 15 bytes, it generates a long nop.   Otherwise, we use a jump
>>>>>>>>>>>>> instruction over nops.  Does it work for you?
>>>>>>>>>>>> N will be limited to 255.
>>>>>>>>>>> Do you mean up to 255 bytes of adjacent long nops, or still a jump if
>>>>>>>>>>> over 15 bytes?  For alternatives in the range of 15-30, a jmp is almost
>>>>>>>>>>> certainly slower than executing through the nops.  The ORM isn't clear
>>>>>>>>>>> where the split lies, and I expect it is very uarch specific.
>>>>>>>>>> How about this
>>>>>>>>>>
>>>>>>>>>> {nop} N, L
>>>>>>>>>> {nop} N
>>>>>>>>>>
>>>>>>>>>> N is < =255. If L is missing, L is 15.
>>>>>>>>>>
>>>>>>>>>> If N < L then
>>>>>>>>>>   Long NOPs up to N bytes
>>>>>>>>>> else
>>>>>>>>>>   jmp + long nops up to N bytes.
>>>>>>>>>> fi
>>>>>>>>> I'm afraid that I don't think that will be very helpful in that form.
>>>>>>>>> Are there technical reasons why you don't want to emit more than a
>>>>>>>>> single 15byte long nop?
>>>>>>>>>
>>>>>>>> Doesn't
>>>>>>>>
>>>>>>>> {nop} 28, 40
>>>>>>>>
>>>>>>>> generate 2 x 14-byte nops?
>>>>>>> By the above logic, yes.  I still don't see the value in the L
>>>>>>> parameter, because I don't expect an average programmer to know how to
>>>>>>> choose it sensibly.  Then again, a compiler generating code for a
>>>>>>> specified uarch probably could have some idea of what value to feed in.
>>>>>>>
>>>>>>> If the semantics were a little more like:
>>>>>>>
>>>>>>> {nop} N => N bytes of nops with no jumps
>>>>>>> {nop} N, L => as above
>>>>>>>
>>>>>>> Then this might be more useful.
>>>>>>>
>>>>>>> I expect N will typically be an expression rather than an absolute
>>>>>>> number, because the usecase I've proposed is for filling in a specific,
>>>>>>> calculated number of bytes.  (In particular, what commonly happens is
>>>>>>> that memory references in alternatives are the thing which cause the
>>>>>>> exact length to fluctuate.)  When there is a sensible uarch value for L,
>>>>>>> that can be fed in, but shouldn't be mandatory.  In particular, if it
>>>>>>> unknown, 15 is almost certainly the wrong default for it.
>>>>>> So, you want
>>>>>>
>>>>>> .nop SIZE
>>>>>>
>>>>>> and
>>>>>>
>>>>>> .jump SIZE
>>>>>>
>>>>>> which are similar to '.skip SIZE , FILL'.  But they fill SIZE with nops or
>>>>>> jmp + nops.
>>>>>>
>>>>> Or
>>>>>
>>>>> .nop SIZE, JUMP_SIZE
>>>>>
>>>>> If SIZE < JUMP_SIZE then
>>>>>   SIZE of nops.
>>>>> else
>>>>>   SIZE of jmp + nops.
>>>>> fi
>>>> I'm still not sure why you want the jump functionality in the first
>>>> place, but yes - this latest option would work.
>>>>
>>>> FWIW, jumping over code with alternatives is typically done like:
>>>>
>>>> ALTERNATIVE "jmp .L\@_skip", "", FEATURE_X
>>>> ...
>>>> .L\@_skip:
>>>>
>>>> At which point it is only the two or 5 byte jmp which is being
>>>> dynamically modified.  The converse case is where we begin with 2 or 5
>>>> bytes of nops, and dynamically insert the jmp.
>>>>
>>>> If we're in the line for other related feature requests, how about being
>>>> able to optionally specify the maximum length of individual nops?  e.g.
>>>>
>>>> .nop SIZE [, MAX_NOP = 9 [, JUMP_SIZE = -1]]
>>> OK, let go with
>>>
>>>  .nop SIZE [, MAX_NOP = 9]
>>>
>>> It is easier to implement with 2 arguments.   MAX_NOP must be a constant.
>> Sounds good to me.
> Please try users/hjl/nop branch:
>
> https://github.com/hjl-tools/binutils-gdb/tree/users/hjl/nop

Oh - thankyou!  I was about to ask if there were any pointers to get
started hacking on binutils.

As for the functionality, there are unfortunately some issues.  Given
this source:

        .text
single:
        nop

pseudo_1:
        .nop 1

pseudo_8:
        .nop 8

pseudo_8_4:
        .nop 8, 4

pseudo_20:
        .nop 20

I get the following disassembly:

0000000000000000 <single>:
   0:    90                       nop

0000000000000001 <pseudo_1>:
   1:    66 90                    xchg   %ax,%ax

0000000000000003 <pseudo_8>:
   3:    66 0f 1f 84 00 00 00     nopw   0x0(%rax,%rax,1)
   a:    00 00

000000000000000c <pseudo_8_4>:
   c:    90                       nop
   d:    0f 1f 40 00              nopl   0x0(%rax)
  11:    0f 1f 40 00              nopl   0x0(%rax)

0000000000000015 <pseudo_20>:
  15:    90                       nop
  16:    66 2e 0f 1f 84 00 00     nopw   %cs:0x0(%rax,%rax,1)
  1d:    00 00 00
  20:    66 2e 0f 1f 84 00 00     nopw   %cs:0x0(%rax,%rax,1)
  27:    00 00 00

The MAX_NOP part looks to be working as intended (including reducing
below the default of 10), but there appears to be an off-by-one
somewhere, as one too many nops are emitted in the block.

Furthermore, attempting to use .nop 30 yields:

/tmp/ccI2Eakp.s: Assembler messages:
/tmp/ccI2Eakp.s: Fatal error: can't write 145268933551616 bytes to
section .text of nops.o: 'Bad value'

I can't obviously tie reported number to anything, but it does appear to
depend on the current position in the section.  Inserting more regular
instructions ahead of the .nop 30 causes the reported number to get
larger until it overflows.

~Andrew
Reply | Threaded
Open this post in threaded view
|

Re: New .nops directive, to aid Linux alternatives patching?

H.J. Lu-30
On Sat, Feb 10, 2018 at 9:22 AM, Andrew Cooper
<[hidden email]> wrote:

> On 10/02/18 15:44, H.J. Lu wrote:
>> On Fri, Feb 9, 2018 at 5:29 AM, Andrew Cooper <[hidden email]> wrote:
>>> On 09/02/18 11:55, H.J. Lu wrote:
>>>> On Fri, Feb 9, 2018 at 3:35 AM, Andrew Cooper <[hidden email]> wrote:
>>>>> On 09/02/18 02:22, H.J. Lu wrote:
>>>>>> On Thu, Feb 8, 2018 at 5:14 PM, H.J. Lu <[hidden email]> wrote:
>>>>>>> On Thu, Feb 8, 2018 at 4:45 PM, Andrew Cooper <[hidden email]> wrote:
>>>>>>>> On 09/02/2018 00:24, H.J. Lu wrote:
>>>>>>>>> On Thu, Feb 8, 2018 at 3:47 PM, Andrew Cooper <[hidden email]> wrote:
>>>>>>>>>> On 08/02/2018 20:36, H.J. Lu wrote:
>>>>>>>>>>> On Thu, Feb 8, 2018 at 12:33 PM, Andrew Cooper
>>>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>>> On 08/02/2018 20:28, H.J. Lu wrote:
>>>>>>>>>>>>> On Thu, Feb 8, 2018 at 12:27 PM, H.J. Lu <[hidden email]> wrote:
>>>>>>>>>>>>>> On Thu, Feb 8, 2018 at 12:18 PM, Andrew Cooper
>>>>>>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>>>>>> On 08/02/2018 20:10, H.J. Lu wrote:
>>>>>>>>>>>>>>>> On Thu, Feb 8, 2018 at 11:26 AM, Andrew Cooper
>>>>>>>>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I realise this is a little bit niche, but how feasible would it be to
>>>>>>>>>>>>>>>>> introduce a new .nops directive which takes a size parameter, and
>>>>>>>>>>>>>>>>> outputs long nops covering the number of specified bytes?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sounds to me you want a pseudo NOP instruction:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> pseudo-NOP N
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> which generates a long NOP with N byte.  Is that correct.  If yes,
>>>>>>>>>>>>>>>> what is the range of N?
>>>>>>>>>>>>>>> Currently 255 based on other implementation limits, and I expect that
>>>>>>>>>>>>>>> ought to be long enough for anyone.  There is one existing user for
>>>>>>>>>>>>>>> N=43, and I expect that to grow a bit.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The real answer properly depends at what point it is more efficient to
>>>>>>>>>>>>>>> jmp rather than wasting decode bandwidth decoding nops, and I don't know
>>>>>>>>>>>>>>> the answer, but expect that it isn't larger than 255.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> How about
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> {nop} N
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If N is less than 15 bytes, it generates a long nop.   Otherwise, we use a jump
>>>>>>>>>>>>>> instruction over nops.  Does it work for you?
>>>>>>>>>>>>> N will be limited to 255.
>>>>>>>>>>>> Do you mean up to 255 bytes of adjacent long nops, or still a jump if
>>>>>>>>>>>> over 15 bytes?  For alternatives in the range of 15-30, a jmp is almost
>>>>>>>>>>>> certainly slower than executing through the nops.  The ORM isn't clear
>>>>>>>>>>>> where the split lies, and I expect it is very uarch specific.
>>>>>>>>>>> How about this
>>>>>>>>>>>
>>>>>>>>>>> {nop} N, L
>>>>>>>>>>> {nop} N
>>>>>>>>>>>
>>>>>>>>>>> N is < =255. If L is missing, L is 15.
>>>>>>>>>>>
>>>>>>>>>>> If N < L then
>>>>>>>>>>>   Long NOPs up to N bytes
>>>>>>>>>>> else
>>>>>>>>>>>   jmp + long nops up to N bytes.
>>>>>>>>>>> fi
>>>>>>>>>> I'm afraid that I don't think that will be very helpful in that form.
>>>>>>>>>> Are there technical reasons why you don't want to emit more than a
>>>>>>>>>> single 15byte long nop?
>>>>>>>>>>
>>>>>>>>> Doesn't
>>>>>>>>>
>>>>>>>>> {nop} 28, 40
>>>>>>>>>
>>>>>>>>> generate 2 x 14-byte nops?
>>>>>>>> By the above logic, yes.  I still don't see the value in the L
>>>>>>>> parameter, because I don't expect an average programmer to know how to
>>>>>>>> choose it sensibly.  Then again, a compiler generating code for a
>>>>>>>> specified uarch probably could have some idea of what value to feed in.
>>>>>>>>
>>>>>>>> If the semantics were a little more like:
>>>>>>>>
>>>>>>>> {nop} N => N bytes of nops with no jumps
>>>>>>>> {nop} N, L => as above
>>>>>>>>
>>>>>>>> Then this might be more useful.
>>>>>>>>
>>>>>>>> I expect N will typically be an expression rather than an absolute
>>>>>>>> number, because the usecase I've proposed is for filling in a specific,
>>>>>>>> calculated number of bytes.  (In particular, what commonly happens is
>>>>>>>> that memory references in alternatives are the thing which cause the
>>>>>>>> exact length to fluctuate.)  When there is a sensible uarch value for L,
>>>>>>>> that can be fed in, but shouldn't be mandatory.  In particular, if it
>>>>>>>> unknown, 15 is almost certainly the wrong default for it.
>>>>>>> So, you want
>>>>>>>
>>>>>>> .nop SIZE
>>>>>>>
>>>>>>> and
>>>>>>>
>>>>>>> .jump SIZE
>>>>>>>
>>>>>>> which are similar to '.skip SIZE , FILL'.  But they fill SIZE with nops or
>>>>>>> jmp + nops.
>>>>>>>
>>>>>> Or
>>>>>>
>>>>>> .nop SIZE, JUMP_SIZE
>>>>>>
>>>>>> If SIZE < JUMP_SIZE then
>>>>>>   SIZE of nops.
>>>>>> else
>>>>>>   SIZE of jmp + nops.
>>>>>> fi
>>>>> I'm still not sure why you want the jump functionality in the first
>>>>> place, but yes - this latest option would work.
>>>>>
>>>>> FWIW, jumping over code with alternatives is typically done like:
>>>>>
>>>>> ALTERNATIVE "jmp .L\@_skip", "", FEATURE_X
>>>>> ...
>>>>> .L\@_skip:
>>>>>
>>>>> At which point it is only the two or 5 byte jmp which is being
>>>>> dynamically modified.  The converse case is where we begin with 2 or 5
>>>>> bytes of nops, and dynamically insert the jmp.
>>>>>
>>>>> If we're in the line for other related feature requests, how about being
>>>>> able to optionally specify the maximum length of individual nops?  e.g.
>>>>>
>>>>> .nop SIZE [, MAX_NOP = 9 [, JUMP_SIZE = -1]]
>>>> OK, let go with
>>>>
>>>>  .nop SIZE [, MAX_NOP = 9]
>>>>
>>>> It is easier to implement with 2 arguments.   MAX_NOP must be a constant.
>>> Sounds good to me.
>> Please try users/hjl/nop branch:
>>
>> https://github.com/hjl-tools/binutils-gdb/tree/users/hjl/nop
>
> Oh - thankyou!  I was about to ask if there were any pointers to get
> started hacking on binutils.
>
> As for the functionality, there are unfortunately some issues.  Given
> this source:
>
>         .text
> single:
>         nop
>
> pseudo_1:
>         .nop 1
>
> pseudo_8:
>         .nop 8
>
> pseudo_8_4:
>         .nop 8, 4
>
> pseudo_20:
>         .nop 20
>
> I get the following disassembly:
>
> 0000000000000000 <single>:
>    0:    90                       nop
>
> 0000000000000001 <pseudo_1>:
>    1:    66 90                    xchg   %ax,%ax
>
> 0000000000000003 <pseudo_8>:
>    3:    66 0f 1f 84 00 00 00     nopw   0x0(%rax,%rax,1)
>    a:    00 00
>
> 000000000000000c <pseudo_8_4>:
>    c:    90                       nop
>    d:    0f 1f 40 00              nopl   0x0(%rax)
>   11:    0f 1f 40 00              nopl   0x0(%rax)
>
> 0000000000000015 <pseudo_20>:
>   15:    90                       nop
>   16:    66 2e 0f 1f 84 00 00     nopw   %cs:0x0(%rax,%rax,1)
>   1d:    00 00 00
>   20:    66 2e 0f 1f 84 00 00     nopw   %cs:0x0(%rax,%rax,1)
>   27:    00 00 00
>
> The MAX_NOP part looks to be working as intended (including reducing
> below the default of 10), but there appears to be an off-by-one
> somewhere, as one too many nops are emitted in the block.
>
> Furthermore, attempting to use .nop 30 yields:
>
> /tmp/ccI2Eakp.s: Assembler messages:
> /tmp/ccI2Eakp.s: Fatal error: can't write 145268933551616 bytes to
> section .text of nops.o: 'Bad value'

Please try my branch again.  It should be fixed.


--
H.J.
Reply | Threaded
Open this post in threaded view
|

Re: New .nops directive, to aid Linux alternatives patching?

Andrew Cooper
On 11/02/2018 00:59, H.J. Lu wrote:

>>> Please try users/hjl/nop branch:
>>>
>>> https://github.com/hjl-tools/binutils-gdb/tree/users/hjl/nop
>> Oh - thankyou!  I was about to ask if there were any pointers to get
>> started hacking on binutils.
>>
>> As for the functionality, there are unfortunately some issues.  Given
>> this source:
>>
>>         .text
>> single:
>>         nop
>>
>> pseudo_1:
>>         .nop 1
>>
>> pseudo_8:
>>         .nop 8
>>
>> pseudo_8_4:
>>         .nop 8, 4
>>
>> pseudo_20:
>>         .nop 20
>>
>> I get the following disassembly:
>>
>> 0000000000000000 <single>:
>>    0:    90                       nop
>>
>> 0000000000000001 <pseudo_1>:
>>    1:    66 90                    xchg   %ax,%ax
>>
>> 0000000000000003 <pseudo_8>:
>>    3:    66 0f 1f 84 00 00 00     nopw   0x0(%rax,%rax,1)
>>    a:    00 00
>>
>> 000000000000000c <pseudo_8_4>:
>>    c:    90                       nop
>>    d:    0f 1f 40 00              nopl   0x0(%rax)
>>   11:    0f 1f 40 00              nopl   0x0(%rax)
>>
>> 0000000000000015 <pseudo_20>:
>>   15:    90                       nop
>>   16:    66 2e 0f 1f 84 00 00     nopw   %cs:0x0(%rax,%rax,1)
>>   1d:    00 00 00
>>   20:    66 2e 0f 1f 84 00 00     nopw   %cs:0x0(%rax,%rax,1)
>>   27:    00 00 00
>>
>> The MAX_NOP part looks to be working as intended (including reducing
>> below the default of 10), but there appears to be an off-by-one
>> somewhere, as one too many nops are emitted in the block.
>>
>> Furthermore, attempting to use .nop 30 yields:
>>
>> /tmp/ccI2Eakp.s: Assembler messages:
>> /tmp/ccI2Eakp.s: Fatal error: can't write 145268933551616 bytes to
>> section .text of nops.o: 'Bad value'
> Please try my branch again.  It should be fixed.

Thanks.  All of that looks to be in order.

However, when trying to build larger examples, I've started hitting:

/tmp/ccvxOy2v.s: Assembler messages:
/tmp/ccvxOy2v.s: Internal error in md_convert_frag at
../../gas/config/tc-i386.c:9510.

Which is the gas_assert (fragP->fr_var != BFD_RELOC_X86_NOP); you've added.

It occurs when the calculation of the number of nops to insert evaluates
to 0, and a simple ".nop 0" managed to reproduce the issue.  The
calculation evaluating to 0 is a side effect of the existing logic to
evaluate how much, if an, padding is required, and follows this kind of
pattern:

.nop -(((144f-143f)-(141b-140b)) > 0)*((144f-143f)-(141b-140b))

and evaluates to 0 when 144f-143f is equal to or smaller than 141b-140b.

~Andrew
12