Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for model functions with keyword-only and positional-only arguments #976

Open
schtandard opened this issue Nov 30, 2024 · 10 comments

Comments

@schtandard
Copy link
Contributor

First Time Issue Code

Not applicable.

Description

As mentioned in this discussion, models currently do not support model functions with keyword-only or positional-only arguments. It would be nice if that support could be added. Here's an example for keyword-only arguments.

def foo(x, *, a=1, b=1, kind='linear'):
    if kind == 'linear':
        return a + b * x
    elif kind == 'exponential':
        return a * np.exp(b * x)

mdl = lmfit.Model(foo)
# 'a' and 'b' are not parameters, 'kind' is not an independent variable.
print(mdl.independent_vars, mdl.param_names)
# ['x'] []
# Should be: ['x', 'kind'] ['a', 'b']

# 'a' and 'b' are not varied.
# 'kind' is ignored.
x = np.linspace(-10, 10, 21)
y = -2 * np.exp(x / 2)
fitres = mdl.fit(y, x=x, kind='exponential')
fitres.plot_fit()
plt.show()

# This fails.
mdl = lmfit.Model(foo, param_names=['a', 'b'])

A quick fix for supporting these only requires adding fpar.KEYWORD_ONLY to the tuple in

elif fpar.kind in (fpar.POSITIONAL_ONLY, fpar.POSITIONAL_OR_KEYWORD):
as far as I can see (however, see thoughts below).

Supporting positional-only arguments will require changes to Model.make_funcargs() (and functions using it), as it currently returns all arguments as keyword-arguments. My approach would be to have it return two values: A tuple of positional arguments and a dictionary of keyword-arguments. (I would probably return arguments that can be given in either form as positional arguments.) This function does not start with a _ but it does seem to be undocumented, so I'm unsure if changing its interface is acceptable? (If not, an alternative would be creating a new method _make_funcargs() with the new behavior that is used internally, while keeping make_funcargs() with unchanged behavior, raising an exception when positional-only arguments are involved.)

In any case, there are some further things to contemplate related to the automatic determination of independent variables. These are not exclusively due to the addition of keyword-only arguments, but additional special cases arise. Not addressing this immediately would lead to being locked into the current behavior for the new argument specifications, which is why I bring it up here. Let me know if I should rather open a separate issue. My understanding of the current situation is this:

  • Every model function argument must either be a parameter or an independent variable.
  • If not explicitly given, the first argument of the model function without a default value is used as an independent variable. Such an argument must exist.
  • In addition, every argument that has a non-numerical default value is considered an independent variable (but not arguments without a default value).

Here are some possible function signatures and what results when using them as model functions (i.e. mdl = lmfit.Model(foo), as above) after fpar.KEYWORD_ONLY has been added to the line mentioned above.

def foo(x=3, a=1, b=1, kind='linear'):

Error, because there is no argument without a default value (though the error message IndexError: pop from empty list is a bit obscure).

def foo(x=3, *, a, b=1, kind='linear'):

a and kind are independent variables, x and b are parameters.

def foo(x, *, a, b=1, kind='linear'):

x and kind are independent variables, a and b are parameters.

To me, some of these are a bit counter-intuitive. The second case is especially strange, certainly due to the algorithm not expecting arguments without default values to come after ones with a default, which can only happen with keyword-only arguments.

At the risk of getting side-tracked: When explicitly specifying independent variables and parameters, it seems that unspecified arguments are added to the parameters (unless they have non-numerical default values), but none are added to the independent variables. This way, function arguments can be "lost" and become inaccessible through the model methods.

def foo(x, a=1, b=1, kind='linear'):
    pass

mdl = lmfit.Model(foo, independent_vars=['x'], parameters=['a'])
print(mdl.independent_vars, mdl.param_names)
# ['x'] ['a', 'b']
# 'kind' got lost.

I would propose updating the rules as follows:

  • Any explicitly specified independent variables or parameters are set, of course.
  • All remaining arguments are distributed to the two categories. Those with non-numerical defaults are independent variables, those without or with numerical defaults are parameters.
  • If independent variables were not explicitly specified, the first function argument is made one, irrespective of its default value.
    • Optionally, this may be omitted in case the function exclusively has keyword-only arguments.
    • pro: Those arguments are generally considered unordered, so picking the first one, while being well-defined, is not really in the spirit of keyword-only arguments.
    • con: It's a complication of the rules.

I think these rules should leave the behavior of all currently supported model function unchanged and eliminate some of the strangeness mentioned above. (No arguments can be "lost", they are all either independent variables or parameters. In all three of the example above, x and kind would be independent variables.)

I am in principle happy to provide a PR for this (once the details above are decided), but I am very pressed for time and since the necessary changed turned out not to be so tiny, I would only get to it maybe after Christmas.

A Minimal, Complete, and Verifiable example

See above, here's the main one again.

def foo(x, *, a=1, b=1, kind='linear'):
    if kind == 'linear':
        return a + b * x
    elif kind == 'exponential':
        return a * np.exp(b * x)

mdl = lmfit.Model(foo)
# 'a' and 'b' are not parameters, 'kind' is not an independent variable.
print(mdl.independent_vars, mdl.param_names)
# ['x'] []
# Should be: ['x', 'kind'] ['a', 'b']

# 'a' and 'b' are not varied.
# 'kind' is ignored.
x = np.linspace(-10, 10, 21)
y = -2 * np.exp(x / 2)
fitres = mdl.fit(y, x=x, kind='exponential')
fitres.plot_fit()
plt.show()

# This fails.
mdl = lmfit.Model(foo, param_names=['a', 'b'])
Fit report:

Actually, fit report creation fails with the following traceback.

Traceback (most recent call last):
  File "c:\Users\wilde\Desktop\tempTest\lmfit\argtest.py", line 21, in <module>
    print(fitres.fit_report())
          ^^^^^^^^^^^^^^^^^^^
  File "D:\wilde\nobackup\misc_git_repos\lmfit-py\lmfit\model.py", line 1832, in fit_report
    report = fit_report(self, modelpars=modelpars, show_correl=show_correl,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\wilde\nobackup\misc_git_repos\lmfit-py\lmfit\printfuncs.py", line 139, in fit_report
    namelen = max(len(n) for n in parnames)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: max() iterable argument is empty
Error message:
Traceback (most recent call last):
  File "c:\Users\wilde\Desktop\tempTest\lmfit\argtest.py", line 24, in <module>
    mdl = lmfit.Model(foo, param_names=['a', 'b'])
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\wilde\nobackup\misc_git_repos\lmfit-py\lmfit\model.py", line 305, in __init__
    self._parse_params()
  File "D:\wilde\nobackup\misc_git_repos\lmfit-py\lmfit\model.py", line 612, in _parse_params
    raise ValueError(self._invalid_par % (arg, fname))
ValueError: Invalid parameter name ('a') for function foo
Version information
Python: 3.12.7 | packaged by conda-forge | (main, Oct  4 2024, 15:47:54) [MSC v.1941 64 bit (AMD64)]

lmfit: 1.3.2.post13+g39354856, scipy: 1.14.1, numpy: 2.1.3,asteval: 1.0.5, uncertainties: 3.2.2
Link(s)

discussion

@newville
Copy link
Member

newville commented Dec 1, 2024

@schtandard I may not understand your proposed changes.

Any explicitly specified independent variables or parameters are set, of course.

That's not a change.

All remaining arguments are distributed to the two categories. Those with non-numerical defaults are independent variables, those without or with numerical defaults are parameters.

That's not a change either.

If independent variables were not explicitly specified, the first function argument is made one, irrespective of its default value.

That's not a change either.

So, what change are you proposing?

I would be inclined to allow but ignore the bare * function, and to raise an exception with a bare / function argument.
The meaning of * (all remaining function arguments can only be called as keyword arguments) doesn't do much for us.
On the other hand, a bare / (all previous arguments can only be called as positional arguments, not as keyword arguments) will never work for lmfit.Model(), as all function parameters by name with a func(**dict) call.

Just to be clear:

a) The first argument of the function does not need to be an independent parameter. That is the default, but it can be specified otherwise.

b) Positional arguments will generally become Parameters.

c) All Parameters need a default value. Positional arguments have a default value of None (which becomes the equally useless -np.inf). Keyword arguments will become Parameters if their value is numerical (which becomes the Parameter's default value). Keyword arguments with non-numerical values will become independent variables.

I don't think we need to change any of these.

@schtandard
Copy link
Contributor Author

All remaining arguments are distributed to the two categories. Those with non-numerical defaults are independent variables, those without or with numerical defaults are parameters.

That's not a change either.

Yes, it is. As I said, explicitly specified independent variables are not currently amended, which is why arguments can be "lost". Here's the example again.

def foo(x, a=1, b=1, kind='linear'):
    pass

mdl = lmfit.Model(foo, independent_vars=['x'], parameters=['a'])
print(mdl.independent_vars, mdl.param_names)
# ['x'] ['a', 'b']
# 'kind' got lost.
If independent variables were not explicitly specified, the first function argument is made one, irrespective of its default value.

That's not a change either.

Yes, it is. Currently, the first argument without a default argument is used. If all arguments have default values, an error is raised. In the new case of keyword-only arguments, such an argument may exist but not be the first argument. Here are the example signatures for the two relevant cases again.

def foo(x=3, a=1, b=1, kind='linear'):
def foo(x=3, *, a, b=1, kind='linear'):

I would be inclined to allow but ignore the bare * function, and to raise an exception with a bare / function argument.

Not supporting positional-only arguments is another option, of course.

On the other hand, a bare / (all previous arguments can only be called as positional arguments, not as keyword arguments) will never work for lmfit.Model(), as all function parameters by name with a func(**dict) call.

But this could be changed as outlined in my post above, no? Adapting make_funcargs() and its use (mainly in eval()) should do the trick, I think.

Just to be clear: ...

All of this is clear, yes, but I see a danger of confusing terminology: You (and the lmfit code) seem to use "positional argument" equivalently with "argument without a default value". While not quite exact, this is good enough as long as only POSITIONAL_OR_KEYWORD arguments are present, which can be given in either form. With keyword-only arguments, however, you might have arguments without a default value that must be given by keyword. Conversely, with positional-only arguments, you might have arguments with a default value that must be given by position.

@newville
Copy link
Member

newville commented Dec 1, 2024

@schtandard I won't get into an argument about how the code works.

I am certainly willing to leave the use of a bare "*" unsupported. Adding "support" for it would be to ignore it. It would not affect the default behavior for how lmfit.Model distinguishes independent variables from Parameters.

With keyword-only arguments, however, you might have arguments without a default value that must be given by keyword. Conversely, with positional-only arguments, you might have arguments with a default value that must be given by position.

Lmfit.Model always calls with keyword arguments, assigning a value even when no default is provided. It never uses positional-only arguments.

It seems to me that adding support for bare * could cause more confusion, including "well, if you support bare '*', why not bare '/'"?

@schtandard
Copy link
Contributor Author

Well ok, if always calling using keyword arguments is an unalterable design choice, then supporting positional-only arguments is impossible, of course. Keyword-only arguments are vastly more common in any case (and considered good practice in common cases, as far as I know), so that's also a reasonable choice apart from implementation blockers. I mainly tried to include positional-only arguments because you brought them up in the discussion; I've never used them myself. For me, the more "confusing" thing is that keyword-only arguments are not supported, especially since arbitrary keyword arguments (i.e. **kwargs) are.

Now for me just making the one-line change that ignores the * is also fine, so if you want to do that, I can make a PR tomorrow. But since you did seem to think that my proposed rules for distinguishing independent variables from parameters was how it is already done (i.e. how it's supposed to be done), I will ask one last time:

Is it really the desired behavior that with the model function signature

def foo(x=3, *, a, b=1, kind='linear'):

a is an independent variable and x is not?

And is it really the desired behavior that with the model function signature

def foo(x=3, a=1, b=1, kind='linear'):

no automatic selection of independent variables can be performed?

@newville
Copy link
Member

newville commented Dec 2, 2024

@schtandard

def foo(x=3, a=1, b=1, kind='linear'):
      ...
model = lmfit.Model(foo)

failing with a non-helpful error message seems like the most important problem I've seen here so far.

I don't have a strong opinion on what the preferred behavior for guessing independent variables with

def foo(x=3, *, a, b=1, kind='linear'):
       ...
model = lmfit.Model(foo)

should be. I find that syntax sort of "weird beyond belief". In this case, foo() must be called with a keyword parameter "a" but no default value is given in the signature, the traditional signal of a keyword argument. x can be called as a positional argument, but has a default value. So

foo(4.0, a=[])
foo(a='duck', x=9.3)
foo(a=foo)
foo(x=None, a=9.3)

are all allowed. But of course

foo(a=numpy, 8)

is a syntax error. Because that one just makes no sense at all. And clarity is a goal...? Sure.

I suppose we should try to support ignoring the bare * though I find this example to be sort of a good argument for explicitly not supporting it. The more I see of this bare * the less I like it. I expect that this will take some study, and I fear this is also opening a can of worms of confusion and corner cases. This is not going to happen quickly.

It is going to be very hard to convince me to support /.

Lmfit programmatically inspects and calls the user's Model function. We can (and we do) make some assumptions and demands on the call signatures that we support. Whether we support bare * or / are choices we can make, and those choices might be No.

@schtandard
Copy link
Contributor Author

schtandard commented Dec 2, 2024

Alright, let me know once you've thought about it. I feel that making the first argument an independent variable regardless of default values is worthwhile even without supporting keyword-only arguments (x=3 in the function signature would normally become a parameter, just as x, so why should they behave differently when in first position?) and avoiding "lost" arguments is independent of keyword-only arguments in any case. As I said, these could be made a separate issue if you prefer, but they should definitely be decided before supporting keyword-only arguments because they affect how the new edge-case is treated. (For all pure keyword-or-positional signatures, the proposed change would just allow more cases that currently produce problems or errors.)

@newville
Copy link
Member

newville commented Dec 9, 2024

@schtandard I had a chance to play with inspect on functions with bare *. I think that perhaps the simplest change, might be the best thing would be to
a) explicitly allow KEYWORD_ONLY parameters
b) explicitly forbid POSITIONAL_ONLY parameters
c) use the first positional or keyword parameter as a default independent variable

That would look change the code at

pos_args = []
to be:

            pos_args = []
            sig = inspect.signature(self.func)
            for fnam, fpar in sig.parameters.items():
                if fpar.kind == fpar.VAR_KEYWORD:
                    keywords_ = fnam
                elif fpar.kind in (fpar.POSITIONAL_ONLY,
                                   fpar.KEYWORD_ONLY,
                                   fpar.POSITIONAL_OR_KEYWORD):
                    default_vals[fnam] = fpar.default
                    if (isinstance(fpar.default, (float, int, complex))
                       and not isinstance(fpar.default, bool)):
                        kw_args[fnam] = fpar.default
                        pos_args.append(fnam)
                    elif fpar.default == fpar.empty:
                        pos_args.append(fnam)
                    else:
                        kw_args[fnam] = fpar.default
                        indep_vars.append(fnam)
                elif fpar.kind == fpar.POSITIONAL_ONLY:
                    raise ValueError("positional only arguments with '/' is not supported")
                elif fpar.kind == fpar.VAR_POSITIONAL:
                    raise ValueError(f"varargs '*{fnam}' is not supported") 

with that change made locally, this:

import inspect
from lmfit import Model

def f(x=3, *, a, b=1, kind='linear'):
    return

for name, arg in  inspect.signature(f).parameters.items():
    print(f"name={name}, kind={arg.kind}, default={arg.default}")

mod = Model(f)

print('Independent Variables: ', mod.independent_vars)
print('Parameter root names: ', mod._param_root_names)
print('Default values: ', mod.def_vals)

gives

name=x, kind=POSITIONAL_OR_KEYWORD, default=3
name=a, kind=KEYWORD_ONLY, default=<class 'inspect._empty'>
name=b, kind=KEYWORD_ONLY, default=1
name=kind, kind=KEYWORD_ONLY, default=linear
Independent Variables:  ['x', 'kind']
Parameter root names:  ['a', 'b']
Default values:  {'x': 3, 'b': 1}

I think that would work for your case, and seems perfectly reasonable to me. Selecting a as the independent variable seems slightly weirder here, but might be reasonable too. Opinions?

To be clear, this would also explicitly forbid bare /, which I think we need to do.

I still have the feeling this is going to cause some pain somewhere down the road. But I do think that the current status of failing (and with not very clear messages) for functions with that signature is not acceptable.

@schtandard
Copy link
Contributor Author

@schtandard I had a chance to play with inspect on functions with bare *. I think that perhaps the simplest change, might be the best thing would be to a) explicitly allow KEYWORD_ONLY parameters b) explicitly forbid POSITIONAL_ONLY parameters c) use the first positional or keyword parameter as a default independent variable

Perfect, then we are in agreement.

That would look change the code at [...]

That looks right, except that you need to remove fpar.POSITIONAL_ONLY from the first elif and you probably want to write are instead of is in the corresponding error message.

I think that would work for your case, and seems perfectly reasonable to me. Selecting a as the independent variable seems slightly weirder here, but might be reasonable too. Opinions?

I strongly prefer the proposed behavior over choosing a as the independent variable. "If I don't specify any, the first argument will be considered an independent variable" is also much simpler to understand than the alternative. (Also, in that case, it's just as reasonable to wonder why a is chosen at all, after all there is another independent variable in any case (kind). And models without independent variables, while not very useful, also work just fine.)

To be clear, this would also explicitly forbid bare /, which I think we need to do.

I don't think we need to, but I agree that it's best to do so for now. If somebody comes along with a compelling use-case, it can still be added. (One way would be to automatically create a "pass-through" function with the same signature except that all arguments can be given by keyword that passes the values through to the original function and then use that as the model function.)


This is probably something you will want to do separately, if at all, but the whole parsing logic could be simplified quite a bit after this change, I think. The distinction between kw_args and pos_args seems to be needless now. What's in pos_args now could easily be identified by having inspect._empty in the default_vals (which is done for asteval functions already but not for standard functions) or by missing in that dictionary. I wanted to give it a shot myself, but I'm rather confused by self.opts and how it is used. I would have expected it to just be rolled into self.independent_vars_defvals (and parameter hints) and then not be used anywhere else, but this doesn't seem to be the case. Anyways, the change you propose above (with the corrections) should be fine, so cleaning up the code is probably best left for another day.

@schtandard
Copy link
Contributor Author

I guess I should open a separate issue about the possibility of "losing" function arguments.. Or maybe I'll just make a PR after the other stuff is dealt with; the solution seems obvious in this case.

@newville
Copy link
Member

@schtandard Thanks for getting back on this. Yeah, I basically agree with everything you say.

fpar.POSITIONAL_ONLY needs to move, and the error message should be fixed.

I agree that in the example, making x an independent variable makes the most sense, and "first argument" is easier to explain than "first argument without a default value".

I agree that the code might be made simpler... Model has a lot going on!

I agree that it ought to be technically possible to make POSITIONAL_ONLY work - it would be more work to track those, but possible. The general case for POSITIONAL_ONLY (some names used in the signature can change, I guess) seems much weaker to me than KEYWORD_ONLY (don't rely on argument order after this point). The whole design of Model does depend on using the actual names in the signature.

Anyway, I think we could not follow the "POSITIONAL_ONLY" intention anyway. With

def func(x, /, amp=1, center=0, sigma=1):
     ...

model = Model(func)
pars = model.make_params(amp=3, center=30, sigma=4)

you would still evaluate that with x as a keyword argument:

init = model.eval(pars, x=99)

which seems kind of odd.

I can start a PR, but if you would like to contribute to it, that would be great.

@newville newville mentioned this issue Dec 15, 2024
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants