Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cron scaler: Fix unexpected instance count change #3838

Merged
merged 4 commits into from
Dec 8, 2022

Conversation

zou2699
Copy link
Contributor

@zou2699 zou2699 commented Nov 10, 2022

fix IsActive in time boundary

Signed-off-by: tux [email protected]

Provide a description of what has been changed

Checklist

  • When introducing a new scaler, I agree with the scaling governance policy
  • Commits are signed with Developer Certificate of Origin (DCO - learn more)
  • Tests have been added
  • A PR is opened to update our Helm chart (repo) (if applicable, ie. when deployment manifests are modified)
  • A PR is opened to update the documentation on (repo) (if applicable)
  • Changelog has been updated and is aligned with our changelog requirements

Fixes #
fix IsActive in time boundary
for example, when we use this cron configuration to scale pod at the last day of a month

start: 0 8 28-31 * *
end: 0 20 1 * *

if runTime is 2022-01-30T7:59:59.9999999
might happen
nextStartTime 2022-01-30T8:00:00.000000
currentTime 2022-01-30T8:00:00.000111
at this time currentTime < nextStartTime will be false,IsActive will return true,trigger unexpected scaler

Fixes: #3854

fix IsActive in time boundary

Signed-off-by: tux <[email protected]>
@zou2699 zou2699 requested a review from a team as a code owner November 10, 2022 10:31
@JorTurFer
Copy link
Member

Hey,
What is the problem you are trying to solve? I mean, even the description is true, doing that change moves this "issue" to the end time instead of starting time. I think that you shouldn't expect millisecond accuracy in the cron scaler, because it's evaluated every X seconds, indeed.
The scenario you described is correct, but the trigger will be activated/deactivated in the next cycle, usually in 30 seconds

@zou2699
Copy link
Contributor Author

zou2699 commented Nov 11, 2022

We use the following expression to expand the capacity from the end of 8:00 am on the last day of each month to 8:00 pm on the 1st of the next month

start: "0 8 L * *"
end: "0 20 1 * *"

But the preexisting dependency library cron does not support L (the last day of month), see issue robfig/cron#464

So we use the following expression instead

start:"0 8 28,29,30,31 * *"
end: "0 20 1 * *"

But currently cronScaler.IsActive may trigger an issue like

     // if  execTime=2022.10.30 07:59:59.999
      nextStartTime, startTimecronErr := getCronTime(location, s.metadata.start)
      // got nextStartTime=2022.10.30 08:00:00.000
       ......
      nextEndTime, endTimecronErr := getCronTime(location, s.metadata.end)
     // got nextEndTime=2022.11.01 20:00:00.000
      ......
     currentTime := time.Now().Unix()
     // got currentTime=2022.10.30 08:00:00.111
     switch {
       case nextStartTime < nextEndTime && currentTime < nextStartTime:
            return false, nil
       // will use this case, return true, will scale up pod
      // but in the next cycle,  will retrun false       
       case currentTime <= nextEndTime:
           return true, nil
       default:
           return false, nil
}

this will cause an unexpected scale up event

when we get the currentTime before getting nextStartTime in cronScaler.IsActive, we can avoid this mistake

@zroubalik
Copy link
Member

@zou2699 could you please open an issue first, with clear explanation of the problem? Thanks!

@zroubalik
Copy link
Member

@zou2699 do you see any usecases (that don't suffer from this edge case) where this fix could break the existing behavior?

@zou2699
Copy link
Contributor Author

zou2699 commented Nov 15, 2022

@zroubalik This will not affect other usecases

@JorTurFer
Copy link
Member

JorTurFer commented Nov 15, 2022

I think that the change doesn't change the behavior, more than changing the edge case of 59:999. It's true that the opposite case (00.001) will change, but as I said, I think that it's a really "forced" edge case and the change will be done in the next polling instead of current (30 secs later) and we are assuming that this evaluated exactly at 59.999 which is so complicated

@JorTurFer
Copy link
Member

BTW @zou2699 , are you sure that the problem is related with this? I mean, what pooling interval do you have? Because if you are not checking ever 1 second (and even in that case), I'd say that it's really complex to happen

@zou2699
Copy link
Contributor Author

zou2699 commented Nov 16, 2022

The generation of nextStartTime depends on the current time point, but the currentTime here is obtained after nextStartTime is generated, which will lead to the following problems

For example, in the above example, we don't want an unexpected instance count change at 2022.10.30 07:59:59.999, we only expect to instance count change at 2022.10.31 08:00:00.000 -> 2022.11.01 20:00:00.000 .

Through this adjustment, the nextStartTime of the time that is not the last day (October 28, October 29, October 30) is always greater than the currentTime:

case1:
currentTime: 2022-10-28 07:59:59.999
nextStartTime: 2022-10-28 08:00:00.000
case2:
currentTime: 2022-10-28 08:00:00.001
nextStartTime: 2022-10-29 08:00:00.000

to avoid unexpected instance count changes, thus supporting 'L' in Cron expressions

@zroubalik
Copy link
Member

@zou2699 could you please fix the static check problems and add entry to the changelog, we can merge it then.

Signed-off-by: tux <[email protected]>
CHANGELOG.md Outdated Show resolved Hide resolved
@zroubalik
Copy link
Member

zroubalik commented Dec 8, 2022

/run-e2e cron*
Update: You can check the progress here

Signed-off-by: tux <[email protected]>
CHANGELOG.md Show resolved Hide resolved
@zroubalik zroubalik changed the title Update cron_scaler.go Cron scaler: Fix unexpected instance count change Dec 8, 2022
@zroubalik zroubalik merged commit 35e6b6b into kedacore:main Dec 8, 2022
@zou2699 zou2699 deleted the zou2699-patch-1 branch December 9, 2022 01:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cron scaler may have jitter
3 participants