v2.7.0
๐ Major featuresโ
- Metax sGPU topology aware by (@Kyrie336) in #1193
- NVIDIA Resourcequota by (@FouoF) in #1359
- Kunlunxin topology-aware scheduling by (@FouoF) in #1141
- Kunlunxin vxpu sopport #1016 by (@ouyangluwei163) (@archlitchi) in #1337
- Enflame GCU topology-awareness (#1040) by (@zhaikangqi331) in #1334
- AWS-neuron device and device-core allocation by (@archlitchi) in #1238
- Aggregated Scheduling Failure Events by (@Wangmin362) in #1333
๐ Major bug fixesโ
- fix: Before executing MIG partitioning, suppress NVML usage in oโฆ by (@Goend) in #1095
- Fix golint-CI by (@archlitchi) in #1127
- fix: override node socre failure for kunlun #1137 by (@ouyangluwei163) in #1138
- fix: Multi-node scoring nodes are inaccurate by (@ouyangluwei163) in #1147
- fix: An error occurred while create Iluvatar pod by (@ouyangluwei163) in #1149
- Fix e2e CI by (@archlitchi) in #1165
- fix: Add option for overwrite schedulerName by (@Shouren) in #1163
- fix: using go-safecast to fix incorrect conversion of numbers by (@Shouren) in #1183
- fix: deal with security issues reported by Trivy in image by (@Shouren) in #1189
- fix: wrong Pod's UID and emtpy Pod's name in log of webhook.go by (@Shouren) in #1092
- fix: concurrent map writes error in scheduler.calcScore #1269 by (@Shouren) in #1270
- fix: release dangling node lock by (@peachest) in #1271
- fix: fix err which retrieved incorrect NUMA node information issue #1275 by (@abstractmj) in #1276
- fix(security): resolve issues reported by Code scanning in Security by (@Shouren) in #1280
- fix: fix golangci-lint error by (@DSFans2014) in #1319
- Fix: device allocation missing containers with no device request by (@FouoF) in #1299
- fix: update int8Slice to uint8Slice for better type clarity and consistency by (@yxxhero) in #1357
๐ What's Changedโ
๐ Documentationโ
- documentation: add Known Issues for dynamic mig support by (@Goend) in #1122
- docs: fix broken link by (@lixd) in #1125
- clearly list supported devices doc references at README by (@FouoF) in #1155
- docs: update ascend910b-support docs by (@DSFans2014) in #1321
๐จ Other Changesโ
- Optimize Fit-in-device logic to make it device-specific by (@archlitchi) in #1097
- feat(scheduler): make node lock timeout configurable by (@Kevinz857) in #1117
- featue: mig mode-change #1116 by (@ouyangluwei163) in #1124
- feat: Add new labels in .github/release.yml by (@Shouren) in #1066
- feat(scheduler-role): use a scoped-down role for scheduler by (@Antvirf) in #1152
- feat(helm): optionally disable admission webhook by (@Antvirf) in #1145
- remove redundant metrics for vgpu allocation by (@FouoF) in #1169
- refactor: clean up code and improve maintainability by (@Wangmin362) in #1195
- refactor: Ranging over SplitSeq is more efficient by (@Shouren) in #1239
- feat:NodeLockTimeout set from env by (@miaobyte) in #1244
- refactor: move watchAndFeedback function to feedback.go by (@miaobyte) in #1248
- feat: add informer-based pod cache to reduce API server load by (@miaobyte) in #1250
- feat: Add option to disable device plugin at values.yaml. by (@FouoF) in #1274
- refactor(util/nodelock): replace manual polling with k8s.io/client-go/util/retry by (@mayooot) in #1252
- refactor: Remove annotation in Devices interfaces by (@Shouren) in #1343
- feat: update the
Ascend910scheduling policy by (@DSFans2014) in #1344 - feat(nvidia): default gpucores=100 when memory is exclusive and coresโฆ by (@xrwang8) in #1354
- Prerelease-v2.6 by (@archlitchi) in #1108
- add new reviewers Shouren and ouyangluwei163 by (@wawa0210) in #1131
- Support topology-awareness for Kunlunxin device by (@archlitchi) in #1121
- Support Metax sGPU Qos Policy by (@Kyrie336) in #1123
- add global image for chart by (@calvin0327) in #1133
- fix: Skip admission webhook when Pod's scheduler is already assigned. by (@ghostloda) in #1041
- Add node configs to docs by (@wylswz) in #1159
- build(deps): upgrade golang to 1.24.4 by (@Shouren) in #1172
- build(deps): Upgrade golang image in ci to 1.24.4 by (@Shouren) in #1176
- build(deps): Upgrade controller-runtime to 0.21.0 by (@Shouren) in #1171
- build(deps): Dump github.com/NVIDIA/nvidia-container-toolkit by (@Shouren) in #1170
- Add unit tests for Fit Function for enflame,hygon, metax, mthreads, nvidia by (@Wangmin362) in #1199
- [Misc] update hami-core version by (@chaunceyjiang) in #1201
- Improve the impl of DevicePluginConfigs.Nodeconfig overwriting NvidiaConfig by (@FouoF) in #1158
- Add unit tests for cambricon's Fit Function by (@Wangmin362) in #1198
- Add unit tests for Ascend's Fit Function by (@Wangmin362) in #1197
- ไฟฎๅค็ๆ pod ่ฏทๆฑ่ตๆบๆถไธๅฟ ่ฆ็้ๅค่ฎก็ฎ by (@litaixun) in #1215
- ไฟฎๅคๆดๆฐ่็นๆณจ่งฃๆถ็ๆฅๅฟๆ็คบ่ฏ by (@litaixun) in #1214
- If the mem applied for the Mig device is the same as the template value,>will result in CardNotFoundCustom Filter Rule. by (@zgqqiang) in #1179
- updated dri section to combine text for better readability by (@mpetason) in #1216
- feat: Add nvidia gpu topoloy scheduler by (@fyp711) in #1028
- add issue translate robot by (@wawa0210) in #1232
- add issue translate robot by (@wawa0210) in #1234
- perf(util/nodelock): Use clientset Patch instead of Update. by (@mayooot) in #1192
- Update hami-core and fix readme documents by (@archlitchi) in #1240
- Update hami-core version to fix by (@archlitchi) in #1256
- [Snyk] Security upgrade tensorflow/tensorflow from latest-gpu to 2.20.0rc0-gpu by (@wawa0210) in #1243
- feat: Add an action of 'Close stale issue and PRs' in github worklfow by (@Shouren) in #1083
- Welcome fyp711 to become a HAMi member by (@wawa0210) in #1288
- Add values readme by (@clcc2019) in #1267
- Support Metax sGPU device health check by (@Kyrie336) in #1295
- Optimize pkg/util.go and distribute logics to corresponding logics by (@archlitchi) in #1296
- cleanup: Clear and correct ascend device name by (@FouoF) in #1315
- bugfix: Nvidia card abnormal pod will still continue to schedule by (@zgqqiang) in #1336
- FIx CI, add 910B4-1 template and fix vGPUmonitor metrics error by (@archlitchi) in #1345
- add httpTargetPort to values.yaml by (@flpanbin) in #1356
- Update kunlunxin documents by (@archlitchi) in #1366
- update chart version and hami-core by (@archlitchi) in #1369
Committers: ๐ New Contributorsโ
- Kevinz857 (@Kevinz857)
- FouoF (@FouoF)
- Antvirf (@Antvirf)
- wylswz (@wylswz)
- litaixun (@litaixun)
- zgqqiang (@zgqqiang)
- mpetason (@mpetason)
- fyp711 (@fyp711)
- mayooot (@mayooot)
- miaobyte (@miaobyte)
- peachest (@peachest)
- abstractmj (@abstractmj)
- clcc2019 (@clcc2019)
- DSFans2014 (@DSFans2014)
- xrwang8 (@xrwang8)
Full Changelog: https://github.com/Project-HAMi/HAMi/compare/v2.6.1...v2.7.0









