skip to main content
10.1145/3638550.3641126acmconferencesArticle/Chapter ViewAbstractPublication PageshotmobileConference Proceedingsconference-collections
research-article
Open access

Creating Edge AI from Cloud-based LLMs

Published: 28 February 2024 Publication History

Abstract

Cyber-human and cyber-physical systems have tight end-to-end latency bounds, typically on the order of a few tens of milliseconds. In contrast, cloud-based large-language models (LLMs) have end-to-end latencies that are two to three orders of magnitude larger. This paper shows how to bridge this large gap by using LLMs as offline compilers for creating task-specific code that avoids LLM accesses. We provide three case studies as proofs of concept, and discuss the challenges in generalizing this technique to broader uses.

References

[1]
Agus, T., Suied, C., Thorpe, S., and Pressnitzer, D. Characteristics of human voice processing. In Proc. of 2010 IEEE Intl. Symp. on Circuits and Systems (ISCAS) (Paris, France, June 2010).
[2]
Amazon. Voxelab Aquila X2 3D Printer. (https://www.amazon.com/gp/product/B095GQ87QG/ref=ppx_yo_dt_b_asin_title_o07_s00?ie=UTF8&th=1). Last accessed September 23, 2023.
[3]
Bala, M., Eiszler, T., Chen, X., Harkes, J., Blakley, J., Pillai, P., and Satyanarayanan, M. Democratizing Drone Autonomy via Edge Computing. In Proc. of the Eighth ACM/IEEE Symp. on Edge Computing (SEC) (Wilmington, DE, December 2023).
[4]
Chen, Z., Hu, W., Wang, J., Zhao, S., Amos, B., Wu, G., Ha, K., Elgazzar, K., Pillai, P., Klatzky, R., Siewiorek, D., and Satyanarayanan, M. An Empirical Study of Latency in an Emerging Class of Edge Computing Applications for Wearable Cognitive Assistance. In Proceedings of the Second ACM/IEEE Symposium on Edge Computing (Fremont, CA, October 2017).
[5]
Dronecode. QGroundControl: Intuitive and Powerful Ground Control Station for the MAVLink protocol. (http://qgroundcontrol.com/). Last accessed October 2, 2023.
[6]
Ellis, S. R., Mania, K., Adelstein, B. D., and Hill, M. I. Generalizeability of Latency Detection in a Variety of Virtual Environments. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (2004), vol. 48.
[7]
George, S., Eiszler, T., Iyengar, R., Turki, H., Feng, Z., Wang, J., Pillai, P., and Satyanarayanan, M. OpenRTiST: End-to-End Benchmarking for Edge Computing. IEEE Pervasive Computing 19, 4 (2020).
[8]
Ha, K., Chen, Z., Hu, W., Richter, W., Pillai, P., and Satyanarayanan, M. Towards Wearable Cognitive Assistance. In Proceedings of the Twelfth International Conference on Mobile Systems, Applications, and Services (Bretton Woods, NH, June 2014).
[9]
Open Geospatial Consortium. KML Overview. (https://www.ogc.org/standard/kml/). Last accessed October 2, 2023.
[10]
OpenAI. Introducing ChatGPT. (https://openai.com/blog/chatgpt). Last accessed September 25, 2023.
[11]
OpenAI. Introducing ChatGPT and Whisper APIs. (https://openai.com/blog/introducing-chatgpt-and-whisper-apis). Last accessed September 29, 2023.
[12]
Pham, T. A., Wang, J., Iyengar, R., Xiao, Y., Pillai, P., Klatzky, R., and Satyanarayanan, M. Ajalon: Simplifying the authoring of wearable cognitive assistants. Journal of Software Practice and Experience 51, 8 (August 2021).
[13]
Pungas, T. LLM latency is linear in output token count. (https://www.taivo.ai/__llmlatency-is-linear-in-output-token-count/). Last accessed September 25, 2023.
[14]
Ramon, M., Caharel, S., and Rossion, B. The speed of recognition of personally familiar faces. Perception 40, 4 (2011).
[15]
Satyanarayanan, M., Beckmann, N., Lewis, G. A., and Lucia, B. The Role of Edge Offload for Hardware-Accelerated Mobile Devices. In The 22nd International Workshop on Mobile Computing Systems and Applications (Hotmobile '21) (Virtual, February 2021).
[16]
Satyanarayanan, M., and Davies, N. Augmenting Cognition through Edge Computing. IEEE Computer 52, 7 (July 2019).
[17]
Satyanarayanan, M., Gao, W., and Lucia, B. The Computing Landscape of the 21st Century. In Proc. of HotMobile '19 (Santa Cruz, CA, 2019).
[18]
YouTube. Using ChatGPT to Simplify Development of a Wearable Cognitive Assistant for 3D Printer Assembly. (https://www.youtube.com/watch?v=Fn5vTvFCfl8). Last accessed October 12, 2023.
[19]
YouTube. Voxelab Aquila X2 Unbox and Build. (https://www.youtube.com/watch?v=7IPSiIdaIlA). Last accessed September 23, 2023.
[20]
YouTube. Wearable Cognitive Assistance for assembly of IKEA Utility Cart. (https://www.youtube.com/watch?app=desktop&v=yO56SsZZRDg). Last accessed September 24, 2023.
[21]
Zhu, X., Li, J., Liu, Y., Ma, C., and Wang, W. A Survey on Model Compression for Large Language Models. (https://arxiv.org/abs/2308.07633), 2023.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HOTMOBILE '24: Proceedings of the 25th International Workshop on Mobile Computing Systems and Applications
February 2024
167 pages
ISBN:9798400704970
DOI:10.1145/3638550
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 February 2024

Check for updates

Author Tags

  1. edge computing
  2. machine learning
  3. cloudlets
  4. large language models
  5. generative AI
  6. wearable cognitive assistance
  7. drones

Qualifiers

  • Research-article

Funding Sources

Conference

HOTMOBILE '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 96 of 345 submissions, 28%

Upcoming Conference

HOTMOBILE '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,546
  • Downloads (Last 6 weeks)162
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media