Apple has recently released a technical paper outlining the models developed to power Apple Intelligence. These generative AI features are set to debut on iOS, macOS, and iPadOS in the coming months.
In response to allegations regarding its training methods, Apple clarified that it did not utilize private user data to train its models for Apple Intelligence. The company stated that the data sources used included publicly available, open-sourced, and licensed data, carefully curated by Apple’s web crawler, Applebot.
The technical paper sheds light on the development of Apple Foundation Models (AFM), which were first introduced at WWDC 2024. Apple emphasized that the training data for these models was sourced responsibly, including licensed data from publishers and publicly available web data.
To enhance the AFM models’ capabilities, Apple incorporated math-related content from various sources, ensuring the inclusion of high-quality and publicly available datasets. The training dataset for the AFM models totaled around 6.3 trillion tokens.
In an effort to refine the AFM models and address potential issues, Apple also utilized human feedback and synthetic data. The company emphasized its commitment to responsible AI principles throughout the development process.
While the technical paper does not reveal groundbreaking insights, Apple aims to position itself as an ethical player in the AI space. The legality of training models using public web data is a subject of ongoing debate and litigation.
As the future of generative AI models remains uncertain, Apple remains focused on upholding ethical standards while navigating the complex legal landscape.