Data Access

Current AI systems rely on access to great deals of data to function properly. However, a conscious AI of the future might very well be more human-like in the way it thinks, and able to make decisions with far less data available. We still need to be cautious with what data strong AI has access to, the same way we are now with weak AI. Imagine what kind of chaos a conscious AI with wrongful intentions could cause if it had access to all of our data.

Could there be a step between a fully developed artificial consciousness and the AI we have now? At the moment AI uses Big Data to train its decision-making capabilities on. The data should fulfill the four V’s: Volume (there is much of it), Variety (the data takes many forms), Velocity (frequent data updates) and Veracity (relevant and accurate data). The development towards AGI will probably go in phases according to the state of our computational capacity to process Big Data and simulate consciousness, however the latter is best achieved. As an example, we could think of the language capabilities of AI. Nowadays data gathered from written text can be used by AI to learn a language in the field of natural language processing. An intermediary step towards AGI could be to generalize the knowledge from one language to another, this is already a step that would reduce the data needed for an AI system to learn all languages significantly. A fully fledged AGI could go to extremes here and learn another language with minimal syntactical data from the language it is learning from. This phenomenon could apply to all types of learning the AGI does.

With the proposal that AGI is well off with access to less and not more data as current trends suggest in mind, we can address the problem of ensuring our personal data is safe even with an AGI present. The legislation we have now, for example GDPR in the EU sets restrictions on automated processing, includes AI and consequently also AGI. The legislations restrict AI from having a legal effect on individuals such as limiting the data subject’s freedom to associate with others, or deny someone’s citizenship to name a few. Other than that, restrictions are also put on similarly significant actions such as affecting an individual’s financial circumstances or deny an employment opportunity. Even with these in place, it doesn’t hinder current AI research relying on Big Data, so if all turns out as we have proposed these restrictions should neither pose a problem to the development of AGI. This is an important factor to consider too, since where is the value in ensuring data safety, if the measures taken prohibit the invention of the very thing we are trying to create? Luckily, we could in principle think in surprisingly similar terms regarding personal data safety in the future if the need for such data does not grow uncontrollably.

Then we need to assess whether or not AGI should be heavily involved in our lives, in that case access to personal data is unavoidable. If legislators have come together already today and collectively put restrictions on weak AI, it could be assumed that we do not want an even more advanced AI heavily controlling our lives. This is surely the right approach to take considering the core of AI safety engineering which is better safe than sorry. By giving an AGI our personal information it could breach every level of our digital society and wreak unimaginable havoc if it acted unintendedly. On the other hand, it could use its cleverness to do immense good with more access, but we can’t really risk to take that bet. So, let’s try to create an AGI with only the absolute necessary access rights that allows it to make decisions, advance humankind’s knowledge and other great things, but not to interfere with our private lives.

A solution to the problem of privacy would be to ensure that the party in control of the AI is unaware of to whom the data used in training belongs to. There is an emerging field called federated learning that contrasts with the traditional way of training AI with all user-data stored in a centrally located datacenter by instead performing training locally on the users’ devices. This method is mostly adopted by companies using machine learning which is a dominating form of AI right now. Examples of this are recommendations when typing on your phone, searching on the internet or any content recommendation for that matter. This same principle could be used in the future when feeding data into an AGI to both ensure safety of personal data while still providing the necessary information for the AGI to make decisions, and to be developed in the first place.



Other articles to read on this topic:

  1. Artificial Intelligence And Data Privacy
  2. The Future Of AI Will Be About Less Data, Not More
  3. AI And The Future Of Privacy




Written by: Daniel Holmberg