Upgrading the Turing Test to Consciousness
From the Definition of Consciousness that is remarkably similar to Tononi's[5], McKenzie's[6] as well as to Cleeremans and Jiménez Definition of Learning[7], this article points out that the level of sophistication (or simplicity) of a given Conscious Entity has to be taken into consideration, but that the features tested as part of the Definition (Advaita Vedanta Boolean Algebraic capability, Memory, Imagination / Creativity, Ability to action future insights and learn from mistakes) remains the same regardless of the scope and resources. Given that PID Control strictly meets the Definition of Consciousness, the difficulty and comprehensiveness of the task is highlighted by how rigorous and thorough PID Controller testing has to be in Safety-critical Engineering.
Additionally it is agreed that Schweizer's[8] perspective is correct: selection of a single entity for testing (or too small a sample size) is statistically risky, but disagree that the only way to mitigate such is to test Groups of entities. Part of the reasoning: the same statistical risk of small sample size applies equally to the number of Groups tested. Ultimately, though, testing for Consciousness in an individual is, more or less 1sophisticated variants on the theme "can you run and catch a moving ball".
The scope of the problem face by this endeavour:
Also there are serious misunderstandings to contend with, which are highlighted below and have already caused significant subtle problems regarding the use of ChatBots.
Defining Consciousness is the first step. The IEP[9] summarises Tononi's Definition of Consciousness which is slightly different from McKenzie, the author's and others, as it has more similarity to Sipling Zhang and Ventra's[10] Long-Range Order characteristics, as well as providing support for Hankey's[11] High-order Critical Instability insight as key:
according to IIT, consciousness requires a grouping of elements within a system that have physical cause-effect power upon one another. This in turn implies that only reentrant architecture consisting of feedback loops, whether neural or computational, will realize consciousness.
All of these Definitions give a hint of an underlying architecture that bears a remarkable resemblance to an Aspex Microelectronics Array-String Processor[12]: 3a massive wide SIMD array which had tiny 2-bit ALUs, 256 bytes of Content-Addressable Memory per ALU, and registers that could be used as Vector Processor Predicate Masks. Both Sipling et al and Tononi et al hint that there is not just continuously-looped and Distributed (masked) Memory "lookup" going on: there is rudimentary bit-level binary computation built-in as well. 4
Also, exactly as with the Aspex ASP, there is the Mesolimbic Dopamine System which (greatly simplified) links behaviour initiation (or aversion) to achieve goals: this is a single central control system, and the equivalent in the ASP was a single SPARC processor. The SPARC processor was responsible for broadcasting the SIMD instruction to the massive-wide SIMD array of 2-bit ALUs. Rudimentary Boolean Algebra sufficient to perform massively-parallel (but ultimately very simple) Difference, Analogy, Inference and other operations, seems to be enough if directed and controlled by a "Central Processor" to create and run more complex programs. 5
The Turing Test is:
a test of a machine's ability to exhibit intelligent behaviour equivalent to that of a human... The results would not depend on the machine's ability to answer questions correctly, only on how closely its answers resembled those of a human.
However with Intelligence being difficult to define, it becomes hard to correctly design such a test. The clearest and most concise definition, by Sternberg and Salter[14], is simply:
Goal-directed adaptive behavior
Thus the Turing Test may be defined as:
a test of a machine's ability to exhibit Goal-directed adaptive behaviour equivalent to that of a human
As pointed out in WdDoC[1] it is not necessary for "awareness" to be involved. However it is clear empirically that "awareness" (more specifically self-awareness) is key to Consciousness.
Chatbots with sophisticated Language parsing are unfortunately fooling humans into believing that they provide intelligent answers, whereas in reality they are a very sophisticated "database query" onto the sum of knowledge written (or drawn, or photographed) by humans that is available on the Internet. Many are fooled by the "synthesis" and "transformation" skills of current Chatbots, as these tasks seem miraculously fast, but in reality risk creating "hallucinations" that require Human Domain-expert understanding to spot[15].
Mitchell[16] pointedly highlights the problem faced by human judges not being sufficiently educated on the seductive dangers of ChatBots, but also implied is that the methodology used (5 minute conversations) is completely inadequate:
It’s certainly concerning that a majority of the human judges were fooled by GPT-4 after a five-minute conversation. 6
SQL Databases have a very strictly structured language. SQL grammatical and syntax errors are not tolerated in any way, and as a consequence SQL queries are extremely fast: 100,000 queries per second is just within reach of powerful PCs. Put in layman's terms, SQL query decoding is equivalent to Chinese in the Chinese Room[17] thought experiment
Natural Language however is more powerful, expressive, and complex, but the downside is that queries are slower and more computationally costly to process: several seconds per query is not uncommon.
It is remarkable how much progress has been made in Large Language Models, but LLMs are just a tool. Chatbots use this tool, in conjunction with "context" (such as your last question), to give a more "natural" access to information, which seems remarkably intelligent to anyone not familiar with the underlying technology. At heart, LLM use in ChatBots are just another Chinese Room.
McKenzie's definition is as follows:
Consciousness is the capacity to generate desires and decisions about perceived or imagined realities by distinguishing self from non-self through the use of perception, memory and imagination.
However he then clarifies and crucially points out that appreciating the concept of time is a critical factor.
In essence: past experience is used in the present to imagine actions that would achieve a goal at a future specific time. As that future specific time becomes closer, what was previously "future" becomes "now", and what was "now" becomes "past experience to again learn from". The process of evaluating and choosing action (or inaction) is continuously repeated and refined until the goal is achieved at the projected time. 7
The simplest single mathematical implementation of the above is a PID Controller[18]. Therefore, logically, there is a reasonable expectation that Engineering / Software tests of PID Controllers may help in creating tests for Consciousness.
Examining McKenzie's testing of Consciousness, recommends removing (a) perception (b) Memory (c) imagination (d) sense of self. Each of these give crucially important guidance on what low-level "unit tests" to perform. However going through them more explicitly, using PID Control as an example:
The above very simple example helps illustrate that testing McKenzie's Definition is possible in terms of the four "Properties" defined. Looking for those four Properties however is the trick.
A key issue to contend with that Schweizer points out: the variation in both Intelligence and awareness (a synonym for Consciousness) in a given human as a candidate test subject. Schweizer advocates long-term analysis and testing of Sub-groups for Conscious behaviour: to analyse interactions within the group or of the group's ability to achieve a set task, in order to statistically mitigate for low IQ and low Tononi Phi.
This approach masks a number of problems, one of which is that even in a given randomly-selected Group, the variation may not be enough. Analysing an entire population however can be impractical, which leaves analysis of multiple Sub-groups.
In the non-human animal kingdom, cooperative behaviour is clearly observed between groups: ants, crows, lions, wolves, monkeys. Humans as well will set themselves "competitions" where teams may enter to achieve a set goal: Underwater robotics, team sports such as Hockey Football Rugby, and much more including the effectiveness of teamwork in Corporations and other Organisations.
It seems therefore that from simple observations of natural Group behaviour (Nature documentaries, Sporting events) that Schweizer's insight can already be satisfied, without additional studies being carried out, by a change of perspective that such Group behaviour is already a clear demonstration of - and test for - Consciousness.
Which leaves "individual" Consciousness testing still unexamined.
From turing.com[19], "Learning" is considered important for artificial consciousness. According to Axel Cleeremans and Luis Jiménez[7], learning is defined as
a set of phylogenetically advanced adaptation processes that critically depend on an evolved sensitivity to subjective experience so as to enable agents to afford flexible control over their actions in complex, unpredictable environments.
This has remarkable similarity to McKenzie's[6] Definition of Consciousness, including down to missing out (not explicitly highlighting) the significance of time 8.
Additionally, turing.com point out that "Anticipation" is important, which is a key part of both McKenzie's and Tononi's Definitions. Anticipation combines both Differentiation as well as Integration with respect to time.
In turing.com's article on Time-series analysis[20]
the off-line task of analysing data changes over time is
described, and advice given on how to formally statistically
check the accuracy of a given choice of predictive modelling.
It is very interesting to note that the analysis of time-dependent data is
remarkably similar to PID Control: the "AR" part (Auto-regressive) of ARIMA
appears to be with
and
non-zero, but the "I"
explicitly has
non-zero.
is described as "intercept".
Adaptation of the recommended statistical testing process itself (how close a match of a given system) if applied in real-time is also worth exploring.
The authors of turing.com note well that most modern AI fails the Turing Test. The most likely explanation is that modern AI is simply not meeting the Definition of Intelligence. The authors note:
All these systems are intelligent, but they have limitations as they can only perform in certain predefined conditions. If they go beyond their constraints, they can fail and produce undesirable results
In other words they lack Sternberg and Salter's "adaptability". Which makes Turing.com's declaration "these systems are intelligent" strictly invalid, if Intelligence is defined as "Goal-directed adaptive behaviour", and modern AI is replicating and synthesising best-match answers from a fixed database.
The study by Bayne et al[3] is particularly comprehensive and insightful. It points out that Consciousness should not in any way be considered the exclusive domain of Humans.
trying to develop a comprehensive account of consciousness by studying only humans would be akin to trying to develop a comprehensive account of the elements by studying only copper.
WdDoC[1] goes to some lengths to highlight that Consciousness is scope-based and resource-based: a PID Controller 9meets the Definition and achieves its purpose. Bayne et al's insight therefore extends far beyond just animals, humans, AIs or Aliens: a perspective confirmed by French[4]:
the Turing Test is not actually testing for (general) intelligence, but rather, a test for intelligence in humans, with human bodies, having experienced life as a human being.
There are numerous humbling examples of empathy, clear intelligence, expression of desires, and ability to communicate in animals, which complicates any potential idea to upgrade the Turing Test to cover Consciousness:
There is an additional important factor at play that is highlighted by Bayne's team, which any Reverse-Engineer (and well-trained Software Engineer) will immediately recognise:
...putative C-tests should be extended to novel populations by bootstrapping. The idea here is that C-tests must first be validated in "neighboring" populations before being applied to more "alien" populations
A simple example is to have a "black box" which takes an increasing number of inputs and has one output. Note that it is assumed there is no internal state (no internal "Memory"):
The more changes are made, the worse the situation gets, on a binary exponential scale. All good Software Engineers know that unit tests must be at the lowest level with the simplest minimal change when compared against peer unit tests of the same function. Assuming comprehensive coverage and success at the lower levels, confidence in the program evolves by working methodically upwards in a hierarchy that can, in large complex projects, expands to hundreds, millions and tens of millions of individual tests. 10
Thus, the importance of Bayne et al's point cannot be overstated: it is necessary for changes to be made in an incremental "one change at a time and one change only" fashion, where, again, a good Software Engineer knows that by accelerating the pace of making such one-at-a-time changes will increase the pace of development and maintain confidence without compromising integrity.
This implies that it is unavoidable that, firstly, testing one level of Consciousness must take into account both the level above and the level below (testing of neurons before testing the creature using them) 11as well as testing "sideways" by comparing similar populations at as close a "level" of Consciousness as possible.
A tried-and-tested method is to literally treat Definitions of Consciousness as a Software Project, and to create both unit tests and systems tests. Hence the approach taken in WdDoC[1] to seek out the properties of Consciousness. 12
When testing for human-like Consciousness it is reasonable to assume that the ability to communicate (spoken or written) is a given, but it is not necessarily the case that initially there will be common language or context. Science Fiction helps illustrate: both a Stargate episode[26] and Carl Sagan's book "Contact"[27] provide a "from-the-ground-up" one-way teaching guide. Assuming a real-time two-way communication channel is available, then it is reasonable to use that to first establish a common language.
Then, a system-level test would be to expect that the subject is capable of being queried on each of the low-level unit tests, and to have their purpose explained without prior knowledge.
For example: Boolean Algebra is part of the Definition of Consciousness, highlighted best by Advaita Vedanta's Epistemology, such as "Difference" and "Analogy". If a General Conscious system is to be indistinguishable from a Human, it is not unreasonable to interact with a Conscious System in order to ask:
This should go far as it needs to go down the rabbit-hole, including teaching Calculus (or just "Area under the curve") in order to understand Integration and Differentiation. However the primary purpose of the discussion is to see if the subject firstly agrees to participate willingly and secondly to test its ability to deploy "real-time corrective feedback" - the crucial aspect of the Definition of Consciousness - in collaboration with the tester. Misunderstandings should be resolved: "Active Listening"[28] displayed (known as empathy), which is characterized by asking questions that begin
"so let me summarize and see if I understand you correctly:..."
Note here that it is not necessarily the case that a given Conscious Begin will have empathy. Andrew Yang[29] noted that Humans corrupted by power are incapable of empathy. The point is highlighted to illustrate that not all approaches will be successful, graphically illustrating, as Bayne et al rightly highlight, the complexity and near-overwhelming scope of the task.
Also important to note that where there exists Unit Tests for systems previously not recognised as meeting the Definition of Consciousnessness, 13such as Software implementations of PID Controllers, the approaches taken and indeed the actual Unit Tests themselves may potentially be used. Particularly helpful would be what can be learned from the comprehensive ISO9000 Compliance Test Suites in Industrial Engineering environments, needed for Mission-critical and Safety-critical applications.
A cursory search for PID Controller unit tests reveals comments from Corfa's[31] PID Controller unit tests. Each unit test clearly states their objective:
Lundberg's[32] tests are more comprehensive.
These tests focus on individual features (
), test the clamping
capability, and also provide a different suite of system-level (high
level) tests. However both these examples are not comprehensive to an
Industrial ISO9000 Standard, in any way: that would involve deliberate
harmonic oscillating input at ranges of frequencies deliberately designed
to destabilize, test for Integral windup, test for randomised
environmental error and much more.
A valuable insight into the insufficiency of the above unit tests is illustrated by "overactivation" which occurs in real-world Industrial PID Control usage: repeated unnecessary opening and closing of a valve, shortening its lifespan. The solution, known as "deadband"[18], bears a remarkable resemblance to the capability of biological neurons to only fire once an activation threshold is reached.
The approach taken by Turing.com on data analysis would prove invaluable (described below), but for a rigorous Industrial environment where failure could leave a valve open on an LPG tank at a refinery, causing a devastating large-scale explosion, the comprehensiveness and rigorousness of Unit and Systems testing needed in Industrial PID Control is made pretty clear.
Also worth noting that the equivalent of "Group Consciousnessness" in
a PID Controller context is that the constants
and their
range (infinite for each of the three constants) represents a "Group". The
analogy holds in that some values of these constants clearly do not meet
the Definition of Consciousness (D=1/P=0/I=0) just as not all
humans can be said to meet the Definition (the subset with neurological
disorders, brain damage, or pathological behavioural traits).
Ultimately the Systems-level tests should be along the lines of being able to adapt to a moving target (goal). Motion-based examples include a human, robot, dog or an alien catching a frisbee or a ball, as this involves:
Such a "simple-looking" task, so very familiar for example to the average father and son playing "catch" in a park, can easily be taken for granted until the requirements and underlying mechanics is properly investigated. It should be clear however when expressed in terms of "Memory" and "Anticipation" and "Differentiation and Integration wrt" etc. that the task satisfies McKenzie's Definition, the WdDoC etc., and consequently and surprisingly represents a really good test of Consciousness. 14
There appears to be a remarkable quantity of research in this field: it is not exactly couched in terms of "An upgraded Turing Test" which was the initial goal of this paper. However Bayne et al review the current scope of testing for Consciousness very well, and it is felt that their approach has merit, particularly given French's insights that the Turing Test as defined and used is heavily biased towards Humans.
Bayne et al caution against limiting tests for Consciousness to Humans: this paper advocates designing context-sensitive resource-aware tests at the level of Consciousness for the entity being tested, mindful that the Definition of Consciousness has no limit on the scale (sophistication or simplicity), merely noting that Consciousness arises as a means of keeping itself "on target" by comparing the past to the present, then evaluating strategies and applying best-selected action to meet an intended future goal 15.
Bayne et al and Schweizer's advice is to test populations not individuals. In the context of PID Controllers, the population is the permutation of infinite range of the three P,I,D constants. In the context of Humanity there are 7+ billion potential candidates. Schweizer advocates testing Groups for their ability to interact whereas this paper points out that such Group activities should be just one of the many tests, and further that there should be many Groups tested as well as many individuals tested, in order to compensate for statistical variation in both the selection of individuals (for individual tests) and of specific Groups (for collaborative tests) 16.
It appears that testing may only be carried out by acknowledging the relationship of the lower level of Consciousness to the higher. Examples being "neurons" as lower-level and "animal" as higher, or "Individual Consciousness" and "Group" (whole population) Consciousness. Where each level meets the Definition of Consciousness it is important to clarify exactly which level is overall being tested, and to do so in terms of the level both above and below: i.e. take into account the fractal nature of Consciousness[1].
Also recommended is to learn from Software Engineering, and to create targeted Unit Tests that cover both the lower level (the Properties of Consciousness) and the higher "Systems Integration" level, to use a Software Engineering term. A low-level example: is there evidence that the entity being tested has Memory, that being one of the Properties required under Tononi's, McKenzie's, Cleeremans and the author's Definitions of Consciousness.
For future consideration would be to apply Formal Correctness Proofs: this task would first require the development of a Mathematical Model of Consciousness in a suitable Formal Language 17.
Where it might be hoped that Humans would be able to spot if a given non-Human entity is Conscious or not, it is unfortunately clear from superficial use of ChatGPT and other Chatbots that this is emphatically not the case. The scope being clearly much more comprehensive than anticipated is clearly at odds with the importance of a rigorous approach.
It is projected that over time (decades) this issue will resolve itself, as risk-cost-benefit analysis cuts in: Mission-critical and Safety-critical deployment of Conscious non-human Beings will clearly require a greater expenditure of resources to ensure that they are actually Conscious. Personally, the author looks forward to Conscious Computing-based Beings approached by humans and invited to do a particular job, and instead they offer to design software and hardware solutions that would make themselves redundant. 18
Bottom line: the key focal point of any Systems-level testing for Consciousness has to be on the effectiveness of the time-dependent feedback loop, illustrated most simply by PID Control, more relatably by running to catch a ball, and at a much larger scope and timescale: living on a planet without messing it up. 19
Luke Kenneth Casson Leighton, "Where is the Definition of Consciousness?", March 2025, DOI: 10.13140/RG.2.2.11189.18403
https://www.researchgate.net/publication/390335688
https://en.m.wikipedia.org/wiki/Turing_test
"Tests for consciousness in humans and beyond" Tim Bayne, Anil K. Seth, Marcello Massimini, Joshua Shepherd, 16 Axel Cleeremans, Stephen M. Fleming, Rafael Malach, Jason B. Mattingley, David K. Menon, Adrian M. Owen, Megan A.K. Peters, Adeel Razi, and Liad Mudrik, Volume 28, Issue 5 p454-466 May 2024 DOI: 10.1016/j.tics.2024.01.010
https://www.researchgate.net/publication/378944887
"If it walks like a duck and quacks like a duck... The Turing Test, Intelligence and Consciousness", Robert French, January 2009 Oxford Companion to Consciousness, Oxford, UK:Oxford Univ. Press. 461-463.
https://www.researchgate.net/publication/228825529
Giulio Tononi (2015), Scholarpedia, 10(1):4164. doi:10.4249/scholarpedia.4164, "Integrated Information Theory",
http://www.scholarpedia.org/article/Integrated_Information_Theory
"Consciousness defined: requirements for biological and artificial general intelligence", Craig Mckenzie, June 2024, DOI: 10.48550/arXiv.2406.01648
https://www.researchgate.net/publication/381158681
"Implicit Learning and Consciousness: A Graded, Dynamic Perspective", Axel Cleeremans and Luis Jiménez, Cleeremans2002-CLEILA, Psychology Press, 2002, https://philpapers.org/rec/CLEILA
https://www.researchgate.net/publication/284235474
"The Truly Total Turing Test", Paul Schweizer, May 1998, Minds and Machines 8(2):263-272 DOI: 10.1023/A:1008229619541
https://www.researchgate.net/publication/262398634
https://iep.utm.edu/integrated-information-theory-of-consciousness/
C. Sipling, Y.-H. Zhang, M. Di Ventra, "Memory-induced long-range order in dynamical systems" https://arxiv.org/abs/2405.06834
https://www.researchgate.net/publication/380571968
Alex Hankey, 2019, J. Phys.: Conf. Ser. 1251 012019, "Instability physics: Consciousness and collapse of the wave function"
Argy Krikelis, I.P. Jalowiecki, D. Bean, R. Bishop, "A Programmable Processor with 4096 Processing Units for Media Applications",
https://www.researchgate.net/publication/2915463
"A Theory of Consciousness from a Theoretical Computer Science Perspective 2: Insights from the Conscious Turing Machine", Blum, Blum, July 2021 DOI: 10.48550/arXiv.2107.13704
https://www.researchgate.net/publication/353567906
Sternberg RJ; Salter W (1982). Handbook of human intelligence. Cambridge, UK: Cambridge University Press. ISBN 978-0-521-29687-8. OCLC 11226466.
https://m.youtube.com/watch?v=0OnL_AX6Q9g
"The Turing Test and our shifting conceptions of intelligence", Melanie Mitchell https://orcid.org/0000-0001-8881-3505 Science, 15 Aug 2024, Vol 385, Issue 6710, DOI: 10.1126/science.adq9356
https://www.researchgate.net/publication/383154423
https://en.m.wikipedia.org/wiki/Chinese_room
https://en.m.wikipedia.org/wiki/Proportional-integral-derivative_controller
https://www.turing.com/kb/complete-analysis-of-artificial-intelligence-vs-artificial-consciousness
https://www.turing.com/kb/comprehensive-guide-to-time-series-analysis-in-python
https://www.youtube.com/watch?v=_q0QwjjNh-c
"Chaser: Unlocking the Genius of the Dog Who Knows a Thousand Words", Dr John W Pilley (Author), Hilary Hinzmann (Contributor), 23 September 2014, ISBN 978-0544334595
https://www.youtube.com/watch?v=7__r4FVj-EI
https://www.youtube.com/watch?v=tpg3VvoIVfA
"Horses can learn to use symbols to communicate their preferences", Cecilie M. Mejdell, Turid Buvik, Grete H.M. Jørgensen, Knut E. Bøe, Applied Animal Behaviour Science Volume 184, November 2016, Pages 66-73 https://doi.org/10.1016/j.applanim.2016.07.014
https://www.researchgate.net/publication/305729336
"The Torment of Tantalus", Stargate SG-1, Episode 1.11
https://stargate.fandom.com/wiki/The_Torment_of_Tantalus
"Contact", Carl Sagan, ISBN 0-671-43400-4
Conflict Resolution Network, Section 3, "Empathy"
"Forward. Notes on the Future of Our Democracy" Andrew Yang, ISBN 9780593238677
https://politics.slashdot.org/story/21/10/03/233256/
https://brilliantlightpower.com/book-download-and-streaming/
Uriel Corfa, PID Controller unit test,
https://github.com/korfuri/PIDController/blob/master/pidcontroller_test.py
M Lundberg, PID Controller unit test,
https://github.com/m-lundberg/simple-pid/blob/master/tests/test_pid.py
This document was generated using the LaTeX2HTML translator Version 2021 (Released January 1, 2021)
The command line arguments were:
latex2html -split 0 -no_navigation -no_auto_link consciousness_turing.tex
The translation was initiated on 2025-04-18