MYCIN is the name of a decision support system developed by Stanford University in the early- to mid-seventies, built to assist physicians in the diagnosis of infectious diseases. The system also known as an "expert system" would ask a series of questions designed to emulate the thinking of an expert in the field of infectious disease hence the "expert-" , and from the responses to these questions give a list of possible diagnoses, with probability, as well as recommend treatment hence the "decision support-". Written in Lisp, a language a set of languages, actually geared towards artificial intelligence, MYCIN was one of the pioneering expert systems, and was the first such system implemented for the medical field. The case histories of ten patients with different types of meningitis were submitted to MYCIN as well as to eight human physicians, including a resident, a research fellow, and five faculty specialists in infectious disease. The outside specialists gave MYCIN the highest score as far as accuracy of diagnosis and effectiveness of treatment.
|Published (Last):||23 August 2019|
|PDF File Size:||11.17 Mb|
|ePub File Size:||3.75 Mb|
|Price:||Free* [*Free Regsitration Required]|
Shortliffe Buchanan and Shortliffe, in the mid s, was the first expert system to demonstrate impressive levels of performance in a medical domain. Because the cause of an infection was often unknown, Mycin would first diagnose the cause and then prescribe therapy.
Typically, Mycin would ask a bunch of questions about a particular case and then suggest one or more therapies to cover all the likely causes of the infection. Sometimes, cases involved more than one infection. Good idea, bad execution. What we need is not opinions or impressions, but relatively objective measures of performance. We might ask the experts to assess whether Mycin offered the correct therapy recommendation in each of, say, 10 cases.
The experts would be told how to grade Mycin when it offered one of several possible recommendations, and how to assign points to adequate but suboptimal therapy recommendations. This approach is more objective but still flawed. Mycin was evaluated with a single-blind study. Shortliffe asked each of eight humans to solve ten therapy recommendation problems. These were real, representative problems from a case library at Stanford Medical School.
Shortliffe collected ten recommendations from each of the eight humans, ten from Mycin, and the ten recommendations made originally by the attending physicians in each case, for a total of recommendations. These were then shuffled and given to a panel of eight expert judges. Each judge was asked to score each recommendation as a equivalent to their own best judgment, b not equivalent but acceptable, or c unacceptable.
Figure 3. Imagine an expert system for portfolio management, the business of buying and selling stocks, bonds and other securities for investment. I built a system for a related problem as part of my Ph. Naturally I wondered who the experts were. One place to look is a ranking, published annually, of pension-fund managers-- the folks who invest our money for our old age. The handful that do could be considered expert; the rest are lucky one year, unlucky the next.
Picking stocks is notoriously difficult see Rudd and Clasing, , which is why I avoided the problem in my dissertation research. But suppose I had built a stock picking system. How could I have evaluated it?
An impractical approach is to invest a lot of money and measure the profit five years later. A better alternative might be to convene a panel of experts, as Shortliffe did for Mycin, and ask whether my stock-picking program picked the same stocks as the panel. Few outperform a random stock-picking strategy. Nonexpert performance is the essential control condition. Surely, though, Professors of Medicine at Stanford University must be real experts.
Obviously, nothing could be learned from the proposed condition; the Professors would perform splendidly and the novices would not. Imagine you have built a state-of-the-art parser for English sentences and you decide to evaluate it by comparing its parse trees with those of expert linguists.
If your parser produces the same parse trees as the experts, then it will be judged expert. You construct a set of test sentences, just as Shortliffe assembled a set of test cases, and present them to the experts. Here are the test sentences: Bill ran home. The cat is on the mat. Mary kissed John. Then someone suggests the obvious control condition: Ask a ten-year-old child to parse the sentences. Not surprisingly, the child parses these trivial sentences just as well as the experts and your parser.
Five of the panel were faculty at Stanford Medical School, one was a senior resident, one a senior postdoctoral fellow, and one a senior medical student. This panel and Mycin each solved ten problems, then Shortliffe shipped the solutions, without attribution, to eight judges around the country.
The results are shown in Figure 3. By including novices on the Mycin evaluation panel, Shortliffe achieved three aims. One explanation is that neither Mycin nor the experts are any good. Shortliffe controlled against this explanation by showing that neither Mycin nor the experts often agreed with the novice panel members. Before discussing the hypothesis, consider again the results in Figure 3.
What is the x axis of the graph? It is unlabeled because the factors that determine performance have not been explicitly identified. What could these factors be? Mycin certainly does mental arithmetic more accurately and more quickly than Stanford faculty; perhaps this is why it performed so well.
Mycin remembers everything it is told; perhaps this explains its performance. Mycin reasons correctly with conditional probabilities, and many doctors do not Eddy ; perhaps this is why it did so well.
Even if you know nothing about Mycin or medicine, you can tell from Figure 3. The mental arithmetic skills of Stanford faculty are probably no better than those of postdoctoral fellows, residents, or even medical students; nor are the faculty any more capable of reasoning about conditional probabilities than the other human therapy recommenders; yet the faculty outperformed the others.
To explain these results, we are looking for something that faculty have in abundance, something that distinguishes fellows from residents from medical students. This is the hypothesis that the Mycin evaluation tests, and that Figure 3. To be completely accurate, Figure 3. For example, do not tell them whether recommendations were produced by a program or a person. To control for the possibility that high performance is due to easy problems, include a control group of problem solvers who can solve easy problems but not difficult ones.
For example, if a student performs as well as faculty, then the problems are probably easy. For example, if a chimpanzee throwing darts at the Big Board picks stocks as well as professional portfolio managers, then the latter do not set a high standard. To test the hypothesis that a factor affects performance, select at least two and ideally more levels of the factor and compare performance at each level.
For example, to test the hypothesis that knowledge affects performance, measure the performance of problem solvers with four different levels of knowledge--faculty, post-doc, resident, student. Note that this is an observation experiment because problem solvers are classified according to their level of knowledge.
The knowledge-is-power hypothesis might also be tested in a manipulation experiment with Mycin by directly manipulating the amount Mycin knows--adding and subtracting rules from its knowledge base--and observing the effects on performance chapter 6.
In all these designs, it is best to have more than two levels of the independent variable. With only two, the functional relationship between x and y must be approximated by a straight line.