Keyboard / Mouse Input Device Design??

Peter Olcott · Nov 1, 2006

Le Chaud Lapin said:
With those parameters, it is indeed possible to find matches. How
could you not? If your software runs on the same computer as the
windows that it is monitoring, then certainly if you render a piece of
text using the parameters that match what is displayed, you will have
an exact match, even with effects of anti-aliasing, transformations,
etc.

However, I should point out again. Given that the user of your
software has to specify these parameters anyway, and given that text
that was not generated by the underlying font system will not, in
general, be recognized by your software, it remains that the most
important elements of recognition are pieces of text that is generated
by the GUI system.

But it is possible to intercept _all_ rendering of such text through
well-defined API's. In other words, if I were interested in knowing if
there were a window that had the word "JFET" in it, I have to options.

1. Use your system and enter the above information.
2. Use my hypothetical system, and just specify "JFET".

Do you see? By interposition into the GUI subsystem, it becomes far
easier to describe what you are looking for. Font face, point size,
styling, and color become irrelevant, if it doesn't matter.

There is something else that is important. With your system, it seems
that you are taking snapshots. The problem with snapshots is that
there is a chance you will miss something, unless you are planning to
bump up the rate of frame-grabbing so fast that you miss nothing. With
my hypothetical system, there would never need to be a need to take a
snapshot. You'd always know the state of the system.

-Le Chaud Lapin-

My system is the only possible way that is inherently compatible with every
system , platform, and application. There are many cases where the required
information is unavailable from the system internals. My system handles all of
those cases. Now that we have dual core machines, it is possible, using a DFA to
process many screens very quickly. I expect that my system could even play and
win fast paced video games.

Peter Olcott · Nov 1, 2006

Le Chaud Lapin said:
Ok, I see what you are doing now. I hate to rain on anyone's parade,
especially one where the objective is ambitious, you should know that
what you are doing, the ultimate result, could be done in a way that is
probably superior in many respect than the image based method.

One example is simple. Let's say that a programmer wants to use your
software to know whenever the string "You Have Mail" appears anywhere
on the screen, knowing that there is a mail application that pops up a
window with this message. He specifies the font family, point size,
style, and background/foreground colors of the little window that
contains this message. To get this information, he spends 10 minutes
repeatedly sending mail messages to himself to force the window to
popup, and when it does, he eyeballs the message to ascertain the
parameters. Finally he goes to your software and enters arguments for
these parameters. Then he tells your software to run, and specifies a
rate-of-grab of frame buffers so that the window, which pops up for
only three seconds, is not missed.

Compare that to not having to force anything to popup or eyeball
anything, simple typing in "you have mail", checking case-insensitive
box, and being done with it. Not rate-of-grab would be necessary
because there would be no frame grabbing. The monitoring software
would simply "know" the state of entire GUI system at any point in
time.

Certainly you will agree that, if this is what your software does, the
latter method has significant advantages?

-Le Chaud Lapin-

And significant disadvantages, for example a false positive match. For something
as simple as that, my system might be able to process as many as 100 frames per
second. In fact that may be the biggest problem with the approach you are
proposing over my method. Another problem is that there are times when this
"text" message is displayed using a bitmap, rather than text itself.

Roger Hamlett · Nov 1, 2006

Peter Olcott said:
My system is the only possible way that is inherently compatible with
every system , platform, and application. There are many cases where the
required information is unavailable from the system internals. My system
handles all of those cases. Now that we have dual core machines, it is
possible, using a DFA to process many screens very quickly. I expect
that my system could even play and win fast paced video games. <snipped>

While I don't actually 'like' the keyboard interface approach being asked
for, these devices are readily available, and so it should be easy to test
how it does all behave. Look at the Hagstrom KE72
http://www.hagstromelectronics.com/, or the PI engineering X-Keys control
board (the USB versions, may well be the better long term solution)
http://www.piengineering.com/ . The Vetra systems VIP module, might also
be worth a look (allows direct RS232 input to the keyboard port)
http://www.vetra.com/Elimina2.htm.

Best Wishes

Le Chaud Lapin · Nov 1, 2006

Peter said:
And significant disadvantages, for example a false positive match. For something
as simple as that, my system might be able to process as many as 100 frames per
second. In fact that may be the biggest problem with the approach you are
proposing over my method. Another problem is that there are times when this
"text" message is displayed using a bitmap, rather than text itself.

The method I was suggesting could not generate a false positive for
text that is not regarded as simply an image. The reason is that, the
very objective of your software - to determine what text is being
rendered, is actually acoomplish before the text even hits the screen.
If there is any program anywhere on the computer that tries to display
"MOSFET" using any DRAW-TEXT primitive in the system, my method would
catch it. So in fact, I would get a 100% hit rate on text that is
normally rendered by the system.

For text where the programmer first converted it to an image and told
the GUI subsystem to render it, my method would fail with OCR. But
then, the problem reverts to OCR anyway.

Now consider: we do not have an exhaustive list of fonts to be used, so
your method would have to have that to approach a hit rate of 90%
without help from the user. Of course, if the user tells you what the
font face is, etc, and all of these things, then yes, your software
would approach 100%.

However, as mentioned, my gut feel is that
"in-line-interception-of-text" versus "snapshot-of-graphics" is
superior. One has to imagine the headache vs. % effectiveness of using
each model.

Which would you rather have? 100% hit rate on 95% (perhaps) of the
situation by simply declaring what text needs to be sought or 98%+ hit
rate on 98% of the situations with painstaking determination of color,
font face, pitch, and foreground back ground color each time, not to
mention the possibility that you will miss an "easy" true positive
because you're taking snapshots?

-Le Chaud Lapin-

Peter Olcott · Nov 1, 2006

Le Chaud Lapin said:
The method I was suggesting could not generate a false positive for
text that is not regarded as simply an image. The reason is that, the
very objective of your software - to determine what text is being
rendered, is actually acoomplish before the text even hits the screen.
If there is any program anywhere on the computer that tries to display
"MOSFET" using any DRAW-TEXT primitive in the system, my method would
catch it. So in fact, I would get a 100% hit rate on text that is
normally rendered by the system.

Including all the cases where the string "MOSFET" was not placed on the right
place of the right window to be the correct trigger even for the required
action.

For text where the programmer first converted it to an image and told
the GUI subsystem to render it, my method would fail with OCR. But
then, the problem reverts to OCR anyway.

So my system can ALWAYS work, whereas yours only works some of the time.
SeeScreen is inherently compatible with every system, platform and application.

The next best alternative is a hodge podge conglomeration of many different
complex technologies limited to simulating user actions on far fewer
applications and operating system platforms.

Now consider: we do not have an exhaustive list of fonts to be used, so
your method would have to have that to approach a hit rate of 90%
without help from the user. Of course, if the user tells you what the
font face is, etc, and all of these things, then yes, your software
would approach 100%.

I already posted one of several ways that my system can determine with certainty
the exact set of FontInstances.

Zak · Nov 2, 2006

Peter said:
I already have MS Windows completely handled, the answer there is mouse_event(),
keybd_event() and SendInput(). What I am looking for is a solution for Mac OS,
and Linux. Oh Yeah I forgot, there are (or used to be) a few apps the took their
input directly from the hardware, so I might not have every single MS Windows
app. One MS DOS app that did this was PC Anywhere.

Look at KVM switches the run over IP and allow a client on the PC to
connect.

Some servers have remote management cards that do similar. You connect
with VNC to control mouse and keyboard.

Thomas

Moore's Lobby Podcast

Menu

Categories

Platforms

Content

Connect With Us

Network

Keyboard / Mouse Input Device Design??

Keyboard / Mouse Input Device Design??

Peter Olcott

Peter Olcott

Roger Hamlett

Le Chaud Lapin

Peter Olcott

Zak

Similar threads