Keyboard / Mouse Input Device Design??

Le Chaud Lapin · Oct 31, 2006

Joel said:
In my mind this would still make a lot more sense to do in software. If his
fancy technology can recognize what pixel needs to be clicked on anyway, a
Windows driver can trivially move the mouse pointer to that spot and "click
it."

True. Raw Input interface would not only eliminate the hardware, it
would excute in user-mode as a standard EXE, with network communication
so that it all machines are controllable from a central machine.

http://windowssdk.msdn.microsoft.com/en-us/library/ms645536.aspx

-Le Chaud Lapin-

Peter Olcott · Nov 1, 2006

Jan Panteltje said:
I am saying tha tto capture the screen in Linux, you must have access to the
display buffer.
If it runs Xwindows, in an application a buffer is created

A sequece would be something like this:
Display *mydisplay;
mydisplay = XOpenDisplay(displayname);
XDrawPoint(mydisplay, topwindow, xptestgc, x, y);
etc... (omitted important stuff).

'mydisplay' will be internal to the application, can even be a local
variable...
So if your application needs to read say graphic text, or boxes from the
display, you have no access.
You cannot grab data from the graphics card either, as all are different
(bypass driver).
There are many modes in X too, and there is the framebuffer that can be used.
And then the same again for applications that use vgalib or just the
basic text console.
I would not know how to do it in a way that worked for _all_ systems known
on Linux.

So you can't simply get the pixels comprising the display screen on Linux?

Peter Olcott · Nov 1, 2006

Joel Kolstad said:
Why does it have to be hardware? Since you already specified that you're
running on Intel platforms, writing drivers for Windows will probably cover
it's probably >99%...

I already got Windows 100% handled, I am looking into making this technology
universal.

Peter Olcott · Nov 1, 2006

Don Cleveland said:
Peter,

You might want to look at

http://www.hagstromelectronics.com/

Don Cleveland

I am already working 16 hours a day, what I need is someone that already knows
this off the top of their head. Feasible / Infeasible ?

Peter Olcott · Nov 1, 2006

Joel Kolstad said:
One thing the OP may not realize is that, on all modern Windows OSes, you can
generally plug in as many keyboards and mice as you like and Windows takes
care of "merging" their input streams.

So I have a suspicion the OP really has no need whatsoever to build a
"wedge"-type piece of hardware, but rather just a device that emulates a
keyboard or mouse and can be (in some unspecified programmatic manner!) told
what to "type" or where to "move and click".

In my mind this would still make a lot more sense to do in software. If his
fancy technology can recognize what pixel needs to be clicked on anyway, a
Windows driver can trivially move the mouse pointer to that spot and "click
it."

---Joel

I already have MS Windows completely handled, the answer there is mouse_event(),
keybd_event() and SendInput(). What I am looking for is a solution for Mac OS,
and Linux. Oh Yeah I forgot, there are (or used to be) a few apps the took their
input directly from the hardware, so I might not have every single MS Windows
app. One MS DOS app that did this was PC Anywhere.

Peter Olcott · Nov 1, 2006

James Waldby said:
Jan said:

..."Peter Olcott"... wrote ...

"Jan Panteltje" ... wrote ...
Hi, I forgot to tell you in comp.os.linux.development.apps that the part
where you read the screen is also problematic, as X has its own graphics
drivers, etc.
Are you saying that doing a screen capture is hard, or recognizing the
graphical objects from the screen capture? (I got the last part solved).

Click to expand...

I am saying tha tto capture the screen in Linux, you must have access to the
display buffer.
If it runs Xwindows, in an application a buffer is created [snip re X and buffers]
I would not know how to do it in a way that worked for _all_ systems known
on Linux.

This is assuming your application reads what is on the screen, what you
told me.

Click to expand...

On any Unix system where the ImageMagick package is installed, one can
use "import" to get an image from any part of the X display. See eg
http://linux.about.com/library/cmd/blcmdl1_import.htm or man import.
It is straightforward to exec import from a C program and then read
the image file, which can be in any of the one-hundred graphics formats
that ImageMagick supports.
-jiw

Good, what about sending keystrokes and mouse actions to Linux?

Peter Olcott · Nov 1, 2006

qrk said:
Is this a homework question? I haven't seen these devices for years!
The old macro keyboards that inserted between keyboard and computer
were probably under $100. Modern solution is use something like Macro
Express <http://www.macros.com/> which is much more convenient to use.

One of the applications that I am creating with my www.SeeScreen.com patented
technology is a better Macro Recorder than can otherwise exist. Since SeeScreen
let's any application always be able to intelligently see what's on the screen,
now for the first time a Macro Recorder can reliably operate the mouse. Before
SeeScreen Macro Recorders were blind, and could not "see" where to click the
mouse.

Le Chaud Lapin · Nov 1, 2006

Peter said:
So you can't simply get the pixels comprising the display screen on Linux?

Not sure if you are asking this question rhetorically, but yes you can,
using XWD or, programmatically, using XGetImage on the root window.

The question is whether you intend to track all screen state, or take
snap shots. If you intend to take only snapshots, then on both Windows
and Unix, you're looking at 20 lines of code or less.

I noticed that your application allows the specification of font
family, font face, etc., when scanning for text. This has implications
on the scope of what you are doing. Naturally, you're not proposing to
be able to recognize any piece of "graphical text", are you?

-Le Chaud Lapin-

Peter Olcott · Nov 1, 2006

petrus bitbyter said:
Hard to say because of the prerequisite constrains are unknown to me. I'd say
80 to 100 hours. Half of that time would be used for the real work. So making
a ready to use proto (including enclosure, connectors, power supply and so
on). The other half for gathering information, like details of the protocols,
availabily of components and obtaining them and for making decisions. After
all, when you want to go to mass production, component price and availbility
are important issues.

For instance, when I need to be fast, I'd go for an ATmega162. It's like
killing a mosquito using a canon but it's the smallest one with two UARTS I
know for the moment and it's not cheap. And there you go. Looking for a
cheaper solution requires time so you're slowed down. I should not be
astonished if even this very posting starts a discussion about more simple,
cheaper solutions. Which is, after all, the main value of the group.

petrus biybyter

Okay great so the project is feasible.

Le Chaud Lapin · Nov 1, 2006

Peter said:
One of the applications that I am creating with my www.SeeScreen.com patented
technology is a better Macro Recorder than can otherwise exist. Since SeeScreen
let's any application always be able to intelligently see what's on the screen,
now for the first time a Macro Recorder can reliably operate the mouse. Before
SeeScreen Macro Recorders were blind, and could not "see" where to click the
mouse.

I see. Let me clarify for the other readers - you are _not_ proposing
to recognize any arbitrary "text". It is possible to render text in
Windows using a font not known by the system using bit-blt on
internally rendered bitmap. Certainly you would not be able to
recognize a custom "dingbat font".

That said, I can vouch for the value of your application (if it
actually works. :])

Most macro recorders today are somewhat disfunctional. (Macro
recorders monitor, in software, the input stream to Windows, and
attempts to replay the stream back after the computer has been rebooted
to get the application into the state it assumed when a human entered
the input).

The reason that they are broken has primarily to do with timing - when
it comes times to playback the keystrokes, they have no idea how
rapidly the application is responding to playback. So the macro
recorder might replay input too fast, playing keystrokes that were
meant for a window that has not come alive yet. The keystrokes are
then lost. The operator of the macro recorder will try to combat this
by guessing how long it takes a Window to be "born" and "come to life",
waiting that amount of time before playing input to that Window. But
this is error prone. Raise or lower the CPU peformance, and it breaks.

So Peter's application apparently takes a snapshot of the screen, finds
all the windows, finds the title bars in the windows, edit boxes, etc.
and presents that data when it is requested by a function that wants
it. In this case, the operator of the macro recorder will no longer
have to guess how long it takes windows to pop up. He will simply say,
"Wait until there is a window with "Google Talk" in the title bark".

If this is what you are doing, and you are not doing generalized OCR,
which you said on your site you were not, then I am a bit puzzled, as
it would be possible to do the same by intervention into the GUI
subsystem of Windows. And unlike the frame grab method, where you
watch the pixels and therefore cannot "keep up" with state transitions
on the screen, you would know pretty much the "exact" state of the
screen at all times.

-Le Chaud Lapin-

Le Chaud Lapin · Nov 1, 2006

Peter said:
Okay great so the project is feasible.

Not only is it feasible, but the more I think about it, the more I
realize that you should do this in software. The reason is that, if
you are already writing software that must exist on the host system,
then you're already writing software on the host system, so you have
control over what that system does.

There is no commercially available system where you cannot get as close
to the hardware ports as possible in software. For example, on
Windows, the keyboard hardware is controled by a keyboard driver (mone
keyboard). Before a GUI window gets a pressed key, it must pass from
this driver into another device driver called WIN32K.SYS. WIN32K.SYS
gets the keys one by one in a RAW INPUT THREAD. So you have options.
You can inject keystrokes using the Raw Input API, write a device
driver (not trivial) to intercede KBHID.SYS and WIN32K.SYS, or write a
driver that gets right up against the keyboard hardware and and feeds
KBHID.SYS.

On Linux and other Unices, writing device driver borders on trivial
compared to Windows, so you could do the same thing there.

I would seriously reconsider doing this in hardware, since the amount
of effort to get 99% of your market is significantly less in software.

The ratio of material cost for hardware method vs software method is
infinite.

-Le Chaud Lapin-

Peter Olcott · Nov 1, 2006

Le Chaud Lapin said:
Not sure if you are asking this question rhetorically, but yes you can,
using XWD or, programmatically, using XGetImage on the root window.

The question is whether you intend to track all screen state, or take
snap shots. If you intend to take only snapshots, then on both Windows
and Unix, you're looking at 20 lines of code or less.

I noticed that your application allows the specification of font
family, font face, etc., when scanning for text. This has implications
on the scope of what you are doing. Naturally, you're not proposing to
be able to recognize any piece of "graphical text", are you?

-Le Chaud Lapin-

I can't tell what you are asking, if you are asking can my technology recognize
text from screen shots, the answer is yes. Here are some more details:
http://www.seescreen.com/Unique.html

Le Chaud Lapin · Nov 1, 2006

I can't tell what you are asking, if you are asking can my technology recognize
text from screen shots, the answer is yes. Here are some more details:
http://www.seescreen.com/Unique.html

Ok, so even though you're using a DFA in your algorithm, the overall
model is still stochastic. I see in many places 100% recognition,
which, naturally, makes anyone skeptical. To get 100% recognition of
arbitrary text, you have to know a priori the Bezier sets of not only
all font families currently known, but those that have yet to be made,
which seems absurd.

I think you should be more clear about the effectiveness of your tools,
as in how it works. Instead of saying, "it recognizes, say a bit
more." Since you already have a patent, it does not hurt to be more
complete in your description.

-Le Chaud Lapin-

Peter Olcott · Nov 1, 2006

Le Chaud Lapin said:
Not sure if you are asking this question rhetorically, but yes you can,
using XWD or, programmatically, using XGetImage on the root window.

The question is whether you intend to track all screen state, or take
snap shots. If you intend to take only snapshots, then on both Windows
and Unix, you're looking at 20 lines of code or less.

I noticed that your application allows the specification of font
family, font face, etc., when scanning for text. This has implications
on the scope of what you are doing. Naturally, you're not proposing to
be able to recognize any piece of "graphical text", are you?

-Le Chaud Lapin-

Now that I have read your other posts I can more easily understand your
question. My system can recognize any machine generated text. It must be
provided with exactly what to look for, this is typically done by specifying one
or more FontInstances:
(a) Font Typeface Name
(b) Point Size or PixelHeight
(c) Style including (Bold, Italic, Underline, and StrikeOut)
(d) Foreground Color and BackGround Color

James Waldby · Nov 1, 2006

Peter said:
"James Waldby" ... wrote ...

Good, what about sending keystrokes and mouse actions to Linux?

The C language Xlib "SendEvent" protocol request does that, on
systems where X Window System is in use. For a Perl version see http://search.cpan.org/dist/X11-SendEvent/SendEvent.pm

Jan is correct that not all linux systems and applications use
X. I suspect that more than 95% of end-user Unix and linux
systems use X, but imagine that a large fraction of servers
do not. Eg, the tens of thousands of machines in google's
server farms don't need displays. If you plan to address
machines such as those, you'd need another set of system calls.
-jiw

Peter Olcott · Nov 1, 2006

Le Chaud Lapin said:
Ok, so even though you're using a DFA in your algorithm, the overall
model is still stochastic.

No it is not stochastic at all, the whole process is completely deterministic.

I see in many places 100% recognition,
which, naturally, makes anyone skeptical. To get 100% recognition of
arbitrary text, you have to know a priori the Bezier sets of not only
all font families currently known, but those that have yet to be made,
which seems absurd.

You must provide it with the means for knowing the precise pixel pattern of
every Glyph that it must recognize, this is typically done by specifying a
FontInstance:
(a) Font Typeface Name
(b) Point Size or PixelHeight
(c) Style including (Bold, Italic, Underline, and StrikeOut)
(d) Foreground Color and BackGround Color

It can process many different FontInstances simultaneously. This part of the
system is operational and fully tested. It can provide 100% accuracy on any
FontInstance that is not inherently ambiguous. The default FontInstance for much
of MS Windows, Tahoma, 8 point is processed with 100% accuracy.
Simple Heuristics can be applied to get very close to 100% accuracy on most
FontInstances.

Peter Olcott · Nov 1, 2006

Le Chaud Lapin said:
Ok, so even though you're using a DFA in your algorithm, the overall
model is still stochastic. I see in many places 100% recognition,
which, naturally, makes anyone skeptical. To get 100% recognition of
arbitrary text, you have to know a priori the Bezier sets of not only
all font families currently known, but those that have yet to be made,
which seems absurd.

Actually it could be set up to process all font families currently known. The
simplest way to do this would be to build the DFA for the lower case vowels of
every FontInstance in the colors of black on white. Then the text would be
required to be transformed to black on white. Now it could quickly determine the
correct FontInstance on its own, and then load up the appropriate full DFA(s).
This assumes machine generated text that is not dithered or anti-aliased. With
dithering, the problem of transforming the text to black on white becomes more
complex, yet still feasible.

Le Chaud Lapin · Nov 1, 2006

Peter said:
Now that I have read your other posts I can more easily understand your
question. My system can recognize any machine generated text. It must be
provided with exactly what to look for, this is typically done by specifying one
or more FontInstances:
(a) Font Typeface Name
(b) Point Size or PixelHeight
(c) Style including (Bold, Italic, Underline, and StrikeOut)
(d) Foreground Color and BackGround Color

With those parameters, it is indeed possible to find matches. How
could you not? If your software runs on the same computer as the
windows that it is monitoring, then certainly if you render a piece of
text using the parameters that match what is displayed, you will have
an exact match, even with effects of anti-aliasing, transformations,
etc.

However, I should point out again. Given that the user of your
software has to specify these parameters anyway, and given that text
that was not generated by the underlying font system will not, in
general, be recognized by your software, it remains that the most
important elements of recognition are pieces of text that is generated
by the GUI system.

But it is possible to intercept _all_ rendering of such text through
well-defined API's. In other words, if I were interested in knowing if
there were a window that had the word "JFET" in it, I have to options.

1. Use your system and enter the above information.
2. Use my hypothetical system, and just specify "JFET".

Do you see? By interposition into the GUI subsystem, it becomes far
easier to describe what you are looking for. Font face, point size,
styling, and color become irrelevant, if it doesn't matter.

There is something else that is important. With your system, it seems
that you are taking snapshots. The problem with snapshots is that
there is a chance you will miss something, unless you are planning to
bump up the rate of frame-grabbing so fast that you miss nothing. With
my hypothetical system, there would never need to be a need to take a
snapshot. You'd always know the state of the system.

-Le Chaud Lapin-

Le Chaud Lapin · Nov 1, 2006

Peter said:
Actually it could be set up to process all font families currently known. The
simplest way to do this would be to build the DFA for the lower case vowels of
every FontInstance in the colors of black on white. Then the text would be
required to be transformed to black on white. Now it could quickly determine the
correct FontInstance on its own, and then load up the appropriate full DFA(s).
This assumes machine generated text that is not dithered or anti-aliased. With
dithering, the problem of transforming the text to black on white becomes more
complex, yet still feasible.

Ok, I see what you are doing now. I hate to rain on anyone's parade,
especially one where the objective is ambitious, you should know that
what you are doing, the ultimate result, could be done in a way that is
probably superior in many respect than the image based method.

One example is simple. Let's say that a programmer wants to use your
software to know whenever the string "You Have Mail" appears anywhere
on the screen, knowing that there is a mail application that pops up a
window with this message. He specifies the font family, point size,
style, and background/foreground colors of the little window that
contains this message. To get this information, he spends 10 minutes
repeatedly sending mail messages to himself to force the window to
popup, and when it does, he eyeballs the message to ascertain the
parameters. Finally he goes to your software and enters arguments for
these parameters. Then he tells your software to run, and specifies a
rate-of-grab of frame buffers so that the window, which pops up for
only three seconds, is not missed.

Compare that to not having to force anything to popup or eyeball
anything, simple typing in "you have mail", checking case-insensitive
box, and being done with it. Not rate-of-grab would be necessary
because there would be no frame grabbing. The monitoring software
would simply "know" the state of entire GUI system at any point in
time.

Certainly you will agree that, if this is what your software does, the
latter method has significant advantages?

-Le Chaud Lapin-

Jan Panteltje · Nov 1, 2006

So you can't simply get the pixels comprising the display screen on Linux?

I ma sure you can, but because of the large amoun tof stuff that
potentially _can_ run, X11 (own drivers), text console (own drivers),
vgalib, you'd first have to find what is running and how I think, before
you can access any display buffer.
There may be more then one graphics card too

Moore's Lobby Podcast

Menu

Categories

Platforms

Content

Connect With Us

Network

Keyboard / Mouse Input Device Design??

Keyboard / Mouse Input Device Design??

Le Chaud Lapin

Peter Olcott

Peter Olcott

Peter Olcott

Peter Olcott

Peter Olcott

Peter Olcott

Le Chaud Lapin

Peter Olcott

Le Chaud Lapin

Le Chaud Lapin

Peter Olcott

Le Chaud Lapin

Peter Olcott

James Waldby

Peter Olcott

Peter Olcott

Le Chaud Lapin

Le Chaud Lapin

Jan Panteltje

Similar threads