OT: card storage

D Yuniskis · May 17, 2010

Hi Joseph,

It not only has been done, but is still available, mouse included.
But most of such currently are just repurposed laptops. Probably more
user hacks than commercial sales. A great excuse to buy a laptop that is
otherwise relatively CPU underpowered.

This is something that *looks* like a laptop but, in
reality, is just the laptop's *screen* with a video
INput and keyboard with a PS/2 or USB OUTput?

Pointers, please?

Paul Keinanen · May 17, 2010

The most tedious are the print manuals that were never
available (publicly) in electronic form. In addition
to the actual *scanning*, there is a lot of work
getting the manuals "disassembled" to the point where
they *can* be scanned (e.g., ripping "perfect binding").

Have you tried photographing those pages using a digital camera ?

This way, the book only needs to be opened to 90 degrees, compared to
180 degrees on an ordinary flat scanner.

A 12 Mpix photograph of the A4 paper will be about 14 pixels/mm.
Compare this to the older CRT computer monitors that had a 0.25 mm
pitch (4 pixels/mm) shadow mask, thus the picture can even be enlarged
on the screen, compared to the original page size.

Of course, the Bayer camera cell will degrade the resolution as well
as the quality of the optics.

JosephKK · May 17, 2010

Hi Joseph,

This is something that *looks* like a laptop but, in
reality, is just the laptop's *screen* with a video
INput and keyboard with a PS/2 or USB OUTput?

Pointers, please?

Not knowing your specific application i suggest using this in your
favorite search engine:

X-terminal device

You could probably find brick-like devices that do what you want.

D Yuniskis · May 17, 2010

Hi Joseph,

Not knowing your specific application i suggest using this in your
favorite search engine:

X-terminal device

You could probably find brick-like devices that do what you want.

No, X Terminals, thin clients, etc. are different beasts entirely.
I have several from NCD, Neoware, HP, etc. (I tend to like
using them because they have very little maintenance overhead)

I want a portable LCD monitor and a portable keyboard.
I want to be able to connect that monitor to any old PC
just like any *other* LCD monitor.
I want to be able to connect that keyboard to that same PC
just like any other keyboard.

Gee, a laptop has an LCD *panel* in it. And a set of keys.
Even laptops that are DOG SLOW have these things!

I.e., a laptop that one is likely to DISCARD (recycle) as being
too slow would make an ideal device to *gut* and convert into
a "portable LCD monitor with attached portable keyboard"
(i.e., no CPU inside).

*That* is what I am asking for.

I can do this with a (*fast*) laptop and a video digitizer
card and some software. But, if I could remove the processor
from the laptop along with 95% of its guts and leave *just*
the LCD panel and keyboard, I could get the same sort of
capabilities by adding enough guts to turn it into *just*
an "LCD display".

This should be possible (as a "hackable") with a small board
installed in place of the laptop's guts. I.e., take the guts
out of an LCD monitor and wire them to the connector for the
LCD *panel* inside the gutted laptop (the keyboard is a
simple thing to hack)

D Yuniskis · May 17, 2010

Hi Michael,

The problem is that LCD display isn't a monitor. It is mated to the
LCD video interface on the laptop's system board.

Correct. But, the LCD *panel* in the laptop is no different from
the LCD *panel* in an LCD monitor. I.e., there is nothing to
prevent it from being used as such.

The keyboard isn't encoded, it is just a set of raw, matrixed
switches.

Keyboard debounce and encode is a trivial task. :>

<http://www.earthlcd.com/> makes some small SVGA LCD monitors, starting
at 6.4 inches.

It hasn't come up lately, but for a while someone asked that question
about once a month. Then they would find that it cost more for the
custom elcetronics, than to buy a new monitor.

Understood. But I'm not looking for an LCD monitor, per se.
What I want is the portability that the laptop offers for
screen *and* keyboard.

I.e., currently, I carry a 7" portable LCD monitor and a
(generic) PC keyboard to interact with headless machines.
I would much prefer the LCD and keyboard be in one neat little
case -- like a laptop -- so easier to carry and store.

Making a suitable interface (to convert a "laptop LCD + keypad"
into a "portable monitor + keyboard") isn't a real problem. The
bigger problem is finding a suitable laptop on which to do this.
(i.e., so that the design effort for the interface isn't "wasted"
on a "qty one" product). Everyone would have their own special
wants -- namely, that it work with *their* (scrap) laptop
(fitting *their* laptop's keypad and display connections,
fitting *their* laptop's mechanical constraints, etc.)

It would only make sense to do if you had a known supply of
a particular laptop (or netbook, etc.) to use.

D Yuniskis · May 17, 2010

Hi Paul,

Paul said:
Have you tried photographing those pages using a digital camera ?

It's too labor intensive. You have to arrange for the book
to be held open "enough", even lighting, transfer photos
to PC, convert to TIFF, trim them, import them (in the correct
order) to a PDF, etc.

A lot of time, you end up with crappy image quality "in the
binding edge" as the paper curls and you can't get a clear
view of stuff at that edge, etc.

Instead, I bring the manuals to a print shop and have them
*cut* the binding edge off of the pages. They have large,
electric stack paper cutters (do ~1000 pages at a time
*without* the inevitable skew that a manual/guillotine paper
cutter imparts to the cut!). Then, I can just feed the
"individual pages" through the document feeder (instead of
having to manually flip pages, etc.).

It ends up destroying the original *bound* document (<shrug>)
but most of the folks who look for this sort of information
would gladly see a paper document "sacrificed" if it makes
that document more readily available (in electronic form).

It's just a hugely BORING activity (tedious?) and demands
large blocks of time to get anything done. So, I don't
rush to "do more of it"! :>

This works for most "A" size manuals. Things with fold-out
pages (e.g., a B size fold-out in a "regular" manual)
have to be processed differently. I have a 12x17 scanner
for larger documents. Anything bigger than that I have to
piece together from partial scans (which gets *really*
time consuming!)

This way, the book only needs to be opened to 90 degrees, compared to
180 degrees on an ordinary flat scanner.

Even that approach won't work for a lot of materials.
E.g., the scanner in the original KRM was deliberately
designed (for obvious reasons!) to be able to scan
a book in this way. With particular care to being
able to bring the camera *deep* into the binding edge
I.e., the scanner glass came right to the edge of the
case over which the book's binding would sit (the 3rd
photo in http://www.kurzweiltech.com/raybio.html is
*just* the scanner)

Yet, it still had problems with some printed materials.
"Perfect" binding sucks :>

JosephKK · May 17, 2010

Hi Joseph,

This is something that *looks* like a laptop but, in
reality, is just the laptop's *screen* with a video
INput and keyboard with a PS/2 or USB OUTput?

Pointers, please?

Sounds a lot like an X-terminal. Due to volume issues the laptop may yet
be lower cost.

D Yuniskis · May 17, 2010

Hi Joseph,

Sounds a lot like an X-terminal. Due to volume issues the laptop may yet
be lower cost.

No. An X terminal has a processor in it, understands the
X protocol, has a network interface, etc.

I.e., if I gave you an LCD monitor and a keyboard, you could
never run xdm -- unless you added a processor and a NIC.
The device I am describing could be used "as a TV" (with
an NTSC-VGA adapter) -- something you aren't going to do
with an X Terminal.

Paul Keinanen · May 17, 2010

Hi Paul,

It's too labor intensive. You have to arrange for the book
to be held open "enough", even lighting, transfer photos
to PC, convert to TIFF, trim them, import them (in the correct
order) to a PDF, etc.

I have seen a video footage of a machine used by a library (sorry, I
did not record the details), which opened the book about 120 degrees.
One arm took the next page down to horizontal level, a horizontal
glass sheet was put on the page to make sure the page was truly
horizontal, the flash light was activated and the class sheet was
removed and the sequence restarted.

The sequence took about 2-3 seconds. Apparently some auto-focus was
used, since the distance between the lens and the paper changed each
time a new page was added.

The odd pages could be processed in one run and a separate run would
be required to process the even pages (including flipping the page and
inverting the picture order).

A lot of time, you end up with crappy image quality "in the
binding edge" as the paper curls and you can't get a clear
view of stuff at that edge, etc.

Put a heavy glass sheet on the page you are photographing, this will
flatten out the page and you will get equal focus across the page.

Instead, I bring the manuals to a print shop and have them
*cut* the binding edge off of the pages. They have large,
electric stack paper cutters (do ~1000 pages at a time
*without* the inevitable skew that a manual/guillotine paper
cutter imparts to the cut!). Then, I can just feed the
"individual pages" through the document feeder (instead of
having to manually flip pages, etc.).

If you do the trouble of carrying the manuals to the print shop, why
not let them scan the pages ?

In a law abiding print shops you may have to prove that you have the
copyright to make your own copies.

----------------

Then there is the question how to store the scanned pages and also how
to distribute the (web) pages in a bandwidth efficient way.

I previously thought that storing and displaying the scanned pages as
simply bilevel (1 bit/pixel) bitmaps (typically run length encoded as
in faxes) would be sufficient, however, such page pictures look
horrible and the OCR software does not reliably make sense of the
text.

1 bit/pixel is really too little and 8 bits/pixel would be excessive.
How many bits/pixel would be sufficient for pleasant visual rendering
or required by OCR software ?

D Yuniskis · May 17, 2010

Hi Paul,

Paul said:
I have seen a video footage of a machine used by a library (sorry, I
did not record the details), which opened the book about 120 degrees.
One arm took the next page down to horizontal level, a horizontal
glass sheet was put on the page to make sure the page was truly
horizontal, the flash light was activated and the class sheet was
removed and the sequence restarted.

I suspect such a device is considerably beyond my *practical*
budget! :>

The sequence took about 2-3 seconds. Apparently some auto-focus was
used, since the distance between the lens and the paper changed each
time a new page was added.

The odd pages could be processed in one run and a separate run would
be required to process the even pages (including flipping the page and
inverting the picture order).

Yes. I do similarly when running the sheets through the
document feeder. Once "prepared", I can do 5 or 6 pages
a minute -- not too bad but, when you have tens of thousands
of pages... :<

If you do the trouble of carrying the manuals to the print shop, why
not let them scan the pages ?

1) I didn't realize they could do this
2) it's probably not inexpensive
3) copyright issues

In a law abiding print shops you may have to prove that you have the
copyright to make your own copies.

Exactly. It seems like the attitude towards this waxes and wanes.
And, no doubt, varies based on who's working on that day, etc.

Then there is the question how to store the scanned pages and also how
to distribute the (web) pages in a bandwidth efficient way.

I previously thought that storing and displaying the scanned pages as
simply bilevel (1 bit/pixel) bitmaps (typically run length encoded as
in faxes) would be sufficient, however, such page pictures look
horrible and the OCR software does not reliably make sense of the
text.

That is where the manual aspects come into play. You need to review
the results of the scan to decide how best to proceed. I've not found
any "magic bullet" -- unless you don't care about size (or quality).

1 bit/pixel is really too little and 8 bits/pixel would be excessive.
How many bits/pixel would be sufficient for pleasant visual rendering
or required by OCR software ?

It depends on the sizes of the typefaces used. Note that this
can vary within a document.

And, whether there are illustrations, etc.

Sometimes, you get really grainy images -- as if there was
dust on the scanner (though it is NOT the scanner that is
the source of the problem).

For decent typeface sizes, I will use 1bpp at 400-600dpi.
This is readable *and* OCR-able (not to be confused with
ocre-able -- which is the ability to turn something into ocre!)
Other times, I will use 8bpp and drop down to 300dpi
(trying to balance the added image depth against the
decreased resolution).

I wrote some utilities to create *4* bit TIFFs but very few
programs will recognize this encoding (despite adhering to the
letter of the spec).

I generally avoid the OCR stage as it requires *lots* of
proofreading. Images often get mishandled. Text often
gets misrecognized (remember, these are "computer manuals"
so "pigx" and "pigy" might be real "words" despite the OCR
packages attempts to "fix" them into "pigs" and "piggy", etc.).
I figure just creating the (electronic) documents is
enough of a "donation" so if folks want to grumble, they
can go find better versions (hint: most of this stuff is
simply NOT AVAILABLE). :>

"If you don't like what I'm serving for dinner, you're welcome
to eat elsewhere..."

Jeffrey D Angus · May 17, 2010

D said:
Exactly. It seems like the attitude towards this waxes
and wanes. And, no doubt, varies based on who's working
on that day, etc.

My standard comment to those that "make the decision" at
places like Kinko's is. "These copies are being made because
the technicians at my shop at complete assh**es and destroy
original manuals every time they pick one up. Now if I can
keep them from writing on the monitor screens with a Sharpie
I'd be a happy camper."

Usually I get the eye roll, but after they stop laughing,
they authorize the full copying and or scanning of the
documents.

Jeff

D Yuniskis · May 18, 2010

Hi Jeff,

My standard comment to those that "make the decision" at
places like Kinko's is. "These copies are being made because
the technicians at my shop at complete assh**es and destroy
original manuals every time they pick one up. Now if I can
keep them from writing on the monitor screens with a Sharpie
I'd be a happy camper."

Usually I get the eye roll, but after they stop laughing,
they authorize the full copying and or scanning of the
documents.

Ha! I'm not sure I want to rely on that sort of
response... :<

JosephKK · May 18, 2010

Hi Joseph,

No. An X terminal has a processor in it, understands the
X protocol, has a network interface, etc.

I.e., if I gave you an LCD monitor and a keyboard, you could
never run xdm -- unless you added a processor and a NIC.
The device I am describing could be used "as a TV" (with
an NTSC-VGA adapter) -- something you aren't going to do
with an X Terminal.

All read up. Now i see what you want. The best approach still looks
like a really serious hack of a laptop. The monitor part is going to be
really tough. Keyboard and one or more pointing devices should be pretty
easy.
You may have to hack the power brick, or the batteries or both.
Once USB3 (3 Gb/s) becomes common you only need the one interface.

Paul Keinanen · May 18, 2010

Hi Paul,

That is where the manual aspects come into play. You need to review
the results of the scan to decide how best to proceed. I've not found
any "magic bullet" -- unless you don't care about size (or quality).

I think it is important to keep the distinction between
scanning/storage format and on the other hand the publishing format.

These days 1 TB of storage costs practically nothing (and an other TB
for backup), IMHO the source should be scanned and stored with the
best available resolution and bit planes, possibly with some very mild
compression.

You can then make some 1 bit/pixel encoding for publishing and heavy
compression.

After a few years, you can reprocesses your digital source archives,
without rescanning the original documents when better software is
available, in order to produce smaller or higher quality publishing
formats.

It depends on the sizes of the typefaces used. Note that this
can vary within a document.

And, whether there are illustrations, etc.

Sometimes, you get really grainy images -- as if there was
dust on the scanner (though it is NOT the scanner that is
the source of the problem).

For decent typeface sizes, I will use 1bpp at 400-600dpi.
This is readable *and* OCR-able (not to be confused with
ocre-able -- which is the ability to turn something into ocre!)
Other times, I will use 8bpp and drop down to 300dpi
(trying to balance the added image depth against the
decreased resolution).

I wrote some utilities to create *4* bit TIFFs but very few
programs will recognize this encoding (despite adhering to the
letter of the spec).

4 bit/pixel might be a usable format for _storage_, since this can
register the varying illumination, the whiteness of the paper and how
black the ink is. This might be usable information when postprocessing
to 1 bit/pixel.

I generally avoid the OCR stage as it requires *lots* of
proofreading. Images often get mishandled. Text often
gets misrecognized (remember, these are "computer manuals"
so "pigx" and "pigy" might be real "words" despite the OCR
packages attempts to "fix" them into "pigs" and "piggy", etc.).

As a compromise, you might publish the scans as bit maps, however, it
might be a good idea to run your original scans through some OCR
software and use the result to build an index. While a "pig" might be
a bit unexpected in a computer manual index, there is much less manual
proofreading.

IMHO the worst problem with scanned documents is that it does not
usually contain a searchable index, so including even somewhat flaked
index would be a great service.

I figure just creating the (electronic) documents is
enough of a "donation" so if folks want to grumble, they
can go find better versions (hint: most of this stuff is
simply NOT AVAILABLE). :>

"If you don't like what I'm serving for dinner, you're welcome
to eat elsewhere..."

Scanning fragile (and often disintegrating) paper documents is a way
of preserve our cultural heritage.

Unfortunately, intellectual property laws (with protection times
decades after the IP holders death), may in fact cause a loss of the
human intellectual heritage.

Albert van der Horst · May 18, 2010

Hi Paul,

I suspect such a device is considerably beyond my *practical*
budget! :>

Yes. I do similarly when running the sheets through the
document feeder. Once "prepared", I can do 5 or 6 pages
a minute -- not too bad but, when you have tens of thousands
of pages... :<

At the Dutch tax office I fell in love with some Fujitsu scanner,
capable of two sided scanning.
(It is on sale, *refurbished*, second hand... Where do you hear
that, PC equipment that is worth revising, then doing hundred of
Euro's.)

There are machines like the Brother MFX-8860DN.
This scans sheets two sided, dozens at a time, with the sheet
feeder. (It prints two-sided too. It copies. It faxes.)
It is not cheap, but seems like a good deal and well supported
by Linux.

Groetjes Albert.

Jeffrey D Angus · May 18, 2010

Michael said:
That's just a sample of Jeff's 'VERY' warped sense of humor. You'll
get used to it. ;-)

Aye, but it works. That's the bottom line.

Jeff

D Yuniskis · May 18, 2010

Hi Paul,

Paul said:
I think it is important to keep the distinction between
scanning/storage format and on the other hand the publishing format.

In my case, they are one in the same. I'm not in this as a "business"
(I am uncompensated for the *many* hours it takes to convert the
documents)

These days 1 TB of storage costs practically nothing (and an other TB
for backup), IMHO the source should be scanned and stored with the
best available resolution and bit planes, possibly with some very mild
compression.

You'd be amazed at how quickly that eats up disk space! I scanned
a disintegrating book on origami a few years ago seeking to
preserve color, etc. It was over 100MB compressed. You can't
store very many books if you preserve that much detail. :<

You can then make some 1 bit/pixel encoding for publishing and heavy
compression.

After a few years, you can reprocesses your digital source archives,
without rescanning the original documents when better software is
available, in order to produce smaller or higher quality publishing
formats.

4 bit/pixel might be a usable format for _storage_, since this can
register the varying illumination, the whiteness of the paper and how
black the ink is. This might be usable information when postprocessing
to 1 bit/pixel.

But, it's a "proprietary format", then. I used this on a manual
I produced and it was nothing but trouble since I had to explicitly
"unpack" each image before I could create the final artwork...
then, repack everything to conserve space on disk.

As a compromise, you might publish the scans as bit maps, however, it
might be a good idea to run your original scans through some OCR
software and use the result to build an index. While a "pig" might be
a bit unexpected in a computer manual index, there is much less manual
proofreading.

IMHO the worst problem with scanned documents is that it does not
usually contain a searchable index, so including even somewhat flaked
index would be a great service.

I guess I look at it differently. The original PAPER document
didn't have a (electronic) searchable index and "somehow" seemed
to work. So, if the electronic document doesn't have that
searchable index, it's no *loss* (it's just not a *gain*!).

E.g., I have lots of novels that I would love to preserve
in this way. I don't care if they are available as text.
I just want to be able to re-read them after the paper
versions have disintegrated (paperbacks being notoriously
short-lived). So, an "image" of a page that my brain
can process -- even if it doesn't have enough fidelity for
an OCR package to handle -- is quite adequate.

Scanning fragile (and often disintegrating) paper documents is a way
of preserve our cultural heritage.

Unfortunately, intellectual property laws (with protection times
decades after the IP holders death), may in fact cause a loss of the
human intellectual heritage.

See AEK's work at bitsavers.org. Be prepared to be blown away!
(be friendly to the server as I think it's his personal expense)

D Yuniskis · May 18, 2010

Michael said:
Have you tried 'Paperport'? Its compression is impressive. Its .max
file format makes small files, and you can drag the individual pages
into chapters or whole documents. The basic version was shipped with a
lot of flatbed scanners a few years ago, and includes a stand alone
reader.

I have to use formats that are "open" and/or widely
accepted (which often ends up with them being "open").
I don't live in *just* the "Windows World"

Paul Keinanen · May 19, 2010

Hi Paul,

You'd be amazed at how quickly that eats up disk space! I scanned
a disintegrating book on origami a few years ago seeking to
preserve color, etc. It was over 100MB compressed. You can't
store very many books if you preserve that much detail. :<

The storage cost for those 100 MB would be about one cent with current
1 TB drives.

Albert van der Horst · May 19, 2010

I think it is important to keep the distinction between
scanning/storage format and on the other hand the publishing format.

These days 1 TB of storage costs practically nothing (and an other TB
for backup), IMHO the source should be scanned and stored with the
best available resolution and bit planes, possibly with some very mild
compression.

You can then make some 1 bit/pixel encoding for publishing and heavy
compression.

After a few years, you can reprocesses your digital source archives,
without rescanning the original documents when better software is
available, in order to produce smaller or higher quality publishing
formats.

A good advice.

4 bit/pixel might be a usable format for _storage_, since this can
register the varying illumination, the whiteness of the paper and how
black the ink is. This might be usable information when postprocessing
to 1 bit/pixel.

As a compromise, you might publish the scans as bit maps, however, it
might be a good idea to run your original scans through some OCR
software and use the result to build an index. While a "pig" might be
a bit unexpected in a computer manual index, there is much less manual
proofreading.

It seems that Adobe has software to add OCR to a bitmap document.
That means text is searchable. For an example see the old issues
of Forth Dimensions (http//www.forth.org ) under the heading
Forth Online documentation. So although you're looking at
a scan you can search for e.g. DROP and get it right most of the
time.

(But I'm convinced that there will be a time that you ocr
a 19-th century book, and the result will be better than
the original.)

IMHO the worst problem with scanned documents is that it does not
usually contain a searchable index, so including even somewhat flaked
index would be a great service.

See above.

Scanning fragile (and often disintegrating) paper documents is a way
of preserve our cultural heritage.

Unfortunately, intellectual property laws (with protection times
decades after the IP holders death), may in fact cause a loss of the
human intellectual heritage.

This is one of my grave concerns. The program SchoonSchip of
(Nobel price winner) Veltman has a nice manual, that is free.
The original manual (197x, mainly of historic interest) sits behind a
(ca) 30 Euro fee. (I'm involved with this, trying to port SchoonSchip
from 68K assembler to Intel.)
It is not hard to imagine a hardcore Elsevier executive to drop
all papers not downloaded for 5 years.
(This has been a seminal activity for the "standard model"
in physics, but what do they know ...)

Throughout history it has been a fight to have libraries in shape.
We don't need another destructive force, besides wars and
ignorance.

Please note that IP laws give protection. We are in no obligation
to exert these rights to the full. A movement that establishes
the habit of pushing all legacy documentation into the public domain
would get my backing.

Groetjes Albert

Moore's Lobby Podcast

Menu

Categories

Platforms

Content

Connect With Us

Network

OT: card storage

OT: card storage

D Yuniskis

Paul Keinanen

JosephKK

D Yuniskis

D Yuniskis

D Yuniskis

JosephKK

D Yuniskis

Paul Keinanen

D Yuniskis

Jeffrey D Angus

D Yuniskis

JosephKK

Paul Keinanen

Albert van der Horst

Jeffrey D Angus

D Yuniskis

D Yuniskis

Paul Keinanen

Albert van der Horst

Similar threads