Google Developers Day US - Theorizing from Data

Back


"Theorizing from Data: Avoiding the Capital Mistake
Peter Norvig
""It is a capital mistake to theorize before one has data."" Sir Arthur Conan Doyle's words from 1891 remain true today. Researchers in computational linguistics and information retrieval now have a million times more data than was available 30 years ago. This talk explores what this data can do for problems in language understanding, translation, information extraction, and inference, and extrapolates to what more data may bring in the future. "

Channel: News & Politics
Uploaded: June 5, 2007 at 3:32 pm
Author: GoogleDeveloperDay

Length: 00:52:51
Rating: 4.75
Views: 15062

Tags: GDD07 GDD07US Theorizing from Data

Embed Code:


Video Comments:
54spiritedwill54 (May 21, 2008 at 11:38 pm)
Quite interesting...
xHardstyleAddictx (March 16, 2008 at 5:09 pm)
It still amazes me over and over again of how smart some people can be. I'm getting my Professional Bachelor of Informatics in 2 months and I feel really dumb compared to these people. But then again, they have their years of experience and I only have my 3 years at collegue. I find this topic very interesting, though a little bit hard to understand at certain times.
LethalCoke (March 9, 2008 at 8:11 pm)
He mentioned a DVD that Google sold which had their collection of English words. Anyone know how to obtain it?

Any help will be VERY appreciated^^
pixiemotion (November 23, 2007 at 3:27 am)
Very interesting overview, but the question session in the end revealed a rather low competence among the audience, which is too bad -- there are some much more interesting theoretical questions to be asked. For one, this type of machine translation seems to be founded on having some sort of parallell aligned texts; this is relatively easy for German and English as showed in the examples, they're very similar languages both syntactically and lexically.
pixiemotion (November 23, 2007 at 3:28 am)
Ut what happens when you try aligning eg. polysynthetic languages such as the Greenlandics (where a single word may express what in English would be a ten letter sentence) and analytic languages such a s Chinese (where the average word length is, what, 2.5 letters?). There are a lot of challenges to be met, and it'd be very interesting to see how Norvig and the Google MT team are dealing with them.
pixiemotion (November 23, 2007 at 3:30 am)
The basic method of a probabilistic translation model and a language model is relatively old news (Brown et al, 1990), and the same criticisms that applied 17 years ago have not been answered here: what do you do with language pairs that differ?

Now, if they manage to translate English-Klingon, that'd be impressive.
Erudecorp (December 22, 2007 at 6:41 pm)
You would put that into the search criteria, and have it search words within words (synthetic) or context (analytic). English is a mix of synthetic and analytic already, so you can see it already has those capabilities.
omonRarius (July 31, 2007 at 2:43 am)
<b>

Enter XXXodour.com_ to Play Full Length XXX Videos 4 FREE

</b>
PGTaboada (July 6, 2007 at 5:11 pm)
IMHO one of the best sessions I have seen from GDD US.
loseryouser (June 25, 2007 at 7:50 pm)
Thanks Peter Norvig, Your shirt gave me a siezure.. ;)